DataSnap: Enabling Domain Experts and Introductory Programmers to Process Big Data in a Block-Based Programming Language

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Block-based programming languages were originally designed for educational purposes. Due to their low requirements for a user's programming capability, such languages have great potential to serve both introductory programmers in educational settings as well as domain experts as a data processing tool. However, the current design of block-based languages fails to address critical factors for these two audiences: 1) domain experts do not have the ability to perform crucial steps: import data sources, perform efficient data processing, and visualize results; 2) the focus of online assignments towards introductory programmers on entertainment (e.g. games, animation) fails to convince students that computer science is important, relevant, and related to their day-to-day experiences.

In this thesis, we present the design and implementation of DataSnap, which is a block-based programming language extended from Snap!. Our work focuses on enhancing the state of the art in block-based programming languages for our two target audiences: domain experts and introductory programmers. Specifically, in this thesis we: 1) provide easy-to-use interfaces for big data import, processing, and visualization methods for domain experts; 2) integrate relevant social media, geographic, and business-related data sets into online educational platforms for introductory programmers and enable teachers to develop their own real-time and big-data access blocks; and 3) present DataSnap in the Open edX online courseware platform along with customized problem definition and a dynamic analysis grading system. Stemming from our research contributions, our work encourages the further development and utilization of block-based languages towards a broader audience range.



computer science, block-based programming