Browsing by Author "Mittelman, David Alexander"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Automatically Generating Tests from Natural Language Descriptions of Software BehaviorSunil Kamalakar, FNU (Virginia Tech, 2013-10-18)Behavior-Driven Development (BDD) is an emerging agile development approach where all stakeholders (including developers and customers) work together to write user stories in structured natural language to capture a software application's functionality in terms of re- quired "behaviors". Developers then manually write "glue" code so that these scenarios can be executed as software tests. This glue code represents individual steps within unit and acceptance test cases, and tools exist that automate the mapping from scenario descriptions to manually written code steps (typically using regular expressions). Instead of requiring programmers to write manual glue code, this thesis investigates a practical approach to con- vert natural language scenario descriptions into executable software tests fully automatically. To show feasibility, we developed a tool called Kirby that uses natural language processing techniques, code information extraction and probabilistic matching to automatically gener- ate executable software tests from structured English scenario descriptions. Kirby relieves the developer from the laborious work of writing code for the individual steps described in scenarios, so that both developers and customers can both focus on the scenarios as pure behavior descriptions (understandable to all, not just programmers). Results from assessing the performance and accuracy of this technique are presented.
- Bioflow: A web based workflow management system for design and execution of genomics pipelinesPuthige, Ashwin Acharya (Virginia Tech, 2014-01-11)The cost required for the process of sequencing genomes has decreased drastically in the last few years. The knowledge of full genomes has increased the pace of the advancements in the field of functional genomics. Computational genomics, which analyses these sequences, has seen a similar growth. The multitude of sequencing technologies has resulted in various formats for storing the sequences. This has resulted in the creation of many tools for DNA analysis. There are various tools for sorting, indexing, analyzing read groups and other tasks. The analysis of genomics often requires the creation of pipelines, which processes the DNA sequences by chaining together many tools. This results in the creation of complex scripts that glue together these tools and pass the output from one stage to the other. Also, there are tools which allow creation of these pipelines with a graphical user interface. But these are complex to use and it is difficult to quickly add the new tools being developed to existing workflows. To solve these issues, we developed BioFlow; a web based genomic workflow management system. The use of BioFlow does not require any programming skills. The integrated workflow designer allows creation and saving workflows. The pipeline is created by connecting the tools with a visual connector. BioFlow provides an easy and simple interface that allows users to quickly add tools for use in any workflow. Audit logs are maintained at each stage, which helps users to easily identify errors and fix them.
- NextBrowse: An integrated and interactive web-based genome browser for analyzing and interpreting genomic dataWhisenhunt, Phillip J. (Virginia Tech, 2012-04-23)With the advent of high throughput sequencing technologies over the past decade there has been a surge in the amount of genomic data that needs to be analyzed and interpreted. Despite the availability of software frameworks such as the Genome Analysis Toolkit, data interpretation and analysis still requires human intervention and refinement. Genome browsers enable developers and users of sequence analysis tools to visualize, compare, and better interpret genomic data such as gene expression and functional annotations. We developed a next generation cross platform web-based genome browser, NextBrowse, for visualizing General Feature Format and Binary Alignment Map files. NextBrowse uses advanced visualization techniques such as 3D feature selection and transparency based on mapping quality, and improved Graphical User Interface elements such as individual track searching and textual and graphical reference location. NextBrowse is the first genome browser to allow BAM files to be streamed and visualized, the first genome browser to employ security measures, and the first to use only client side rendering. NextBrowse takes advantage of the open-source community, allowing developers and users to extend the project to fit their needs. NextBrowse along with all documentation is available for use at http://www.nextbrowse.vbi.vt.edu.
- Optimizing analysis pipelines for improved variant discoveryHighnam, Gareth Wei An (Virginia Tech, 2014-04-17)In modern genomics, all experiments begin data collection with sequencing and downstream alignment or assembly processing. As such, the development of reliable sequencing pipelines is hugely important as a foundation for any future analysis on that data. While much existing work has been done on enhancing the throughput and computational performance of such pipelines, there is still the question of accuracy. The rift in knowledge between speed and accuracy can be attributed to the more conceptually complex nature of what constitutes the measurement of accuracy. Unlike simply parsing logs of memory usage and CPU hours, accuracy requires experimental validation. Subsets of accuracy are also created when assessing alignment or variations around particular genomic features such as indels, Copy Number Variants (CNVs), or microsatellite repeats. Here is the development of accuracy measurements in read alignment and variation calls, allowing the optimization of sequencing pipelines at all stages. The underlying hypothesis, then, is that different sequencing platforms and analysis software can be distinguished from each other in accuracy by both sample and genomic variation of interest. As the term accuracy suggests, the measurements of alignment and variation recall require comparison against a truth set, for which read library simulations and high quality data from the Genome in a Bottle Consortium or Illumina Omni array have served us. In exploring the hypothesis, the measurements are built into a community resource to crowdsource the creation of a benchmarking repository for pipeline comparison. Results from pipelines promoted by this computational model are then wet lab validated with support for a hierarchy of pipeline performance. Particularly, the construction of an accurate pipeline for genotyping microsatellite repeats will be investigated, which is then used to create a database of human microsatellites. Progress in this area is vital for the growth of sequencing in both clinical and research settings. For genomics research to fully translate to the bedside, the boom of new technology must be controlled by rational metrics and industry standardization. This project will address both of these issues, as well as contribute to the understanding of human microsatellite variation.