Browsing by Author "Li, Song"
Now showing 1 - 20 of 56
Results Per Page
Sort Options
- Analysis of Shoot Architecture Traits in Edamame Reveals Potential Strategies to Improve Harvest EfficiencyDhakal, Kshitiz; Zhu, Qian; Zhang, Bo; Li, Mao; Li, Song (2021-03-03)Edamame is a type of green, vegetable soybean and improving shoot architecture traits for edamame is important for breeding of high-yield varieties by decreasing potential loss due to harvesting. In this study, we use digital imaging technology and computer vision algorithms to characterize major traits of shoot architecture for edamame. Using a population of edamame PIs, we seek to identify underlying genetic control of different shoot architecture traits. We found significant variations in the shoot architecture of the edamame lines including long-skinny and candle stick-like structures. To quantify the similarity and differences of branching patterns between these edamame varieties, we applied a topological measurement called persistent homology. Persistent homology uses algebraic geometry algorithms to measure the structural similarities between complex shapes. We found intriguing relationships between the topological features of branching networks and pod numbers in our plant population, suggesting combination of multiple topological features contribute to the overall pod numbers on a plant. We also identified potential candidate genes including a lateral organ boundary gene family protein and a MADS-box gene that are associated with the pod numbers. This research provides insight into the genetic regulation of shoot architecture traits and can be used to further develop edamame varieties that are better adapted to mechanical harvesting.
- Antibiotics ameliorate lupus-like symptoms in miceMu, Qinghui; Tavella, Vincent J.; Kirby, Jay L.; Cecere, Thomas E.; Chung, Matthias; Lee, Jiyoung; Li, Song; Ahmed, Sattar Ansar; Eden, Kristin; Allen, Irving C. (Nature, 2017-10-20)Gut microbiota and the immune system interact to maintain tissue homeostasis, but whether this interaction is involved in the pathogenesis of systemic lupus erythematosus (SLE) is unclear. Here we report that oral antibiotics given during active disease removed harmful bacteria from the gut microbiota and attenuated SLE-like disease in lupus-prone mice. Using MRL/lpr mice, we showed that antibiotics given after disease onset ameliorated systemic autoimmunity and kidney histopathology. They decreased IL-17-producing cells and increased the level of circulating IL-10. In addition, antibiotics removed Lachnospiraceae and increased the relative abundance of Lactobacillus spp., two groups of bacteria previously shown to be associated with deteriorated or improved symptoms in MRL/lpr mice, respectively. Moreover, we showed that the attenuated disease phenotype could be recapitulated with a single antibiotic vancomycin, which reshaped the gut microbiota and changed microbial functional pathways in a time-dependent manner. Furthermore, vancomycin treatment increased the barrier function of the intestinal epithelium, thus preventing the translocation of lipopolysaccharide, a cell wall component of Gram-negative Proteobacteria and known inducer of lupus in mice, into the circulation. These results suggest that mixed antibiotics or a single antibiotic vancomycin ameliorate SLE-like disease in MRL/lpr mice by changing the composition of gut microbiota.
- Application of Machine Learning and Hyperspectral Imaging in Plant Phenomics ResearchDhakal, Kshitiz (Virginia Tech, 2023-03-08)
- Applications of Machine Learning in Source Attribution and Gene Function PredictionChinnareddy, Sandeep (Virginia Tech, 2024-06-07)This research investigates the application of machine learning techniques in computational genomics across two distinct domains: (1) the predicting the source of bacterial pathogen using whole genome sequencing data, and (2) the functional annotation of genes using single- cell RNA sequencing data. This work proposes the development of a bioinformatics pipeline tailored for identifying genomic variants, including gene presence/absence and single nu- cleotide polymorphism. This methodology is applied to specific strains such as Salmonella enterica serovar Typhimurium and the Ralstonia solanacearum species complex. Phylo- genetic analyses along with pan-genome and positive selection studiesshow that genomic variants and evolutionary patterns of S. Typhimurium vary across sources, which suggests that sources can be accurately attributed based on genomic variants empowered by machine learning. We benchmarked seven traditional machine learning algorithms, achieving a no- table accuracy of 94.6% in host prediction for S. Typhimurium using the Random Forest model, underscored by SHAP value analyses which elucidated key predictive features. Next, the focus is shifted to the prediction of Gene Ontology terms for Arabidopsis genes using single-cell RNA-seq data. This analysis offers a detailed comparison of gene expression in root versus shoot tissues, juxtaposed with insights from bulk RNA-seq data. The integration of regulatory network data from DAP-seq significantly enhances the prediction accuracy of gene functions.
- Arabidopsis bioinformatics resources: The current state, challenges, and priorities for the futureDoherty, Colleen; Friesner, Joanna; Gregory, Brian; Loraine, Ann; Megraw, Molly; Meyers, Blake C.; Provart, Nicholas J.; Slotkin, R. Keith; Town, Chris; Assmann, Sarah M.; Axtell, Michael J.; Berardini, Tanya; Chen, Sixue; Gehan, Malia; Huala, Eva; Jaiswal, Pankaj; Larson, Stephen; Li, Song; May, Sean; Michael, Todd; Pires, J. Chris; Topp, Chris; Walley, Justin; Wurtele, Eve (Wiley, 2019-01-01)Effective research, education, and outreach efforts by the Arabidopsis thaliana community, as well as other scientific communities that depend on Arabidopsis resources, depend vitally on easily available and publicly-shared resources. These resources include reference genome sequence data and an ever-increasing number of diverse data sets and data types. TAIR (The Arabidopsis Information Resource) and Araport (originally named the Arabidopsis Information Portal) are community informatics resources that provide tools, data, and applications to the more than 30,000 researchers worldwide that use in their work either Arabidopsis as a primary system of study or data derived from Arabidopsis. Four years after Araport's establishment, the IAIC held another workshop to evaluate the current status of Arabidopsis Informatics and chart a course for future research and development. The workshop focused on several challenges, including the need for reliable and current annotation, community-defined common standards for data and metadata, and accessible and user-friendly repositories/tools/methods for data integration and visualization. Solutions envisioned included (a) a centralized annotation authority to coalesce annotation from new groups, establish a consistent naming scheme, distribute this format regularly and frequently, and encourage and enforce its adoption. (b) Standards for data and metadata formats, which are essential, but challenging when comparing across diverse genotypes and in areas with less-established standards (e.g., phenomics, metabolomics). Community-established guidelines need to be developed. (c) A searchable, central repository for analysis and visualization tools. Improved versioning and user access would make tools more accessible. Workshop participants proposed a "one-stop shop" website, an Arabidopsis "Super-Portal" to link tools, data resources, programmatic standards, and best practice descriptions for each data type. This must have community buy-in and participation in its establishment and development to encourage adoption.
- Comparative Functional Genomics Characterization of Low Phytic Acid Soybeans and Virus Resistant SoybeansDeMers, Lindsay Carlisle (Virginia Tech, 2020-06-02)The field of functional genomics aims to understand the complex relationship between genotype and phenotype by integrating genome-wide approaches, such as transcriptomics, proteomics, and metabolomics. Large-scale "-omics" research has been made widely possible by the advent of high-throughput techniques, such as next-generation sequencing and mass-spectrometry. The vast data generated from such studies provide a wealth of information on the biological dynamics underlying phenotypes. Though functional genomics approaches are used extensively in human disease research, their use also spans organisms as miniscule as mycoplasmas to as great as sperm whales. In particular, functional genomics is instrumental in agricultural advancements for the improvement of productivity and sustainability in crop and livestock production. Improvement in soybean production is especially imperative, as soybeans are a primary source of oil and protein for human and livestock consumption, respectively. The research presented here employs functional genomics approaches – transcriptomics and metabolomics – to discern the transcriptional regulation and metabolic events underlying two economically important agronomic traits in soybean: seed phytic acid content and Soybean mosaic virus resistance. At normal levels, seed phytic acid content inhibits mineral absorption in humans and livestock, acting as an antinutrient and contributing to phosphorus pollution; however, the development of low phytic acid soybeans has helped mitigate these issues, as their seeds increase nutrient bioavailability and reduce environmental impact. Despite these desirable qualities, low phytic acid soybeans exhibit poor seed performance, which negatively affects germination rates and yield and has prevented their large-scale commercial production. Thus, part of the focus of this research was investigating the effects of mutations conferring the low phytic acid phenotype on seed germination. Comparative studies between low and normal phytic acid soybean seeds were carried out and revealed distinct differences in metabolite profiles and in the transcriptional regulation of biological pathways that may be vital for successful seed germination. The final part of this research concerns Rsv3-mediated extreme resistance, a unique mode of resistance that is effective against the most virulent strains of Soybean mosaic virus. The molecular mechanisms governing this type of resistance are poorly characterized. Therefore, the research presented here attempts to elucidate the regulatory elements responsible for the induction of the Rsv3-mediated extreme resistance response. Utilizing a comparative transcriptomic time series approach on Soybean mosaic virus-inoculated Rsv3 (resistant) and rsv3 (susceptible) soybean lines, this final study provides gene candidates putatively functioning in the regulation of biological pathways demonstrated to be crucial for Rsv3-mediated resistance.
- Comparing time series transcriptome data between plants using a network module finding algorithmLee, Jiyoung; Heath, Lenwood S.; Grene, Ruth; Li, Song (2019-06-01)Background Comparative transcriptome analysis is the comparison of expression patterns between homologous genes in different species. Since most molecular mechanistic studies in plants have been performed in model species, including Arabidopsis and rice, comparative transcriptome analysis is particularly important for functional annotation of genes in diverse plant species. Many biological processes, such as embryo development, are highly conserved between different plant species. The challenge is to establish one-to-one mapping of the developmental stages between two species. Results In this manuscript, we solve this problem by converting the gene expression patterns into co-expression networks and then apply network module finding algorithms to the cross-species co-expression network. We describe how such analyses are carried out using bash scripts for preliminary data processing followed by using the R programming language for module finding with a simulated annealing method. We also provide instructions on how to visualize the resulting co-expression networks across species. Conclusions We provide a comprehensive pipeline from installing software and downloading raw transcriptome data to predicting homologous genes and finding orthologous co-expression networks. From the example provided, we demonstrate the application of our method to reveal functional conservation and divergence of genes in two plant species.
- Computational Analysis of Gene Expression Regulation from Cross Species Comparison to Single Cell ResolutionLee, Jiyoung (Virginia Tech, 2020-08-31)Gene expression regulation is dynamic and specific to various factors such as developmental stages, environmental conditions, and stimulation of pathogens. Nowadays, a tremendous amount of transcriptome data sets are available from diverse species. This trend enables us to perform comparative transcriptome analysis that identifies conserved or diverged gene expression responses across species using transcriptome data. The goal of this dissertation is to develop and apply approaches of comparative transcriptomics to transfer knowledge from model species to non-model species with the hope that such an approach can contribute to the improvement of crop yield and human health. First, we presented a comprehensive method to identify cross-species modules between two plant species. We adapted the unsupervised network-based module finding method to identify conserved patterns of co-expression and functional conservation between Arabidopsis, a model species, and soybean, a crop species. Second, we compared drought-responsive genes across Arabidopsis, soybean, rice, corn, and Populus in order to explore the genomic characteristics that are conserved under drought stress across species. We identified hundreds of common gene families and conserved regulatory motifs between monocots and dicots. We also presented a BLS-based clustering method which takes into account evolutionary relationships among species to identify conserved co-expression genes. Last, we analyzed single-cell RNA-seq data from monocytes to attempt to understand regulatory mechanism of innate immune system under low-grade inflammation. We identified novel subpopulations of cells treated with lipopolysaccharide (LPS), that show distinct expression patterns from pro-inflammatory genes. The data revealed that a promising therapeutic reagent, sodium 4-phenylbutyrate, masked the effect of LPS. We inferred the existence of specific cellular transitions under different treatments and prioritized important motifs that modulate the transitions using feature selection by a random forest method. There has been a transition in genomics research from bulk RNA-seq to single-cell RNA-seq, and scRNA-seq has become a widely used approach for transcriptome analysis. With the experience we gained by analyzing scRNA-seq data, we plan to conduct comparative single-cell transcriptome analysis across multiple species.
- Computational Tools for Improved Detection, Identification, and Classification of Plant Pathogens Using Genomics and MetagenomicsJohnson, Marcela Aguilera (Virginia Tech, 2023-02-13)Plant pathogens are one of the biggest threats to plant health and food security worldwide. To effectively contain plant disease outbreaks, classification and precise identification of pathogens is crucial to determine treatment and preventive measurements. Conventional methods of detection such as PCR may not be sufficient when the pathogen in question is unknown. Advances in sequencing technology have made it possible to sequence entire genomes and metagenomes in real-time and at a relatively low cost, opening an opportunity for the development of alternative methods for detection of novel and unknown plant pathogens. Within this dissertation, an integrated approach is used to reclassify a high-impact group of plant pathogens. Additionally, the application of metagenomics and nanopore sequencing using the Oxford Nanopore Technologies (ONT) MinION for fungal and bacterial plant pathogen detection and precise identification are demonstrated. To improve the classification of the strains belonging to the Ralstonia solanacearum species complex (RSSC), we performed a meta-analysis using a comparative genomics and a reverse ecology approach to accurately portray and refine the understanding of the diversity and evolution of the RSSC. The groups identified by these approaches were circumscribed and made publicly available through the LINbase web server so future isolates can be properly classified. To develop a culture-free detection method of plant pathogens, we used metagenomes of various plants and long-read nanopore sequencing to precisely identify plant pathogens to the strain-level and performed phylogenetic analysis with SNP resolution. In the first paper, we used tomato plants to demonstrate the detection power of bacterial plant pathogens. We compared bioinformatics tools for detection at the strain-level using reads and assemblies. In the second paper, we used a read-based approach to test the feasibility of the methodology to precisely detect the fungal pathogen causing boxwood blight. Lastly, with the improvement in nanopore sequencing, we used grapevine petioles to investigate whether we can go beyond detection and identification and do a phylogenetic analysis. We assembled a metagenome-assembled genome (MAG) of almost the same quality as the genomes obtained from cultured isolates and did a phylogenetic analysis with SNP resolution. Finally, for the cases where there may be no related genome in the database like the pathogen in question, we used machine learning and metagenomics to develop a reference-free approach to detection of plant diseases. We trained eight different machine learning models with reads from healthy and infected plant metagenomes and compared the classification accuracy of reads as belonging to a healthy or infected plant. From the comparison, random forest was the best model in terms of computational resources needed while maintaining a high accuracy (> 0.90).
- CoSpliceNet: a framework for co-splicing network inference from transcriptomics dataAghamirzaie, Delasa; Collakova, Eva; Li, Song; Grene, Ruth (BMC, 2016)Background: Alternative splicing has been proposed to increase transcript diversity and protein plasticity in eukaryotic organisms, but the extent to which this is the case is currently unclear, especially with regard to the diversification of molecular function. Eukaryotic splicing involves complex interactions of splicing factors and their targets. Inference of co-splicing networks capturing these types of interactions is important for understanding this crucial, highly regulated post-transcriptional process at the systems level. Results: First, several transcript and protein attributes, including coding potential of transcripts and differences in functional domains of proteins, were compared between splice variants and protein isoforms to assess transcript and protein diversity in a biological system. Alternative splicing was shown to increase transcript and functionrelated protein diversity in developing Arabidopsis embryos. Second, CoSpliceNet, which integrates co-expression and motif discovery at splicing regulatory regions to infer co-splicing networks, was developed. CoSpliceNet was applied to temporal RNA sequencing data to identify candidate regulators of splicing events and predict RNAbinding motifs, some of which are supported by prior experimental evidence. Analysis of inferred splicing factor targets revealed an unexpected role for the unfolded protein response in embryo development. Conclusions: The methods presented here can be used in any biological system to assess transcript diversity and protein plasticity and to predict candidate regulators, their targets, and RNA-binding motifs for splicing factors. CoSpliceNet is freely available at http://delasa.github.io/co-spliceNet/.
- Cyberbiosecurity Challenges of Pathogen Genome DatabasesVinatzer, Boris A.; Heath, Lenwood S.; Almohri, Hussain M.J.; Stulberg, Michael J.; Lowe, Christopher; Li, Song (Frontiers, 2019-05-15)Pathogen detection, identification, and tracking is shifting from non-molecular methods, DNA fingerprinting methods, and single gene methods to methods relying on whole genomes. Viral Ebola and influenza genome data are being used for real-time tracking, while food-borne bacterial pathogen outbreaks and hospital outbreaks are investigated using whole genomes in the UK, Canada, the USA and the other countries. Also, plant pathogen genomes are starting to be used to investigate plant disease epidemics such as the wheat blast outbreak in Bangladesh. While these genome-based approaches provide never-seen advantages over all previous approaches with regard to public health and biosecurity, they also come with new vulnerabilities and risks with regard to cybersecurity. The more we rely on genome databases, the more likely these databases will become targets for cyber-attacks to interfere with public health and biosecurity systems by compromising their integrity, taking them hostage, or manipulating the data they contain. Also, while there is the potential to collect pathogen genomic data from infected individuals or agricultural and food products during disease outbreaks to improve disease modeling and forecast, how to protect the privacy of individuals, growers, and retailers is another major cyberbiosecurity challenge. As data become linkable to other data sources, individuals and groups become identifiable and potential malicious activities targeting those identified become feasible. Here, we define a number of potential cybersecurity weaknesses in today's pathogen genome databases to raise awareness, and we provide potential solutions to strengthen cyberbiosecurity during the development of the next generation of pathogen genome databases.
- Developing machine learning tools to understand transcriptional regulation in plantsSong, Qi (Virginia Tech, 2019-09-09)Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance. Recent advances of genomic technologies have led to the generation of much genomic data for the model plant, Arabidopsis. To understand gene responses activated by specific external stress signals, these large-scale data sets need to be analyzed to generate new insight of gene functions in stress responses. This poses new computational challenges of mining gene associations and reconstructing regulatory interactions from large-scale data sets. In this dissertation, several computational tools were developed to address the challenges. In Chapter 2, ConSReg was developed to infer condition-specific regulatory interactions and prioritize transcription factors (TFs) that are likely to play condition specific regulatory roles. Comprehensive investigation was performed to optimize the performance of ConSReg and a systematic recovery of nitrogen response TFs was performed to evaluate ConSReg. In Chapter 3, CoReg was developed to infer co-regulation between genes, using only regulatory networks as input. CoReg was compared to other computational methods and the results showed that CoReg outperformed other methods. CoReg was further applied to identified modules in regulatory network generated from DAP-seq (DNA affinity purification sequencing). Using a large expression dataset generated under many abiotic stress treatments, many regulatory modules with common regulatory edges were found to be highly co-expressed, suggesting that target modules are structurally stable modules under abiotic stress conditions. In Chapter 4, exploratory analysis was performed to classify cell types for Arabidopsis root single cell RNA-seq data. This is a first step towards construction of a cell-type-specific regulatory network for Arabidopsis root cells, which is important for improving current understanding of stress response.
- Development of Open-Source Gantry-Plus Robot Systems for Plant Science researchKaundanya, Adwait Anand (Virginia Tech, 2024-12-19)Affordable and readily available automation options for plant research remain scarce, however with the availability of such a system, many research tasks can be streamlined. In this project, we demonstrate a prototype of such an open-source, low-cost, heterogeneous robotic system called Mini T-Rex. We combine two over-the-counter robots and leverage the ROS2 framework to control this heterogeneous system. This system provides a unique advantage of sensor-to-plant method to capture multi-view images at any angle and distance within the workspace. We demonstrate how making a digital twin in ROS2 can help to control a heterogeneous system via abstracted hardware control. We also talk about I2GROW Oasis which is a robotic system consisting of a remotely controlled robot with the ability to capture top-view images. In this thesis we describe the hardware and software design of both these robotic systems. To use this robotic system, the plants can be grown on a growth bed or a hydroponic system below the Mini T-Rex robot, and the camera will approach the plant without any contact with the plants due to the precise control of the robotic manipulator. We used the system to capture several large data sets of 3D phenotypic data for Solanum lycopersicum, Lactuca sativa, and Thlaspi. In conclusion, we have developed a 9-degree of freedom, fully open-source heterogeneous robotic system capable of multi-view, camera-to plant image capture for plant 3D model reconstruction called Mini T-Rex. We show how to use gantry like robots for phenotyping and create longitudinal datasets by automating these high precision robotic systems.
- Development of tools to study the association of transposons to agronomic traitsYan, Haidong (Virginia Tech, 2020-05-21)Transposable elements (Transposons; TEs) constitute the majority of DNA in genomes and are a major source of genetic polymorphisms. TEs act as potential regulators of gene expression and lead to phenotypic plasticity in plants and animals. In crops, several TEs were identified to influence alleles associated with important agronomic traits, such as apical dominance in maize and seed number in rice. Crops may harbor more TE-mediated genetic regulations than expected in view of multifunctional TEs in genomes. However, tools that accurately annotate TEs and clarify their associations with agronomic traits are still lacking, which largely limits applications of TEs in crop breeding. Here we 1) evaluate performances of popular tools and strategies to identify TEs in genomes, 2) develop a tool 'DeepTE' to annotate TEs based on deep learning models, and 3) develop a tool 'TE-marker' to identify potential TE-regulated alleles associated with agronomic traits. As a result, we propose a series of recommendations and a guideline to develop a comprehensive library to precisely identify TEs in genomes. Secondly, 'DeepTE' classifies TEs into 15-24 super families according to sequences from plants, metazoans, and fungi. For unknown sequences, this tool can distinguish non-TEs and TEs in plant species. Finally, the 'TE-marker' tool builds a TE-based marker system that is able to cluster rice populations similar to a classical SNP marker approach. This system can also detect association peaks that are equivalent to the ones produced by SNP markers. 'TE-marker' is a novel complementary approach to the classical SNP markers that it assists in revealing population structures and in identifying alleles associated with agronomic traits.
- Direct sequencing and expression analysis of a large number of miRNAs in Aedes aegypti and a multi-species survey of novel mosquito miRNAsLi, Song; Mead, Edward A.; Liang, Shaohui; Tu, Zhijian Jake (2009-12-04)Background MicroRNAs (miRNAs) are a novel class of gene regulators whose biogenesis involves hairpin structures called precursor miRNAs, or pre-miRNAs. A pre-miRNA is processed to make a miRNA:miRNA* duplex, which is then separated to generate a mature miRNA and a miRNA*. The mature miRNAs play key regulatory roles during embryonic development as well as other cellular processes. They are also implicated in control of viral infection as well as innate immunity. Direct experimental evidence for mosquito miRNAs has been recently reported in anopheline mosquitoes based on small-scale cloning efforts. Results We obtained approximately 130, 000 small RNA sequences from the yellow fever mosquito, Aedes aegypti, by 454 sequencing of samples that were isolated from mixed-age embryos and midguts from sugar-fed and blood-fed females, respectively. We also performed bioinformatics analysis on the Ae. aegypti genome assembly to identify evidence for additional miRNAs. The combination of these approaches uncovered 98 different pre-miRNAs in Ae. aegypti which could produce 86 distinct miRNAs. Thirteen miRNAs, including eight novel miRNAs identified in this study, are currently only found in mosquitoes. We also identified five potential revisions to previously annotated miRNAs at the miRNA termini, two cases of highly abundant miRNA* sequences, 14 miRNA clusters, and 17 cases where more than one pre-miRNA hairpin produces the same or highly similar mature miRNAs. A number of miRNAs showed higher levels in midgut from blood-fed female than that from sugar-fed female, which was confirmed by northern blots on two of these miRNAs. Northern blots also revealed several miRNAs that showed stage-specific expression. Detailed expression analysis of eight of the 13 mosquito-specific miRNAs in four divergent mosquito genera identified cases of clearly conserved expression patterns and obvious differences. Four of the 13 miRNAs are specific to certain lineage(s) within mosquitoes. Conclusion This study provides the first systematic analysis of miRNAs in Ae. aegypti and offers a substantially expanded list of miRNAs for all mosquitoes. New insights were gained on the evolution of conserved and lineage-specific miRNAs in mosquitoes. The expression profiles of a few miRNAs suggest stage-specific functions and functions related to embryonic development or blood feeding. A better understanding of the functions of these miRNAs will offer new insights in mosquito biology and may lead to novel approaches to combat mosquito-borne infectious diseases.
- Ecology of Root Nodule Bacterial Diversity: Implications for Soybean GrowthSharaf, Hazem (Virginia Tech, 2021-11-30)Diazotrophs supply legumes such as soybean (Glycine max L. Merr) with nitrogen (N) needed for protein synthesis through biological nitrogen fixation (BNF). Through BNF, these bacteria such as Bradyrhizobium that reside in soybean root nodules, convert atmospheric nitrogen (N2) into ammonia (NH3/ NH4), a form that is biologically available for use by the plants, in return for photosynthate carbon from the plant. Abiotic stresses such as drought disrupt BNF and subsequently affects soybean yield. In addition, increasing demand for soybean is leading to supplementing its growth with synthetic N fertilizer. However, fertilizer application is known for its detrimental effects on the environment causing waterways eutrophication contributing to global warming. On the other hand, diazotrophs can supply soybean with up to 90% of N need. As such, improving the understanding and exploiting the relationship between soybean and diazotrophs is key to promoting the sustainable growing of soybean. This dissertation here investigates three main questions. First, how the soybean-diazotrophs respond to changes in water such as rainfall and irrigation. Second, how changes in these bacterial diazotrophs are related to levels of BNF, and N-related soybean molecular markers. Finally, as my colleagues and I found non-diazotrophs in the nodules of some soybean plants, I was curious about the role they are playing inside the nodules in concert with the diazotrophs. The main hypotheses tested in this dissertation are that root nodule bacterial community (bacteriome) would (1) vary by plant type, (2) respond to changes in water, and (3) be related to BNF. To answer the research questions, I devised the dissertation as follows. In Chapter 2, my colleagues and I used nine commercial cultivars of soybean that vary in drought tolerance and agronomic traits. We show that soybean sometimes, but not always, harbor a consortium of non-nitrogen fixing bacteria belonging to Pseudomonadaceae and Enterobacteriaceae families. However, as expected, nodules diazotrophs rather than non-diazotrophs responded most to changes in soil water status. In chapter 3, I used a collection of 24 genotypes of soybean that vary in their ability to fix nitrogen. The results revealed that the bacteriome diazotroph alpha diversity metrics, phylogenetic richness and evenness, was correlated with changes in BNF. Moreover, few N-related molecular markers were associated with some of the bacteria. However, we have also observed a strong effect of the environment on the diazotroph driven process of BNF (i.e. 39%-75%). For chapter 4, we sequenced three of the Pseudomonas spp. strains that were subsequently recovered again from a diversity of soybean nodules in field trials. I found that one of the strains has the ability to adapt to the nodule's unique hypoxic conditions, supporting Bradyrhizobium nodulation and possibly nodule iron. The results include the draft assembly of the proposed Pseudomonas nodulensis sp. nov. as a novel species of nodule adapted bacteria belonging to the P. fluorescens complex. The results of this dissertation contribute to the basic knowledge needed to advance sustainable breeding and management of soybean. Nodule diazotrophs are sensitive to water status e.g. drought, and other experiments have shown that the nodule bacteriome is the driver of BNF. Thus, improving the understanding and exploiting the nodule bacteriome will support developing more resilient cultivars of soybean that are efficient in BNF, and tolerant of stress. Identifying and testing diazotrophs and atypical nodule bacteria will provide a platform for developing new inoculants and biofertilizers.
- Economic and chemometric studies to supplement food-grade soybean variety development in the Mid-Atlantic regionLord, Nilanka (Virginia Tech, 2021-01-07)Sustainability of the soybean industry relies on the growth of new industries and the continued improvement of seeds for utilization. Grower adoption and growth of the edamame industry has been slow in part due to insufficient information on its potential profitability and marketability. As such, the first and second objectives of this thesis aimed at 1) determining production costs of hand-harvested fresh edamame enterprise and 2) exploring consumer willingness-to-pay (WTP) for fresh, local, organic, and "on-the-stalk" marketed edamame. Sucrose, raffinose, and stachyose sugars hold tremendous implications for utilization of soybean seeds in livestock, soyfood, and probiotics industries. Current sugar phenotyping methods using high-performance liquid chromatography (HPLC) are costly and inefficient. Therefore, the third objective of this study was to develop calibrations to predict sugar content using near-infrared reflectance spectroscopy (NIRS). Results showed that labor accounted for 72% of production costs for edamame pods, which largely limits its profit potential. Mean WTP for fresh and local edamame exceeded their frozen and non-local counterparts by 94 and 88 cents, respectively. In addition, mean WTP for organic edamame exceeded non-GMO edamame by 33 cents. Pro-environmental attitudes appeared to be a consistent driver of WTP these three attributes. Meanwhile, a 40-cent discount for "on-the-stalk" edamame compared to pods indicates convenience may also be a factor in edamame marketability. Calibration development for sucrose and stachyose was successful, with R2cal, R2cv, RMSEC, and RMSECV of 0.901, 0.869, 0.516, and 0.596, and 0.911, 0.891, 0.361, and 0.405, respectively. Alternative methods should be investigated for quantification of raffinose.
- An Empirical Study of API Breaking Changes in BioconductorChowdhury, Hemayet Ahmed (Virginia Tech, 2023-01-10)Bioconductor is the second largest R software package repository that is primarily used for the analysis of genomic and biological data. With downloads exceeding millions in recent years, the widespread growth of the repository's adoption can be attributed to it's diverse selection of community-created packages, written in the programming language R, that allow statistical methodologies for analysis and modelling of data. However, as these packages evolve, their APIs go through changes that can break existing user code. Fixing these API breaking changes whenever a package is updated can be frustrating and time-consuming, especially since a large fraction of the user community are researchers who do not necessarily have software engineering background. In that context, we first present a tool that can detect syntactic API breaking changes between two released versions of a library written in R through static analysis of the package source code. This tool can be of utility to R package developers, so that they can more comprehensively report or handle the breaking changes in their releases, and to R package users, who want to be aware of the API differences that may exist between two releases before upgrading the libraries in their code. Through the use of this tool and manual inspection, we also conducted an empirical study of the breaking changes and backward incompatibility in Bioconductor packages. We studied the 100 most downloaded packages in the repository and found that 28% of all packages releases are backward incompatible. We also found that 55% of these breaking changes go undocumented and developers don't maintain semantic versioning for 22% of the releases. Finally, we manually inspected 10 library releases that consisted of breaking changes and found 2% of the API-s to affect 31 client projects.
- Explainable Interactive Projections for Image DataHan, Huimin (Virginia Tech, 2023-01-12)Making sense of large collections of images is difficult. Dimension reductions (DR) assist by organizing images in a 2D space based on similarities, but provide little support for explaining why images were placed together or apart in the 2D space. Additionally, they do not provide support for modifying and updating the 2D space to explore new relationships and organizations of images. To address these problems, we present an interactive DR method for images that uses visual features extracted by a deep neural network to project the images into 2D space and provides visual explanations of image features that contributed to the 2D location. In addition, it allows people to directly manipulate the 2D projection space to define alternative relationships and explore subsequent projections of the images. With an iterative cycle of semantic interaction and explainable-AI feedback, people can explore complex visual relationships in image data. Our approach to human-AI interaction integrates visual knowledge from both human mental models and pre-trained deep neural models to explore image data. Two usage scenarios are provided to demonstrate that our method is able to capture human feedback and incorporate it into the model. Our visual explanations help bridge the gap between the feature space and the original images to illustrate the knowledge learned by the model, creating a synergy between human and machine that facilitates a more complete analysis experience.
- Exploring transposable element-based markers to identify allelic variations underlying agronomic traits in riceYan, Haidong; Haak, David C.; Li, Song; Huang, Linkai; Bombarely, Aureliano (Elsevier, 2022-05-09)Transposable elements (TEs) are a major force in the production of new alleles during domestication; nevertheless, their use in association studies has been limited because of their complexity. We have developed a TE genotyping pipeline (TEmarker) and applied it to whole-genome genome-wide association study (GWAS) data from 176 Oryza sativa subsp. japonica accessions to identify genetic elements associated with specific agronomic traits. TE markers recovered a large proportion (69%) of single-nucleotide polymorphism (SNP)-based GWAS peaks, and these TE peaks retained ca. 25% of the SNPs. The use of TEs in GWASs may reduce false positives associated with linkage disequilibrium (LD) among SNP markers. A genome scan revealed positive selection on TEs associated with agronomic traits. We found several cases of insertion and deletion variants that potentially resulted from the direct action of TEs, including an allele of LOC_Os11g08410 associated with plant height and panicle length traits. Together, these findings reveal the utility of TE markers for connecting genotype to phenotype and suggest a potential role for TEs in influencing phenotypic variations in rice that impact agronomic traits.
- «
- 1 (current)
- 2
- 3
- »