Browsing by Author "Yu, Haipeng"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
- An assessment of genomic connectedness measures in Nellore cattleAmorim, Sabrina T.; Yu, Haipeng; Momen, Mehdi; de Albuquerque, Lucia Galvao; Cravo Pereira, Angelica S.; Baldi, Fernando; Morota, Gota (2020-11)An important criterion to consider in genetic evaluations is the extent of genetic connectedness across management units (MU), especially if they differ in their genetic mean. Reliable comparisons of genetic values across MU depend on the degree of connectedness: the higher the connectedness, the more reliable the comparison. Traditionally, genetic connectedness was calculated through pedigree-based methods; however, in the era of genomic selection, this can be better estimated utilizing new approaches based on genomics. Most procedures consider only additive genetic effects, which may not accurately reflect the underlying gene action of the evaluated trait, and little is known about the impact of non-additive gene action on connectedness measures. The objective of this study was to investigate the extent of genomic connectedness measures, for the first time, in Brazilian field data by applying additive and non-additive relationship matrices using a fatty acid profile data set from seven farms located in the three regions of Brazil, which are part of the three breeding programs. Myristic acid (C14:0) was used due to its importance for human health and reported presence of non-additive gene action. The pedigree included 427,740 animals and 925 of them were genotyped using the Bovine high-density genotyping chip. Six relationship matrices were constructed, parametrically and non-parametrically capturing additive and non-additive genetic effects from both pedigree and genomic data. We assessed genome-based connectedness across MU using the prediction error variance of difference (PEVD) and the coefficient of determination (CD). PEVD values ranged from 0.540 to 1.707, and CD from 0.146 to 0.456. Genomic information consistently enhanced the measures of connectedness compared to the numerator relationship matrix by at least 63%. Combining additive and non-additive genomic kernel relationship matrices or a non-parametric relationship matrix increased the capture of connectedness. Overall, the Gaussian kernel yielded the largest measure of connectedness. Our findings showed that connectedness metrics can be extended to incorporate genomic information and non-additive genetic variation using field data. We propose that different genomic relationship matrices can be designed to capture additive and non-additive genetic effects, increase the measures of connectedness, and to more accurately estimate the true state of connectedness in herds.
- Deciphering Cattle Temperament Measures Derived From a Four-Platform Standing Scale Using Genetic Factor Analytic ModelingYu, Haipeng; Morota, Gota; Celestino, Elfren F., Jr.; Dahlen, Carl R.; Wagner, Sarah A.; Riley, David G.; Hulsman Hanna, Lauren L. (2020-06-12)The animal's reaction to human handling (i.e., temperament) is critical for work safety, productivity, and welfare. Subjective phenotyping methods have been traditionally used in beef cattle production. Even so, subjective scales rely on the evaluator's knowledge and interpretation of temperament, which may require substantial experience. Selection based on such subjective scores may not precisely change temperament preferences in cattle. The objectives of this study were to investigate the underlying genetic interrelationships among temperament measurements using genetic factor analytic modeling and validate a movement-based objective method (four-platform standing scale, FPSS) as a measure of temperament. Relationships among subjective methods of docility score (DS), temperament score (TS), 12 qualitative behavior assessment (QBA) attributes and objective FPSS including the standard deviation of total weight on FPSS over time (SSD) and coefficient of variation of SSD (CVSSD) were investigated using 1,528 calves at weaning age. An exploratory factor analysis (EFA) identified two latent variables account for TS and 12 QBA attributes, termeddifficultandeasyfrom their characteristics. Inclusion of DS in EFA was not a good fit because it was evaluated under restraint and other measures were not. A Bayesian confirmatory factor analysis inferred thedifficultandeasyscores discovered in EFA. This was followed by fitting a pedigree-based Bayesian multi-trait model to characterize the genetic interrelationships amongdifficult, easy, DS, SSD, and CVSSD. Estimates of heritability ranged from 0.18 to 0.4 with the posterior standard deviation averaging 0.06. The factors ofdifficultandeasyexhibited a large negative genetic correlation of -0.92. Moderate genetic correlation was found between DS anddifficult(0.36),easy(-0.31), SSD (0.42), and CVSSD (0.34) as well as FPSS withdifficult(CVSSD: 0.35; SSD: 0.42) andeasy(CVSSD: -0.35; SSD: -0.4). Correlation coefficients indicate selection could be performed with either and have similar outcomes. We contend that genetic factor analytic modeling provided a new approach to unravel the complexity of animal behaviors and FPSS-like measures could increase the efficiency of genetic selection by providing automatic, objective, and consistent phenotyping measures that could be an alternative of DS, which has been widely used in beef production.
- Designing and modeling high-throughput phenotyping data in quantitative geneticsYu, Haipeng (Virginia Tech, 2020-04-09)Quantitative genetics aims to bridge the genome to phenome gap. The advent of high-throughput genotyping technologies has accelerated the progress of genome to phenome mapping, but a challenge remains in phenotyping. Various high-throughput phenotyping (HTP) platforms have been developed recently to obtain economically important phenotypes in an automated fashion with less human labor and reduced costs. However, the effective way of designing HTP has not been investigated thoroughly. In addition, high-dimensional HTP data bring up a big challenge for statistical analysis by increasing computational demands. A new strategy for modeling high-dimensional HTP data and elucidating the interrelationships among these phenotypes are needed. Previous studies used pedigree-based connectetdness statistics to study the design of phenotyping. The availability of genetic markers provides a new opportunity to evaluate connectedness based on genomic data, which can serve as a means to design HTP. This dissertation first discusses the utility of connectedness spanning in three studies. In the first study, I introduced genomic connectedness and compared it with traditional pedigree-based connectedness. The relationship between genomic connectedness and prediction accuracy based on cross-validation was investigated in the second study. The third study introduced a user-friendly connectedness R package, which provides a suite of functions to evaluate the extent of connectedness. In the last study, I proposed a new statistical approach to model high-dimensional HTP data by leveraging the combination of confirmatory factor analysis and Bayesian network. Collectively, the results from the first three studies suggested the potential usefulness of applying genomic connectedness to design HTP. The statistical approach I introduced in the last study provides a new avenue to model high-dimensional HTP data holistically to further help us understand the interrelationships among phenotypes derived from HTP.
- Forecasting dynamic body weight of nonrestrained pigs from images using an RGB-D sensor cameraYu, Haipeng; Lee, Kiho; Morota, Gota (Oxford University Press, 2021-01-01)Average daily gain is an indicator of the growth rate, feed efficiency, and current health status of livestock species including pigs. Continuous monitoring of daily gain in pigs aids producers to optimize their growth performance while ensuring animal welfare and sustainability, such as reducing stress reactions and feed waste. Computer vision has been used to predict live body weight from video images without direct handling of the pig. In most studies, videos were taken while pigs were immobilized at a weighing station or feeding area to facilitate data collection. An alternative approach is to capture videos while pigs are allowed to move freely within their own housing environment, which can be easily applied to the production system as no special imaging station needs to be established. The objective of this study was to establish a computer vision system by collecting RGB-D videos to capture top-view red, green, and blue (RGB) and depth images of nonrestrained, growing pigs to predict their body weight over time. Over a period of 38 d, eight growers were video recorded for approximately 3 min/d, at the rate of six frames per second, and manually weighed using an electronic scale. An image-processing pipeline in Python using OpenCV was developed to process the images. Specifically, each pig within the RGB frame was segmented by a thresholding algorithm, and the contour of the pig was identified to extract its length and width. The height of a pig was estimated from the depth images captured by the infrared depth sensor. Quality control included removing pigs that were touching the fence and sitting, as well as those showing extremely distorted shape or motion blur owing to their frequent movement. Fitting all of the morphological image descriptors simultaneously in linear mixed models yielded prediction coefficients of determination of 0.72-0.98, 0.65-0.95, 0.51-0.94, and 0.49-0.93 for 1-, 2-, 3-, and 4-d ahead forecasting, respectively, of body weight in time series cross-validation. Based on the results, we conclude that our RGB-D sensor-based imaging system coupled with the Python image-processing pipeline could potentially provide an effective approach to predict the live body weight of nonrestrained pigs from images.
- GCA: an R package for genetic connectedness analysis using pedigree and genomic dataYu, Haipeng; Morota, Gota (2021-02-15)Background Genetic connectedness is a critical component of genetic evaluation as it assesses the comparability of predicted genetic values across units. Genetic connectedness also plays an essential role in quantifying the linkage between reference and validation sets in whole-genome prediction. Despite its importance, there is no user-friendly software tool available to calculate connectedness statistics. Results We developed the GCA R package to perform genetic connectedness analysis for pedigree and genomic data. The software implements a large collection of various connectedness statistics as a function of prediction error variance or variance of unit effect estimates. The GCA R package is available at GitHub and the source code is provided as open source. Conclusions The GCA R package allows users to easily assess the connectedness of their data. It is also useful to determine the potential risk of comparing predicted genetic values of individuals across units or measure the connectedness level between training and testing sets in genomic prediction.
- Genomic Bayesian Confirmatory Factor Analysis and Bayesian Network To Characterize a Wide Spectrum of Rice PhenotypesYu, Haipeng; Campbell, Malachy T.; Zhang, Qi; Walia, Harkamal; Morota, Gota (Genetics Society of America, 2019-06-01)With the advent of high-throughput phenotyping platforms, plant breeders have a means to assess many traits for large breeding populations. However, understanding the genetic interdependencies among high-dimensional traits in a statistically robust manner remains a major challenge. Since multiple phenotypes likely share mutual relationships, elucidating the interdependencies among economically important traits can better inform breeding decisions and accelerate the genetic improvement of plants. The objective of this study was to leverage confirmatory factor analysis and graphical modeling to elucidate the genetic interdependencies among a diverse agronomic traits in rice. We used a Bayesian network to depict conditional dependencies among phenotypes, which can not be obtained by standard multi-trait analysis. We utilized Bayesian confirmatory factor analysis which hypothesized that 48 observed phenotypes resulted from six latent variables including grain morphology, morphology, flowering time, physiology, yield, and morphological salt response. This was followed by studying the genetics of each latent variable, which is also known as factor, using single nucleotide polymorphisms. Bayesian network structures involving the genomic component of six latent variables were established by fitting four algorithms (i.e., Hill Climbing, Tabu, Max-Min Hill Climbing, and General 2-Phase Restricted Maximization algorithms). Physiological components influenced the flowering time and grain morphology, and morphology and grain morphology influenced yield. In summary, we show the Bayesian network coupled with factor analysis can provide an effective approach to understand the interdependence patterns among phenotypes and to predict the potential influence of external interventions or selection related to target traits in the interrelated complex traits systems.
- Identification of Quantitative Disease Resistance Loci Toward Four Pythium Species in SoybeanClevinger, Elizabeth M.; Biyashev, Ruslan M.; Lerch-Olson, Elizabeth; Yu, Haipeng; Quigley, Charles; Song, Qijian; Dorrance, Anne E.; Robertson, Alison E.; Saghai-Maroof, Mohammad A. (Frontiers, 2021-03-30)In this study, four recombinant inbred line (RIL) soybean populations were screened for their response to infection by Pythium sylvaticum, Pythium irregulare, Pythium oopapillum, and Pythium torulosum. The parents, PI 424237A, PI 424237B, PI 408097, and PI 408029, had higher levels of resistance to these species in a preliminary screening and were crossed with “Williams,” a susceptible cultivar. A modified seed rot assay was used to evaluate RIL populations for their response to specific Pythium species selected for a particular population based on preliminary screenings. Over 2500 single-nucleotide polymorphism (SNP) markers were used to construct chromosomal maps to identify regions associated with resistance to Pythium species. Several minor and large effect quantitative disease resistance loci (QDRL) were identified including one large effect QDRL on chromosome 8 in the population of PI 408097 × Williams. It was identified by two different disease reaction traits in P. sylvaticum, P. irregulare, and P. torulosum. Another large effect QDRL was identified on chromosome 6 in the population of PI 408029 × Williams, and conferred resistance to P. sylvaticum and P. irregulare. These large effect QDRL will contribute toward the development of improved soybean cultivars with higher levels of resistance to these common soil-borne pathogens.
- Modeling multiple phenotypes in wheat using data-driven genomic exploratory factor analysis and Bayesian network learningMomen, Mehdi; Bhatta, Madhav; Hussain, Waseem; Yu, Haipeng; Morota, Gota (2021-01)Inferring trait networks from a large volume of genetically correlated diverse phenotypes such as yield, architecture, and disease resistance can provide information on the manner in which complex phenotypes are interrelated. However, studies on statistical methods tailored to multidimensional phenotypes are limited, whereas numerous methods are available for evaluating the massive number of genetic markers. Factor analysis operates at the level of latent variables predicted to generate observed responses. The objectives of this study were to illustrate the manner in which data-driven exploratory factor analysis can map observed phenotypes into a smaller number of latent variables and infer a genomic latent factor network using 45 agro-morphological, disease, and grain mineral phenotypes measured in synthetic hexaploid wheat lines (Triticum aestivum L.). In total, eight latent factors including grain yield, architecture, flag leaf-related traits, grain minerals, yellow rust, two types of stem rust, and leaf rust were identified as common sources of the observed phenotypes. The genetic component of the factor scores for each latent variable was fed into a Bayesian network to obtain a trait structure reflecting the genetic interdependency among traits. Three directed paths were consistently identified by two Bayesian network algorithms. Flag leaf-related traits influenced leaf rust, and yellow rust and stem rust influenced grain yield. Additional paths that were identified included flag leaf-related traits to minerals and minerals to architecture. This study shows that data-driven exploratory factor analysis can reveal smaller dimensional common latent phenotypes that are likely to give rise to numerous observed field phenotypes without relying on prior biological knowledge. The inferred genomic latent factor structure from the Bayesian network provides insights for plant breeding to simultaneously improve multiple traits, as an intervention on one trait will affect the values of focal phenotypes in an interrelated complex trait system.
- Multi-omic data integration for the study of production, carcass, and meat quality traits in Nellore cattlede Novais, Francisco Jose; Yu, Haipeng; Cesar, Aline Silva Mello; Momen, Mehdi; Poleti, Mirele Daiana; Petry, Bruna; Mourao, Gerson Barreto; Regitano, Luciana Correia de Almeida; Morota, Gota; Coutinho, Luiz Lehmann (Frontiers, 2022-10)Data integration using hierarchical analysis based on the central dogma or common pathway enrichment analysis may not reveal non-obvious relationships among omic data. Here, we applied factor analysis (FA) and Bayesian network (BN) modeling to integrate different omic data and complex traits by latent variables (production, carcass, and meat quality traits). A total of 14 latent variables were identified: five for phenotype, three for miRNA, four for protein, and two for mRNA data. Pearson correlation coefficients showed negative correlations between latent variables miRNA 1 (mirna1) and miRNA 2 (mirna2) (-0.47), ribeye area (REA) and protein 4 (prot4) (-0.33), REA and protein 2 (prot2) (-0.3), carcass and prot4 (-0.31), carcass and prot2 (-0.28), and backfat thickness (BFT) and miRNA 3 (mirna3) (-0.25). Positive correlations were observed among the four protein factors (0.45-0.83): between meat quality and fat content (0.71), fat content and carcass (0.74), fat content and REA (0.76), and REA and carcass (0.99). BN presented arcs from the carcass, meat quality, prot2, and prot4 latent variables to REA; from meat quality, REA, mirna2, and gene expression mRNA1 to fat content; from protein 1 (prot1) and mirna2 to protein 5 (prot5); and from prot5 and carcass to prot2. The relations of protein latent variables suggest new hypotheses about the impact of these proteins on REA. The network also showed relationships among miRNAs and nebulin proteins. REA seems to be the central node in the network, influencing carcass, prot2, prot4, mRNA1, and meat quality, suggesting that REA is a good indicator of meat quality. The connection among miRNA latent variables, BFT, and fat content relates to the influence of miRNAs on lipid metabolism. The relationship between mirna1 and prot5 composed of isoforms of nebulin needs further investigation. The FA identified latent variables, decreasing the dimensionality and complexity of the data. The BN was capable of generating interrelationships among latent variables from different types of data, allowing the integration of omics and complex traits and identifying conditional independencies. Our framework based on FA and BN is capable of generating new hypotheses for molecular research, by integrating different types of data and exploring non-obvious relationships.