Using whole genome sequence to compare variant callers and breed differences of US sheep

dc.contributor.authorStegemiller, Morgan R.en
dc.contributor.authorRedden, Reid R.en
dc.contributor.authorNotter, David R.en
dc.contributor.authorTaylor, Todden
dc.contributor.authorTaylor, J. Breten
dc.contributor.authorCockett, Noelle E.en
dc.contributor.authorHeaton, Michael P.en
dc.contributor.authorKalbfleisch, Theodore S.en
dc.contributor.authorMurdoch, Brenda M.en
dc.date.accessioned2023-04-04T15:06:26Zen
dc.date.available2023-04-04T15:06:26Zen
dc.date.issued2023-01-04en
dc.description.abstractAs whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.en
dc.description.notesThis research was funded by Agriculture and Food Research Initiative Hatch grant no. IDA01566. Additional support for this research was provided by the USDA Agricultural Research Service (ARS project number 5438-32000-033-00D, MPH) and used resources provided by the SCINet (ARS project number 0500-00093-001-00-D).en
dc.description.sponsorshipAgriculture and Food Research Initiative Hatch [IDA01566]; USDA Agricultural Research Service (ARS) [5438-32000-033-00D]; SCINet [0500-00093-001-00-D]en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.3389/fgene.2022.1060882en
dc.identifier.eissn1664-8021en
dc.identifier.other1060882en
dc.identifier.pmid36685812en
dc.identifier.urihttp://hdl.handle.net/10919/114249en
dc.identifier.volume13en
dc.language.isoenen
dc.publisherFrontiersen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectsheepen
dc.subjectwhole genome sequenceen
dc.subjectfreebayesen
dc.subjectGATK HaplotypeCaller (HC)en
dc.subjectvariant callersen
dc.titleUsing whole genome sequence to compare variant callers and breed differences of US sheepen
dc.title.serialFrontiers in Geneticsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fgene-13-1060882.pdf
Size:
1.22 MB
Format:
Adobe Portable Document Format
Description:
Published version