Comprehensive description of genomewide nucleotide and structural variation in short-season soya bean.
ABSTRACT: Next-generation sequencing (NGS) and bioinformatics tools have greatly facilitated the characterization of nucleotide variation; nonetheless, an exhaustive description of both SNP haplotype diversity and of structural variation remains elusive in most species. In this study, we sequenced a representative set of 102 short-season soya beans and achieved an extensive coverage of both nucleotide diversity and structural variation (SV). We called close to 5M sequence variants (SNPs, MNPs and indels) and noticed that the number of unique haplotypes had plateaued within this set of germplasm (1.7M tag SNPs). This data set proved highly accurate (98.6%) based on a comparison of called genotypes at loci shared with a SNP array. We used this catalogue of SNPs as a reference panel to impute missing genotypes at untyped loci in data sets derived from lower density genotyping tools (150 K GBS-derived SNPs/530 samples). After imputation, 96.4% of the missing genotypes imputed in this fashion proved to be accurate. Using a combination of three bioinformatics pipelines, we uncovered ~92 K SVs (deletions, insertions, inversions, duplications, CNVs and translocations) and estimated that over 90% of these were accurate. Finally, we noticed that the duplication of certain genomic regions explained much of the residual heterozygosity at SNP loci in otherwise highly inbred soya bean accessions. This is the first time that a comprehensive description of both SNP haplotype diversity and SV has been achieved within a regionally relevant subset of a major crop.
Project description:White mould of soya bean, caused by Sclerotinia sclerotiorum (Lib.) de Bary, is a necrotrophic fungus capable of infecting a wide range of plants. To dissect the genetic architecture of resistance to white mould, a high-density customized single nucleotide polymorphism (SNP) array (52 041 SNPs) was used to genotype two soya bean diversity panels. Combined with resistance variation data observed in the field and greenhouse environments, genome-wide association studies (GWASs) were conducted to identify quantitative trait loci (QTL) controlling resistance against white mould. Results showed that 16 and 11 loci were found significantly associated with resistance in field and greenhouse, respectively. Of these, eight loci localized to previously mapped QTL intervals and one locus had significant associations with resistance across both environments. The expression level changes in genes located in GWAS-identified loci were assessed between partially resistant and susceptible genotypes through a RNA-seq analysis of the stem tissue collected at various time points after inoculation. A set of genes with diverse biological functionalities were identified as strong candidates underlying white mould resistance. Moreover, we found that genomic prediction models outperformed predictions based on significant SNPs. Prediction accuracies ranged from 0.48 to 0.64 for disease index measured in field experiments. The integrative methods, including GWAS, RNA-seq and genomic selection (GS), applied in this study facilitated the identification of causal variants, enhanced our understanding of mechanisms of white mould resistance and provided valuable information regarding breeding for disease resistance through genomic selection in soya bean.
Project description:Single-nucleotide polymorphisms (SNPs) provide an abundant source of DNA polymorphisms in a number of eukaryotic species. Information on the frequency, nature, and distribution of SNPs in plant genomes is limited. Thus, our objectives were (1) to determine SNP frequency in coding and noncoding soybean (Glycine max L. Merr.) DNA sequence amplified from genomic DNA using PCR primers designed to complete genes, cDNAs, and random genomic sequence; (2) to characterize haplotype variation in these sequences; and (3) to provide initial estimates of linkage disequilibrium (LD) in soybean. Approximately 28.7 kbp of coding sequence, 37.9 kbp of noncoding perigenic DNA, and 9.7 kbp of random noncoding genomic DNA were sequenced in each of 25 diverse soybean genotypes. Over the >76 kbp, mean nucleotide diversity expressed as Watterson's theta was 0.00097. Nucleotide diversity was 0.00053 and 0.00111 in coding and in noncoding perigenic DNA, respectively, lower than estimates in the autogamous model species Arabidopsis thaliana. Haplotype analysis of SNP-containing fragments revealed a deficiency of haplotypes vs. the number that would be anticipated at linkage equilibrium. In 49 fragments with three or more SNPs, five haplotypes were present in one fragment while four or less were present in the remaining 48, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean. Squared allele-frequency correlations (r(2)) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes. To facilitate SNP discovery and the development of a transcript map, subsets of four to six diverse genotypes, whose sequence analysis would permit the discovery of at least 75% of all SNPs present in the 25 genotypes as well as 90% of the common (frequency >0.10) SNPs, were identified.
Project description:We explored the use of the eco-physiological crop model GECROS to identify markers for improved rice yield under well-watered (control) and water-deficit conditions. Eight model parameters were measured from the control in one season for 267 indica genotypes. The model accounted for 58% of yield variation among genotypes under control and 40% under water-deficit conditions. Using 213 randomly selected genotypes as training set, 90 SNP loci were identified using GWAS, explaining 42-77% of crop-model parameter variation. SNPs-based parameter values estimated from the additive loci effects were fed into the model. For the training set, the SNPs-based model accounted for 37% (control) and 29% (water-deficit) of yield variation, less than the 78% explained by a statistical genomic prediction (GP) model for the control treatment. Both models failed in predicting yields of the 54 testing genotypes. However, compared with the GP model, the SNPs-based crop model was advantageous when simulating yields under either control or water-stress conditions in an independent season. Crop-model sensitivity analysis ranked the SNP loci for their relative importance in accounting for yield variation, and the rank differed greatly between control and water-deficit environments. Crop models have a potential to use single-environment information for predicting phenotypes under different environments.
Project description:Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis and public release of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with geographic distance from Africa, as expected under a serial founder effect for an out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations. Keywords: High Density SNP array Overall design: We genotyped 513 individuals drawn from 29 populations of the Human Genome Diversity Panel (HGDP; http://www.cephb.fr/HGDP-CEPH-Panel/) using Illumina Infinium HumanHap550 Genotyping BeadChips (Illumina Inc., San Diego, CA, www.illumina.com).
Project description:The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies.
Project description:Peronospora effusa is an obligate pathogen that causes downy mildew on spinach and is considered the most economically important disease of spinach. The objective of the current research was to assess genetic diversity of known historical races and isolates collected in 2014 from production fields in Yuma, Arizona and Salinas Valley, California. Candidate neutral single nucleotide polymorphisms (SNPs) were identified by comparing sequence data from reference isolates of known races of the pathogen collected in 2009 and 2010. Genotypes were assessed using targeted sequencing on genomic DNA extracted directly from infected plant tissue. Genotyping 26 historical and 167 contemporary samples at 46 SNP loci revealed 82 unique multi-locus genotypes. The unique genotypes clustered into five groups and the majority of isolates collected in 2014 were genetically closely related, regardless of source location. The historical samples, representing several races, showed greater genetic differentiation. Overall, the SNP data indicate much of the genotypic variation found within fields was produced during asexual development, whereas overall genetic diversity may be influenced by sexual recombination on broader geographical and temporal scales.
Project description:Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.
Project description:Large nitrogen, phosphorus and potassium fertilizer inputs are used in many crop systems. Identifying genetic loci controlling nutrient accumulation may be useful in crop breeding strategies to increase fertilizer use efficiency and reduce financial and environmental costs. Here, variation in leaf nitrate concentration across a diversity population of 383 genotypes of Brassica napus was characterized. Genetic loci controlling variation in leaf nitrate, phosphorus and potassium concentration were then identified through Associative Transcriptomics using single nucleotide polymorphism (SNP) markers and gene expression markers (GEMs). Leaf nitrate concentration varied over 8-fold across the diversity population. A total of 455 SNP markers were associated with leaf nitrate concentration after false-discovery-rate (FDR) correction. In linkage disequilibrium of highly associated markers are a number of known nitrate transporters and sensors, including a gene thought to mediate expression of the major nitrate transporter NRT1.1. Several genes influencing root and root-hair development co-localize with chromosomal regions associated with leaf P concentration. Orthologs of three ABC-transporters involved in suberin synthesis in roots also co-localize with association peaks for both leaf nitrate and phosphorus. Allelic variation at nearby, highly associated SNPs confers large variation in leaf nitrate and phosphorus concentration. A total of five GEMs associated with leaf K concentration after FDR correction including a GEM that corresponds to an auxin-response family protein. Candidate loci, genes and favorable alleles identified here may prove useful in marker-assisted selection strategies to improve fertilizer use efficiency in B. napus.
Project description:Although growing numbers of single nucleotide polymorphisms (SNPs) and microsatellites (short tandem repeat polymorphisms or STRPs) are used to infer population structure, their relative properties in this context remain poorly understood. SNPs and STRPs mutate differently, suggesting multi-locus genotypes at these loci might differ in ability to detect population structure. Here, we use coalescent simulations to measure the power of sets of SNPs and STRPs to identify population structure. To maximize the applicability of our results to empirical studies, we focus on the popular STRUCTURE analysis and evaluate the role of several biological and practical factors in the detection of population structure. We find that: (1) fewer unlinked STRPs than SNPs are needed to detect structure at recent divergence times <0.3 N(e) generations; (2) accurate estimation of the number of populations requires many fewer STRPs than SNPs; (3) for both marker types, declines in power due to modest gene flow (N(e)m=1.0) are largely negated by increasing marker number; (4) variation in the STRP mutational model affects power modestly; (5) SNP haplotypes (θ=1, no recombination) provide power comparable with STRP loci (θ=10); (6) ascertainment schemes that select highly variable STRP or SNP loci increase power to detect structure, though ascertained data may not be suitable to other inference; and (7) when samples are drawn from an admixed population and one of its parent populations, the reduction in power to detect two populations is greater for STRPs than SNPs. These results should assist the design of multi-locus studies to detect population structure in nature.
Project description:The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.