Global landscape of recent inferred Darwinian selection for Homo sapiens.
ABSTRACT: By using the 1.6 million single-nucleotide polymorphism (SNP) genotype data set from Perlegen Sciences [Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, D. G., Frazer, K. A. & Cox, D. R. (2005) Science 307, 1072-1079], a probabilistic search for the landscape exhibited by positive Darwinian selection was conducted. By sorting each high-frequency allele by homozygosity, we search for the expected decay of adjacent SNP linkage disequilibrium (LD) at recently selected alleles, eliminating the need for inferring haplotype. We designate this approach the LD decay (LDD) test. By these criteria, 1.6% of Perlegen SNPs were found to exhibit the genetic architecture of selection. These results were confirmed on an independently generated data set of 1.0 million SNP genotypes (International Human Haplotype Map Phase I freeze). Simulation studies indicate that the LDD test, at the megabase scale used, effectively distinguishes selection from other causes of extensive LD, such as inversions, population bottlenecks, and admixture. The approximately 1,800 genes identified by the LDD test were clustered according to Gene Ontology (GO) categories. Based on overrepresentation analysis, several predominant biological themes are common in these selected alleles, including host-pathogen interactions, reproduction, DNA metabolism/cell cycle, protein metabolism, and neuronal function.
Project description:Significant interest has emerged in mapping genetic susceptibility for complex traits through whole-genome association studies. These studies rely on the extent of association, i.e., linkage disequilibrium (LD), between single nucleotide polymorphisms (SNPs) across the human genome. LD describes the nonrandom association between SNP pairs and can be used as a metric when designing maximally informative panels of SNPs for association studies in human populations. Using data from the 1.58 million SNPs genotyped by Perlegen, we explored the allele frequency dependence of the LD statistic r(2) both empirically and theoretically. We show that average r(2) values between SNPs unmatched for allele frequency are always limited to much less than 1 (theoretical approximately 0.46 to 0.57 for this dataset). Frequency matching of SNP pairs provides a more sensitive measure for assessing the average decay of LD and generates average r(2) values across nearly the entire informative range (from 0 to 0.89 through 0.95). Additionally, we analyzed the extent of perfect LD (r(2) = 1.0) using frequency-matched SNPs and found significant differences in the extent of LD in genic regions versus intergenic regions. The SNP pairs exhibiting perfect LD showed a significant bias for derived, nonancestral alleles, providing evidence for positive natural selection in the human genome.
Project description:Understanding of genetic diversity and linkage disequilibrium (LD) decay in diverse maize germplasm is fundamentally important for maize improvement. A total of 287 tropical and 160 temperate inbred lines were genotyped with 1943 single nucleotide polymorphism (SNP) markers of high quality and compared for genetic diversity and LD decay using the SNPs and their haplotypes developed from genic and intergenic regions. Intronic SNPs revealed a substantial higher variation than exonic SNPs. The big window size haplotypes (3-SNP slide-window covering 2160 kb on average) revealed much higher genetic diversity than the 10 kb-window and gene-window haplotypes. The polymorphic information content values revealed by the haplotypes (0.436-0.566) were generally much higher than individual SNPs (0.247-0.259). Cluster analysis classified the 447 maize lines into two major groups, corresponding to temperate and tropical types. The level of genetic diversity and subpopulation structure were associated with the germplasm origin and post-domestication selection. Compared to temperate lines, the tropical lines had a much higher level of genetic diversity with no significant subpopulation structure identified. Significant variation in LD decay distance (2-100 kb) was found across the genome, chromosomal regions and germplasm groups. The average of LD decay distance (10-100 kb) in the temperate germplasm was two to ten times larger than that in the tropical germplasm (5-10 kb). In conclusion, tropical maize not only host high genetic diversity that can be exploited for future plant breeding, but also show rapid LD decay that provides more opportunity for selection.
Project description:The effects of selection on genome variation were investigated and visualized in tomato using a high-density single nucleotide polymorphism (SNP) array. 7,720 SNPs were genotyped on a collection of 426 tomato accessions (410 inbreds and 16 hybrids) and over 97% of the markers were polymorphic in the entire collection. Principal component analysis (PCA) and pairwise estimates of F(st) supported that the inbred accessions represented seven sub-populations including processing, large-fruited fresh market, large-fruited vintage, cultivated cherry, landrace, wild cherry, and S. pimpinellifolium. Further divisions were found within both the contemporary processing and fresh market sub-populations. These sub-populations showed higher levels of genetic diversity relative to the vintage sub-population. The array provided a large number of polymorphic SNP markers across each sub-population, ranging from 3,159 in the vintage accessions to 6,234 in the cultivated cherry accessions. Visualization of minor allele frequency revealed regions of the genome that distinguished three representative sub-populations of cultivated tomato (processing, fresh market, and vintage), particularly on chromosomes 2, 4, 5, 6, and 11. The PCA loadings and F(st) outlier analysis between these three sub-populations identified a large number of candidate loci under positive selection on chromosomes 4, 5, and 11. The extent of linkage disequilibrium (LD) was examined within each chromosome for these sub-populations. LD decay varied between chromosomes and sub-populations, with large differences reflective of breeding history. For example, on chromosome 11, decay occurred over 0.8 cM for processing accessions and over 19.7 cM for fresh market accessions. The observed SNP variation and LD decay suggest that different patterns of genetic variation in cultivated tomato are due to introgression from wild species and selection for market specialization.
Project description:<h4>Background</h4>The extent of linkage disequilibrium (LD) between molecular markers impacts genome-wide association studies and implementation of genomic selection. The availability of high-density single nucleotide polymorphism (SNP) genotyping platforms makes it possible to investigate LD at an unprecedented resolution. In this work, we characterised LD decay in breeds of beef cattle of taurine, indicine and composite origins and explored its variation across autosomes and the X chromosome.<h4>Findings</h4>In each breed, LD decayed rapidly and r2 was less than 0.2 for marker pairs separated by 50 kb. The LD decay curves clustered into three groups of similar LD decay that distinguished the three main cattle types. At short distances between markers (<10 kb), taurine breeds showed higher LD (r2=0.45) than their indicine (r2=0.25) and composite (r2=0.32) counterparts. This higher LD in taurine breeds was attributed to a smaller effective population size and a stronger bottleneck during breed formation. Using all SNPs on only the X chromosome, the three cattle types could still be distinguished. However for taurine breeds, the LD decay on the X chromosome was much faster and the background level much lower than for indicine breeds and composite populations. When using only SNPs that were polymorphic in all breeds, the analysis of the X chromosome mimicked that of the autosomes.<h4>Conclusions</h4>The pattern of LD mirrored some aspects of the history of breed populations and showed a sharp decay with increasing physical distance between markers. We conclude that the availability of the HD chip can be used to detect association signals that remained hidden when using lower density genotyping platforms, since LD dropped below 0.2 at distances of 50 kb.
Project description:Chilean Farmed Atlantic salmon (Salmo salar) populations were established with individuals of both European and North American origins. These populations are expected to be highly genetically differentiated due to evolutionary history and poor gene flow between ancestral populations from different continents. The extent and decay of linkage disequilibrium (LD) among single nucleotide polymorphism (SNP) impacts the implementation of genome-wide association studies and genomic selection and provides relevant information about demographic processes of fish populations. We assessed the population structure and characterized the extent and decay of LD in three Chilean commercial populations of Atlantic salmon with North American (NAM), Scottish (SCO), and Norwegian (NOR) origin. A total of 123 animals were genotyped using a 159 K SNP Axiom® myDesignTM Genotyping Array. A total of 32 K SNP markers, representing the common SNPs along the three populations after quality control were used. The principal component analysis explained 78.9% of the genetic diversity between populations, clearly discriminating between populations of North American and European origin, and also between European populations. NAM had the lowest effective population size, followed by SCO and NOR. Large differences in the LD decay were observed between populations of North American and European origin. An r 2 threshold of 0.2 was estimated for marker pairs separated by 7,800, 64, and 50 kb in the NAM, SCO, and NOR populations, respectively. In this study we show that this SNP panel can be used to detect association between markers and traits of interests and also to capture high-resolution information for genome-enabled predictions. Also, we suggest the feasibility to achieve similar prediction accuracies using a smaller SNP data set for the NAM population, compared with samples with European origin which would need a higher density SNP array.
Project description:Accuracy of genome-wide association studies, and the successful implementation of genomic selection depends on the level of linkage disequilibrium (LD) across the genome and also the persistence of LD phase between populations. In the present study LD between adjacent SNPs and LD decay between SNPs was calculated in three Iranian water buffalo populations. Persistence of LD phase was evaluated across these populations and effective population size (Ne) was estimated from corrected r2 information. A set of 404 individuals from three Iranian buffalo populations were genotyped with the Axiom Buffalo Genotyping 90K Array. Average r2 and |D'| between adjacent SNP pairs across all chromosomes was 0.27 and 0.66 for AZI, 0.29 and 0.68 for KHU, and 0.32 and 0.72 for MAZ. The LD between the SNPs decreased with increasing physical distance from 100Kb to 1Mb between markers, from 0.234 to 0.018 for AZI, 0.254 to 0.034 for KHU, and 0.297 to 0.119 for MAZ, respectively. These results indicate that a density of 90K SNP is sufficient for genomic analyses relying on long range LD (e.g. GWAS and genomic selection). The persistence of LD phase decreased with increasing marker distances across all the populations, but remained above 0.8 for AZI and KHU for marker distances up to 100Kb. For multi-breed genomic evaluation, the 90K SNP panel is suitable for AZI and KHU buffalo breeds. Estimated effective population sizes for AZI, KHU and MAZ were 477, 212 and 32, respectively, for recent generations. The estimated effective population sizes indicate that the MAZ is at risk and requires careful management.
Project description:The number of SNPs required for QTL discovery is justified by the distance at which linkage disequilibrium has decayed. Simulations and real potato SNP data showed how to estimate and interpret LD decay. The magnitude of linkage disequilibrium (LD) and its decay with genetic distance determine the resolution of association mapping, and are useful for assessing the desired numbers of SNPs on arrays. To study LD and LD decay in tetraploid potato, we simulated autotetraploid genotypes and used it to explore the dependence on: (1) the number of haplotypes in the population (the amount of genetic variation) and (2) the percentage of haplotype specific SNPs (hs-SNPs). Several estimators for short-range LD were explored, such as the average r 2, median r 2, and other percentiles of r 2 (80, 90, and 95 %). For LD decay, we looked at LD½,90, the distance at which the short-range LD is halved when using the 90 % percentile of r 2 at short range, as estimator for LD. Simulations showed that the performance of various estimators for LD decay strongly depended on the number of haplotypes, although the real value of LD decay was not influenced very much by this number. The estimator LD½,90 was chosen to evaluate LD decay in 537 tetraploid varieties. LD½,90 values were 1.5 Mb for varieties released before 1945 and 0.6 Mb in varieties released after 2005. LD½,90 values within three different subpopulations ranged from 0.7 to 0.9 Mb. LD½,90 was 2.5 Mb for introgressed regions, indicating large haplotype blocks. In pericentromeric heterochromatin, LD decay was negligible. This study demonstrates that several related factors influencing LD decay could be disentangled, that no universal approach can be suggested, and that the estimation of LD decay has to be performed with great care and knowledge of the sampled material.
Project description:BACKGROUND: Linkage disequilibrium (LD) maps can provide a wealth of information on specific marker-phenotype relationships, especially in areas of the genome where positional candidate genes with similar functions are located. A recently published high resolution radiation hybrid map of bovine chromosome 14 (BTA14) together with the bovine physical map have enabled the creation of more accurate LD maps for BTA14 in both dairy and beef cattle. RESULTS: Over 500 Single Nucleotide Polymorphism (SNP) markers from both Angus and Holstein animals had their phased haplotypes estimated using GENOPROB and their pairwise r2 values compared. For both breeds, results showed that average LD extends at moderate levels up to 100 kilo base pairs (kbp) and falls to background levels after 500 kbp. Haplotype block structure analysis using HAPLOVIEW under the four gamete rule identified 122 haplotype blocks for both Angus and Holstein. In addition, SNP tagging analysis identified 410 SNPs and 420 SNPs in Holstein and Angus, respectively, for future whole genome association studies on BTA14. Correlation analysis for marker pairs common to these two breeds confirmed that there are no substantial correlations between r-values at distances over 10 kbp. Comparison of extended haplotype homozygosity (EHH), which calculates the LD decay away from a core haplotype, shows that in Holstein there is long range LD decay away from the DGAT1 region consistent with the selection for milk fat % in this population. Comparison of EHH values for Angus in the same region shows very little long range LD. CONCLUSION: Overall, the results presented here can be applied in future single or haplotype association analysis for both populations, aiding in confirming or excluding potential polymorphisms as causative mutations, especially around Quantitative Trait Loci regions. In addition, knowledge of specific LD information among markers will aid the research community in selecting appropriate markers for whole genome association studies.
Project description:The linkage disequilibrium (LD) between molecular markers affects the accuracy of genome-wide association studies and genomic selection application. High-density genotyping platforms allow identifying the genotype of thousands of single nucleotide polymorphisms (SNPs) distributed throughout the animal genomes, which increases the resolution of LD evaluations. This study evaluated the distribution of minor allele frequencies (MAF) and the level of LD in the Colombian Creole cattle breeds Blanco Orejinegro (BON) and Romosinuano (ROMO) using a medium density SNP panel (BovineSNP50K_v2). The LD decay in these breeds was lower than those reported for other taurine breeds, achieving optimal LD values (r2 ? 0.3) up to a distance of 70 kb in BON and 100 kb in ROMO, which is possibly associated with the conservation status of these cattle populations and their effective population size. The average MAF for both breeds was 0.27 ± 0.14 with a higher SNP proportion having high MAF values (? 0.3). The LD levels and distribution of allele frequencies found in this study suggest that it is possible to have adequate coverage throughout the genome of these breeds using the BovineSNP50K_v2, capturing the effect of most QTL related with productive traits, and ensuring an adequate prediction capacity in genomic analysis.
Project description:Assortative mating, a potentially efficient prezygotic reproductive barrier, may prevent loss of genetic potential by avoiding the production of unfit hybrids (i.e., because of hybrid infertility or hybrid breakdown) that occur at regions of secondary contact between incipient species. In the case of the mouse hybrid zone, where two subspecies of Mus musculus (M. m. domesticus and M. m. musculus) meet and exchange genes to a limited extent, assortative mating requires a means of subspecies recognition. We based the work reported here on the hypothesis that, if there is a pheromone sufficiently diverged between M. m. domesticus and M. m. musculus to mediate subspecies recognition, then that process must also require a specific receptor(s), also sufficiently diverged between the subspecies, to receive the signal and elicit an assortative mating response. We studied the mouse V1R genes, which encode a large family of receptors in the vomeronasal organ (VNO), by screening Perlegen SNP data and identified one, Vmn1r67, with 24 fixed SNP differences most of which (15/24) are nonsynonymous nucleotide substitutions between M. m. domesticus and M. m. musculus. We observed substantial linkage disequilibrium (LD) between Vmn1r67 and Abpa27, a mouse salivary androgen-binding protein gene that encodes a proteinaceous pheromone (ABP) capable of mediating assortative mating, perhaps in conjunction with its bound small lipophilic ligand. The LD we observed is likely a case of association rather than residual physical linkage from a very recent selective sweep, because an intervening gene, Vmn1r71, shows significant intra(sub)specific polymorphism but no inter(sub)specific divergence in its nucleotide sequence. We discuss alternative explanations of these observations, for example that Abpa27 and Vmn1r67 are coevolving as signal and receptor to reinforce subspecies hybridization barriers or that the unusually divergent Vmn1r67 allele was not a product of fast positive selection, but was derived from an introgressed allele, possibly from Mus spretus.