Project description:Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in approximately 2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 500 kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.
Project description:Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.
Project description:Copy number variants (CNVs) in the human genome contribute to both Mendelian and complex traits as well as to genomic plasticity in evolution. The investigation of mutational rates of CNVs is critical to understanding genomic instability and the etiology of the copy number variation (CNV)-related traits. However, the evaluation of the CNV mutation rate at the genome level poses an insurmountable practical challenge that requires large samples and accurate typing. In this study, we show that an approximate estimation of the CNV mutation rate could be achieved by using the phylogeny information of flanking SNPs. This allows a genome-wide comparison of mutation rates between CNVs with the use of vast, readily available data of SNP genotyping. A total of 4187 CNV regions (CNVRs) previously identified in HapMap populations were investigated in this study. We showed that the mutation rates for the majority of these CNVRs are at the order of 10?? per generation, consistent with experimental observations at individual loci. Notably, the mutation rates of 104 (2.5%) CNVRs were estimated at the order of 10?³ per generation; therefore, they were identified as potential hotspots. Additional analyses revealed that genome architecture at CNV loci has a potential role in inciting mutational hotspots in the human genome. Interestingly, 49 (47%) CNV hotspots include human genes, some of which are known to be functional CNV loci (e.g., CNVs of C4 and ?-defensin causing autoimmune diseases and CNVs of HYDIN with implication in control of cerebral cortex size), implicating the important role of CNV in human health and evolution, especially in common and complex diseases.
Project description:Copy number variants (CNVs) can reach appreciable frequencies in the human population, and several of these copy number polymorphisms (CNPs) have been recently associated with human diseases including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. Developing a novel method based on single channel intensity data and benchmarking against copy numbers determined from sequencing read-depth, we successfully obtained CNP genotypes for 1489 CNPs from 487 human DNA samples from diverse ethnic backgrounds. This customized microarray was enriched for segmental duplication-rich regions and novel insertions of sequences not represented in the reference genome assembly or on standard single nucleotide polymorphism (SNP) microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that bi-allelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although bi-allelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multi-copy CNPs do not (40% with r >0.8). We selected a subset of CNPs for further characterization in 1873 additional samples from 62 populations (947 samples analyzed by microarray; 926 samples analyzed with PCR based assays); this revealed striking population-differentiated structural variants in genes of clinical significance such as the OCLN gene, a tight junction protein involved in hepatitis C viral entry. Our new microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that are not in linkage disequilibrium with SNPs) may have contributed disproportionately to human diversity and selection.
Project description:Hirschsprung disease (HSCR) is a neurocristopathy characterized by absence of intramural ganglion cells along variable lengths of the gastrointestinal tract. The HSCR phenotype is highly variable with respect to gender, length of aganglionosis, familiality and the presence of additional anomalies. By molecular genetic analysis, a minimum of 11 neuro-developmental genes (RET, GDNF, NRTN, SOX10, EDNRB, EDN3, ECE1, ZFHX1B, PHOX2B, KIAA1279, TCF4) are known to harbor rare, high-penetrance mutations that confer a large risk to the bearer. In addition, two other genes (RET, NRG1) harbor common, low-penetrance polymorphisms that contribute only partially to risk and can act as genetic modifiers. To broaden this search, we examined whether a set of 67 proven and candidate HSCR genes harbored additional modifier alleles. In this pilot study, we utilized a custom-designed array CGH with ∼33,000 test probes at an average resolution of ∼185 bp to detect gene-sized or smaller copy number variants (CNVs) within these 67 genes in 18 heterogeneous HSCR patients. Using stringent criteria, we identified CNVs at three loci (MAPK10, ZFHX1B, SOX2) that are novel, involve regulatory and coding sequences of neuro-developmental genes, and show association with HSCR in combination with other congenital anomalies. Additional CNVs are observed under relaxed criteria. Our research suggests a role for CNVs in HSCR and, importantly, emphasizes the role of variation in regulatory sequences. A much larger study will be necessary both for replication and for identifying the full spectrum of small CNV effects.
Project description:Copy number variations (CNVs) are universal genetic variations, and their association with disease has been increasingly recognized. We designed high-density microarrays for CNVs, and detected 3000-4000 CNVs (4-6% of the genomic sequence) per population that included CNVs previously missed because of smaller sizes and residing in segmental duplications. The patterns of CNVs across individuals were surprisingly simple at the kilo-base scale, suggesting the applicability of a simple genetic analysis for these genetic loci. We utilized the probabilistic theory to determine integer copy numbers of CNVs and employed a recently developed phasing tool to estimate the population frequencies of integer copy number alleles and CNV-SNP haplotypes. The results showed a tendency toward a lower frequency of CNV alleles and that most of our CNVs were explained only by zero-, one- and two-copy alleles. Using the estimated population frequencies, we found several CNV regions with exceptionally high population differentiation. Investigation of CNV-SNP linkage disequilibrium (LD) for 500-900 bi- and multi-allelic CNVs per population revealed that previous conflicting reports on bi-allelic LD were unexpectedly consistent and explained by an LD increase correlated with deletion-allele frequencies. Typically, the bi-allelic LD was lower than SNP-SNP LD, whereas the multi-allelic LD was somewhat stronger than the bi-allelic LD. After further investigation of tag SNPs for CNVs, we conclude that the customary tagging strategy for disease association studies can be applicable for common deletion CNVs, but direct interrogation is needed for other types of CNVs.
Project description:Hirschsprung disease (HSCR) is a neurocristopathy characterized by absence of intramural ganglion cells along variable lengths of the gastrointestinal tract. The HSCR phenotype is highly variable with respect to gender, segment length of aganglionosis, familiality and the presence of additional anomalies. By molecular genetic analysis, a minimum of 11 neuro-developmental genes (RET, GDNF, NRTN, SOX10, EDNRB, EDN3, ECE1, ZFHX1B, PHOX2B, KIAA1279, TCF4) are known to harbor rare high-penetrance mutations that confer a large risk to the bearer. In addition, two other genes (RET, NRG1) harbor common low-penetrance polymorphisms that contribute only partially to risk and act as genetic modifiers. To broaden this search, we examined whether a set of 67 proven and candidate HSCR genes harbored additional modifier alleles. In this pilot study, we utilized a custom-designed array CGH with ~33,000 test probes at an average resolution of ~185bp to detect gene-sized or smaller copy number variants (CNVs) within these 67 genes in 18 heterogeneous HSCR patients. Using stringent criteria, we identified CNVs at three loci (MAPK10, ZFHX1B, SOX2) that are novel, involve regulatory and coding sequences of these neuro-developmental genes and show association with HSCR in combination with other congenital anomalies.
Project description:Hirschsprung disease (HSCR) is a neurocristopathy characterized by absence of intramural ganglion cells along variable lengths of the gastrointestinal tract. The HSCR phenotype is highly variable with respect to gender, segment length of aganglionosis, familiality and the presence of additional anomalies. By molecular genetic analysis, a minimum of 11 neuro-developmental genes (RET, GDNF, NRTN, SOX10, EDNRB, EDN3, ECE1, ZFHX1B, PHOX2B, KIAA1279, TCF4) are known to harbor rare high-penetrance mutations that confer a large risk to the bearer. In addition, two other genes (RET, NRG1) harbor common low-penetrance polymorphisms that contribute only partially to risk and act as genetic modifiers. To broaden this search, we examined whether a set of 67 proven and candidate HSCR genes harbored additional modifier alleles. In this pilot study, we utilized a custom-designed array CGH with ~33,000 test probes at an average resolution of ~185bp to detect gene-sized or smaller copy number variants (CNVs) within these 67 genes in 18 heterogeneous HSCR patients. Using stringent criteria, we identified CNVs at three loci (MAPK10, ZFHX1B, SOX2) that are novel, involve regulatory and coding sequences of these neuro-developmental genes and show association with HSCR in combination with other congenital anomalies. Two-condition experiment: Patient vs. Control. Sex-matched controls. Technical replicates: 4 were examined twice and 3 were studied in triplicate. Technical replicates: 408.3.1, 408.3.2 Technical replicates: 300.3.1, 300.3.2 Technical replicates: 354.3.1, 354.3.2 Technical replicates: 355.3.1, 355.3.2 Technical replicates: 63.3.1, 63.3.2, 63.3.3 Technical replicates: 122.7.1, 122.7.2, 122.7.3 Technical replicates: 413.3.1, 413.3.2, 413.3.3
Project description:BackgroundThe Taiwan Human Disease iPSC Service Consortium was established to accelerate Taiwan's growing stem cell research initiatives and provide a platform for researchers interested in utilizing induced pluripotent stem cell (iPSC) technology. The consortium has generated and characterized 83 iPSC lines: 11 normal and 72 disease iPSC lines covering 21 different diseases, several of which are of high incidence in Taiwan. Whether there are any reprogramming-induced recurrent copy number variant (CNV) hotspots in iPSCs is still largely unknown.MethodsWe performed genome-wide copy number variant screening of 83 Han Taiwanese iPSC lines and compared them with 1093 control subjects using an Affymetrix genome-wide human SNP array.ResultsIn the iPSCs, we identified ten specific CNV loci and seven "polymorphic" CNV regions that are associated with the reprogramming process. Additionally, we established several differentiation protocols for our iPSC lines. We demonstrated that our iPSC-derived cardiomyocytes respond to pharmacological agents and were successfully engrafted into the mouse myocardium demonstrating their potential application in cell therapy.ConclusionsThe CNV hotspots induced by cell reprogramming have successfully been identified in the current study. This finding may be used as a reference index for evaluating iPSC quality for future clinical applications. Our aim was to establish a national iPSC resource center generating iPSCs, made available to researchers, to benefit the stem cell community in Taiwan and throughout the world.