Genovar: a detection and visualization tool for genomic variants.
ABSTRACT: Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals.A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files.Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results.http://genovar.sourceforge.net/.
Project description:BACKGROUND: This epidemiological study was carried out in Sfax (south of Tunisia) and focused on genital Chlamydia trachomatis (C. trachomatis) genovar distribution. METHODS: One hundred and thirty seven genital samples from 4067 patients (4.2%) attending the Habib Bourguiba University hospital of Sfax over 12 years (from 2000 to 2011) were found to be C. trachomatis PCR positive by the Cobas Amplicor system. These samples were genotyped by an in house reverse hybridization method. RESULTS: One hundred and eight (78.8%) samples contained only one genovar and 29 (21.2%) samples contained two or three genovars. Genovar E was the most prevalent (70.8%) single genovar and it was detected in 90.6% of all the cases. Genovars J, C and L1-L3 were not detected in our samples whereas ocular genovars A and B were in 5 cases. All the five cases were mixed infections. Men had more mixed infections than women (p=0.02) and were more frequently infected by genovars F and K (p<0.05). No associations between current infection, infertility and the genovar distribution were observed. Patients coinfected with Neisseria gonorrhoeae were also significantly more frequently infected with mixed genovars (p=0.04). CONCLUSIONS: In conclusion, we have reported a high prevalence of genovar E and of mixed infections in our study population. Such data could have implications for the control and vaccine development of C. trachomatis in Tunisia.
Project description:Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many methods for analyzing single-sample aCGH data, the analysis of multi-sample aCGH data is a relatively new area of research. Further, many of the current approaches for analyzing multi-sample aCGH data do not appropriately utilize the additional information present in the multiple samples. We propose a procedure called the Fused Lasso Latent Feature Model (FLLat) that provides a statistical framework for modeling multi-sample aCGH data and identifying regions of CNV. The procedure involves modeling each sample of aCGH data as a weighted sum of a fixed number of features. Regions of CNV are then identified through an application of the fused lasso penalty to each feature. Some simulation analyses show that FLLat outperforms single-sample methods when the simulated samples share common information. We also propose a method for estimating the false discovery rate. An analysis of an aCGH data set obtained from human breast tumors, focusing on chromosomes 8 and 17, shows that FLLat and Significance Testing of Aberrant Copy number (an alternative, existing approach) identify similar regions of CNV that are consistent with previous findings. However, through the estimated features and their corresponding weights, FLLat is further able to discern specific relationships between the samples, for example, identifying 3 distinct groups of samples based on their patterns of CNV for chromosome 17.
Project description:Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
Project description:This study describes a new multilocus variable number tandem-repeat (VNTR) analysis (MLVA) typing system for the discrimination of Chlamydia trachomatis genovar D to K isolates or specimens. We focused our MLVA scheme on genovar E which predominates in most populations worldwide. This system does not require culture and therefore can be performed directly on DNA extracted from positive clinical specimens. Our method was based on GeneScan analysis of five VNTR loci labelled with fluorescent dyes by multiplex PCR and capillary electrophoresis. This MLVA, called MLVA-5, was applied to a collection of 220 genovar E and 94 non-E genovar C. trachomatis isolates and specimens obtained from 251 patients and resulted in 38 MLVA-5 types. The genetic stability of the MLVA-5 scheme was assessed for results obtained both in vitro by serial passage culturing and in vivo using concomitant and sequential isolates and specimens. All anorectal genovar E isolates from men who have sex with men exhibited the same MLVA-5 type, suggesting clonal spread. In the same way, we confirmed the clonal origin of the Swedish new variant of C. trachomatis. The MLVA-5 assay was compared to three other molecular typing methods, ompA gene sequencing, multilocus sequence typing (MLST) and a previous MLVA method called MLVA-3, on 43 genovar E isolates. The discriminatory index was 0.913 for MLVA-5, 0.860 for MLST and 0.622 for MLVA-3. Among all of these genotyping methods, MLVA-5 displayed the highest discriminatory power and does not require a time-consuming sequencing step. The results indicate that MLVA-5 enables high-resolution molecular epidemiological characterisation of C. trachomatis genovars D to K infections directly from specimens.
Project description:BACKGROUND: Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. RESULTS: A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. CONCLUSIONS: Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Project description:Disseminated microsporidiosis is a life-threatening opportunistic infection. Here, we report about a previously undescribed genovar of Encephalitozoon cuniculi causing disseminated infection in a non-HIV-infected renal transplant recipient. Disseminated microsporidiosis must be considered in the differential diagnosis of chronic fever in renal allograft recipients, even those without urinary symptoms.
Project description:Despite considerable excitement over the potential functional significance of copy-number variants (CNVs), we still lack knowledge of the fine-scale architecture of the large majority of CNV regions in the human genome. In this study, we used a high-resolution array-based comparative genomic hybridization (aCGH) platform that targeted known CNV regions of the human genome at approximately 1 kb resolution to interrogate the genomic DNAs of 30 individuals from four HapMap populations. Our results revealed that 1020 of 1153 CNV loci (88%) were actually smaller in size than what is recorded in the Database of Genomic Variants based on previously published studies. A reduction in size of more than 50% was observed for 876 CNV regions (76%). We conclude that the total genomic content of currently known common human CNVs is likely smaller than previously thought. In addition, approximately 8% of the CNV regions observed in multiple individuals exhibited genomic architectural complexity in the form of smaller CNVs within larger ones and CNVs with interindividual variation in breakpoints. Future association studies that aim to capture the potential influences of CNVs on disease phenotypes will need to consider how to best ascertain this previously uncharacterized complexity.
Project description:DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Project description:Congenital heart defect (CHD) occurs in 40% of Down syndrome (DS) cases. While carrying three copies of chromosome 21 increases the risk for CHD, trisomy 21 itself is not sufficient to cause CHD. Thus, additional genetic variation and/or environmental factors could contribute to the CHD risk. Here we report genomic variations that in concert with trisomy 21, determine the risk for CHD in DS. This case-control GWAS includes 187 DS with CHD (AVSD = 69, ASD = 53, VSD = 65) as cases, and 151 DS without CHD as controls. Chromosome 21-specific association studies revealed rs2832616 and rs1943950 as CHD risk alleles (adjusted genotypic P-values <0.05). These signals were confirmed in a replication cohort of 92 DS-CHD cases and 80 DS-without CHD (nominal P-value 0.0022). Furthermore, CNV analyses using a customized chromosome 21 aCGH of 135K probes in 55 DS-AVSD and 53 DS-without CHD revealed three CNV regions associated with AVSD risk (FDR ? 0.05). Two of these regions that are located within the previously identified CHD region on chromosome 21 were further confirmed in a replication study of 49 DS-AVSD and 45 DS- without CHD (FDR ? 0.05). One of these CNVs maps near the RIPK4 gene, and the second includes the ZBTB21 (previously ZNF295) gene, highlighting the potential role of these genes in the pathogenesis of CHD in DS. We propose that the genetic architecture of the CHD risk of DS is complex and includes trisomy 21, and SNP and CNV variations in chromosome 21. In addition, a yet-unidentified genetic variation in the rest of the genome may contribute to this complex genetic architecture.
Project description:BACKGROUND: Recent studies have shown that copy number variation (CNV) in mammalian genomes contributes to phenotypic diversity, including health and disease status. In domestic pigs, CNV has been catalogued by several reports, but the extent of CNV and the phenotypic effects are far from clear. The goal of this study was to identify CNV regions (CNVRs) in pigs based on array comparative genome hybridization (aCGH). RESULTS: Here a custom-made tiling oligo-nucleotide array was used with a median probe spacing of 2506 bp for screening 12 pigs including 3 Chinese native pigs (one Chinese Erhualian, one Tongcheng and one Yangxin pig), 5 European pigs (one Large White, one Pietrain, one White Duroc and two Landrace pigs), 2 synthetic pigs (Chinese new line DIV pigs) and 2 crossbred pigs (Landrace × DIV pigs) with a Duroc pig as the reference. Two hundred and fifty-nine CNVRs across chromosomes 1-18 and X were identified, with an average size of 65.07 kb and a median size of 98.74 kb, covering 16.85 Mb or 0.74% of the whole genome. Concerning copy number status, 93 (35.91%) CNVRs were called as gains, 140 (54.05%) were called as losses and the remaining 26 (10.04%) were called as both gains and losses. Of all detected CNVRs, 171 (66.02%) and 34 (13.13%) CNVRs directly overlapped with Sus scrofa duplicated sequences and pig QTLs, respectively. The CNVRs encompassed 372 full length Ensembl transcripts. Two CNVRs identified by aCGH were validated using real-time quantitative PCR (qPCR). CONCLUSIONS: Using 720 K array CGH (aCGH) we described a map of porcine CNVs which facilitated the identification of structural variations for important phenotypes and the assessment of the genetic diversity of pigs.