Project description:We developed a novel algorithm, smORFer, detecting smORFs (e.g. <50 codons) which performs with higher accuracy in prokaryotic organisms. smORFer considers structural features of genetic sequence along with in-register translation and using Fourier transform converts them into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way and dependent on the data available different modules can be tested.
Project description:Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385,000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH, and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM’s performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of less than 10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2-5 kb, and 1 kb, respectively. Keywords: CNV detection algorithm development and assessment
Project description:Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385,000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH, and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM’s performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of less than 10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2-5 kb, and 1 kb, respectively. Keywords: CNV detection algorithm development and assessment All four samples in this series are hybridizations of genomic DNA from inbred mouse strains 129X1/SvJ versus C57BL6/J. The experiments were performed at increasing resolutions (one 385K, two 2.1M, and one 3.1M).
Project description:DNA methylation is an important regulator of genome function in the eukaryotes, but it is currently unclear if the same is true in prokaryotes. While regulatory functions have been demonstrated for a small number of bacteria, there have been no large-scale studies of prokaryotic methylomes and the full repertoire of targets and biological functions of DNA methylation remains unclear. Here we applied single-molecule, real-time sequencing to directly study the methylomes of 232 phylogenetically diverse prokaryotes. Collectively, we identified 834 methylated motifs, enabling the specific annotation of 415 DNA methyltransferases (MTases), and adding substantially to existing databases of MTase specificities. While the majority of MTases function as components of restriction-modification systems, 139 MTases have no cognate restriction enzyme in the genome, suggesting some other functional role. Several of these âorphanâ MTases are conserved across species and exhibit patterns of DNA methylation consistent with known regulatory MTases. Based on these patterns of methylation, we identify candidate novel regulators of gene expression in several phyla of bacteria, and candidate regulators of DNA replication in Haloarchaea. Together these data substantially advance our knowledge of DNA restriction-modification systems, and hint at a wider role for methylation in prokaryotic genome regulation. Single-molecule, real-time sequencing of DNA modifications across 232 diverse prokaryotic genomes.