Genomics

Dataset Information

0

Leveraging histone modifications to improve genome annotation


ABSTRACT: With the creation of accurate, chromosome-scale genomes, the next challenge facing the genomics community is the accurate idenfication of transcriptional units, distinguishing them from aberrant transcriptional noise. This has proven to be a challenge as annotation by traditional means, such as short read RNA-seq followed by transcriptome assembly, which is prone to the generation of in-silico artifacts. To address this issue, we took advantage of epigenomic data in the form of ChIP-seq to unbiasedly annotate plant genomes and identify potential annotation issues, as well as identify novel genes. Histone modifications appear in the genome in a reproducible and predictable manner, making them an ideal resource to use in annotation. Trimethylation of histone 3 lysine 4 (H3K4me3), as well as acetylation of histone 3 lysine 56 are well documented to coincide with initiation of transcription by polymerase II (Pol II) at promoter sequences. These initiation marks, paired with marks deposited across the gene body during transcriptional elongation, such as histone 3 lysine 36 tri-methylation (H3K36me3) and histone 3 lysine 4 mono-methylation (H3K4me1), offer a framework to begin identifying complete transcriptional units. We leveraged these data on a genome-wide scale, allowing for identification of annotations discordant with empirical data. In total, 13,159 potential annotation issues were found in Zea mays across three different tissues, which were corroborated using complementary RNA-based approaches. Upon correction and validation, genes were extended by an average of 2,128 base pairs, and the length of discovered novel genes was 1,962 base pairs. Application of this method to five additional plant genomes revealed a variety of novel gene annotations, including 13,836 in Asparagus officianalis, 2,724 in Setaria viridis, 2,446 in Sorghum bicolor, 8,631 in Glycine max, and 2,585 in Phaseolous vulgaris.

ORGANISM(S): Zea mays Glycine max

PROVIDER: GSE160944 | GEO | 2021/04/26

REPOSITORIES: GEO

Similar Datasets

2020-10-06 | GSE159044 | GEO
| PRJNA253267 | ENA
2017-01-20 | GSE93848 | GEO
2005-08-25 | E-WMIT-3 | biostudies-arrayexpress
2011-04-05 | E-GEOD-28376 | biostudies-arrayexpress
2011-04-05 | E-GEOD-28377 | biostudies-arrayexpress
2011-04-05 | E-GEOD-28378 | biostudies-arrayexpress
2010-04-27 | E-GEOD-18588 | biostudies-arrayexpress
| PRJNA496322 | ENA
2011-03-01 | E-GEOD-14397 | biostudies-arrayexpress