Genome-wide discovery and characterization of maize long non-coding RNAs.
ABSTRACT: BACKGROUND: Long non-coding RNAs (lncRNAs) are transcripts that are 200 bp or longer, do not encode proteins, and potentially play important roles in eukaryotic gene regulation. However, the number, characteristics and expression inheritance pattern of lncRNAs in maize are still largely unknown. RESULTS: By exploiting available public EST databases, maize whole genome sequence annotation and RNA-seq datasets from 30 different experiments, we identified 20,163 putative lncRNAs. Of these lncRNAs, more than 90% are predicted to be the precursors of small RNAs, while 1,704 are considered to be high-confidence lncRNAs. High confidence lncRNAs have an average transcript length of 463 bp and genes encoding them contain fewer exons than annotated genes. By analyzing the expression pattern of these lncRNAs in 13 distinct tissues and 105 maize recombinant inbred lines, we show that more than 50% of the high confidence lncRNAs are expressed in a tissue-specific manner, a result that is supported by epigenetic marks. Intriguingly, the inheritance of lncRNA expression patterns in 105 recombinant inbred lines reveals apparent transgressive segregation, and maize lncRNAs are less affected by cis- than by trans-genetic factors. CONCLUSIONS: We integrate all available transcriptomic datasets to identify a comprehensive set of maize lncRNAs, provide a unique annotation resource of the maize genome and a genome-wide characterization of maize lncRNAs, and explore the genetic control of their expression using expression quantitative trait locus mapping.
Project description:Long non-coding RNAs (lncRNAs) are a new class of regulatory molecules with roles in diverse biological processes. While much effort has been invested in the analysis of lncRNAs from established plant models Arabidopsis, maize, and rice, almost nothing is known about lncRNAs from fruit crops, including those in the Rosaceae family.Here, we present a genome-scale identification and characterization of lncRNAs from a diploid strawberry, Fragaria vesca, based on rich RNA-seq datasets from 35 different flower and fruit tissues. 5,884 Fve-lncRNAs derived from 3,862 loci were identified. These lncRNAs were carefully cataloged based on expression level and whether or not they contain repetitive sequences or generate small RNAs. About one fourth of them are termed high-confidence lncRNAs (hc-lncRNAs) because they are expressed at a level of FPKM higher than 2 and produce neither small RNAs nor contain repetitive sequence. To identify regulatory interactions between lncRNAs and their potential protein-coding (PC) gene targets, pairs of lncRNAs and PC genes with positively or negatively correlated expression trends were identified based on their expression; these pairs may be candidates of cis- or trans-acting lncRNAs and their targets. Finally, blast searches within plant species indicate that lncRNAs are not well conserved.Our study identifies a large number of tissue-specifically expressed lncRNAs in F. vesca, thereby highlighting their potential contributions to strawberry flower and fruit development and paving the way for future functional studies.
Project description:The genetic factors underlying changes in ear morphology, and particularly the inheritance of kernel row number (KRN), have been broadly investigated in diverse mapping populations in maize (Zea mays L.). In this study, we mapped a region on the long arm of chromosome 1 containing a QTL for KRN. This work was performed using a set of recombinant chromosome nearly isogenic lines (RCNILs) derived from a BC2S3 population produced using the inbred maize line W22 and teosinte (Zea mays ssp. parviglumis) as the parents. A set of 48 RCNILs was evaluated in the field during the summer of 2013 in order to perform the mapping. A QTL for KRN was found that explained approximately 51% of the phenotypic variance and had a 1.5-LOD confidence interval of 203 kb. Seven genes are described in this interval. One of these candidate genes may have been the target of domestication processes in maize and contributed to the shift from two kernel row ears in teosinte to a highly polystichous ear in maize.
Project description:Recent transcriptome annotation using deep sequencing approaches have annotated a large number of long non-coding RNAs in zebrafish, a popular model organism for human diseases. These studies characterized lncRNAs in critical developmental stages as well as adult tissues. Each of the studies has uncovered a distinct set of lncRNAs, with minor overlaps. The availability of the raw RNA-Seq datasets in public domain encompassing critical developmental time-points and adult tissues provides us with a unique opportunity to understand the spatiotemporal expression patterns of lncRNAs. In the present report, we created a catalog of lncRNAs in zebrafish, derived largely from the three annotation sets, as well as manual curation of literature to compile a total of 2,267 lncRNA transcripts in zebrafish. The lncRNAs were further classified based on the genomic context and relationship with protein coding gene neighbors into 4 categories. Analysis revealed a total of 86 intronic, 309 promoter associated, 485 overlapping and 1,386 lincRNAs. We created a comprehensive resource which houses the annotation of lncRNAs as well as associated information including expression levels, promoter epigenetic marks, genomic variants and retroviral insertion mutants. The resource also hosts a genome browser where the datasets could be browsed in the genome context. To the best of our knowledge, this is the first comprehensive resource providing a unified catalog of lncRNAs in zebrafish. The resource is freely available at URL: http://genome.igib.res.in/zflncRNApedia.
Project description:DNA methylation is a chromatin modification that contributes to epigenetic regulation of gene expression. The inheritance patterns and trans-generational stability of 962 differentially methylated regions (DMRs) were assessed in a panel of 71 near-isogenic lines (NILs) derived from maize (Zea mays) inbred lines B73 and Mo17. The majority of DMRs exhibit inheritance patterns that would be expected for local (cis) inheritance of DNA methylation variation such that DNA methylation level was coupled to local genotype. There are few examples of DNA methylation that exhibit trans-acting control or paramutation-like patterns. The cis-inherited DMRs provide an opportunity to study the stability of inheritance for DNA methylation variation. There was very little evidence for alterations of DNA methylation levels at these DMRs during the generations of the NIL population development. DNA methylation level was associated with local genotypes in nearly all of the >30,000 potential cases of inheritance. The majority of the DMRs were not associated with small RNAs. Together, our results suggest that a significant portion of DNA methylation variation in maize exhibits locally (cis) inherited patterns, is highly stable, and does not require active programming by small RNAs for maintenance. DNA methylation may contribute to heritable epigenetic information in many eukaryotic genomes. In this study, we have documented the inheritance patterns and trans-generational stability for nearly 1000 DNA methylation variants in a segregating maize population. At most loci studied, the DNA methylation differences are locally inherited and are not influenced by the other allele or other genomic regions. The inheritance of DNA methylation levels across generations is quite robust with almost no examples of unstable inheritance, suggesting that DNA methylation differences can be quite stably inherited, even in segregating populations.
Project description:Maize (<i>Zea mays</i> L.) is an important source of carbohydrates and protein in the diet in sub-Saharan Africa. The objectives of this study were to (i) estimate general (GCA) and specific combining abilities (SCA) of 13 new quality protein maize (QPM) lines in a diallel under stress and non-stress conditions, (ii) compare observed and predicted performance of QPM hybrids, (iii) characterize genetic diversity among the 13 QPM lines using single nucleotide polymorphism (SNP) markers and assess the relationship between genetic distance and hybrid performance, and (iv) assess diversity and population structure in 116 new QPM inbred lines as compared to eight older tropical QPM lines and 15 non-QPM lines. The GCA and SCA effects were significant for most traits under optimal conditions, indicating that both additive and non-additive genetic effects were important for inheritance of the traits. Additive genetic effects appeared to govern inheritance of most traits under optimal conditions and across environments. Non-additive genetic effects were more important for inheritance of grain yield but additive effects controlled most agronomic traits under drought stress conditions. Inbred lines CKL08056, CKL07292, and CKL07001 had desirable GCA effects for grain yield across drought stress and non-stress conditions. Prediction efficiency for grain yield was highest under optimal conditions. The classification of 139 inbred lines with 95 SNPs generated six clusters, four of which contained 10 or fewer lines, and 16 lines of mixed co-ancestry. There was good agreement between Neighbor Joining dendrogram and Structure classification. The QPM lines used in the diallel were nearly uniformly spread throughout the dendrogram. There was no relationship between genetic distance and grain yield in either the optimal or stressed environments in this study. The genetic diversity in mid-altitude maize germplasm is ample, and the addition of the QPM germplasm did not increase it measurably.
Project description:Long non-coding RNAs (lncRNAs) are of fundamental biological importance; however, their functional role is often unclear or loosely defined as experimental characterization is challenging and bioinformatic methods are limited. We developed a novel integrated method protocol for the annotation and detailed functional characterization of lncRNAs within the genome. It combines annotation, normalization and gene expression with sequence-structure conservation, functional interactome and promoter analysis. Our protocol allows an analysis based on the tissue and biological context, and is powerful in functional characterization of experimental and clinical RNA-Seq datasets including existing lncRNAs. This is demonstrated on the uncharacterized lncRNA GATA6-AS1 in dilated cardiomyopathy.
Project description:<h4>Background</h4>The harvest index for many crops can be improved through introduction of dwarf stature to increase lodging resistance, combined with early maturity. The inbred line Shen5003 has been widely used in maize breeding in China as a key donor line for the dwarf trait. Also, one major quantitative trait locus (QTL) controlling plant height has been identified in bin 5.05-5.06, across several maize bi-parental populations. With the progress of publicly available maize genome sequence, the objective of this work was to identify the candidate genes that affect plant height among Chinese maize inbred lines with genome wide association studies (GWAS).<h4>Methods and findings</h4>A total of 284 maize inbred lines were genotyped using over 55,000 evenly spaced SNPs, from which a set of 41,101 SNPs were filtered with stringent quality control for further data analysis. With the population structure controlled in a mixed linear model (MLM) implemented with the software TASSEL, we carried out a genome-wide association study (GWAS) for plant height. A total of 204 SNPs (P?0.0001) and 105 genomic loci harboring coding regions were identified. Four loci containing genes associated with gibberellin (GA), auxin, and epigenetic pathways may be involved in natural variation that led to a dwarf phenotype in elite maize inbred lines. Among them, a favorable allele for dwarfing on chromosome 5 (SNP PZE-105115518) was also identified in six Shen5003 derivatives.<h4>Conclusions</h4>The fact that a large number of previously identified dwarf genes are missing from our study highlights the discovery of the consistently significant association of the gene harboring the SNP PZE-105115518 with plant height (P=8.91e-10) and its confirmation in the Shen5003 introgression lines. Results from this study suggest that, in the maize breeding schema in China, specific alleles were selected, that have played important roles in maize production.
Project description:Aspergillus flavus is a pathogenic fungus infecting maize and producing aflatoxins that are health hazards to humans and animals. Characterizing host defense mechanism and prioritizing candidate resistance genes are important to the development of resistant maize germplasm. We investigated methods amenable for the analysis of the significance and relations among maize candidate genes based on the empirical gene expression data obtained by RT-qPCR technique from maize inbred lines. We optimized a pipeline of analysis tools chosen from various programs to provide rigorous statistical analysis and state of the art data visualization. A network-based method was also explored to construct the empirical gene expression relational structures. Maize genes at the centers in the network were considered as important candidate genes for maize DNA marker studies. The methods in this research can be used to analyze large RT-qPCR datasets and establish complex empirical gene relational structures across multiple experimental conditions.
Project description:With the abundant mammalian lncRNAs identified recently, a comprehensive annotation resource for these novel lncRNAs is an urgent need. Since its first release in November 2016, AnnoLnc has been the only online server for comprehensively annotating novel human lncRNAs on-the-fly. Here, with significant updates to multiple annotation modules, backend datasets and the code base, AnnoLnc2 continues the effort to provide the scientific community with a one-stop online portal for systematically annotating novel human and mouse lncRNAs with a comprehensive functional spectrum covering sequences, structure, expression, regulation, genetic association and evolution. In response to numerous requests from multiple users, a standalone package is also provided for large-scale offline analysis. We believe that updated AnnoLnc2 (http://annolnc.gao-lab.org/) will help both computational and bench biologists identify lncRNA functions and investigate underlying mechanisms.
Project description:Although recent data suggest that some long non-coding RNAs (lncRNAs) exert widespread effects on gene expression and organelle formation, lncRNAs as a group constitute a sizable but poorly characterized fraction of the human transcriptome. We investigated whether some human lncRNA sequences were fortuitously represented on commonly used microarrays, then used this annotation to assess lncRNA expression in human brain. A computational and annotation pipeline was developed to identify lncRNA transcripts represented on Affymetrix U133 arrays. A previously published dataset derived from human nucleus accumbens was then examined for potential lncRNA expression. Twenty-three lncRNAs were determined to be represented on U133 arrays. Of these, dataset analysis revealed that five lncRNAs were consistently detected in samples of human nucleus accumbens. Strikingly, the abundance of these lncRNAs was up-regulated in human heroin abusers compared to matched drug-free control subjects, a finding confirmed by quantitative PCR. This study presents a paradigm for examining existing Affymetrix datasets for the detection and potential regulation of lncRNA expression, including changes associated with human disease. The finding that all detected lncRNAs were up-regulated in heroin abusers is consonant with the proposed role of lncRNAs as mediators of widespread changes in gene expression as occur in drug abuse.