Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon.
ABSTRACT: BACKGROUND:Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. RESULT:We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. CONCLUSION:The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.
Project description:BACKGROUND: Populus is one of favorable model plants because of its small genome. Structural genomics of Populus has reached a breakpoint as nucleotides of the entire genome have been determined. Reaching the post genome era, functional genomics of Populus is getting more important for well-comprehended plant science. Development of bioresorce serving functional genomics is making rapid progress. Huge efforts have achieved deposits of expressed sequence tags (ESTs) in various plant species consequently accelerating functional analysis of genes. ESTs from full-length cDNA clones are especially powerful for accurate molecular annotation. We promoted collection and annotation of the ESTs from Populus full-length enriched cDNA clones as part of functional genomics of tree species. RESULTS: We have been collecting the full-length enriched cDNA of the female poplar (Populus nigra var. italica) for years. By sequencing P. nigra full-length (PnFL) cDNA libraries, we generated about 116,000 5'-end or 3'-end ESTs corresponding to 19,841 nonredundant PnFL clones. Population of PnFL cDNA clones represents 44% of the predicted genes in the Populus genome. CONCLUSION: Our resource of P. nigra full-length enriched clones is expected to provide valuable tools to gain further insight into genome annotation and functional genomics in Populus.
Project description:BACKGROUND:Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. RESULTS:We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. CONCLUSION:The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome.
Project description:BACKGROUND: Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs). RESULTS: A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 10(6) clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. CONCLUSIONS: The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding for diverse biological functions including oil biosynthesis were identified. These genes will serve as invaluable genetic resource for crop improvement in jatropha to make it an ideal and profitable crop for biodiesel production.
Project description:<h4>Background</h4>Previous transcriptome profiling studies have investigated the molecular mechanisms of pollen and anther development, and identified many genes involved in these processes. However, only 51 anther ESTs of Upland cotton (Gossypium hirsutum) were found in NCBI and there have been no reports of transcriptome profiling analyzing anther development in Upland cotton, a major fiber crop in the word.<h4>Methodology/principal finding</h4>Ninety-eight hundred and ninety-six high quality ESTs were sequenced from their 3'-ends and assembled into 6,643 unigenes from a normalized, full-length anther cDNA library of Upland cotton. Combined with previous sequenced anther-related ESTs, 12,244 unigenes were generated as the reference genes for digital gene expression (DGE) analysis. The DGE was conducted on anthers that were isolated at tetrad pollen (TTP), uninucleate pollen (UNP), binucleate pollen (BNP) and mature pollen (MTP) periods along with four other tissues, i.e., roots (RO), stems (ST), leaves (LV) and embryos (EB). Through transcriptome profiling analysis, we identified 1,165 genes that were enriched at certain anther development periods, and many of them were involved in starch and sucrose metabolism, pentose and glucuronate interconversion, flavonoid biosynthesis, and ascorbate and aldarate metabolism.<h4>Conclusions/significance</h4>We first generated a normalized, full-length cDNA library from anthers and performed transcriptome profiling analysis of anther development in Upland cotton. From these results, 10,178 anther expressed genes were identified, among which 1,165 genes were stage-enriched in anthers. And many of these stage-enriched genes were involved in some important processes regulating anther development.
Project description:BACKGROUND: Cucurbita pepo belongs to the Cucurbitaceae family. The "Zucchini" types rank among the highest-valued vegetables worldwide, and other C. pepo and related Cucurbita spp., are food staples and rich sources of fat and vitamins. A broad range of genomic tools are today available for other cucurbits that have become models for the study of different metabolic processes. However, these tools are still lacking in the Cucurbita genus, thus limiting gene discovery and the process of breeding. RESULTS: We report the generation of a total of 512,751 C. pepo EST sequences, using 454 GS FLX Titanium technology. ESTs were obtained from normalized cDNA libraries (root, leaves, and flower tissue) prepared using two varieties with contrasting phenotypes for plant, flowering and fruit traits, representing the two C. pepo subspecies: subsp. pepo cv. Zucchini and subsp. ovifera cv Scallop. De novo assembling was performed to generate a collection of 49,610 Cucurbita unigenes (average length of 626 bp) that represent the first transcriptome of the species. Over 60% of the unigenes were functionally annotated and assigned to one or more Gene Ontology terms. The distributions of Cucurbita unigenes followed similar tendencies than that reported for Arabidopsis or melon, suggesting that the dataset may represent the whole Cucurbita transcriptome. About 34% unigenes were detected to have known orthologs of Arabidopsis or melon, including genes potentially involved in disease resistance, flowering and fruit quality. Furthermore, a set of 1,882 unigenes with SSR motifs and 9,043 high confidence SNPs between Zucchini and Scallop were identified, of which 3,538 SNPs met criteria for use with high throughput genotyping platforms, and 144 could be detected as CAPS. A set of markers were validated, being 80% of them polymorphic in a set of variable C. pepo and C. moschata accessions. CONCLUSION: We present the first broad survey of gene sequences and allelic variation in C. pepo, where limited prior genomic information existed. The transcriptome provides an invaluable new tool for biological research. The developed molecular markers are the basis for future genetic linkage and quantitative trait loci analysis, and will be essential to speed up the process of breeding new and better adapted squash varieties.
Project description:BACKGROUND: Salinization causes negative effects on plant productivity and poses an increasingly serious threat to the sustainability of agriculture. Wild soybean (Glycine soja) can survive in highly saline conditions, therefore provides an ideal candidate plant system for salt tolerance gene mining. RESULTS: As a first step towards the characterization of genes that contribute to combating salinity stress, we constructed a full-length cDNA library of Glycine soja (50109) leaf treated with 150 mM NaCl, using the SMART technology. Random expressed sequence tag (EST) sequencing of 2,219 clones produced 2,003 cleaned ESTs for gene expression analysis. The average read length of cleaned ESTs was 454 bp, with an average GC content of 40%. These ESTs were assembled using the PHRAP program to generate 375 contigs and 696 singlets. The resulting unigenes were categorized according to the Gene Ontology (GO) hierarchy. The potential roles of gene products associated with stress related ESTs were discussed. We compared the EST sequences of Glycine soja to that of Glycine max by using the blastn algorithm. Most expressed sequences from wild soybean exhibited similarity with soybean. All our EST data are available on the Internet (GenBank_Accn: DT082443-DT084445). CONCLUSION: The Glycine soja ESTs will be used to mine salt tolerance gene, whose full-length cDNAs will be obtained easily from the full-length cDNA library. Comparison of Glycine soja ESTs with those of Glycine max revealed the potential to investigate the wild soybean's expression profile using the soybean's gene chip. This will provide opportunities to understand the genetic mechanisms underlying stress response of plants.
Project description:BACKGROUND: Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. RESULTS: We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. CONCLUSION: The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species.
Project description:BACKGROUND: There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs. RESULTS: EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) x 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars. CONCLUSION: This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 x 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon accessions as efficiently as with SSR markers, and these markers may also be useful for cultivar identification in Occidental melon varieties.
Project description:BACKGROUND: Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. RESULTS: Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. CONCLUSIONS: The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean.
Project description:BACKGROUND: The obligate parasitic plant witchweed (Striga hermonthica) infects major cereal crops such as sorghum, maize, and millet, and is the most devastating weed pest in Africa. An understanding of the nature of its parasitism would contribute to the development of more sophisticated management methods. However, the molecular and genomic resources currently available for the study of S. hermonthica are limited. RESULTS: We constructed a full-length enriched cDNA library of S. hermonthica, sequenced 37,710 clones from the library, and obtained 67,814 expressed sequence tag (EST) sequences. The ESTs were assembled into 17,317 unigenes that included 10,319 contigs and 6,818 singletons. The S. hermonthica unigene dataset was subjected to a comparative analysis with other plant genomes or ESTs. Approximately 80% of the unigenes have homologs in other dicotyledonous plants including Arabidopsis, poplar, and grape. We found that 589 unigenes are conserved in the hemiparasitic Triphysaria species but not in other plant species. These are good candidates for genes specifically involved in plant parasitism. Furthermore, we found 1,445 putative simple sequence repeats (SSRs) in the S. hermonthica unigene dataset. We tested 64 pairs of PCR primers flanking the SSRs to develop genetic markers for the detection of polymorphisms. Most primer sets amplified polymorphicbands from individual plants collected at a single location, indicating high genetic diversity in S. hermonthica. We selected 10 primer pairs to analyze S. hermonthica harvested in the field from different host species and geographic locations. A clustering analysis suggests that genetic distances are not correlated with host specificity. CONCLUSIONS: Our data provide the first extensive set of molecular resources for studying S. hermonthica, and include EST sequences, a comparative analysis with other plant genomes, and useful genetic markers. All the data are stored in a web-based database and freely available. These resources will be useful for genome annotation, gene discovery, functional analysis, molecular breeding, epidemiological studies, and studies of plant evolution.