MELOGEN: an EST database for melon functional genomics.
ABSTRACT: BACKGROUND:Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. RESULTS:We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. CONCLUSION:The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome.
Project description:BACKGROUND:Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. RESULT:We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. CONCLUSION:The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.
Project description:BACKGROUND: Cucurbita pepo belongs to the Cucurbitaceae family. The "Zucchini" types rank among the highest-valued vegetables worldwide, and other C. pepo and related Cucurbita spp., are food staples and rich sources of fat and vitamins. A broad range of genomic tools are today available for other cucurbits that have become models for the study of different metabolic processes. However, these tools are still lacking in the Cucurbita genus, thus limiting gene discovery and the process of breeding. RESULTS: We report the generation of a total of 512,751 C. pepo EST sequences, using 454 GS FLX Titanium technology. ESTs were obtained from normalized cDNA libraries (root, leaves, and flower tissue) prepared using two varieties with contrasting phenotypes for plant, flowering and fruit traits, representing the two C. pepo subspecies: subsp. pepo cv. Zucchini and subsp. ovifera cv Scallop. De novo assembling was performed to generate a collection of 49,610 Cucurbita unigenes (average length of 626 bp) that represent the first transcriptome of the species. Over 60% of the unigenes were functionally annotated and assigned to one or more Gene Ontology terms. The distributions of Cucurbita unigenes followed similar tendencies than that reported for Arabidopsis or melon, suggesting that the dataset may represent the whole Cucurbita transcriptome. About 34% unigenes were detected to have known orthologs of Arabidopsis or melon, including genes potentially involved in disease resistance, flowering and fruit quality. Furthermore, a set of 1,882 unigenes with SSR motifs and 9,043 high confidence SNPs between Zucchini and Scallop were identified, of which 3,538 SNPs met criteria for use with high throughput genotyping platforms, and 144 could be detected as CAPS. A set of markers were validated, being 80% of them polymorphic in a set of variable C. pepo and C. moschata accessions. CONCLUSION: We present the first broad survey of gene sequences and allelic variation in C. pepo, where limited prior genomic information existed. The transcriptome provides an invaluable new tool for biological research. The developed molecular markers are the basis for future genetic linkage and quantitative trait loci analysis, and will be essential to speed up the process of breeding new and better adapted squash varieties.
Project description:BACKGROUND: Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. RESULTS: We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. CONCLUSION: We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.
Project description:BACKGROUND: Common bean (Phaseolus vulgaris) is the most important food legume in the world. Although this crop is very important to both the developed and developing world as a means of dietary protein supply, resources available in common bean are limited. Global transcriptome analysis is important to better understand gene expression, genetic variation, and gene structure annotation in addition to other important features. However, the number and description of common bean sequences are very limited, which greatly inhibits genome and transcriptome research. Here we used 454 pyrosequencing to obtain a substantial transcriptome dataset for common bean. RESULTS: We obtained 1,692,972 reads with an average read length of 207 nucleotides (nt). These reads were assembled into 59,295 unigenes including 39,572 contigs and 19,723 singletons, in addition to 35,328 singletons less than 100 bp. Comparing the unigenes to common bean ESTs deposited in GenBank, we found that 53.40% or 31,664 of these unigenes had no matches to this dataset and can be considered as new common bean transcripts. Functional annotation of the unigenes carried out by Gene Ontology assignments from hits to Arabidopsis and soybean indicated coverage of a broad range of GO categories. The common bean unigenes were also compared to the bean bacterial artificial chromosome (BAC) end sequences, and a total of 21% of the unigenes (12,724) including 9,199 contigs and 3,256 singletons match to the 8,823 BAC-end sequences. In addition, a large number of simple sequence repeats (SSRs) and transcription factors were also identified in this study. CONCLUSIONS: This work provides the first large scale identification of the common bean transcriptome derived by 454 pyrosequencing. This research has resulted in a 150% increase in the number of Phaseolus vulgaris ESTs. The dataset obtained through this analysis will provide a platform for functional genomics in common bean and related legumes and will aid in the development of molecular markers that can be used for tagging genes of interest. Additionally, these sequences will provide a means for better annotation of the on-going common bean whole genome sequencing.
Project description:We report the construction of an oligo-based microarray using a total of 17,510 unigenes derived from 30,675 high-quality melon ESTs. This chip is particularly enriched with genes that are expressed in fruit and during interaction with pathogens. Hybridizations for three independent experiments allowed the characterization of global gene expression profiles during fruit ripening, as well as in response to viral and fungal infections in plant cotyledons and roots, respectively. Microarray construction, statistical analyses and validation together with functional-enrichment analysis are presented in this study.
Project description:A normalized full-length cDNA library was constructed from the coralloid roots of Cycas debaoensis by the DSN (duplex-specific nuclease) normalization method combined with the SMART (Switching Mechanism At 5' end of the RNA Transcript) technique. The titer of the original cDNA library was about 1.5 × 106 cfu·mL-1 and the average insertion size was about 1 kb with a high recombination rate (97%). The 5011 high-quality expressed sequence tags (ESTs) were obtained from 5393 randomly picked cDNA clones. Clustering and assembly of ESTs resulted in 2984 unique sequences, consisting of 618 contigs and 2366 singlets. EST sequence annotation revealed that 2333 and 1901 unigenes were functionally annotated in the NCBI non-redundant database and Swiss-Prot protein database, respectively. Functional analysis demonstrated that 1495 (50.1%) unigenes were associated with 4082 Gene Ontology (GO) terms. A total of 847 unigenes were grouped into 22 Cluster of Orthologous Groups (COG) functional categories. Based on the EST dataset, 22 ESTs that encoded putative receptor-like protein kinase (RLK) genes were screened. Furthermore, a total of 94 simple sequence repeats (SSRs) were discovered, of which 20 loci were successfully amplified in C. debaoensis. This study is the first EST analysis for the coralloid roots of C. debaoensis and provides a valuable genomic resource for novel gene discovery, gene expression and comparative genomics, conservation and management studies as well as applications in C. debaoensis and related cycad species.
Project description:An investigation into proteins involved in chemosensory perception in the melon fly, Bactrocera cucurbitae (Diptera: Tephritidae) is described here using a newly generated transcriptome dataset. The melon fly is a major agricultural pest, widely distributed in the Asia-Pacific region and some parts of Africa. For this study, a transcriptome dataset was generated using RNA extracted from 4-day-old adult specimens of the melon fly. The dataset was assembled and annotated via Gene Ontology (GO) analysis. Based on this and similarity searches to data from other species, a number of protein sequences putatively involved in chemosensory reception were identified and characterized in the melon fly. This included the highly conserved "Orco" along with a number of other less conserved odorant binding protein sequences. In addition, several sequences representing putative ionotropic and gustatory receptors were also identified. This study provides a foundation for future functional studies of chemosensory proteins in the melon fly and for making more detailed comparisons to other species. In the long term, this will ultimately help in the development of improved tools for pest management.
Project description:BACKGROUND: The phenomenon of desiccation tolerance, also called anhydrobiosis, involves the ability of an organism to survive the loss of almost all cellular water without sustaining irreversible damage. Although there are several physiological, morphological and ecological studies on tardigrades, only limited DNA sequence information is available. Therefore, we explored the transcriptome in the active and anhydrobiotic state of the tardigrade Milnesium tardigradum which has extraordinary tolerance to desiccation and freezing. In this study, we present the first overview of the transcriptome of M. tardigradum and its response to desiccation and discuss potential parallels to stress responses in other organisms. RESULTS: We sequenced a total of 9984 expressed sequence tags (ESTs) from two cDNA libraries from the eutardigrade M. tardigradum in its active and inactive, anhydrobiotic (tun) stage. Assembly of these ESTs resulted in 3283 putative unique transcripts, whereof approximately 50% showed significant sequence similarity to known genes. The resulting unigenes were functionally annotated using the Gene Ontology (GO) vocabulary. A GO term enrichment analysis revealed several GOs that were significantly underrepresented in the inactive stage. Furthermore we compared the putative unigenes of M. tardigradum with ESTs from two other eutardigrade species that are available from public sequence databases, namely Richtersius coronifer and Hypsibius dujardini. The processed sequences of the three tardigrade species revealed similar functional content and the M. tardigradum dataset contained additional sequences from tardigrades not present in the other two. CONCLUSIONS: This study describes novel sequence data from the tardigrade M. tardigradum, which significantly contributes to the available tardigrade sequence data and will help to establish this extraordinary tardigrade as a model for studying anhydrobiosis. Functional comparison of active and anhydrobiotic tardigrades revealed a differential distribution of Gene Ontology terms associated with chromatin structure and the translation machinery, which are underrepresented in the inactive animals. These findings imply a widespread metabolic response of the animals on dehydration. The collective tardigrade transcriptome data will serve as a reference for further studies and support the identification and characterization of genes involved in the anhydrobiotic response.
Project description:BACKGROUND: Although melon (Cucumis melo L.) is an economically important fruit crop, no genome-wide sequence information is openly available at the current time. We therefore sequenced BAC-ends representing a total of 33,024 clones, half of them from a previously described melon BAC library generated with restriction endonucleases and the remainder from a new random-shear BAC library. RESULTS: We generated a total of 47,140 high-quality BAC-end sequences (BES), 91.7% of which were paired-BES. Both libraries were assembled independently and then cross-assembled to obtain a final set of 33,372 non-redundant, high-quality sequences. These were grouped into 6,411 contigs (4.5 Mb) and 26,961 non-assembled BES (14.4 Mb), representing ~4.2% of the melon genome. The sequences were used to screen genomic databases, identifying 7,198 simple sequence repeats (corresponding to one microsatellite every 2.6 kb) and 2,484 additional repeats of which 95.9% represented transposable elements. The sequences were also used to screen expressed sequence tag (EST) databases, revealing 11,372 BES that were homologous to ESTs. This suggests that ~30% of the melon genome consists of coding DNA. We observed regions of microsynteny between melon paired-BES and six other dicotyledonous plant genomes. CONCLUSION: The analysis of nearly 50,000 BES from two complementary genomic libraries covered ~4.2% of the melon genome, providing insight into properties such as microsatellite and transposable element distribution, and the percentage of coding DNA. The observed synteny between melon paired-BES and six other plant genomes showed that useful comparative genomic data can be derived through large scale BAC-end sequencing by anchoring a small proportion of the melon genome to other sequenced genomes.
Project description:BACKGROUND: Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. RESULTS: To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. CONCLUSION: Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.