ABSTRACT: The crop expressed sequence tag database, CR-EST (http://pgrc.ipk-gatersleben.de/cr-est/), is a publicly available online resource providing access to sequence, classification, clustering and annotation data of crop EST projects. CR-EST currently holds more than 200,000 sequences derived from 41 cDNA libraries of four species: barley, wheat, pea and potato. The barley section comprises approximately one-third of all publicly available ESTs. CR-EST deploys an automatic EST preparation pipeline that includes the identification of chimeric clones in order to transparently display the data quality. Sequences are clustered in species-specific projects to currently generate a non-redundant set of approximately 22,600 consensus sequences and approximately 17,200 singletons, which form the basis of the provided set of unigenes. A web application allows the user to compute BLAST alignments of query sequences against the CR-EST database, query data from Gene Ontology and metabolic pathway annotations and query sequence similarities from stored BLAST results. CR-EST also features interactive JAVA-based tools, allowing the visualization of open reading frames and the explorative analysis of Gene Ontology mappings applied to ESTs.
Project description:BACKGROUND: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users. RESULTS: A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects. CONCLUSIONS: The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http://titan.biotec.uiuc.edu/ESTIMA/. The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.
Project description:Fusarium culmorum is one of the most common and globally important causal agent of root and crown rot diseases of cereals. These diseases cause grain yield loss and reduced grain quality in barley. In this study, we have analyzed an expressed sequence tag (EST) database derived from F. culmorum infected barley root tissues available at the National Center for Biotechnology Information (NCBI). The 2294 sequences were assembled into 1619 non-redundant sequences consisting of 359 contigs and 1260 singletons using the program CAP3. BLASTX analysis for these sequences was conducted in order to find similar sequences in all databases. Gene Ontology search, enzyme search, KEGG mapping and InterProScan search were done using Blast2GO 3.0.7 tool. By BLASTX analysis, 41.7%, 7.7%, 3.2% and 47.4% of ESTs were categorized as annotated, unannotated, not mapping and without blast hits, respectively. BLASTX analysis revealed that the majority of top hits were barley proteins (43.5%). Based on Gene Ontology classification, 38.3%, 31.3%, and 16% of ESTs were assigned to molecular function, biological process, and cellular component GO terms, respectively. Most abundant GO terms were as follows: 157 sequences were related to response to stress (biological process), 207 sequences were related to ion binding (molecular function), and 160 sequences were related to plastid (cellular component). Furthermore, based on KEGG mapping, 369 sequences could be assigned to 264 enzymes and 83 different KEGG pathways. According to Enzyme Commission (EC) distribution; 94 sequences were transferases (EC2) while 70 sequences were hydrolases (EC3).
Project description:Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000-100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.
Project description:GrainGenes, http://www.graingenes.org, is the international database for the wheat, barley, rye and oat genomes. For these species it is the primary repository for information about genetic maps, mapping probes and primers, genes, alleles and QTLs. Documentation includes such data as primer sequences, polymorphism descriptions, genotype and trait scoring data, experimental protocols used, and photographs of marker polymorphisms, disease symptoms and mutant phenotypes. These data, curated with the help of many members of the research community, are integrated with sequence and bibliographic records selected from external databases and results of BLAST searches of the ESTs. Records are linked to corresponding records in other important databases, e.g. Gramene's EST homologies to rice BAC/PACs, TIGR's Gene Indices and GenBank. In addition to this information within the GrainGenes database itself, the GrainGenes homepage at http://wheat.pw.usda.gov provides many other community resources including publications (the annual newsletters for wheat, barley and oat, monographs and articles), individual datasets (mapping and QTL studies, polymorphism surveys, variety performance evaluations), specialized databases (Triticeae repeat sequences, EST unigene sets) and pages to facilitate coordination of cooperative research efforts in specific areas such as SNP development, EST-SSRs and taxonomy. The goal is to serve as a central point for obtaining and contributing information about the genetics and biology of these cereal crops.
Project description:BACKGROUND:We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human), lagomorphs (rabbit), rodents (rat and mouse), artiodactyls (cow), carnivorans (dog) and proboscideans (elephant). METHODOLOGY/PRINCIPAL FINDINGS:We have produced 2000 ESTs (1.2 mega bases) from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS:The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.
Project description:BACKGROUND: Genome sequencing of barley has been delayed due to its large genome size (ca. 5,000 Mbp). Among the fast sequencing systems, 454 liquid phase pyrosequencing provides the longest reads and is the most promising method for BAC clones. Here we report the results of pooled sequencing of BAC clones selected with ESTs genetically mapped to chromosome 3H. RESULTS: We sequenced pooled barley BAC clones using a 454 parallel genome sequencer. A PCR screening system based on primer sets derived from genetically mapped ESTs on chromosome 3H was used for clone selection in a BAC library developed from cultivar "Haruna Nijo". The DNA samples of 10 or 20 BAC clones were pooled and used for shotgun library development. The homology between contig sequences generated in each pooled library and mapped EST sequences was studied. The number of contigs assigned on chromosome 3H was 372. Their lengths ranged from 1,230 bp to 58,322 bp with an average 14,891 bp. Of these contigs, 240 showed homology and colinearity with the genome sequence of rice chromosome 1. A contig annotation browser supplemented with query search by unique sequence or genetic map position was developed. The identified contigs can be annotated with barley cDNAs and reference sequences on the browser. Homology analysis of these contigs with rice genes indicated that 1,239 rice genes can be assigned to barley contigs by the simple comparison of sequence lengths in both species. Of these genes, 492 are assigned to rice chromosome 1. CONCLUSIONS: We demonstrate the efficiency of sequencing gene rich regions from barley chromosome 3H, with special reference to syntenic relationships with rice chromosome 1.
Project description:An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). There is also growing evidence to suggest that antisense transcription could play a key role in many human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output. In order to perform an analysis, the program requires a genomic sequence plus an associated list of transcript names and coordinates of the genomic region. After masking the repeated regions, the program carries out a BLASTN search of this sequence in the selected EST database, reporting via email the EST entries that reveal an antisense transcript according to the user-supplied list. Here, we present the newly developed version 2.0 of the AntiHunter tool. Several improvements have been added to this version of the program in order to increase its ability to detect a larger number of antisense ESTs. As a result, AntiHunter can now detect, on average, >45% more antisense ESTs with little or no increase in the percentage of the false positives. We also raised the maximum query size to 3 Mb (previously 1 Mb). Moreover, we found that a reasonable trade-off between the program search sensitivity and the maximum allowed size of the input-query sequence could be obtained by querying the database with the MEGABLAST program, rather than by using the BLAST one. We now offer this new opportunity to users, i.e. if choosing the MEGABLAST option, users can input a query sequence up to 30 Mb long, thus considerably improving the possibility to analyze longer query regions. The AntiHunter tool is freely available at http://bioinfo.crs4.it/AH2.0.
Project description:Functional genomics has proven to be an efficient tool in identifying genes involved in various biological functions. However the availability of commercially important seaweed Eucheuma denticulatum functional resources is still limited. EuDBase is the first seaweed online repository that provides integrated access to ESTs of Eucheuma denticulatum generated from samples collected from Kudat and Semporna in Sabah, Malaysia. The database stored 10,031 ESTs that are clustered and assembled into 2,275 unique transcripts (UT) and 955 singletons. Raw data were automatically processed using ESTFrontier, an in-house automated EST analysis pipeline. Data was collected in MySQL database. Web interface is implemented using PHP and it allows browsing and querying EuDBase through search engine. Data is searchable via BLAST hit, domain search, Gene Ontology or KEGG Pathway. A user-friendly interface allows the identification of sequences either using a simple text query or similarity search. The development of EuDBase is initiated to store, manage and analyze the E. denticulatum ESTs and to provide accumulative digital resources for the use of global scientific community. EuDBase is freely available from http://www.inbiosis.ukm.my/eudbase/.
Project description:BACKGROUND: EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. RESULTS: We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. CONCLUSION: EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at http://biozon.org/tools/est/.
Project description:BACKGROUND: The recent rapid accumulation of sequence resources of various crop species ensures an improvement in the genetics approach, including quantitative trait loci (QTL) analysis as well as the holistic population analysis and association mapping of natural variations. Because the tribe Triticeae includes important cereals such as wheat and barley, integration of information on the genetic markers in these crops should effectively accelerate map-based genetic studies on Triticeae species and lead to the discovery of key loci involved in plant productivity, which can contribute to sustainable food production. Therefore, informatics applications and a semantic knowledgebase of genome-wide markers are required for the integration of information on and further development of genetic markers in wheat and barley in order to advance conventional marker-assisted genetic analyses and population genomics of Triticeae species. DESCRIPTION: The Triticeae mapped expressed sequence tag (EST) database (TriMEDB) provides information, along with various annotations, regarding mapped cDNA markers that are related to barley and their homologues in wheat. The current version of TriMEDB provides map-location data for barley and wheat ESTs that were retrieved from 3 published barley linkage maps (the barley single nucleotide polymorphism database of the Scottish Crop Research Institute, the barley transcript map of Leibniz Institute of Plant Genetics and Crop Plant Research, and HarvEST barley ver. 1.63) and 1 diploid wheat map. These data were imported to CMap to allow the visualization of the map positions of the ESTs and interrelationships of these ESTs with public gene models and representative cDNA sequences. The retrieved cDNA sequences corresponding to each EST marker were assigned to the rice genome to predict an exon-intron structure. Furthermore, to generate a unique set of EST markers in Triticeae plants among the public domain, 3472 markers were assembled to form 2737 unique marker groups as contigs. These contigs were applied for pairwise comparison among linkage maps obtained from different EST map resources. CONCLUSION: TriMEDB provides information regarding transcribed genetic markers and functions as a semantic knowledgebase offering an informatics facility for the acceleration of QTL analysis and for population genetics studies of Triticeae.