De novo transcriptome assembly of Sorghum bicolor variety Taejin.
ABSTRACT: Sorghum (Sorghum bicolor), also known as great millet, is one of the most popular cultivated grass species in the world. Sorghum is frequently consumed as food for humans and animals as well as used for ethanol production. In this study, we conducted de novo transcriptome assembly for sorghum variety Taejin by next-generation sequencing, obtaining 8.748 GB of raw data. The raw data in this study can be available in NCBI SRA database with accession number of SRX1715644. Using the Trinity program, we identified 222,161 transcripts from sorghum variety Taejin. We further predicted coding regions within the assembled transcripts by the TransDecoder program, resulting in a total of 148,531 proteins. We carried out BLASTP against the Swiss-Prot protein sequence database to annotate the functions of the identified proteins. To our knowledge, this is the first transcriptome data for a sorghum variety derived from Korea, and it can be usefully applied to the generation of genetic markers.
Project description:Foxtail millet (Setaria italica) belonging to the family Poaceae is an important millet that is widely cultivated in East Asia. Of the cultivated millets, the foxtail millet has the longest history and is one of the main food crops in South India and China. Moreover, foxtail millet is a model plant system for biofuel generation utilizing the C4 photosynthetic pathway. In this study, we carried out de novo transcriptome assembly for the foxtail millet variety Taejin collected from Korea using next-generation sequencing. We obtained a total of 8.676 GB raw data by paired-end sequencing. The raw data in this study can be available in NCBI SRA database with accession number of SRR3406552. The Trinity program was used to de novo assemble 145,332 transcripts. Using the TransDecoder program, we predicted 82,925 putative proteins. BLASTP was performed against the Swiss-Prot protein sequence database to annotate the functions of identified proteins, resulting in 20,555 potentially novel proteins. Taken together, this study provides transcriptome data for the foxtail millet variety Taejin by RNA-Seq.
Project description:The adzuki bean (Vigna angularis), a member of the family Fabaceae, is widely grown in Asia, from East Asia to the Himalayas. The adzuki bean is known as an ingredient that adds sweetness to diverse desserts made in Eastern Asian countries. Libraries prepared from two V. angularis varieties referred to as Taejin Black and Taejin Red were paired-end sequenced using the Illumina HiSeq 2000 system. The raw data in this study can be available in NCBI SRA database with accession numbers of SRR3406660 and SRR3406553. After de novo transcriptome assembly using Trinity, we obtained 324,219 and 280,056 transcripts from Taejin Black and Taejin Red, respectively. We predicted a total of 238,321 proteins and 179,519 proteins for Taejin Black and Taejin Red, respectively, by the TransDecoder program. We carried out BLASTP on the predicted proteins against the Swiss-Prot protein sequence database to predict the putative functions of identified proteins. Taken together, we provide transcriptomes of two adzuki bean varieties by RNA-Seq, which might be usefully applied to generate molecular markers.
Project description:Apricot (Prunus armeniaca) belonging to the Prunus species is a popular kind of stone fruit tree. Apricot is native to Armenia and is currently cultivated in many countries with climates adaptable for apricot growth. In general, fresh fruits as well as dried apricot are produced. However, the information associated with genes and genetic markers for apricot is very limited. In this study, we carried out de novo transcriptome assembly for two selected apricot cultivars referred to as Harcot and Ungarische Beste, which are commercially important apricot cultivars in the world, using next generation sequencing. We obtained a total of 9.31 GB and 8.88 GB raw data from Harcot and Ungarische Beste (NCBI accession numbers: SRX1186946 and SRX1186893), respectively. De novo transcriptome assembly using Trinity identified 147,501 and 152,235 transcripts for Harcot and Ungarische Beste, respectively. Next, we identified 113,565 and 126,444 proteins from Harcot and Ungarische Beste using the TransDecoder program. We performed BLASTP against an NCBI non-redundant (nr) dataset to annotate identified proteins. Taken together, we provide transcriptomes of two different apricot cultivars by RNA-Seq.
Project description:The present study is expected to reveal differentially expressed genes under drought stress of Sorghum bicolor. The seeds of Sorghum genotype drought tolerant (DT) were grown at 28-32°C day/night temperature with 12/12 h light/dark period in the phytotron glass house. The fully opened uppermost leaves from control and drought stressed seedlings were sampled and stored at -80°C. For RNA-Seq libraries, one microgram of total RNA was extracted with Trizol reagent (Invitrogen, USA) and mRNA libraries were produced using the TruSeq mRNA-Seq library kit (Illumina) according to manufacturer’s instructions. The libraries generated were quantitated using an Agilent Bioanalyzer DNA 1000 chip. (Agilent Technologies, Santa Clara, CA) and a 2x101 cycle paired end sequencing (sequenced by Sandor Pvt. Ltd., Hyderabad, India) was performed using an Illumina HiScanSQ sequencer (Illumina Inc.). Initially, raw reads were processed by NGSQC toolkit (http://184.108.40.206:8080/ngsqctoolkit/) and high quality reads were subjected to de-novo assembly using Trinity assembler (Patel and Jain, 2012). Assembled transcripts were quantified by standard pipeline (Trinity→RSEM→R→DESeq) and those transcripts were removed which has zero FPKM in all four samples (Anders, 2010; Grabherr, et al., 2011; Li and Dewey, 2011). These transcripts were further processed by transdecoder tool to retrieve full length coding sequence and subsequent annotated by FastAnnotator (http://fastannotator.cgu.edu.tw/index.php) (Chen, et al., 2012). Pathway enrichment analysis was performed for the predicted transcripts by KEGG Automatic Annotation Server (KAAS; www.genome.jp/tools/kaas/) for the classification of spatial and temporally governed pathways.
Project description:Prunus mume, belonging to the Prunus genus, is an Asian tree, and its common names are Chinese plum and Japanese plum. P. mume are cultivated for fruit production as well as ornamental purposes. In this study, we conducted de novo transcriptome assembly for two selected P. mume cultivars referred to as Takada and Wallyoung (commercially important cultivars for fruit production and ornamental trees, respectively) by RNA-sequencing. We obtained 9.14 GB and 9.48 GB sequence data from Takada and Wallyoung (NCBI accession numbers: SRX1187101 and SRX1187169), respectively. De novo transcriptome assembly identified 130,989 and 116,941 transcripts for Takada and Wallyoung, respectively. In addition, we identified 96,681 and 91,429 proteins from Takada and Wallyoung, respectively, by TransDecoder program. We performed BLASTP against the NCBI non-redundant (nr) datasets to annotate identified proteins. This study provides transcriptomes and proteomes for two different P. mume cultivars, which might be useful for comparative transcriptome analyses and assist development of genetic markers.
Project description:The shrimp Palaemon serratus is a coastal decapod crustacean with a high commercial value. It is harvested for human consumption. In this study, we used Illumina sequencing technology (HiSeq 2000) to sequence, assemble and annotate the transcriptome of P. serratus. RNA was isolated from muscle of adults individuals and, from a pool of larvae. A total number of 4 cDNA libraries were constructed, using the TruSeq RNA Sample Preparation Kit v2. The raw data in this study was deposited in NCBI SRA database with study accession number of SRP090769. The obtained data were subjected to de novo transcriptome assembly using Trinity software, and coding regions were predicted by TransDecoder. We used Blastp and Sma3s to annotate the identified proteins. The transcriptome data could provide some insight into the understanding of genes involved in the larval development and metamorphosis. SPECIFICATIONS:[Table: see text].
Project description:Sorghum bicolor is one of the most important crops for food and bioethanol production. Its small diploid genome and resistance to environmental stress make sorghum an attractive model for studying the functional genomics of the Saccharinae and other C4 grasses. We analyzed the domain-based functional annotation of the cDNAs using the gene ontology (GO) categories for molecular function to characterize all the genes cloned in the full-length cDNA library of sorghum. The sorghum cDNA library successfully captured a wide range of cDNA-encoded proteins with various functions. To characterize the protein function of newly identified cDNAs, a search of their deduced domains and comparative analyses in the Oryza sativa and Zea mays genomes were carried out. Furthermore, genes on the sense strand corresponding to antisense transcripts were classified based on the GO of molecular function. To add more information about these genes, we have analyzed the expression profiles using RNA-Seq of three tissues (spikelet, seed and stem) during the starch-filling phase. We performed functional analysis of tissue-specific genes and expression analysis of genes of starch biosynthesis enzymes. This functional analysis of sorghum full-length cDNAs and the transcriptome information will facilitate further analysis of the Saccharinae and grass families.
Project description:This experiment contains the subset of data corresponding to sorghum RNA-Seq data from experiment E-GEOD-50464 (http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-50464/), which goal is to examine the transcriptome of various Sorghum bicolor (BTx623) tissues: flowers, vegetative and floral meristems, embryos, roots and shoots. Thus, we expanded the existing transcriptome atlas for sorghum by conducting RNA-Seq analysis on meristematic tissues, florets, and embryos, and these data sets have been used to improve on the existing community structural annotations.
Project description:Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression.This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes.These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.
Project description:This study utilized next generation sequencing technology (RNA-Seq and BS-Seq) to examine the transcriptome and methylome of various tissues within sorghum plants with the ultimate goal of improving the Sorghum bicolor annotation We examined the mRNA of various Sorghum bicolor (BTx623) tissues (flowers, vegitative and floral meristems, embryos, roots and shoots) and bisulfite treated DNA from two root samples