Project description:Many new alternative splice forms have been detected at the transcript level using next generation sequencing (NGS) methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of NGS, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Results: Eighty million paired-end Illumina reads and ~500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6,810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Jurkat T-cell mRNA was analyzed on an Illumina HiSeq2000. ~80 million paired end reads (2x200bp, ~350bp lengths) were collected.
Project description:Many new alternative splice forms have been detected at the transcript level using next generation sequencing (NGS) methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of NGS, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Results: Eighty million paired-end Illumina reads and ~500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6,810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching.
Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ?1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility. Reproductively fertile Stallion sperm transcriptome as revealed by RNA sequencing
Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ≥1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility.
Project description:The goals of this study are to use Next-generation sequencing (NGS)to detect bacterial mRNA profiles of Pseudomonas aeruginosa PAO1 in response to 0, 1, 20 and 25 mg/L AgNPs or 0, 1,30 and 300 mg/L AgNRs for 2 h, using Illumina HiSeq 2500.The NGS QC toolkit (version 2.3.3) was used to treat the raw sequence reads to trim the 3’-end residual adaptors and primers, and the ambiguous characters in the reads were removed. Then, the sequence reads consisting of at least 85% bases were progressively trimmed at the 3’-ends until a quality value ≥ 20 were kept. Downstream analyses were performed using the generated clean reads of no shorter than 75 bp. The clean reads of each sample were aligned to the E. coli reference genome (NC_000913) using SeqAlto (version 0.5). Cufflinks (version 2.2.1) was used to calculate the strand-specific coverage for each gene, and to analyze the differential expression in triplicate bacterial cell cultures. The statistical analyses and visualization were conducted using CummeRbund package in R (http://compbio.mit.edu/cummeRbund/). Gene expression was calculated as fragments per kilobase of a gene per million mapped reads (FPKM, a normalized value generated from the frequency of detection and the length of a given gene.
Project description:The goals of this study are to use Next-generation sequencing (NGS)to detect bacterial mRNA profiles of original E. coli K-12 MG1655 and fluoxetine induced E. coli mutants in response to 100 mg/L fluoxetine for 8 h, in triplicate, using Illumina HiSeq 2500.The NGS QC toolkit (version 2.3.3) was used to treat the raw sequence reads to trim the 3’-end residual adaptors and primers, and the ambiguous characters in the reads were removed. Then, the sequence reads consisting of at least 85% bases were progressively trimmed at the 3’-ends until a quality value ≥ 20 were kept. Downstream analyses were performed using the generated clean reads of no shorter than 75 bp. The clean reads of each sample were aligned to the E. coli reference genome (NC_000913) using SeqAlto (version 0.5). Cufflinks (version 2.2.1) was used to calculate the strand-specific coverage for each gene, and to analyze the differential expression in triplicate bacterial cell cultures. The statistical analyses and visualization were conducted using CummeRbund package in R (http://compbio.mit.edu/cummeRbund/). Gene expression was calculated as fragments per kilobase of a gene per million mapped reads (FPKM, a normalized value generated from the frequency of detection and the length of a given gene.
Project description:The goals of this study are to use Next-generation sequencing (NGS) to detect bacterial mRNA profiles of wild-type E. coli K-12 MG1655 and triclosan induced E. coli mutants in response to 0.2 mg/L triclosan for 8 h, in triplicate, using Illumina HiSeq 2500.The NGS QC toolkit (version 2.3.3) was used to treat the raw sequence reads to trim the 3’-end residual adaptors and primers, and the ambiguous characters in the reads were removed. Then, the sequence reads consisting of at least 85% bases were progressively trimmed at the 3’-ends until a quality value ≥ 20 were kept. Downstream analyses were performed using the generated clean reads of no shorter than 75 bp. The clean reads of each sample were aligned to the E. coli reference genome (NC_000913) using SeqAlto (version 0.5). Cufflinks (version 2.2.1) was used to calculate the strand-specific coverage for each gene, and to analyze the differential expression in triplicate bacterial cell cultures. The statistical analyses and visualization were conducted using CummeRbund package in R (http://compbio.mit.edu/cummeRbund/). Gene expression was calculated as fragments per kilobase of a gene per million mapped reads (FPKM, a normalized value generated from the frequency of detection and the length of a given gene.
Project description:The goals of this study are to use Next-generation sequencing (NGS)to detect bacterial mRNA profiles of E.coli DH5a in response to 0, 20ug/L and 2 mg/L triclosan for 2 h, in triplicate, using Illumina HiSeq 2500.The NGS QC toolkit (version 2.3.3) was used to treat the raw sequence reads to trim the 3’-end residual adaptors and primers, and the ambiguous characters in the reads were removed. Then, the sequence reads consisting of at least 85% bases were progressively trimmed at the 3’-ends until a quality value ≥ 20 were kept. Downstream analyses were performed using the generated clean reads of no shorter than 75 bp. The clean reads of each sample were aligned to the E. coli reference genome (NC_000913) using SeqAlto (version 0.5). Cufflinks (version 2.2.1) was used to calculate the strand-specific coverage for each gene, and to analyze the differential expression in triplicate bacterial cell cultures. The statistical analyses and visualization were conducted using CummeRbund package in R (http://compbio.mit.edu/cummeRbund/). Gene expression was calculated as fragments per kilobase of a gene per million mapped reads (FPKM, a normalized value generated from the frequency of detection and the length of a given gene.