Project description:Over 2000 publicly accessible human and mouse ChIP-Seq datasets for about 250 Transcription Factors and chromatin complexes from various databases (ENCODE, GEO) were mapped to custom-made human and mouse genomes containing a reference rDNA sequence of the appropriate species (Genbank U13369.1 for human, BK000964.3 for mouse). The read mapping density across the rDNA sequence was then extracted and normalized to the median in that dataset. Unbiased clustering and analysis, followed by curation, was performed to identify high-confidence patterns of rDNA occupancy for numerous hematopoietic TFs and TF families at canonical TF motif sequences. ************************ Data processing steps: FASTQs were trimmed using Trimmomatic with the following parameters: LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:30 Reads were mapped to customized genomes (containing additional rDNA sequence) using Bowtie2 using the following parameter: -X 2000 Read density across the rDNA sequence was extracted using igvtools ************************
Project description:We collected (Illumina) RNA-seq data (polyadenylated RNA fraction) for a number of tissue samples from common marmoset and elephant. We developed a subtraction approach based on male/female RNA-seq data, Illumina genomic data and available genomes to identify and assemble Y transcripts. For marmoset samples, we added Y coding genes and noncoding sequences to the reference genomes in order to assess their expression levels. We then mapped all RNA-seq reads with TopHat 1.4.0 and used Cufflinks 2.0.0 (all mapped reads, embedded multi-read and fragment bias correction) to calculate the FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values for all genes in the genomes with our refined annotations. Sequence and expression levels of reconstructed Y-linked genes
Project description:The source of most errors in RNA sequencing (RNA-seq) read alignment is in the repetitive structure of the genome and not with the alignment algorithm. Genetic variation away from the reference sequence exacerbates this problem causing reads to be assigned to the wrong location. We developed a method, implemented as the software package Seqnature, to construct the imputed genomes of individuals (individualized genomes) of experimental model organisms including inbred mouse strains and genetically unique outbred animals. Alignment to individualized genomes increases read mapping accuracy and improves transcript abundance estimates. In an application to expression QTL mapping, this approach corrected erroneous linkages and unmasked thousands of hidden associations. Individualized genomes accounting for genetic variation will be useful for human short-read sequencing and other sequencing applications including ChIP-seq.
Project description:Purpose: The goal of this study is to compare exosomal small RNA transcriptome of HCT116 cells to identify the target of PRDX3 under basal or knock down conditions by utilizing miRNA-seq. Methods: miRNA profilies of siCTL or siPRDX3 transfected HCT 116 exosoems were generated by illumina sequencing method, in triplicate. After sequencing, the raw sequence reads are filtered based on quality.Sequence reads were mapped with the bowtie2 software tool, which yielded bam files. Mature miRNA sequences were used as references for mapping. Read counts mapped to a mature miRNA sequence were extracted from the alignment file using bedtools v2.25.0 and Bioconductor, which use the R statistical programming language. Read counts were used to determine the expression level of miRNAs. The CPM+TMM normalization method was used for between-sample comparison. Results: We identified known miRNA in species (miRDeep2) in the HCT116 exosome transfected with siCTL or siPRDX3. The expression profile of mature miRNA is used to analyze differentially expressed miRNA(DE miRNA). Conclusions: Our study represents the first analysis of HCT116 exosomal miRNA profiles affected by PRDX3 knockdown with biologic replicates.
Project description:Total RNA extracted from prostate cancer LNCaP cells transfected with siRNA against CTCF(siCTCF), or negative control siRNA (si-)were processed, and sequenced by two different companies using Illumina Hi-seq 2000 platform to generate RNA sequencing with two output sequences: paired-end 50bp and 101bp in read length. Nearly 100 million and 50 million raw reads were yielded from each sample respectively. We used FastQC to confirm the quality of raw fastq sequencing data, and SOAPfuse software to detect fusion transcripts.
Project description:HDMYZ cells were treated with 2ug/ml ActD for 0, 4 and 12 hours. Small RNAs of 15-40 bases were gel-purified from 10 ug total RNA, and subjected to multiplex Illumina small RNA library preparation. Small RNA libraries were sequenced on a HiSeq2000 (Illumina) with 3 samples per lane. To quantify miRNA and isoform abundance, sequence reads were processed by the miRDeep2 package, with the following modifications. First, to remove adaptor sequence, we removed both the main adaptor sequence present in the sequencing reads, as well as the second most abundant adaptor variant. In addition, we did not restrict the size of small RNAs during adaptor removal. Second, we used miRBase v18 for mapping the reads. Third, for quantifying miRNA and isoform frequency, we limited reads to more or equal to 15 bases in length with zero mis-match during mapping. The number of reads that were mapped to known miRNAs was used to normalize read frequencies for each miRNA or each miRNA isoform. For quantification purposes, we only considered miRNAs or isoforms that had frequency >= 1x10e-6 in samples without ActD treatment, which correspond to ~21-30 reads in raw count. These miRNAs or isoforms were referred to as reliably quantifiable.To analyze mapping to the genome, we removed reads that mapped to miRNA precursors. The rest of the reads were then mapped to the genome with Bowtie.
Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ≥1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility.
Project description:In this study, we sequenced small RNA content from three different rice cultivars employing Illumina technology. More than 15 million reads were generated using Illumina high-throughput sequencing platform. After pre-processing, distinct small RNA sequences were identified for each rice cultivars. We collected seedlings of different rice cultivars and total RNA isolated was subjected to Illumina sequencing. The sequenced data was further filtered using NGS QC Toolkit to obtain high-quality reads. The filtered reads were pre-processed using modified perl script provided in the miRTools software. After quality control, the identical reads were collapsed into a unique read and read count for each sequence was recorded. All the filtered unique reads from each sample were mapped on the rice genome to find their location.
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.