Project description:Whole exome sequencing of 5 HCLc tumor-germline pairs. Genomic DNA from HCLc tumor cells and T-cells for germline was used. Whole exome enrichment was performed with either Agilent SureSelect (50Mb, samples S3G/T, S5G/T, S9G/T) or Roche Nimblegen (44.1Mb, samples S4G/T and S6G/T). The resulting exome libraries were sequenced on the Illumina HiSeq platform with paired-end 100bp reads to an average depth of 120-134x. Bam files were generated using NovoalignMPI (v3.0) to align the raw fastq files to the reference genome sequence (hg19) and picard tools (v1.34) to flag duplicate reads (optical or pcr), unmapped reads, reads mapping to more than one location, and reads failing vendor QC.
Project description:We use nucleosome maps obtained by high-throughput sequencing to study sequence specificity of intrinsic histone-DNA interactions. In contrast with previous approaches, we employ an analogy between a classical one-dimensional fluid of finite-size particles in an arbitrary external potential and arrays of DNA-bound histone octamers. We derive an analytical solution to infer free energies of nucleosome formation directly from nucleosome occupancies measured in high-throughput experiments. The sequence-specific part of free energies is then captured by fitting them to a sum of energies assigned to individual nucleotide motifs. We have developed hierarchical models of increasing complexity and spatial resolution, establishing that nucleosome occupancies can be explained by systematic differences in mono- and dinucleotide content between nucleosomal and linker DNA sequences, with periodic dinucleotide distributions and longer sequence motifs playing a secondary role. Furthermore, similar sequence signatures are exhibited by control experiments in which genomic DNA is either sonicated or digested with micrococcal nuclease in the absence of nucleosomes, making it possible that current predictions based on highthroughput nucleosome positioning maps are biased by experimental artifacts. Included are raw (eland) and mapped (wig) reads. The mapped reads are provided in eland and wiggle formats, and the raw reads are included in the eland file. This series includes only Mnase control data. The sonicated control is part of this already published accession, as is a in vitro nucleosome map: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15188 We also studied data (in vitro and in vivo maps as well as a model) from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13622 and from: http://www.ncbi.nlm.nih.gov/sra/?term=SRA001023
Project description:Purpose: In order to understand the functional significance of sperm transcriptome in stallion fertility, the aim of this study was to generate a detailed body of knowledge about the sperm RNA profile that defines a normal fertile stallion. Methods: The 50 bp single-end ABI SOLiD raw reads were directly aligned with the horse reference sequence EcuCab2 using ABI aligner software (NovoalignCS version 1.00.09, novocraft.com) which uses multiple indexes in the reference genome, identifies candidate alignment locations for each primary read, and allows completion of the alignment. Results: Next generation sequencing (NGS) of total RNA from the sperm of two reproductively normal stallions generated about 70 million raw reads and more than 3 Gb of sequence per sample; over half of these aligned with the EcuCab2 reference genome. Altogether, 19,257 sequence tags with average coverage ≥1 (normalized number of transcripts) were mapped in the horse genome. Conclusion: The sequence of stallion sperm transcriptome is an important foundation for the discovery of transcripts of known and novel genes, and non-coding RNAs, thus improving the annotation of the horse genome sequence draft and providing markers for evaluating stallion fertility. Reproductively fertile Stallion sperm transcriptome as revealed by RNA sequencing
Project description:Purpose: Here we describe the modulation of a gene expression program involved in cell fate. Methods: We depleted U2AF1 in human induced pluripotent stem cells (hiPSCs) to the level found in differentiated cells using an inducible shRNA system, followed by high-throughput RNAseq, revealing a gene expression program involved in cell fate determination. Results: Approximately 85% of the total raw reads were mapped to the human genome sequence (GRCh37), giving an average of 200 million human reads per sample for total RNA and 15 million human reads per sample for small RNA libraries. Conclusions: Our results show that transcriptional control of gene expression in hiPSCs can be set by the CSF U2AF1, establishing a direct link between transcription and AS during cell fate determination. Overall design: hiPSCs were differentiated into the three germ layers following the described protocol in the study (Gifford et al., 2013).
Project description:Purpose: The goals of this study are to identify the putative mRNA targets that are regulated by the 6C sRNA. We constuct an inducible vector to transiently overexpressed the 6C sRNA in M. smegmatis, and then we perform RNA-Seq to look for genes that are differicently expressed upon the over-expression of 6C sRNA, which we think these genes are the potential targets of the 6C sRNA. Overall design: Methods: Purified RNA was used to construct cDNA library according to the TruSeq Stranded RNA LT Guide from Illumina. High-throughput sequencing was carried out on an Illumina HiSeq 2000 system according to the manufacturer's instructions (Illumina HiSeq 2000 User Guide) and 150-bp paired-end reads were obtained. The raw reads were filtered by Seqtk and then mapped to the M. smegmatis MC2 155 strain reference sequence (GenBank NC_008596) using Bowtie2 (version: 2-2.0.5). Counting of reads per gene was performed using HTSeq followed by TMM (trimmed mean of M-values) normalization. Differentially expressed genes were defined as those with a false discovery rate < 0.05 and fold-change >2 using the edgeR software.
Project description:Chromatin accessibility captures the binding status of protein factors to chromosomes in vivo, and has been considered a highly informative proxy for functional protein-DNA interactions. Existing DNase I and Tn5 transposase based assays generally require tens of thousands to millions of fresh cells. Applying Tn5 tagmentation to single cells yields very sparse maps. Here we present a transposome hypersensitive sites sequencing assay (THS-seq) for highly sensitive characterizations of chromatin accessibility. Validation of THS-seq method, and comparison of DNase-seq, ATAC-seq and THS-seq methods for quantitation of chromatin accessibility. GSE47753, GSM1155957 GM12878_ATACseq_50k_Rep1, data downsampled to 8,351,125 reads for analysis: pub_SRR891268_ATAC-seq_50k_cells_Rep1_downsampled_dfilter_peaks.bed.gz GSE47753, GSM1155958 GM12878_ATACseq_50k_Rep2, data downsampled to 8,351,125 reads for analysis: pub_SRR891269_ATAC-seq_50k_cells_Rep2_downsampled_dfilter_peaks.bed.gz GSE47753, GSM1155959 GM12878_ATACseq_50k_Rep3, data downsampled to 8,351,125 reads for analysis: pub_SRR891270_ATAC-seq_50k_cells_Rep3_downsampled_dfilter_peaks.bed.gz GSE47753, GSM1155960 GM12878_ATACseq_50k_Rep4, data downsampled to 8,351,125 reads for analysis: pub_SRR891271_ATAC-seq_50k_cells_Rep4_downsampled_dfilter_peaks.bed.gz GSE47753, GSM1155961 GM12878_ATACseq_500_Rep1, data downsampled to 8,351,125 reads for analysis: pub_SRR891272_ATAC-seq_500_cells_Rep1_downsampled_dfilter_peaks.bed.gz GSE47753, GSM1155962 GM12878_ATACseq_500_Rep2, data downsampled to 8,351,125 reads for analysis: pub_SRR891273_ATAC-seq_500_cells_Rep2_downsampled_dfilter_peaks.bed.gz raw data files were merged, alignment with bowtie 1.3 using parameters, bowtie -n1 -k1 --best --chunkmbs 10240 --strata -l32 -m1 -p4 --nomaqround --sam, followed by clonal read removal for the final data file for further analysis. GSM816665, raw data obtained from UCSC genome browser, https://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeOpenChromDnase: wgEncodeOpenChromDnaseGm12878RawData_merged_unique_dfilter_peaks.bed.gz raw data files were merged, alignment with bowtie 1.3 using parameters, bowtie -n1 -k1 --best --chunkmbs 10240 --strata -l32 -m1 -p4 --nomaqround --sam, followed by clonal read removal for the final data file for further analysis. GSM864360, raw data obtained from UCSC genome browser, https://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeOpenChromFaire: wgEncodeOpenChromFaireGm12878RawData_merged_unique_dfilter_peaks.bed.gz raw data files were merged, alignment with bowtie 1.3 using parameters, bowtie -n1 -k1 --best --chunkmbs 10240 --strata -l32 -m1 -p4 --nomaqround --sam, followed by clonal read removal for the final data file for further analysis. GSM736496, GSM736620, raw data obtained from UCSC genome browser, https://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeUwDnase: wgEncodeUwDnaseGm12878RawData_merged_unique_dfilter_peaks.bed.gz