Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Carrie Davis mailto:davisc@cshl.edu (experimental), Roderic Guigo mailto:rguigo@imim.es and lab (data processing) and Tom Gingeras mailto:gingeras@cshl.edu (primary investigator)). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). These tracks were generated by the ENCODE Consortia. They contain information about mouse RNAs > 200 nucleotides in length obtained as short reads off the Illumina platform. Data are available from biological replicates. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Tissue Samples: Individual tissues were harvested from mouse strain C57BL/6NJ at different timepoints according to ENCODE cell culture protocols. Whenever possible biological replicates from litermates. Library Preparation: The published cDNA sequencing protocol was used. This protocol generates directional libraries and reports the transcripts' strand of origin. Exogenous RNA spike-ins were added to each endogenous RNA isolate and carried through library construction and sequencing. The spike-in sequence and the concentrations are available for download in the supplemental directory. Sequencing and Mapping: The libraries were sequenced on the Illumina platform (either GAIIx or Hi-Seq) in mate-pair fashion (either pair-end 76 or pair-end 101) to an average depth of 100 million mate-pairs. The data were mapped against hg19 using Spliced Transcript Alignment and Reconstruction (STAR) written by Alex Dobin (CSHL). More information about STAR, including the parameters used for these data, is available from the Gingeras lab. Verification: FPKM (fragments per kilobase of exon per million fragments mapped) values were calculated for annotated exons and Spearman correlation coefficients were computed. In general, Rho values are > .90 between biological replicates.
Project description:A High Density Rice Array (HDRA) was developed as an Affymetrix Custom GeneChip Array by the McCouch Rice Lab at Cornell University. The HDRA assays 700,000 SNPs, or approximately one SNP every 0.54 Kb across the rice genome (genome size = 380 Mb). It was designed to capture most of the haplotype variation observed in a discovery panel consisting of 16M SNPs (generated by sequencing 125 rice genomes at ~7X genome coverage) and to maximize the inclusion of non-synonymous SNPs. Six probes per SNP target were designed as 3 A-allele and 3 B-allele probes at offsets from center ranging from -6 to +6. A small fraction of SNPs have only 4 probes (2-A, 2-B). For all SNPs, the “A” allele is the reference allele (Os-Nipponbare-Reference-IRGSP-1.0 assembly). Additionally, we designed 23,656 x 25-bp probes complimentary to invariant regions of the genome that were used to normalize systematic differences between samples. An estimated 45% of HDRA SNPs map within genes, hitting all 39,045 unique, non-TE rice gene models (MSUv7 rice genome annotation, GFF3 file, Feb. 7, 2012, http://rice.plantbiology.msu.edu/), while 55% of SNPs map to intergenic regions. Non-synonymous are found in 91% of unique, non-TE gene models, and 57% of genic SNPs are distributed within exons, 36% within introns, 5% within 5’ UTRs and 2% within 3’ UTRs. Of the intergenic SNPs, 40% are located in putative regulatory regions within 2 Kb of a transcriptional start site.