Project description:<p><b>CREST</b></p> <p>The accurate identification of structural variations using whole-genome DNA sequencing data generated by next-generation sequencing technology is extremely difficult. To address this challenge, we have developed CREST, an algorithm that uses sequencing reads with partial alignments to the reference human genome (so-called soft-clipped reads) to directly map the breakpoints of somatic structural variations. We applied CREST to paired tumor/normal whole genome sequencing data from five cases of T-lineage acute lymphoblastic leukemia (T-ALL). A total of 110 somatic structural variants were identified, >80% of which were validated by genomic PCR and Sanger sequencing. The validated structural variants included 31 inter-chromosomal translocations, 19 intra-chromosomal translocations, one inversion, 22 deletions and 16 insertions. A comparison of the results generated with CREST to those obtained using the traditional paired-end discordant mapping methods demonstrate CREST to have a much higher sensitivity and specificity. In addition, application of CREST to publicly available whole-genome sequencing data from the human melanoma cancer cell line COLO-829 demonstrated the identification of 50 novel structural variations not detected using the standard methods, 20 of which were selected for validation with a 90% success rate. These data demonstrate that direct mapping of soft-clipped reads offers an improved method for detecting structural variants at the nucleotide level of resolution.</p> <p><b>T-ALL</b></p> <p>Early T-cell precursor acute lymphoblastic leukaemia (ETP ALL) is an aggressive malignancy of unknown genetic basis. We performed whole-genome sequencing of 12 ETP ALL cases and assessed the frequency of the identified somatic mutations in 94 T-cell acute lymphoblastic leukaemia cases. ETP ALL was characterized by activating mutations in genes regulating cytokine receptor and RAS signalling (67% of cases; <i>NRAS</i>, <i>KRAS</i>, <i>FLT3</i>, <i>IL7R</i>, <i>JAK3</i>, <i>JAK1</i>, <i>SH2B3</i> and <i>BRAF</i>), inactivating lesions disrupting haematopoietic development (58%; <i>GATA3</i>, <i>ETV6</i>, <i>RUNX1</i>, <i>IKZF1</i> and <i>EP300</i>) and histone-modifying genes (48%; <i>EZH2</i>, <i>EED</i>, <i>SUZ12</i>, <i>SETD2</i> and <i>EP300</i>). We also identified new targets of recurrent mutation including <i>DNM2</i>, <i>ECT2L</i> and <i>RELN</i>. The mutational spectrum is similar to myeloid tumours, and moreover, the global transcriptional profile of ETP ALL was similar to that of normal and myeloid leukaemia haematopoietic stem cells. These findings suggest that addition of myeloid-directed therapies might improve the poor outcome of ETP ALL.</p>
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:ruanyj@gis.a-star.edu.sg). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.
Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the mouse ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. The DNase samples were sequenced using next-generation sequencing machines to significantly higher depths of 300-fold or greater. This produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI "double-hit" fragments (Sabo et al., 2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the NCBI37/mm9 mouse genome using Bowtie 0.12.5; only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Raw Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm (Sabo et al., 2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%), were designated for deep sequencing to a depth of over 200 million tags. Results were validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting methods.
Project description:Purpose: To identify the genetic basis of posterior polymorphous corneal dystrophy 1 (PPCD1). Methods: Next-generation sequencing was performed on DNA samples from 4 affected and 4 unaffected members of a previously reported family with PPCD1 linked to chromosome 20 between D20S182 and D20S195. Custom capture probes were utilized for targeted region capture of the linked interval. Single nucleotide variants (SNVs) and insertions/deletions (indels) were identified using two bioinformatics pipelines and two annotation databases. Candidate variants met the following criteria: quality score â¥20, read depth â¥5X, heterozygous, novel or rare (minor allele frequency (MAF) ⤠0.05), present in each affected individual and absent in each unaffected individual. Structural variants were detected with two different microarray platforms to identify indels of varying sizes. Results: Sequencing reads aligned to the linked region on chromosome 20, and high coverage was obtained across the sequenced region. The majority of identified variants were detected with both pipelines and annotation databases, although unique variants were identified. Twelve SNVs in 10 genes (2 synonymous variants and 10 noncoding variants) and 9 indels in 7 genes met the filtering criteria and were considered candidate variants for PPCD1. Conclusions: Next-generation sequencing of the PPCD1 interval has identified 17 genes containing novel or rare SNVs and indels that segregate with the affected phenotype in an affected family previously mapped to the PPCD1 locus. We anticipate that screening of these candidate genes in other families previously mapped to the PPCD1 locus will result in the identification of the genetic basis of PPCD1. Four affected and 4 unaffected individuals from a single family were analyzed for copy number variation within the PPCD1 disease locus. Array design and analysis is based on genome build hg19.
Project description:<p><b>CREST</b></p> <p>The accurate identification of structural variations using whole-genome DNA sequencing data generated by next-generation sequencing technology is extremely difficult. To address this challenge, we have developed CREST, an algorithm that uses sequencing reads with partial alignments to the reference human genome (so-called soft-clipped reads) to directly map the breakpoints of somatic structural variations. We applied CREST to paired tumor/normal whole genome sequencing data from five cases of T-lineage acute lymphoblastic leukemia (T-ALL). A total of 110 somatic structural variants were identified, >80% of which were validated by genomic PCR and Sanger sequencing. The validated structural variants included 31 inter-chromosomal translocations, 19 intra-chromosomal translocations, one inversion, 22 deletions and 16 insertions. A comparison of the results generated with CREST to those obtained using the traditional paired-end discordant mapping methods demonstrate CREST to have a much higher sensitivity and specificity. In addition, application of CREST to publicly available whole-genome sequencing data from the human melanoma cancer cell line COLO-829 demonstrated the identification of 50 novel structural variations not detected using the standard methods, 20 of which were selected for validation with a 90% success rate. These data demonstrate that direct mapping of soft-clipped reads offers an improved method for detecting structural variants at the nucleotide level of resolution.</p> <p><b>T-ALL</b></p> <p>Early T-cell precursor acute lymphoblastic leukaemia (ETP ALL) is an aggressive malignancy of unknown genetic basis. We performed whole-genome sequencing of 12 ETP ALL cases and assessed the frequency of the identified somatic mutations in 94 T-cell acute lymphoblastic leukaemia cases. ETP ALL was characterized by activating mutations in genes regulating cytokine receptor and RAS signalling (67% of cases; <i>NRAS</i>, <i>KRAS</i>, <i>FLT3</i>, <i>IL7R</i>, <i>JAK3</i>, <i>JAK1</i>, <i>SH2B3</i> and <i>BRAF</i>), inactivating lesions disrupting haematopoietic development (58%; <i>GATA3</i>, <i>ETV6</i>, <i>RUNX1</i>, <i>IKZF1</i> and <i>EP300</i>) and histone-modifying genes (48%; <i>EZH2</i>, <i>EED</i>, <i>SUZ12</i>, <i>SETD2</i> and <i>EP300</i>). We also identified new targets of recurrent mutation including <i>DNM2</i>, <i>ECT2L</i> and <i>RELN</i>. The mutational spectrum is similar to myeloid tumours, and moreover, the global transcriptional profile of ETP ALL was similar to that of normal and myeloid leukaemia haematopoietic stem cells. These findings suggest that addition of myeloid-directed therapies might improve the poor outcome of ETP ALL.</p>
Project description:We generated genome-wide H3K9me3-state maps of DP thymocytes purified from ESET+/+ and ESET-/- mice by using next generation sequencing.