Project description:The genetic structure of the indigenous hunter-gatherer peoples of Southern Africa, the oldest known lineage of modern man, holds an important key to understanding humanity's early history. Previously sequenced human genomes have been limited to recently diverged populations. Here we present the first complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and of a Bantu from Southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, and 13,146 novel amino-acid variants. These data allow genetic relationships among Southern African foragers and neighboring agriculturalists to be traced more accurately than was previously possible. Adding the described variants to current databases will facilitate inclusion of Southern Africans in medical research efforts.
Project description:The genetic structure of the indigenous hunter-gatherer peoples of Southern Africa, the oldest known lineage of modern man, holds an important key to understanding humanity's early history. Previously sequenced human genomes have been limited to recently diverged populations. Here we present the first complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and of a Bantu from Southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, and 13,146 novel amino-acid variants. These data allow genetic relationships among Southern African foragers and neighboring agriculturalists to be traced more accurately than was previously possible. Adding the described variants to current databases will facilitate inclusion of Southern Africans in medical research efforts. Copy number differences between NA18507 and KB1 were predicted from the depth of whole-genome shotgun sequence reads. These predictions were then validated using array-CGH using a a genome-wide design as well as a custom design targeted at specific regions of copy number difference
Project description:We present a single-base-resolution sequencing methodology that will simultaneously sequence complete genetics and complete epigenetics in a single workflow. The approach is non-destructive to DNA and provides a digital readout of bases, which we exemplify by simultaneous sequencing of G, C, T, A, and 5mC/5hmC (5-letter sequencing) or 5mC and 5hmC (6-Letter sequencing). We demonstrate sequencing of human genomic DNA and also cell-free DNA taken from a blood sample of a cancer patient. The approach is accurate, requires low DNA input and has a simple workflow and analysis pipeline. We envisage it will be versatile across many applications in life sciences.
Project description:We developed an accurate taxonomic annotation strategy from metagenomic data for deep metaproteomic coverage, and also compared the performance of the state-of-the-art LC-MS/MS techniques using a simulated microbial community with 12 species. In addition, we also achieved deep proteome coverage of human gut microbiome from stool samples.
Project description:Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites correspond-ing to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNa-seI cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell-types or tissues under diverse conditions. DNaseI-Seq on two YRI Hapmap cell lines. Each individual sequenced on 8 lanes of the Illumina Genome Analyzer II