Project description:This dataset comprises 2570 whole genome sequenced samples from the Medical Genome Reference Bank.
https://sgc.garvan.org.au/initiatives/mgrb
The files are provided in cram format, aligned to hs37d5 with decoys, with no further processing applied.
The dataset also contains phenotype information for each sample.
Project description:This dataset comprises 1440 whole genome sequenced samples from the Medical Genome Reference Bank.
https://sgc.garvan.org.au/initiatives/mgrb
The files are provided in cram format, aligned to hs37d5 with decoys, with no further processing applied.
The dataset also contains phenotype information for each sample.
Project description:This study used the NimbleGen dog whole genome CGH 2.1M tiling array to assay copy number variants in the dog genome in multiple breeds and wolf. 53 samples of genomic DNA were hybridized to a reference sample. The dataset comprises 2 samples from each of 15 dog breeds, 10 samples from each of 2 dog breeds and 3 samples from gray wolf.
Project description:Consider the problem of designing a panel of complex biomarkers to predict a patient's health or disease state when one can pair his or her current test sample, called a target sample, with the patient's previously acquired healthy sample, called a reference sample. As contrasted to a population averaged reference, this reference sample is individualized. Automated predictor algorithms that compare and contrast the paired samples to each other could result in a new generation of test panels that compare to a person's healthy reference to enhance predictive accuracy. This study develops such an individualized predictor and illustrates the added value of including the healthy reference for design of predictive gene expression panels. The objective is to predict each subject's state of infection, e.g., neither exposed nor infected, exposed but not infected, pre-acute phase of infection, acute phase of infection, post-acute phase of infection. Using gene microarray data collected in a large-scale serially sampled respiratory virus challenge study, we quantify the diagnostic advantage of pairing a person's baseline reference with his or her target sample.
Project description:<p>Stanford contributed samples to the PAGE study that can act as a population reference dataset across the globe. Therefore this dataset includes reference individuals, without phenotypes, chosen to help infer ancestry that will help us understand the diverse samples available in PAGE. The complete dataset comprises individuals of European, African, Asian, Oceanian, and Native American descent, from a total of over 50 populations. A subset of these individuals from Puno, Peru and Easter Island (Rapa Nui), Chile, are included in the PAGE samples that were whole genome sequenced in 2015. Additional details are available in the Study Acknowledgments.</p> <p>The Global Reference Panel comprises 6 sample sets: <ul><li>A population sample of Andean individuals primarily of Quechuan/Aymaran ancestry from Puno, Peru</li> <li>A population sample of Easter Island (Rapa Nui), Chile</li> <li>Individuals of indigenous origin from Oaxaca, Mexico</li> <li>Individuals of indigenous origin from Honduras</li> <li>Individuals of indigenous origin from Colombia</li> <li>Individuals of indigenous origin from the Nama and Khomani KhoeSan populations of the Northern Cape, South Africa</li></ul> </p> <p>In addition, we genotyped publicly available samples that will be hosted on the Bustamante lab website (<a href="https://bustamantelab.stanford.edu/">https://bustamantelab.stanford.edu/</a>). These comprise large public datasets to provide an open reference dataset for the world:</p> <p> <ul><li>The additional related individuals from the Americas in the Human Genome Diversity Panel (H952) plus all additional samples from the Americas</li> <li>A subset of the unrelated individuals from the Maasai in Kinyawa, Kenya (MKK) dataset from the International Hapmap Project hosted at Coriell</li></ul> </p> <p>Additional samples will be available for restricted use with a data access agreement with the Bustamante Lab.</p> <p>This study is part of the Population Architecture using Genomics and Epidemiology (PAGE) study (<a href="./study.cgi?study_id=phs000356">phs000356</a>). </p>
Project description:Despite the widespread adoption of ChIP-seq there is still no consensus on quality assessment metrics. No single published metric can reliably discriminate the success or failure of an experiment, thus hampering objectivity and reproducibility of quality control. We introduce a new framework for ChIP-seq data quality assessment that overcomes the limitation of previous solutions. Our tool called "ChIC" incorporates a novel set of quality control metrics integrated into one single score summarizing the sample quality and a reference compendium with thousands of published ChIP-seq samples, for easier evaluation of new data. This test dataset contain an example of succesfull and non-succesfull ChIP-seq sample for mouse H3K27me3.
Project description:DNA was isolated from whole red blood cells from various lines and crosses of broiler chickens. DNA was genotyped using Axiom genome-wide chicken array and cel files were analyzed using Axiom Analysis Suite Software (version 3.0.1) with Gallus gallus 5.0 using the software's Best Practices for agricultural animals. The results were exported (Genotyping_Data-3-21-2018.vcf) for all genotype calls and text file of all SNPs with >= 97% call rate rate was also produced for filtering the VCF file (ALL_SNPSs_with_Call_Rate_97_Plus_3-21-2018).
Project description:This dataset contains the genotypes jointly-called from whole genome sequencing data of 177 self-reported Peranakans in Singapore. Reads were aligned to GRCh37 reference genome and jointly-called with other WGS samples. Basic quality control measures and population phasing without reference were performed on the called genotypes. The data are stored in VCF v 4.3 format, and one .vcf.gz file stored the genotypes from one of the 23 chromosomes (22 autosomes+X chromosome).