Genomics

Dataset Information

0

Ena-DATASET-EBI-TEST-16-05-2017-16:52:32:693-547 - samples


ABSTRACT: This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information. This dataset consists of 2508 samples from the 1000 Genomes Project (https://www.nature.com/articles/nature15393). Samples' (e.g. NA18534) data can be accessed through the IGSR portal (e.g. https://www.internationalgenome.org/data-portal/sample/NA18534) or their corresponding folder at the 1000 Genomes' FTP site (e.g. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHB/NA18534/exome_alignment/). There are several different types of data this dataset encompasses: Variant Calling Format (VCF, or its binary counterparts BCF) files, both joint (e.g. ALL_chr22_20130502_2504Individuals.vcf.gz) and split (HG01775.chrY.vcf.gz); exome sequencing CRAM files (e.g. NA18534.GRCh38DH.exome.cram); whole genome sequencing CRAM/BAM files (e.g. NA19239.cram). Additionally, there are multiple files that were sliced to create shorter files, which allows for a quick download, formated as "{FILE-INFO}__{NUMBER-OF-READS}r__{CHR}.{START-COORDINATE}-{END-COORDINATE}.{FILETYPE}" (e.g. "HG01500.GRCh38DH__90r__3.10000-10500__4.10000-10500.cram"). These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client). For any further questions, please contact the DAC (Helpdesk - email: helpdesk [at] ega-archive [dot] org).

PROVIDER: EGAD00001003338 | EGA |

REPOSITORIES: EGA

Similar Datasets

| EGAD00001009826 | EGA
2017-12-01 | GSE107558 | GEO
2017-12-01 | GSE107559 | GEO
2017-06-24 | E-MTAB-5797 | biostudies-arrayexpress
| phs001854 | dbGaP
2014-04-17 | E-GEUV-6 | biostudies-arrayexpress
2017-10-16 | GSE104687 | GEO
2013-10-17 | E-MTAB-1885 | biostudies-arrayexpress
2020-07-21 | GSE150861 | GEO
2013-10-17 | E-MTAB-1884 | biostudies-arrayexpress