Project description:This experiment contains a subset of data from the BLUEPRINT Epigenome project ( http://www.blueprint-epigenome.eu ), which aims at producing a reference haemopoetic epigenomes for the research community. 74 samples of primary cells or cultured primary cells of different haemopoeitc lineages from cord blood, venous blood, bone marrow and thymus are included in this experiment. This ArrayExpress record contains only meta-data. Raw data files have been archived at the European Genome-Phenome Archive (EGA, www.ebi.ac.uk/ega) by the consortium, with restricted access to protect sample donors' identity. There are 32 EGA data set accessions, which can be found under the Comment[EGA_DATA_SET] column in the 'Sample Data Relationship Format' (SDRF) file of this ArrayExpress record (http://www.ebi.ac.uk/arrayexpress/files/E-MTAB-3827/E-MTAB-3827.sdrf.txt). Details on how to apply for data access via the BLUEPRINT data access committee are on the EGA data set pages. Likewise, mapping of samples to these EGA accessions can be found in the SDRF file. Please note that the raw data files for 11 sequencing runs have yet been deposited at EGA, so they are marked with \\ot available\\ under the Comment[SUBMITTED_FILE_NAME] field in the SDRF file, and were included for the sake of completeness. Further iInformation on individual samples and sequencing libraries can also be found on the BLUEPRINT data coordination centre (DCC) website: http://dcc.blueprint-epigenome.eu\
Project description:High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
Project description:This experiment contains a subset of data from the BLUEPRINT Epigenome project ( http://www.blueprint-epigenome.eu ), which aims at producing a reference haemopoetic epigenomes for the research community. 4 samples of primary cells from tonsil with cell surface markes CD20med/CD38high in young individuals (3 to 10 years old) are included in this experiment. This ArrayExpress record contains only meta-data. Raw data files have been archived at the European Genome-Phenome Archive (EGA, www.ebi.ac.uk/ega) by the consortium, with restricted access to protect sample donors' identity. The relevant accessions of EGA data sets is EGAD00001001523. Details on how to apply for data access via the BLUEPRINT data access committee are on the EGA data set pages. The mapping of samples to these EGA accessions can be found in the 'Sample Data Relationship Format' file of this ArrayExpress record. Information on individual samples and sequencing libraries can also be found on the BLUEPRINT data coordination centre (DCC) website: http://dcc.blueprint-epigenome.eu
Project description:This experiment contains a subset of data from the BLUEPRINT Epigenome project ( http://www.blueprint-epigenome.eu ), which aims at producing a reference haemopoetic epigenomes for the research community. 29 samples of primary cells or cultured primary cells of different haemopoeitc lineages from cord blood are included in this experiment. This ArrayExpress record contains only meta-data. Raw data files have been archived at the European Genome-Phenome Archive (EGA, www.ebi.ac.uk/ega) by the consortium, with restricted access to protect sample donors' identity. The relevant accessions of EGA data sets is EGAD00001001165. Details on how to apply for data access via the BLUEPRINT data access committee are on the EGA data set pages. The mapping of samples to these EGA accessions can be found in the 'Sample Data Relationship Format' file of this ArrayExpress record. Information on individual samples and sequencing libraries can also be found on the BLUEPRINT data coordination centre (DCC) website: http://dcc.blueprint-epigenome.eu
Project description:This experiment contains a subset of data from the BLUEPRINT Epigenome project ( http://www.blueprint-epigenome.eu ), which aims at producing a reference haemopoetic epigenomes for the research community. 74 samples of primary cells or cultured primary cells of different haemopoeitc lineages from cord blood, venous blood, bone marrow and thymus are included in this experiment. This ArrayExpress record contains only meta-data. Raw data files have been archived at the European Genome-Phenome Archive (EGA, www.ebi.ac.uk/ega) by the consortium, with restricted access to protect sample donors' identity. There are 32 EGA data set accessions, which can be found under the Comment[EGA_DATA_SET] column in the 'Sample Data Relationship Format' (SDRF) file of this ArrayExpress record (http://www.ebi.ac.uk/arrayexpress/files/E-MTAB-3827/E-MTAB-3827.sdrf.txt). Details on how to apply for data access via the BLUEPRINT data access committee are on the EGA data set pages. Likewise, mapping of samples to these EGA accessions can be found in the SDRF file. Please note that the raw data files for 11 sequencing runs have yet been deposited at EGA, so they are marked with ""ot available"" under the Comment[SUBMITTED_FILE_NAME] field in the SDRF file, and were included for the sake of completeness. Further iInformation on individual samples and sequencing libraries can also be found on the BLUEPRINT data coordination centre (DCC) website: http://dcc.blueprint-epigenome.eu"
Project description:The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.