A mechanism for controlled access to GWAS data: experience of the GAIN Data Access Committee.
ABSTRACT: The Genetic Association Information Network (GAIN) Data Access Committee was established in June 2007 to provide prompt and fair access to data from six genome-wide association studies through the database of Genotypes and Phenotypes (dbGaP). Of 945 project requests received through 2011, 749 (79%) have been approved; median receipt-to-approval time decreased from 14 days in 2007 to 8 days in 2011. Over half (54%) of the proposed research uses were for GAIN-specific phenotypes; other uses were for method development (26%) and adding controls to other studies (17%). Eight data-management incidents, defined as compromises of any of the data-use conditions, occurred among nine approved users; most were procedural violations, and none violated participant confidentiality. Over 5 years of experience with GAIN data access has demonstrated substantial use of GAIN data by investigators from academic, nonprofit, and for-profit institutions with relatively few and contained policy violations. The availability of GAIN data has allowed for advances in both the understanding of the genetic underpinnings of mental-health disorders, diabetes, and psoriasis and the development and refinement of statistical methods for identifying genetic and environmental factors related to complex common diseases.
Project description:The scientific and public health benefits of mandatory data-sharing mechanisms must be actively demonstrated. To this end, we manually reviewed 2724 data access requests approved between June 2007 and August 2010 through the U.S. National Center for Biotechnology Information database of genotypes and phenotypes (dbGaP). Our analysis demonstrates that dbGaP enables a wide range of secondary research by investigators from academic, governmental, and nonprofit and for-profit institutions in the United States and abroad. However, limitations in public reporting preclude the tracing of outcomes from secondary research to longer-term translational benefit.
Project description:The database of Genotypes and Phenotypes (dbGaP) Data Browser (https://www.ncbi.nlm.nih.gov/gap/ddb/) was developed in response to requests from the scientific community for a resource that enable view-only access to summary-level information and individual-level genotype and sequence data associated with phenotypic features maintained in the controlled-access tier of dbGaP. Until now, the dbGaP controlled-access environment required investigators to submit a data access request, wait for Data Access Committee review, download each data set and locally examine them for potentially relevant information. Existing unrestricted-access genomic data browsing resources (e.g. http://evs.gs.washington.edu/EVS/, http://exac.broadinstitute.org/) provide only summary statistics or aggregate allele frequencies. The dbGaP Data Browser serves as a third solution, providing researchers with view-only access to a compilation of individual-level data from general research use (GRU) studies through a simplified controlled-access process. The National Institutes of Health (NIH) will continue to improve the Browser in response to user feedback and believes that this tool may decrease unnecessary download requests, while still facilitating responsible genomic data-sharing.
Project description:Background/AimsThe Kaiser Permanente Northern California (KPNC) Research Program on Genes, Environment, and Health (RPGEH) provides a research resource to support investigations of environmental and genetic factors in the development of a wide variety of conditions. While the resource is still evolving with new data collection, it consists of data from surveys and electronic medical records on over 400,000 adult members of KPNC; biospecimens collected and stored on ~200,000 of these individuals; and data from genome-wide and telomere-length assays on ~110,000 of those who have contributed biospecimens; and linkage of these members to environmental, area-level databases.MethodsThe RPGEH was developed in part with the understanding that it would be made available to the scientific community for appropriate studies. An Access and Collaborations Core has developed procedures for submission of applications for research studies, their review, and decisions on approval and support. Review of proposals by an Applications Review Committee follows a two-step process, with a pre-application to assess feasibility (e.g., adequate numbers of the phenotype: availability of appropriate data, given inclusion criteria) and a full application to assess appropriateness of the study in the RPGEH context. Scientific merit; alignment with RPGEH guiding principles, including ethical, legal, and social implications; consistency with informed consents; potential overlap with prior approvals; and collaboration with a researcher affiliated with the Division of Research are among the criteria for approval. As an alternative for select analyses, genomic and selected phenotypic data will be available in the NIH database of genotypes and phenotypes (dbGaP) for the substantial subset of RPGEH participants who have consented to dbGaP deposition.ResultsAs of October 31, 2012, the RPGEH has received 74 pre-applications and full applications for the use of its resources; only 6 pre-applications were not approved. In 2011–2012, 13 approved applications were funded by NIH and other agencies. Studies currently underway include genome-wide association studies of prostate cancer, bi-polar disorder, multiple sclerosis, and mammographic density.ConclusionsAccess to the unique and outstanding research resources of the RPGEH balances the mission of promoting research with the need to shepherd finite resources and safeguard member confidentiality.
Project description:<h4>Summary</h4>Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP.<h4>Availability and implementation</h4>dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.
Project description:A global, comprehensive and open access listing of approved anticancer drugs does not currently exist. Partial information is available from multiple sources, including regulatory authorities, national formularies and scientific agencies. Many such data sources include drugs used in oncology for supportive care, diagnostic or other non-antineoplastic uses. We describe a methodology to combine and cleanse relevant data from multiple sources to produce an open access database of drugs licensed specifically for therapeutic antineoplastic purposes. The resulting list is provided as an open access database, (http://www.redo-project.org/cancer-drugs-db/), so that it may be used by researchers as input for further research projects, for example literature-based text mining for drug repurposing.
Project description:Longitudinal analysis of supermarkets over time is essential to understanding the dynamics of foodscape environments for healthy living. Supermarkets for 2007, 2011, and 2014 for the City of Chicago were curated and further validated. The average distance to all supermarkets along the street network was constructed for each resident-populated census tract. These analytic results were generated with GIS software and stored as spatially enabled data files, facilitating further research and analysis. The data presented in this article are related to the research article entitled "Urban foodscape trends: Disparities in healthy food access in Chicago, 2007-2014" (Kolak et al., 2018).
Project description:According to World Health Organization (WHO) prevalence estimates, 1.1 million people in Mexico are infected with Trypanosoma cruzi, the etiologic agent of Chagas disease (CD). However, limited information is available about access to antitrypanosomal treatment. This study assesses the extent of access in Mexico, analyzes the barriers to access, and suggests strategies to overcome them.Semi-structured in-depth interviews were conducted with 18 key informants and policymakers at the national level in Mexico. Data on CD cases, relevant policy documents and interview data were analyzed using the Flagship Framework for Pharmaceutical Policy Reform policy interventions: regulation, financing, payment, organization, and persuasion. Data showed that 3,013 cases were registered nationally from 2007-2011, representing 0.41% of total expected cases based on Mexico's national prevalence estimate. In four of five years, new registered cases were below national targets by 11-36%. Of 1,329 cases registered nationally in 2010-2011, 834 received treatment, 120 were pending treatment as of January 2012, and the treatment status of 375 was unknown. The analysis revealed that the national program mainly coordinated donation of nifurtimox and that important obstacles to access include the exclusion of antitrypanosomal medicines from the national formulary (regulation), historical exclusion of CD from the social insurance package (organization), absence of national clinical guidelines (organization), and limited provider awareness (persuasion).Efforts to treat CD in Mexico indicate an increased commitment to addressing this disease. Access to treatment could be advanced by improving the importation process for antitrypanosomal medicines and adding them to the national formulary, increasing education for healthcare providers, and strengthening clinical guidelines. These recommendations have important implications for other countries in the region with similar problems in access to treatment for CD.
Project description:Summary Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset’s allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers’ discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide. Graphical abstract Highlights Biomedical advances depend on the efficient and compliant re-use of sensitive human data The Data Use Ontology standardizes terms and definitions for consented data uses The Data Use Ontology facilitates discovery of, request for, and access to datasets Over 200,000 datasets worldwide have been annotated using the Data Use Ontology The GA4GH Data Use Ontology (DUO) provides unambiguous, machine-readable standard language for consent forms and the data sharing policies they represent. Lawson et al. describe the DUO standard and implementations throughout the data access workflow to expedite data access while maintaining or improving compliant processes.
Project description:<h4>Objective</h4>To examine the contributions of individual- and neighborhood-level spatial access to care to the utilization of emergency departments (EDs) for preventable conditions through implementation of novel local spatial access measures.<h4>Data sources/study setting</h4>Emergency department admissions data are from four HealthLNK member hospitals in Chicago from 2007 to 2011. Primary care physician office and clinic locations were obtained from the American Medical Association and the City of Chicago.<h4>Study design</h4>Multilevel logit regression was used to model the relationship between individual- and neighborhood-level attributes and preventable ED use.<h4>Data collection/extraction methods</h4>Emergency department admissions data were classified based on the primary diagnosis for each encounter. Spatial access to care indices were generated in ArcGIS, and values were extracted at each ZIP code centroid to match patients' ZIP codes.<h4>Principal findings</h4>Beyond sociodemographic factors such as gender and race, patients living in medically underserved areas (MUAs) and areas with lower spatial access to primary care clinics had higher odds of preventable ED use.<h4>Conclusions</h4>Preventable ED use can be associated with sociodemographic characteristics, as well as spatial access to primary care services. This study reveals potential for using local measures of spatial accessibility for preventable ED analyses.
Project description:Academic graphs are essential for communicating complex scientific ideas and results. To ensure that these graphs truthfully reflect underlying data and relationships, visualization researchers have proposed several principles to guide the graph creation process. However, the extent of violations of these principles in academic publications is unknown. In this work, we develop a deep learning-based method to accurately measure violations of the proportional ink principle (AUC = 0.917), which states that the size of shaded areas in graphs should be consistent with their corresponding quantities. We apply our method to analyze a large sample of bar charts contained in 300K figures from open access publications. Our results estimate that 5% of bar charts contain proportional ink violations. Further analysis reveals that these graphical integrity issues are significantly more prevalent in some research fields, such as psychology and computer science, and some regions of the globe. Additionally, we find no temporal and seniority trends in violations. Finally, apart from openly releasing our large annotated dataset and method, we discuss how computational research integrity could be part of peer-review and the publication processes.