Dataset Information

EMERGE Network Phase III: HRC Imputed Array Data of 83,000+ Participants

ABSTRACT:

The Electronic Medical Records and Genomics (eMERGE) Network is a National Institutes of Health (NIH)-organized and funded consortium of U.S. medical research institutions. The primary goal of the eMERGE Network is to develop, disseminate, and apply approaches to research that combine biorepositories with electronic medical record (EMR) systems for genomic discovery and genomic medicine implementation research. eMERGE was announced in September 2007 and began its third phase in September 2015. eMERGE Phase III (September 2015 - May 2019) consists of nine study sites, two central sequencing and genotyping facilities, and a coordinating center.

Included in this study are:

Human Reference Consortium (HRC) imputed array data of ~83,000 eMERGE participants from nine Phase III study sites and three Phase II study site collaborators.
Corresponding demographics, body mass index measurements.
Case/control status for the following phenotypes: Abdominal aortic aneurysm; Ace-Inhibitor/Cough; Attention Deficit Hyperactivity Disorder; Age-related macular disease; Appendicitis; Asthma; Atopic Dermatitis; Autism; Benign Prostatic Hyperplasia; Carotid artery disease as a Quantitative Measure; caMRSA; Cataract; Clostridium difficile colitis; Extreme Obesity; Chronic Kidney Disease; Chronic Kidney Disease and Type 2 Diabetes; Chronic Kidney Disease, Type 2 Diabetes and Hypertension; Colon Polyps; Cardiorespiratory Fitness; Dementia; Diverticulosis; Diabetic retinopathy; Gastroesophageal Reflux Disease; Glaucoma; Height; Heart failure; Hypothyroidism; Lipids; Ocular hypertension; Peripheral Arterial Disease; QRS duration; Red blood cell indices; Remission of Diabetes after ROUX-EN-Y gastric bypass surgery; Resistant hypertension; MACE while on Statins; Type 2 Diabetes; Venous Thromboembolism; White blood cell indices; and Zoster virus infection.

Study sites and participants include:

Boston Children's Hospital: The Gene Partnership (TGP) is a prospective longitudinal registry at Boston Children's Hospital (BCH) to study the genetic and environmental contributions to childhood health and disease, collect genetic information on a large number of children who have been phenotyped, and implement the Informed Cohort and the Informed Cohort Oversight Board (ICOB). The term "The Gene Partnership" reflects a partnership between researchers and participants. Children seen at BCH are offered enrollment, as are their parents and siblings. DNA is collected on all enrollees. BCH has a comprehensive EMR system, and virtually all inpatient and outpatient data are captured electronically. Clinical data in the BCH EMR is loaded in the i2b2 data warehouse which is available to investigators. Cases, phenotypes, and covariates are ascertained using the i2b2 database. Participants at BCH in TGP have consented to receive any research result and/or incidental finding that arises from studies using TGP that is approved by the Informed Cohort Oversight Board (ICOB) and is in accordance with the participants'preferences;results are returned through the Personally Controlled Health Record (PCHR). BCH and Cincinnati Children's Hospital Medical Center (CCHMC) have partnered as the Pediatric Alliance for Genomic and Electronic Medical Record (EMR) Research (PAGER) site for the eMERGE Phase II network for pediatric institutions, and the cohort for eMERGE at BCH is TGP.

Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. A large majority of participants consenting to prospective genomic analyses also consent to analysis of their de-identified electronic health records (EHRs). EHRs are longitudinal, with a mean duration of 6.5 years.

Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): Cincinnati Children's Hospital Medical Center (CCHMC) is a pediatric institution dedicated to improving health and welfare of children and to the discovery and practical application of new genomic information to the ordinary care of children. CCHMC brings an extraordinary faculty to eMERGE III who are committed to gain a better understanding of the genesis of disease and to elucidate the mechanisms of diseases that afflict children, specifically pediatric disease phenotypes that will leverage the available eMERGE adult genomic data and electronic medical records (EMRs) to discover meaningful use results. Generation of EMR phenotype algorithms, informed by natural language processing, using heuristic and machine learning methods is ongoing. CCHMC has developed tools to evaluate adolescent return of results preferences, examined the ethical and legal obligations and potential to reanalyze results, and developed clinical decision support for phenotyping, test ordering, and returning sequencing results. The success of these eMERGE III studies is enhanced by the ongoing institutional investment made in the CCHMC Biobank, the comprehensive EMR (EPIC), the i2b2 de-identified medical record data warehouse, and hundreds of faculty and senior staff who make genomics or informatics an active focus of their research.

Columbia University: Columbia University Medical Center/New York Presbyterian (CUMC/NYP) Hospital system is one of the nation's largest and most comprehensive hospital systems with over 2 million inpatient and outpatient yearly visits that serves a racially and ethnically diverse urban patient population. The Columbia University GENomic Integration with EHR (GENIE) research study contributed and shared phenotype and genotype data for individuals who were recruited as part of a diverse array of initiatives within the hospital, including Northern Manhattan Study (NOMAS), Pediatric Cardiac Genomic Consortium (PCGC), Caribbean Hispanics with Familial and Sporadic Late Onset Alzheimer's disease (AD), Alzheimer's Disease Sequencing Project (ADSP), and Genetics of Chronic Kidney Disease study. Some of these individuals had kidney or neurological problems, some were healthy adult volunteers with self-reported health status information from the medically underserved Northern Manhattan community, and others were pediatric patients with cardiac conditions. For the kidney disease cohort, patients with the diagnosis of Chronic Kidney Disease (CKD) and healthy controls were recruited to the Columbia University CKD biobank. For the NOMAS cohort, eligible participants were stroke-free, were 40 years old, and resided for at least 3 months in a Northern Manhattan household with a telephone. The PCGC study recruited parent-offspring trios with pediatric probands diagnosed with congenital heart defects (CHD). For the Caribbean Hispanics with Alzheimer's disease project, individuals from families affected by AD and with sporadic AD were recruited, along with unrelated controls. Samples for the ADSP study have been selected from well-characterized cohorts of individuals with AD diagnosis.

Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All participants provided written informed consent and HIPAA authorization. Consenting patients agreed to provide blood samples for broad biomedical research use, and permission to access data in their Geisinger electronic medical record for research. The enrollment rate was 90% of patients approached. The demographics of the cohort approximate those of the Geisinger Clinic outpatient population. Research blood samples were collected during an outpatient clinical phlebotomy encounter. Research blood samples are coded and stored in a central biorepository. Samples are linkable to clinical data in a de-identified manner for research via an IRB-approved data broker process. For genomic analysis, DNA is extracted from EDTA-anticoagulated whole blood.

Partners Health Care (Harvard): The Partners HealthCare Biobank is a large research program designed to help researchers understand how people's health is affected by their genes, lifestyle, and environment. This large research data and sample repository provides access to high-quality, consented blood samples to help foster research, advance our understanding of the causes of common diseases, and advance the practice of medicine. For the Partners research community (Massachusetts General Hospital and Brigham and Women's Hospital), the Biobank provides:

Banked samples (plasma, serum, and DNA) collected from consented patients
Blood samples that were discarded after clinical testing in the Crimson Cores maintained in the Brigham and Women's Hospital and Massachusetts General Hospital Pathology Departments
Sample handling and preparation services
Link to the biobank data to the Partners Research Patient Data Registry (RPDR) a research instance of our electronic clinical chart
Data access through our research portal.

To date, over 60,000 Partners patients have given their consent to enroll, give a blood sample, receive research results and agreed to be re-contacted for additional research studies. The Biobank has enabled Partners investigators to compete for nationally recognized grants in personalized medicine such as a clinical electronic Medical Records and Genomics network (eMERGE) site and the national All of US program. The Biobank currently supports over 120 Partners investigators and over 100 million dollars in NIH research.

Kaiser Permanente Washington with the University of Washington and the Fred Hutchinson Cancer Research Center: KPWA participants were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~8,073.) NWIGM is based at the University of Washington and co-managed by the University of Washington and KPWA. The purpose of the NWIGM biorepository is to build infrastructure and resources to carry out a broad range of future genetic research. KPWA members enrolled in the biorepository are asked to provide informed consent to providing a DNA sample for storage in the NWIGM biorepository. The consent is purposefully broad to serve the dual purpose of reducing the burden on researchers who wish to use this biorepository and the IRB committees who will be responsible for reviewing these requests in the future. Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in KPWA's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at KPWA. 1) Demographics - participants with self-reported race as Asian ancestry were prioritized and selected to enrich for non-European ancestry. The KPWA eMERGE cohort includes n=1,245 members of Asian ancestry. 2) Participants were also selected for a history of colorectal cancer (n=1,002), in order to allow us to enrich germline pathogenic variants.

Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): The Marshfield Clinic Personalized Medicine Research Project is a population-based biobank in central Wisconsin with more than 20,000 adult subjects who provided written, informed consent to access their medical records and provided a blood sample from which DNA was extracted and plasma and serum stored. In addition to an average of 30 years of medical history data, a questionnaire about environmental exposures, including a detailed food frequency questionnaire, is available to facilitate gene/environment studies.

Mayo Clinic: The Mayo Vascular Disease Biorepository is a disease-specific biobank for vascular diseases including peripheral arterial disease (PAD). PAD patients were identified from individuals referred to the non-invasive vascular laboratory for lower extremity arterial evaluation. Since 1997, laboratory findings have been recorded into an electronic database employing an in-house software package for data archiving and retrieval;this data becomes part of the Mayo EMR. Patients referred to the center with suspected PAD undergo a comprehensive non-invasive evaluation including the ankle-brachial index (ABI) - the ratio of blood pressure measured in the upper arms divided by blood pressure measured at the ankles. Controls subjects are identified from patients referred to the Cardiovascular Health Clinic for stress ECG. The prevalence of PAD in patients with normal exercise capacity who do not have inducible ischemia on the stress ECG , was <1%. Data regarding risk factors for atherosclerosis such as diabetes, dyslipidemia, hypertension, and smoking are ascertained from the EMR. Case control study of venous thromboembolism (PI John Heit) Controls from a case control study of pancreatic cancer (PI Gloria Petersen) Mayo Clinic Biobank.

Icahn School of Medicine at Mount Sinai School (Mt. Sinai): The Institute for Personalized Medicine (IPM) Biobank Project is a consented, EMR-linked medical care setting biorepository of the Mount Sinai Medical Center (MSMC) drawing from a population of over 70,000 inpatients and 800,000 outpatient visits annually. MSMC serves diverse local communities of upper Manhattan, including Central Harlem (86% African American), East Harlem (88% Hispanic Latino), and Upper East Side (88% Caucasian/white) with broad health disparities. IPM Biobank populations include 28% African American (AA), 38% Hispanic Latino (HL) predominantly of Caribbean origin, 23% Caucasian/White (CW). IPM Biobank disease burden is reflective of health disparities with broad public health impact: average body mass index of 28.9 and frequencies of hypertension (55%), hypercholesterolemia (32%), diabetes (30%), coronary artery disease (25%), chronic kidney disease (23%), among others. Biobank operations are fully integrated in clinical care processes, including direct recruitment from clinical sites, waiting areas and phlebotomy stations by dedicated Biobank recruiters independent of clinical care providers, prior to or following a clinician standard of care visit. Recruitment currently occurs at a broad spectrum of over 30 clinical care sites.

Northwestern University: The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants'DNA samples are coupled with data from a self-reported questionnaire and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators.

Vanderbilt University Medical Center: BioVU, Vanderbilt's DNA databank, was designed as an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations. BioVU is currently the largest single site DNA collection world-wide, at >235,000 samples as of spring 2017.

PROVIDER: phs001584 | dbGaP |

REPOSITORIES: dbGaP

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	GapExchange_phs001584.v1.p1.xml	Xml
	dbGaPEx2.1.5.xsd	Other
	phs001584.v1-Documents.zip	Other
	Study_Report.phs001584.eMERGE_III_HRC.v1.p1.MULTI.pdf	Pdf
	manifest_phs001584.eMERGE_III_HRC.v1.p1.c1.GRU.pdf	Pdf

Items per page:

1 - 5 of 62

Similar Datasets

Project description:The electronic Medical Records and Genomics (eMERGE) Network is a consortium of ten participating sites (Cincinnati Children's Hospital Medical Center/Boston Children's Hospital, Children's Hospital of Philadelphia, Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University, Geisinger Clinic, Group Health Cooperative/University of Washington, Mayo Clinic, Icahn School of Medicine at Mount Sinai, Northwestern University, Vanderbilt University Medical Center) funded by the NHGRI to investigate the use of electronic medical record (EMR) systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 55,000 individuals using EMR-derived phenotypes and DNA from linked Biorepositories. Using electronic phenotyping methods, the consortium used DNA samples from all participating sites to explore the genetic determinants of over forty phenotypes, including Abdominal aortic aneurysm; Ace-Inhibitor/Cough; Attention Deficit Hyperactivity Disorder; Age-related macular disease; Appendicitis; Asthma; Atopic Dermatitis; Autism; Benign Prostatic Hyperplasia; Carotid artery disease as a Quantitative Measure; caMRSA; Cataract; Clostridium difficile colitis; Extreme Obesity; Chronic Kidney Disease; Chronic Kidney Disease and Type 2 Diabetes; Chronic Kidney Disease, Type 2 Diabetes and Hypertension; Colon Polyps; Cardiorespiratory Fitness; Dementia; Diverticulosis; Diabetic retinopathy; Gastroesophageal Reflux Disease; Glaucoma; Height; Heart failure; Hypothyroidism; Lipids; Ocular hypertension; Peripheral Arterial Disease; QRS duration; Red blood cell indices; Remission of Diabetes after ROUX-EN-Y gastric bypass surgery; Resistant hypertension; MACE while on Statins; Type 2 Diabetes; Venous Thromboembolism; White blood cell indices; and Zoster virus infection, as well as using the phenome-wide association study (PheWAS) paradigm to replicate and discover relationships between targeted genotypes with multiple phenotypes. Sites and participants include: Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. A large majority of participants consenting to prospective genomic analyses also consent to analysis of their de-identified electronic medical records (EMRs). EMRs are longitudinal, with a mean duration of 6.5 years. Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): Cincinnati Children's Hospital Medical Center (CCHMC) and Boston Children's Hospital (BCH) are pediatric institutions dedicated to improving health and welfare of children and to the shared purpose of discovery and practical application of new genomic information to the ordinary care of children. The CCHMC/BCH site has been built on a five-year history of collaboration, particularly in patient electronic record (ERM)-related informatics, the basis of much of eMERGE II. CCHMC and BCH together bring an extraordinary faculty to eMERGE II who are committed to diseases that afflict children, specifically phenotypes that focus upon diseases of children in ways that will leverage the available eMERGE adult GWAS and EMRs to discover meaningful use results. CCHMC/BCH plans to demonstrate real-time execution of phenotypic selection across their two distinct pediatric institutions as a model for ensuring phenotypic standardization and for national scalability. They will also look carefully at parents' responses to results and use of their children's research results and better understand the factors that influence their decisions about learning incidental findings. In addition to patient and parent perceptions CCHMC/BCH will also explore clinician perceptions of pharmacogenetic research results after EMR integration. Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All participants provided written informed consent and HIPAA authorization. Consenting patients agreed to provide blood samples for broad biomedical research use, and permission to access data in their Geisinger electronic medical record for research. The enrollment rate was 90% of patients approached. The demographics of the cohort approximate those of the Geisinger Clinic outpatient population. Research blood samples were collected during an outpatient clinical phlebotomy encounter. Research blood samples are coded and stored in a central biorepository. Samples are linkable to clinical data in a de-identified manner for research via an IRB-approved data broker process. For genomic analysis, DNA is extracted from EDTA-anticoagulated whole blood. Group Health(GH)/University of Washington (UW): GH participants for the PGx project were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~6300.) Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in GH's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at Group Health: 1. Demographics - participants with self-reported race as Asian or African ancestry were prioritized and selected to enrich for non-European ancestry; 2. Diagnosis and procedure codes - participants were selected if found to have a history of hypertension, atrial fibrillation (AF), or congestive heart failure (CHF). Participants with a history of arrhythmia were added if the entire selection algorithm did not generate 900 individuals. We also enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes. Participants were selected if found to have an ICD9 code for malignant hyperthermia, hypertension, atrial fibrillation, congestive heart failure or long QT syndrome (LQTS); 3. Laboratory values - if participants had any laboratory event of creatine kinase (CK) >1000, and were dispensed statins within 6 months of the event, then they were selected; and 4. Medications - participants were excluded if ever on carbamazepine or had a current regimen of warfarin. Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): The Marshfield Clinic Personalized Medicine Research Project is a population-based biobank in central Wisconsin with more than 20,000 adult subjects who provided written, informed consent to access their medical records and provided a blood sample from which DNA was extracted and plasma and serum stored. In addition to an average of 30 years of medical history data, a questionnaire about environmental exposures, including a detailed food frequency questionnaire, is available to facilitate gene/environment studies. Mayo Clinic: The Mayo biobank is a disease-specific biobank for vascular diseases including peripheral arterial disease (PAD). PAD patients were identified from individuals referred to the non-invasive vascular laboratory for lower extremity arterial evaluation. Since 1997, laboratory findings have been recorded into an electronic database employing an in-house software package for data archiving and retrieval; this data becomes part of the Mayo EMR. Patients referred to the center with suspected PAD undergo a comprehensive non-invasive evaluation including the ankle-brachial index (ABI) - the ratio of blood pressure measured in the upper arms divided by blood pressure measured at the ankles. Controls subjects are identified from patients referred to the Cardiovascular Health Clinic for stress ECG. The prevalence of PAD in patients with normal exercise capacity who do not have inducible ischemia on the stress ECG , was <1%. Data regarding risk factors for atherosclerosis such as diabetes, dyslipidemia, hypertension, and smoking are ascertained from the EMR. Icahn School of Medicine at Mount Sinai School (Mt. Sinai): The Institute for Personalized Medicine (IPM) Biobank Project is a consented, EMR-linked medical care setting biorepository of the Mount Sinai Medical Center (MSMC) drawing from a population of over 70,000 inpatients and 800,000 outpatient visits annually. MSMC serves diverse local communities of upper Manhattan, including Central Harlem (86% African American), East Harlem (88% Hispanic Latino), and Upper East Side (88% Caucasian/white) with broad health disparities. IPM Biobank populations include 28% African American (AA), 38% Hispanic Latino (HL) predominantly of Caribbean origin, 23% Caucasian/White (CW). IPM Biobank disease burden is reflective of health disparities with broad public health impact: average body mass index of 28.9 and frequencies of hypertension (55%), hypercholesterolemia (32%), diabetes (30%), coronary artery disease (25%), chronic kidney disease (23%), among others. Biobank operations are fully integrated in clinical care processes, including direct recruitment from clinical sites, waiting areas and phlebotomy stations by dedicated Biobank recruiters independent of clinical care providers, prior to or following a clinician standard of care visit. Recruitment currently occurs at a broad spectrum of over 30 clinical care sites. Northwestern University: The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a self-reported questionnaire and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators. Vanderbilt University: BioVU, Vanderbilt's DNA databank, is an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses, and represents a key first step in moving the emerging sciences of genomics and pharmacogenomics from research tools to clinical practice. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations.

Project description:The electronic Medical Records and Genomics (eMERGE) Network is a consortium of ten participating sites funded by the NHGRI to investigate the use of electronic medical record systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 19,000 individuals using EMR-derived phenotypes and DNA from linked Biorepositories. The eMERGE Network brings together researchers with a wide range of expertise in genomics, statistics, ethics, informatics, and clinical medicine from leading medical research institutions across the country. Each center participating in the consortium is uniquely situated to provide critical resources to this highly collaborative and productive network. Each site combines a biobank or study cohort with extensive genomic data and access to clinical data derived from electronic medical records. Sites are geographically dispersed and have diverse patient populations, including two sites focusing specifically on pediatrics. The eMERGE Network is comprised of 9 sites and one coordinating center. Each site maintains its own biorepository where DNA specimens are linked to phenotypic data contained within EMRs. <ul> Children's Hospital of Pennsylvania, PI: Hakon Hakonarson, MD, PhD Cincinnati Children's Medical Center with Boston Children's Hospital, PI: John Harley, MD, PhD - CCHMC and Ingrid Holm, MD, MPH - BCH Geisinger Health System, PI: David Carey, PhD and Marc Williams, MD Group Health Cooperative with University of Washington, PI: Eric Larson, MD, MPH - GHC and Gail Jarvik, MD, PhD - UW Marshfield Clinic with Essentia Institute of Rural Health, PI: Catherine McCarty, PhD, MPH - Essentia and Murray Brilliant, PhD - Marshfield Mayo Clinic, PI: Christopher Chute, MD, DrPH and Iftikhar Kullo, MD Icahn School of Medicine at Mount Sinai, PI: Erwin Bottinger, MD Northwestern University, PI: Rex Chisholm, PhD and Maureen Smith, MS Vanderbilt University, PI: Dan Roden, MD eMERGE Coordinating Center - Vanderbilt University, PI: Paul Harris, PhD </ul> Using electronic phenotyping methods, the consortium used DNA samples from all participating sites to explore the genetic determinants of red cell indices, white blood count (WBC) differential, diabetic retinopathy, height, serum lipid levels, specifically total cholesterol, HDL (high density lipoprotein), LDL (low density lipoprotein), and triglycerides, and autoimmune hypothyroidism as well as using the phenome-wide association study (PheWAS) paradigm to replicate and discover relationships between targeted genotypes with multiple phenotypes.

Project description:The Electronic Medical Records and Genomics (eMERGE) Network is a National Institutes of Health (NIH)-organized and funded consortium of U.S. medical research institutions. The primary goal of the eMERGE Network is to develop, disseminate, and apply approaches to research that combine biorepositories with electronic medical record (EMR) systems for genomic discovery and genomic medicine implementation research. eMERGE was announced in September 2007 and began its third phase in September 2015. eMERGE III consists of nine study sites, two central sequencing and genotyping facilities, and a coordinating center. eMERGE Phase III aims to: 1) sequence and assess the phenotypic implication of rare variants in a custom designed eMERGEseq panel consisting of 109 genes (including 56 ACMG actionable finding list genes and the top 6 genes from each site relevant to their specific aims), as well as approximately 1400 SNPs; 2) assess the phenotypic implications of these variants by developing, validating and implementing new phenotype algorithms, 3) integrate genetic variants into EMRs to inform clinical care; and 4) create community resources. Included in this study are: <ul> <li>~ 15,000 eMERGE participants from nine eMERGE III study sites and one study site collaborator.</li> <li>Corresponding demographics, body mass index measurements.</li> <li>Top PheWAS codes generated from a collated list of ICD codes from all study sites.</li> </ul> Study sites and participants include: Cincinnati Children's Hospital Medical Center (CCHMC): Cincinnati Children's Hospital Medical Center (CCHMC) is a not-for-profit hospital and research center pioneering breakthrough treatments, providing outstanding family-centered patient care and training healthcare professionals for the future, and dedicated to improving health and welfare of children and to the shared purpose of discovery and practical application of new genomic information to the ordinary care of children. We bring a comprehensive electronic health record (EPIC), a deidentified i2b2 data warehouse of 680K patient records, a biobank with >261,000 consents that allow return of results to >84,000 patients and guardians who have provided DNA samples, and hundreds of faculty and senior staff who make genomics or informatics an active focus of their research. CCHMC will help the eMERGE III Steering Committee identify genes for the eMERGE III targeted sequencing panel, provide 3,000 DNA samples from CCHMC patients to be sequenced, review targeted gene panels from clinical care at CCHMC for somatic mosaicism and reinterpretation, and further develop and disseminate a software workflow suite for sequence analysis. We will also extend our work generating phenotype algorithms using heuristic and machine learning methods to many new childhood diseases. We will develop tools to evaluate adolescent return of results preferences, examine the ethical and legal obligations and potential to reanalyze results, and develop clinical decision support for phenotyping, test ordering, and returning sequencing results. Children's Hospital of Philadelphia (CHOP): The Center for Applied Genomics (CAG) is a specialized Center of Emphasis at the Children's Hospital of Philadelphia (CHOP), and one of the world's largest genetics research programs, with to state-of-the-art high-throughput sequencing and genotyping technology. Our primary goal is to translate basic research findings to medical innovations. We aim to develop new and better ways to diagnose and treat children affected by rare and complex medical disorders, including asthma, autism, epilepsy, pediatric cancer, learning disabilities, and a range of rare diseases. Ultimately, our objective is to generate new diagnostic tests and to guide physicians to the most appropriate therapies. Participants were recruited from the CAG biorepository (n>450,000), specifically from >100,000 CHOP pediatric patients and family members, which is enriched for rare-diseases (n>12,000). Center for Applied Genomics, The Children's Hospital of Philadelphia We gratefully thank all the children and their families who enrolled in this study, and all individuals who donated blood samples for research purposes. Genotyping for this project was performed at the Center for Applied Genomics and supported by an Institutional Development Award from The Children's Hospital of Philadelphia. Sequencing was supported by the National Institutes of Health through an award from the National Human Genome Research Institute's Electronic Medical Records and Genomics (eMERGE) program (U01HG008684). Columbia University: The goal of the Columbia eMERGE III project is to develop methods for integrating genomic data in EHRs and to study the impact of such genomic informatics interventions on the health of a diverse, underserved urban adult English- and Spanish-speaking patient population in Northern Manhattan served by Columbia University Medical Center/New York-Presbyterian Hospital system. The study group is 2500 patients recruited from diverse clinics and community outreach centers of self-reported White (~61%), Asian (~11%), African-American (~11%), American Indian/Alaska Native (<1%) racial and Hispanic (~33%) ethnic backgrounds. There are two subgroups in the study cohort - a retrospective group (N=1052) that includes patients from oncology and nephrology clinics, and a prospective one (N=1448) that includes healthy individuals as well as participants with diverse medical conditions. Confirmed pathogenic variants in 70 selected genes will be returned to participants and their healthcare providers through the EHR integration. Participants are able to choose the results they receive and will have the freedom to meet with a genetic counselor and a geneticist to review results. The impact of genetic testing on clinical care is determined by periodic monitoring of EHRs. Geisinger: Samples and phenotype data in this study were provided by the Geisinger MyCode® Community Health Initiative. Participants are recruited across the Geisinger System via online consents or in-person consents at a hospital or clinic visit. Enrollment is ongoing with over 100,000 individuals currently consented. Partners Healthcare (Harvard University): The Partners HealthCare Biobank is a large research program designed to help researchers understand how people's health is affected by their genes, lifestyle, and environment. This large research data and sample repository provides access to high-quality, consented blood samples to help foster research, advance our understanding of the causes of common diseases, and advance the practice of medicine. For the Partners research community (Massachusetts General Hospital and Brigham and Women's Hospital), the Biobank provides: <ol> <li>Banked samples (plasma, serum, and DNA) collected from consented patients </li> <li>Blood samples that were discarded after clinical testing in the Crimson Cores maintained in the Brigham and Women's Hospital and Massachusetts General Hospital Pathology Departments </li> <li>Sample handling and preparation services</li> <li>Link to the biobank data to the Partners Research Patient Data Registry (RPDR) a research instance of our electronic clinical chart</li> <li>Data access through our research portal.</li> </ol> To date, over 70,000 Partners patients have given their consent to enroll, give a blood sample, receive research results and agreed to be re-contacted for additional research studies. The Biobank has enabled Partners investigators to compete for nationally recognized grants in personalized medicine such as a clinical electronic Medical Records and Genomics network (eMERGE) site and the national All of US program. The Biobank currently supports over 120 Partners investigators and over 130 million dollars in NIH research. Kaiser Permanente Washington/ (KPWA) / University of Washington (UW): KPWA participants were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N=2,500.) NWIGM is based at the University of Washington and co-managed by the University of Washington and KPWA. The purpose of the NWIGM biorepository is to build infrastructure and resources to carry out a broad range of future genetic research. KPWA members enrolled in the biorepository are asked to provide informed consent to providing a DNA sample for storage in the NWIGM biorepository. The consent is purposefully broad to serve the dual purpose of reducing the burden on researchers who wish to use this biorepository and the IRB committees who will be responsible for reviewing these requests in the future. Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in KPWA's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at KPWA. 1) Demographics - participants with self-reported race as Asian ancestry were prioritized and selected to enrich for non-European ancestry. The KPWA eMERGE cohort includes N=1,245 members of Asian ancestry. 2) Participants were also selected for a history of colorectal cancer (N=1,255), in order to allow us to enrich germline pathogenic variants. Mayo Clinic: The Return of Actionable Variants Empirical (RAVE) Study was approved by the Mayo Clinic IRB. We recruited 2537 participants from Mayo Clinic biobanks in Rochester, MN, who had hypercholesterolemia or colon polyps, thereby enriching for Familial hypercholesterolemia (FH) and monogenic causes of colorectal cancer (CRC). Additional eligibility criteria were: 1) residents of Southeast MN who were alive and aged 18-70 years; 2) LDL-C level >155 or >120 mg/dl while on lipid-lowering therapy; 3) no known cause of secondary hyperlipidemia; and 4) no cognitive impairment or dementia that would compromise their ability to give written informed consent. Based on these criteria, we identified 5270 eligible patients and obtained informed consent from 3030 participants. Recruitment was conducted in waves and utilized mailed recruitment packets consisting of a study brochure, a written informed consent form, a baseline psychosocial questionnaire, and a return postage-paid envelope. DNA of 2537 participants was sent for CLIA-certified targeted sequencing of 109 genes including genes associated with FH and CRC. Targeted sequencing and genotyping was performed in a Central Laboratory Improvement Amendment (CLIA)-certified laboratory. Northwestern University: Samples and data used in this study were obtained from patients from Northwestern Medicine, an integrated healthcare system, formed through a partnership of Northwestern Memorial HealthCare and Northwestern University Feinberg School of Medicine. Participants include a retrospective cohort from the Northwestern Pharmacogenomics Study, funded through the eMERGE II project, NHGRI (3U01HG006388-02S1) and a prospective cohort from the Genetic Testing and Your Health Study, funded through the eMERGE III project, NHGRI (U01HG008673). Patients were eligible to participate if they were18 years or older and see a physician at Northwestern Medicine. Patients consented to genetic testing and to allow their results to be placed in their electronic medical record. Vanderbilt University Medical Center: Vanderbilt University Medical Center (VUMC) participants were enrolled in the eMERGE Network through the Vanderbilt Genome-Electronic Records (VGER) project. Patients were provided the appropriate consent to receive clinically relevant genetic results (N=2,700). Participants were eligible if aged 21 or over, had a healthcare provider at VUMC, and visited the provider at least 3 times in the past 3 years. Study site collaborator and participants include: Meharry Medical College: Inclusion of ethnic groups in genomic research is critical to identify possible reasons for health disparities. African-Americans are being enrolled in various outpatient clinics of Nashville General Hospital at Meharry, an inner city hospital primary serving a poorer patient group. A total of 500 African Americans with four cancer types demonstrating health disparities in this population - prostate, colon, breast, lung are identified and approached by clinical research coordinators. The purpose of the study is to determine if any genetic information can be identified from these patients who have or are at high risk of one of these disparate cancers. All participants provide written informed consent and HIPAA authorization to provide blood samples for broad research use and permission to access data in their hospital electronic medical record for research now and in the future. An extensive demographic profile is obtained and entered into a REDCap database. Blood samples are obtained for a panel of alleles from extracted DNA at Baylor. In addition, de-identified coded samples are processed and stored in a central biorepository for further DNA, RNA and proteomic analyses. The survey and phlebotomy are performed at the time of the initial contact and agreement to participate. Nearly all patients approached willingly agree to participate for potential benefit to themselves, family members, or humankind. Little concern is voiced of providing samples for genetic analysis. Study investigators will share results with the participants and providers if testing does not indicate high risk. Results indicating increased risk or actionable alleles for the patient and/or family will be returned by a genetic counselor. Monitoring of the patients' health in this cohort will continue to be followed in the EMR to identify any future associations that might explain health disparities in African Americans. Proposals will be reviewed from investigators to study the genetic or proteomic samples as well as the clinical and demographic information in the repository. Please note that this version of the dataset has a handful of mismatches between genotyped and provided sex, discovered after posting of the data. They will be removed from the version 2 dataset that is forthcoming. Data with the following IDs should be removed: <table> <tr><td>42025287</td><td>42137441</td><td>42139915</td><td>42284933</td><td>42412243</td><td>42456938</td><td>42456946</td><td>42672223</td><td>42954000</td><td>49368913</td><td>49868431</td></tr> <tr><td>52111929</td><td>63312723</td><td>68877688</td><td>81014537</td><td>81014938</td><td>81014945</td><td>81014946</td><td>81014959</td><td>81015004</td><td>81015025</td><td>81015050</td></tr> <tr><td>81015218</td><td>81015243</td><td>81015316</td><td>81015330</td><td>81015374</td><td>81015436</td><td>81015438</td><td>81015662</td><td>81015668</td><td>81015776</td><td>81015778</td></tr> </table>

Project description:The electronic Medical Records and Genomics (eMERGE) Network is a consortium of five participating sites (Group Health Seattle, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University) funded by the NHGRI to investigate the use of electronic medical record systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 19,000 individuals using EMR-derived phenotypes and DNA from linked Biorepositories. Using electronic phenotyping methods, the consortium used DNA samples from all participating sites to explore the genetic determinants of red cell indices, white blood count (WBC) differential, diabetic retinopathy, height, serum lipid levels, specifically total cholesterol, HDL (high density lipoprotein), LDL (low density lipoprotein), and triglycerides, and autoimmune hypothyroidism as well as using the phenome-wide association study (PheWAS) paradigm to replicate and discover relationships between targeted genotypes with multiple phenotypes. eMERGE led studies for which original genotyping was performed and are included in this merged set: <ul><li>Genome-Wide Association Study on Cataract and HDL in the Personalized Medicine Research Project Cohort, <a href="./study.cgi?study_id=phs000170">phs000170</a></li></ul> <ul><li>Development and Use of Network Infrastructure for High-Throughput GWA Studies, <a href="./study.cgi?study_id=phs000234">phs000234</a></li></ul> <ul><li>Vanderbilt Genome-Electronic Records (VGER) Project: QRS Duration, <a href="./study.cgi?study_id=phs000188">phs000188</a></li></ul> <ul><li>Northwestern NUgene Project: Type 2 Diabetes, <a href="./study.cgi?study_id=phs000237">phs000237</a></li></ul> <ul><li>A Genome-Wide Association Study of Peripheral Arterial Disease, <a href="./study.cgi?study_id=phs000203">phs000203</a></li></ul> Sites and participants include: Vanderbilt University: BioVU, Vanderbilt's DNA databank, is an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses, and represents a key first step in moving the emerging sciences of genomics and pharmacogenomics from research tools to clinical practice. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations. Marshfield Clinic: The Marshfield Clinic Personalized Medicine Research Project is a population-based biobank in central Wisconsin with more than 20,000 adult subjects who provided written, informed consent to access their medical records and provided a blood sample from which DNA was extracted and plasma and serum stored. In addition to an average of 30 years of medical history data, a questionnaire about environmental exposures, including a detailed food frequency questionnaire, is available to facilitate gene/environment studies. Northwestern University: The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a self-reported questionnaire and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators. Group Health(GH)/University of Washington (UW): Aging and Dementia eMERGE study biorepository leverages rich population-based longitudinal data from both electronic medical records and in-depth research data to explore genome wide associations. Participants include Seattle-area members of GH (a large integrated health care system in Washington State) consented and enrolled in 1) the UW Alzheimer's Disease Patient Registry (ADPR) and 2) the Adult Changes in Thought (ACT) study. The ADPR (PI: Eric B. Larson; NIH/NIA U01 AG 006781) is a population-based registry of incident dementia cases designed to identify all new Alzheimer's Disease cases within GH from 1987 to 1996. Medical history, physical, laboratory testing, and neuropsychological testing were performed on all consenting potential cases for determination of dementia status by a consensus conference. The study base of the ADPR population was stable with an attrition rate of less than 1%/year. The ACT study (PI: Eric B. Larson; NIH/NIA U01 AG 006781) is an ongoing community-based cohort study of aging and dementia. The original cohort of 2,581 randomly selected dementia-free members age 65 and older was enrolled in 1994-1996 and expanded by 811 in 2000-2002. Continuous enrollment to maintain a cohort of 2,000 dementia free persons began in 2005. Participants receive biennial assessment including cognitive status determination. The ACT sub-sample is stable; for the original cohort, median enrollment in GH was 19 years prior to joining the ACT study, and 85% of the cohort has ≥10 years of GH enrollment. DNA for the ADPR participants were obtained through a companion study, Genetic Differences in Cases and Controls (PI: Walter Kukull; NIH/NIA R01 AG007584). DNA obtained through both studies were extracted from blood using Gentra Systems Puregene methods. DNA concentration is determined by UV optical density. All samples are checked for quality by 260/280 ratio. For long-term storage, samples are aliquoted and stored at -70°C. Mayo Clinic: The Mayo biobank is a disease-specific biobank for vascular diseases including peripheral arterial disease (PAD). PAD patients were identified from individuals referred to the non-invasive vascular laboratory for lower extremity arterial evaluation. Since 1997, laboratory findings have been recorded into an electronic database employing an in-house software package for data archiving and retrieval; this data becomes part of the Mayo EMR. Patients referred to the center with suspected PAD undergo a comprehensive non-invasive evaluation including the ankle-brachial index (ABI) - the ratio of blood pressure measured in the upper arms divided by blood pressure measured at the ankles. Controls subjects are identified from patients referred to the Cardiovascular Health Clinic for stress ECG. The prevalence of PAD in patients with normal exercise capacity who do not have inducible ischemia on the stress ECG, was <1%. Data regarding risk factors for atherosclerosis such as diabetes, dyslipidemia, hypertension, and smoking are ascertained from the EMR.

Project description:The NUgene Project is a biorepository with longitudinal medical information from participating patients at Northwestern Medicine affiliated hospitals and outpatient clinics. Participants' DNA samples are coupled with data from a self-reported questionnaire (2 versions were used, 1 before and 1 after February 2006, both are included) and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 3 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated enterprise data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators. The electronic MedicalRecords and Genomics (eMERGE) Network is a consortium of 9 clinical sites with EMR linked DNA biobanks, including Northwestern University and its NUgene biobank, funded by the NHGRI (National Human Genome Research Institute) to investigate the use of electronic medical record systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 100,000 individuals using EMR-derived phenotypes and DNA from linked biorepositories. Using electronic phenotyping methods, the consortium has been and is using DNA samples from all participating sites to explore the genetic determinants of approximately 80 phenotypes, including both diseases and traits, for which the electronic phenotyping algorithms have or are being published on <a href="https://phekb.org/">PheKB.org</a>. Thus, for the eMERGE network, ~900 NUgene subjects were selected, as described in the study inclusion/exclusion criteria section below, to be genotyped using whole genome sequencing (WGS), for use in the eMERGE network as eMERGE subjects. The criteria used below were selected so that these subjects will likely meet the criteria to be cases and/or controls, of diverse races and/or ethnicities, for at least 1 of the ~80 phenotypes being studied within eMERGE, and could be recontacted for future research if needed.

Project description:Primitive neuro-ectodermal tumours (PNET) of the supratentorial region are rare, highly malignant embryonal brain tumours affecting young children. Although supratentorial PNET (sPNET) are histologically similar to infratentorial PNET/medulloblastoma, sPNET have more aggressive clinical phenotypes, which suggest sPNET represents distinct biological entities. In contrast to considerable progress in understanding the signalling pathways involved in medulloblastoma, little is known about sPNET pathogenesis. Prior low resolution CGH (comparative genomic hybridization) studies indicate sPNET have frequent genomic imbalances and copy number aberrations (CNAs). To define genes involved in sPNET pathogenesis, we utilized the Affymetrix 250K Nsp SNP (single nucleotide polymorphism) analysis to identify genes targeted by recurrent CNAs in primary human sPNET samples. Copy number analysis was conducted on 39 primary PNET samples. Select target genes were validated by genomic and/or RT-PCR. Our analysis revealed frequent CNA across the sPNET genome, encompassing large and focal chromosome segments, and corroborated previous reports that isochromosome 17q, an abnormality found in ~ 30% of medulloblastoma, is rare in sPNET. Keywords: single nucleotide polymorphism array, disease state analysis A total of 56 primary sPNET samples were collected for this study from The Hospital for Sick Children, John Hopkins University, St. Jude Research Hospital, University of Cambridge, Children’s Hospital Boston, Virginia Commonwealth University, Instituto nazionale per lo studio e la cura dei tumori, and Texas Children’s Hospital, with local Institutional Research Ethics Board approval. Pineoblastoma and rhaboid tumours were specifically excluded, leaving 52 samples, 39 of which had good quality DNA available. These samples, in addition to 28 diploid reference samples (a gift from Dr. S. Scherer, University of Toronto), were analyzed on the Affymetrix GeneChip Mapping 250K Nsp SNP array. Due to sample availability at the time the arrays were run, the diploid reference samples analyzed on the Sty arrays are different from reference samples analyzed on the Nsp arrays.

Dataset Information

EMERGE Network Phase III: HRC Imputed Array Data of 83,000+ Participants

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets