Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals.
ABSTRACT: OBJECTIVE:Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. MATERIALS AND METHODS:We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. RESULTS:Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. CONCLUSION:This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.
Project description:MOTIVATION:Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS:As an alternative to ICD9 coding, a text-based phenome was defined by 23?384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.
Project description:In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.
Project description:OBJECTIVE:Hepatorenal Syndrome (HRS) is a devastating form of acute kidney injury (AKI) in advanced liver disease patients with high morbidity and mortality, but phenotyping algorithms have not yet been developed using large electronic health record (EHR) databases. We evaluated and compared multiple phenotyping methods to achieve an accurate algorithm for HRS identification. MATERIALS AND METHODS:A national retrospective cohort of patients with cirrhosis and AKI admitted to 124 Veterans Affairs hospitals was assembled from electronic health record data collected from 2005 to 2013. AKI was defined by the Kidney Disease: Improving Global Outcomes criteria. Five hundred and four hospitalizations were selected for manual chart review and served as the gold standard. Electronic Health Record based predictors were identified using structured and free text clinical data, subjected through NLP from the clinical Text Analysis Knowledge Extraction System. We explored several dimension reduction techniques for the NLP data, including newer high-throughput phenotyping and word embedding methods, and ascertained their effectiveness in identifying the phenotype without structured predictor variables. With the combined structured and NLP variables, we analyzed five phenotyping algorithms: penalized logistic regression, naïve Bayes, support vector machines, random forest, and gradient boosting. Calibration and discrimination metrics were calculated using 100 bootstrap iterations. In the final model, we report odds ratios and 95% confidence intervals. RESULTS:The area under the receiver operating characteristic curve (AUC) for the different models ranged from 0.73 to 0.93; with penalized logistic regression having the best discriminatory performance. Calibration for logistic regression was modest, but gradient boosting and support vector machines were superior. NLP identified 6985 variables; a priori variable selection performed similarly to dimensionality reduction using high-throughput phenotyping and semantic similarity informed clustering (AUC of 0.81 - 0.82). CONCLUSION:This study demonstrated improved phenotyping of a challenging AKI etiology, HRS, over ICD-9 coding. We also compared performance among multiple approaches to EHR-derived phenotyping, and found similar results between methods. Lastly, we showed that automated NLP dimension reduction is viable for acute illness.
Project description:HLA-DRB1 codes for a major histocompatibility complex class II cell surface receptor. Genetic variants in and around this gene have been linked to numerous autoimmune diseases. Most notably, an association between HLA-DRB1*1501 haplotype and multiple sclerosis (MS) has been defined. Utilizing electronic health records and 4235 individuals within Marshfield Clinic's Personalized Medicine Research Project, a reverse genetic screen coined phenome-wide association study (PheWAS) tested association of rs3135388 genotype (tagging HLA-DRB1*1501) with 4841 phenotypes. As expected, HLA-DRB1*1501 was associated with MS (International Classification of Disease version 9-CM (ICD9) 340, P=0.023), whereas the strongest association was with alcohol-induced cirrhosis of the liver (ICD9 571.2, P=0.00011). HLA-DRB1*1501 also demonstrated association with erythematous conditions (ICD9 695, P=0.0054) and benign neoplasms of the respiratory and intrathoracic organs (ICD9 212, P=0.042), replicating previous findings. This study not only builds on the feasibility/utility of the PheWAS approach, represents the first external validation of a PheWAS, but may also demonstrate the complex etiologies associated with the HLA-DRB1*1501 loci.
Project description:OBJECTIVE:Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). MATERIALS AND METHODS:We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. RESULTS:The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. CONCLUSION:The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
Project description:Colorectal cancer (CRC) screening rates are low despite confirmed benefits. The authors investigated the use of natural language processing (NLP) to identify previous colonoscopy screening in electronic records from a random sample of 200 patients at least 50 years old. The authors developed algorithms to recognize temporal expressions and 'status indicators', such as 'patient refused', or 'test scheduled'. The new methods were added to the existing KnowledgeMap concept identifier system, and the resulting system was used to parse electronic medical records (EMR) to detect completed colonoscopies. Using as the 'gold standard' expert physicians' manual review of EMR notes, the system identified timing references with a recall of 0.91 and precision of 0.95, colonoscopy status indicators with a recall of 0.82 and precision of 0.95, and references to actually completed colonoscopies with recall of 0.93 and precision of 0.95. The system was superior to using colonoscopy billing codes alone. Health services researchers and clinicians may find NLP a useful adjunct to traditional methods to detect CRC screening status. Further investigations must validate extension of NLP approaches for other types of CRC screening applications.
Project description:Objective:To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. Materials and Methods:Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from eMERGE network phenotypes, translated to Systematized Nomenclature of Medicine - Clinical Terms concept sets, and applied to patient data that were mapped from source ICD9-CM and ICD10-CM codes to Systematized Nomenclature of Medicine - Clinical Terms codes using Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) vocabulary mappings. The original ICD9-CM concept set and a concept set extended to ICD10-CM were used to create patient cohorts that served as gold standards. Results:Four phenotype concept sets were able to be translated to Systematized Nomenclature of Medicine - Clinical Terms without ambiguities and were able to perform perfectly with respect to the gold standards. The other 5 lost performance when 2 or more ICD9-CM or ICD10-CM codes mapped to the same Systematized Nomenclature of Medicine - Clinical Terms code. The patient cohorts had a total error (false positive and false negative) of up to 0.15% compared to querying ICD9-CM source data and up to 0.26% compared to querying ICD9-CM and ICD10-CM data. Knowledge engineering was required to produce that performance; simple automated methods to generate concept sets had errors up to 10% (one outlier at 250%). Discussion:The translation of data from source vocabularies to Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) resulted in very small error rates that were an order of magnitude smaller than other error sources. Conclusion:It appears possible to map diagnoses from disparate vocabularies to a single clinical vocabulary and carry out research using a single set of definitions, thus improving efficiency and transportability of research.
Project description:BACKGROUND:The performance of automated algorithms for childhood diabetes case ascertainment and type classification may differ by demographic characteristics. OBJECTIVE:This study evaluated the potential of administrative and electronic health record (EHR) data from a large academic care delivery system to conduct diabetes case ascertainment in youth according to type, age, and race/ethnicity. SUBJECTS:Of 57?767 children aged <20?yr as of 31 December 2011 seen at University of North Carolina Health Care System in 2011 were included. METHODS:Using an initial algorithm including billing data, patient problem lists, laboratory test results, and diabetes related medications between 1 July 2008 and 31 December 2011, presumptive cases were identified and validated by chart review. More refined algorithms were evaluated by type (type 1 vs. type 2), age (<10 vs. ?10?yr) and race/ethnicity (non-Hispanic White vs. 'other'). Sensitivity, specificity, and positive predictive value were calculated and compared. RESULTS:The best algorithm for ascertainment of overall diabetes cases was billing data. The best type 1 algorithm was the ratio of the number of type 1 billing codes to the sum of type 1 and type 2 billing codes ?0.5. A useful algorithm to ascertain youth with type 2 diabetes with 'other' race/ethnicity was identified. Considerable age and racial/ethnic differences were present in type-non-specific and type 2 algorithms. CONCLUSIONS:Administrative and EHR data may be used to identify cases of childhood diabetes (any type), and to identify type 1 cases. The performance of type 2 case ascertainment algorithms differed substantially by race/ethnicity.
Project description:Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.