Constructing a prediction model for physiological parameters for malnutrition in hemodialysis patients.
ABSTRACT: A retrospective analysis of the improvement in the health condition of patients undergoing hemodialysis was done to understand the important factors that can affect malnutrition in these patients. In this study, data from patients who underwent hemodialysis between 2010 and 2015 in a regional hospital in Yunlin County were collected from the Taiwan Society of Nephrology-Kidney Transplantation database. A total of 1049 medical records from 300 patients with age over 20 and underwent hemodialysis were collected for this study. A decision tree C5.0 and logistic regression were used to identify 40 independent variables, as well as the association of the dependent variable albumin. Then, the C5.0 decision tree, logistic regression, and support vector machine (SVM) methods were applied to find a combination of factors that contributed to malnutrition in patients undergoing hemodialysis. Predictive models were established. Finally, a receiver operating characteristic curve and confusion matrix was used to evaluate the standard of performance of these models. All analytical methods indicated that "age" was an important factor. In particular, the best predictive model was the SVM-model 4, with a training accuracy rate of 98.95% and test accuracy rate of 66.89%, identified that "age" and 15 other important factors were the most related to hemodialysis. The findings of this study can be used as a reference for clinical applications.
Project description:As an emerging technology, artificial intelligence has been applied to identify various physical disorders. Here, we developed a three-layer diagnosis system for lung cancer, in which three machine learning approaches including decision tree C5.0, artificial neural network (ANN) and support vector machine (SVM) were involved. The area under the curve (AUC) was employed to evaluate their decision powers. In the first layer, the AUCs of C5.0, ANN and SVM were 0.676, 0.736 and 0.640, ANN was better than C5.0 and SVM. In the second layer, ANN was similar with SVM but superior to C5.0 supported by the AUCs of 0.804, 0.889 and 0.825. Much higher AUCs of 0.908, 0.910 and 0.849 were identified in the third layer, where the highest sensitivity of 94.12% was found in C5.0. These data proposed a three-layer diagnosis system for lung cancer: ANN was used as a broad-spectrum screening subsystem basing on 14 epidemiological data and clinical symptoms, which was firstly adopted to screen high-risk groups; then, combining with additional 5 tumor biomarkers, ANN was used as an auxiliary diagnosis subsystem to determine the suspected lung cancer patients; C5.0 was finally employed to confirm lung cancer patients basing on 22 CT nodule-based radiomic features.
Project description:Patients with chronic obstructive pulmonary disease (COPD) repeat acute exacerbations (AE). Global Initiative for Chronic Obstructive Lung Disease (GOLD) is only available for patients in stable phase. Currently, there is a lack of assessment and prediction methods for acute exacerbation of chronic obstructive pulmonary disease (AECOPD) patients during hospitalization. To enhance the monitoring and treatment of AECOPD patients, we develop a novel C5.0 decision tree classifier to predict the prognosis of AECOPD hospitalized patients with objective clinical indicators. The medical records of 410 hospitalized AECOPD patients are collected and 28 features including vital signs, medical history, comorbidities and various inflammatory indicators are selected. The overall accuracy of the proposed C5.0 decision tree classifier is 80.3% (65 out of 81 participants) with 95% Confidence Interval (CI):(0.6991, 0.8827) and Kappa 0.6054. In addition, the performance of the model constructed by C5.0 exceeds the C4.5, classification and regression tree (CART) model and the iterative dichotomiser 3 (ID3) model. The C5.0 decision tree classifier helps respiratory physicians to assess the severity of the patient early, thereby guiding the treatment strategy and improving the prognosis of patients.
Project description:Malaria is a predominant infectious disease, with a global footprint, but especially severe in developing countries in the African subcontinent. In recent years, drug-resistant malaria has become an alarming factor, and hence the requirement of new and improved drugs is more crucial than ever before. One of the promising locations for antimalarial drug target is the apicoplast, as this organelle does not occur in humans. The apicoplast is associated with many unique and essential pathways in many Apicomplexan pathogens, including Plasmodium. The use of machine learning methods is now commonly available through open source programs. In the present work, we describe a standard protocol to develop molecular descriptor based predictive models (QSAR models), which can be further utilized for the screening of large chemical libraries. This protocol is used to build models using training data sourced from apicoplast specific bioassays. Multiple model building methods are used including Generalized Linear Models (GLM), Random Forest (RF), C5.0 implementation of a decision tree, Support Vector Machines (SVM), K-Nearest Neighbour and Naive Bayes. Methods to evaluate the accuracy of the model building method are included in the protocol. For the given dataset, the C5.0, SVM and RF perform better than other methods, with comparable accuracy over the test data.
Project description:There is accumulating evidence that serum levels of non-high-density lipoprotein cholesterol (non-HDL-C) are a more accurate predictor of cardiovascular outcomes when compared with low-density lipoprotein cholesterol. However, we recently found that higher serum concentrations of triglycerides are associated with better outcomes in patients undergoing hemodialysis. Therefore, we hypothesized that the association of serum levels of non-HDL-C (which includes triglyceride-rich lipoproteins) with outcomes may also be different in patients undergoing hemodialysis when compared with other patient populations.We studied the association of baseline and time-dependent serum levels of non-HDL-C with all-cause and cardiovascular mortality using Cox proportional hazard regression models in a nationally representative cohort of 50 118 patients undergoing incident hemodialysis from January 1, 2007, to December 31, 2011. In time-dependent models adjusted for case mix and surrogates of malnutrition and inflammation, a graded inverse association between non-HDL-C level and mortality was demonstrated with hazard ratios (95% confidence intervals) of the lowest (<60 mg/dL) and highest (≥160 mg/dL) categories: 1.88 (1.72-2.06) and 0.73 (0.64-0.83) for all-cause mortality and 2.07 (1.78-2.41) and 0.75 (0.60-0.93) for cardiovascular mortality, respectively (reference, 100-115 mg/dL). In analyses using baseline values, non-HDL-C levels <100 mg/dL were also associated with significantly higher mortality risk across all levels of adjustment. Similar associations were found when evaluating non-HDL/HDL cholesterol ratio and mortality, with the highest all-cause and cardiovascular mortality being observed in patients with decreased non-HDL/HDL-C ratio (<2.5).Contrary to the general population, decrements in non-HDL-C and non-HDL/HDL cholesterol ratio were paradoxically associated with increased all-cause and cardiovascular mortality in patients undergoing incident hemodialysis. The underlying mechanisms responsible for these associations await further investigation.
Project description:Background and Purpose: Stroke-related functional risk scores are used to predict patients' functional outcomes following a stroke event. We evaluate the predictive accuracy of machine-learning algorithms for predicting functional outcomes in acute ischemic stroke patients after endovascular treatment. Methods: Data were from the Precise and Rapid Assessment of Collaterals with Multi-phase CT Angiography (PROVE-IT), an observational study of 614 ischemic stroke patients. Regression and machine learning models, including random forest (RF), classification and regression tree (CART), C5.0 decision tree (DT), support vector machine (SVM), adaptive boost machine (ABM), least absolute shrinkage and selection operator (LASSO) logistic regression, and logistic regression models were used to train and predict the 90-day functional impairment risk, which is measured by the modified Rankin scale (mRS) score > 2. The models were internally validated using split-sample cross-validation and externally validated in the INTERRSeCT cohort study. The accuracy of these models was evaluated using the area under the receiver operating characteristic curve (AUC), Matthews Correlation Coefficient (MCC), and Brier score. Results: Of the 614 patients included in the training data, 249 (40.5%) had 90-day functional impairment (i.e., mRS > 2). The median and interquartile range (IQR) of age and baseline NIHSS scores were 77 years (IQR = 69-83) and 17 (IQR = 11-22), respectively. Both logistic regression and machine learning models had comparable predictive accuracy when validated internally (AUC range = [0.65-0.72]; MCC range = [0.29-0.42]) and externally (AUC range = [0.66-0.71]; MCC range = [0.34-0.42]). Conclusions: Machine learning algorithms and logistic regression had comparable predictive accuracy for predicting stroke-related functional impairment in stroke patients.
Project description:Motivation:The genomic architecture of human complex diseases is thought to be attributable to single markers, polygenic components and epistatic components. No study has examined the ability of tree-based methods to detect epistasis in the presence of a polygenic signal. We sought to apply decision tree-based methods, C5.0 and logic regression, to detect epistasis under several simulated conditions, varying strength of interaction and linkage disequilibrium (LD) structure. We then applied the same methods to the phenotype of educational attainment in a large population cohort. Results:LD pruning improved the power and reduced the type I error. C5.0 had a conservative type I error rate whereas logic regression had a type I error rate that exceeded 5%. Despite the more conservative type I error, C5.0 was observed to have higher power than logic regression across several conditions. In the presence of a polygenic signal, power was generally reduced. Applying both methods on educational attainment in a large population cohort yielded numerous interacting SNPs; notably a SNP in RCAN3 which is associated with reading and spelling and a SNP in NPAS3, a neurodevelopmental gene. Availability and implementation:All methods used are implemented and freely available in R. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:Natriuretic peptides, brain natriuretic peptide (BNP), and N-terminal probrain natriuretic peptide (NT-proBNP) are mainly known as diagnostic markers for heart failure with high diagnostic and prognostic values in the general population. In patients who are undergoing hemodialysis (HD), changes in NT-proBNP can be related to noncardiac problems such as fluid overload, inflammation, or malnutrition and can also be influenced by the dialysis characteristics. The current review aimed to summarize findings from studies on the association between NT-proBNP and malnutrition in HD patients. Articles published after 2009 and over a ten-year period were considered for inclusion. We first briefly discuss the traditional functions of NT-proBNP, and after, we describe the functions of this prohormone by focusing on its relation with protein energy wasting (PEW) in HD patients. Mechanisms that could explain these relationships were also discussed. Overall, 7 studies in which the investigation of the relations between NT-proBNP and nutritional status in HD patients were among the main objects were taken into account. NT-proBNP levels correlated with several factors described in the 4 categories of markers indicative of PEW (body mass and composition, muscle mass, biochemical criteria, and dietary intakes) and/or were associated with PEW. Interactions between several parameters could be involved in the association between NT-proBNP and malnutrition with a strong role of weight status. NT-proBNP is elevated in HD patients and is associated with malnutrition. Nevertheless, the prognostic value of NT-proBNP on nutritional status should be evaluated.
Project description:Iron overload used to be considered rare among hemodialysis patients after the advent of erythropoesis-stimulating agents, but recent MRI studies have challenged this view. The aim of this study, based on decision-tree learning and on MRI determination of hepatic iron content, was to identify a noxious pattern of parenteral iron administration in hemodialysis patients.We performed a prospective cross-sectional study from 31 January 2005 to 31 August 2013 in the dialysis centre of a French community-based private hospital. A cohort of 199 fit hemodialysis patients free of overt inflammation and malnutrition were treated for anemia with parenteral iron-sucrose and an erythropoesis-stimulating agent (darbepoetin), in keeping with current clinical guidelines. Patients had blinded measurements of hepatic iron stores by means of T1 and T2* contrast MRI, without gadolinium, together with CHi-squared Automatic Interaction Detection (CHAID) analysis.The CHAID algorithm first split the patients according to their monthly infused iron dose, with a single cutoff of 250 mg/month. In the node comprising the 88 hemodialysis patients who received more than 250 mg/month of IV iron, 78 patients had iron overload on MRI (88.6%, 95% CI: 80% to 93%). The odds ratio for hepatic iron overload on MRI was 3.9 (95% CI: 1.81 to 8.4) with >250 mg/month of IV iron as compared to <250 mg/month. Age, gender (female sex) and the hepcidin level also influenced liver iron content on MRI.The standard maximal amount of iron infused per month should be lowered to 250 mg in order to lessen the risk of dialysis iron overload and to allow safer use of parenteral iron products.
Project description:Background:The rise in serum ferritin levels among US maintenance hemodialysis patients has been attributed to higher intravenous iron administration and other changes in practice. We examined ferritin trends over time in hemodialysis patients and whether iron utilization patterns and other factors [erythropoietin-stimulating agent (ESA) prescribing patterns, inflammatory markers] were associated with ferritin trajectory. Methods:In a 5-year (January 2007–December 2011) cohort of 81 864 incident US hemodialysis patients, we examined changes in ferritin averaged over 3-month intervals using linear mixed effects models adjusted for intravenous iron dose, malnutrition and inflammatory markers. We then examined ferritin trends across strata of baseline ferritin level, dialysis initiation year, cumulative iron and ESA use in the first dialysis year and baseline hemoglobin level. Results:In models adjusted for iron dose, malnutrition and inflammation, mean ferritin levels increased over time in the overall cohort and across the three lower baseline ferritin strata. Among patients initiating dialysis in 2007, mean ferritin levels increased sharply in the first versus second year of dialysis and again abruptly increased in the fifth year independent of iron dose, malnutrition and inflammatory markers; similar trends were observed among patients who initiated dialysis in 2008 and 2009. In analyses stratified by cumulative iron use, mean ferritin increased among groups receiving iron, but decreased in the no iron group. In analyses stratified by cumulative ESA dose and baseline hemoglobin, mean ferritin increased over time. Conclusions:While ferritin trends correlated with patterns of iron use, increases in ferritin over time persisted independent of intravenous iron and ESA exposure, malnutrition and inflammation.
Project description:BACKGROUND: Multi-causality and heterogeneity of phenotypes and genotypes characterize complex diseases. In a database with comprehensive collection of phenotypes and genotypes, we compared the performance of common machine learning methods to generate mathematical models to predict diabetic kidney disease (DKD). METHODS: In a prospective cohort of type 2 diabetic patients, we selected 119 subjects with DKD and 554 without DKD at enrolment and after a median follow-up period of 7.8 years for model training, testing and validation using seven machine learning methods (partial least square regression, the classification and regression tree, the C5.0 decision tree, random forest, naïve Bayes classification, neural network and support vector machine). We used 17 clinical attributes and 70 single nucleotide polymorphisms (SNPs) of 54 candidate genes to build different models. The top attributes selected by the best-performing models were then used to build models with performance comparable to those using the entire dataset. RESULTS: Age, age of diagnosis, systolic blood pressure and genetic polymorphisms of uteroglobin and lipid metabolism were selected by most methods. Models generated by support vector machine (svmRadial) and random forest (cforest) had the best prediction accuracy whereas models derived from naïve Bayes classifier and partial least squares regression had the least optimal performance. Using 10 clinical attributes (systolic and diastolic blood pressure, age, age of diagnosis, triglyceride, white blood cell count, total cholesterol, waist to hip ratio, LDL cholesterol, and alcohol intake) and 5 genetic attributes (UGB G38A, LIPC -514C > T, APOB Thr71Ile, APOC3 3206T > G and APOC3 1100C > T), selected most often by SVM and cforest, we were able to build high-performance models. CONCLUSIONS: Amongst different machine learning methods, svmRadial and cforest had the best performance. Genetic polymorphisms related to inflammation and lipid metabolism warrant further investigation for their associations with DKD.