<HashMap><database>biostudies-literature</database><scores/><additional><submitter>Gerussi A</submitter><funding>Ministero della Salute</funding><pagination>1587</pagination><full_dataset_link>https://www.ebi.ac.uk/biostudies/studies/S-EPMC9604872</full_dataset_link><repository>biostudies-literature</repository><omics_type>Unknown</omics_type><volume>12(10)</volume><pubmed_abstract>&lt;h4>Background&lt;/h4>The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC).&lt;h4>Methods&lt;/h4>Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of "if-then" rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort.&lt;h4>Results&lt;/h4>The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden's value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73.&lt;h4>Conclusions&lt;/h4>This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals.</pubmed_abstract><journal>Journal of personalized medicine</journal><pubmed_title>LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis.</pubmed_title><pmcid>PMC9604872</pmcid><funding_grant_id>PE-2016- 02363915</funding_grant_id><funding_grant_id>GR-2018-12367794</funding_grant_id><pubmed_authors>Gerussi A</pubmed_authors><pubmed_authors>Verda D</pubmed_authors><pubmed_authors>Asselta R</pubmed_authors><pubmed_authors>Cristoferi L</pubmed_authors><pubmed_authors>Muselli M</pubmed_authors><pubmed_authors>Cappadona C</pubmed_authors><pubmed_authors>Invernizzi P</pubmed_authors><pubmed_authors>Carbone M</pubmed_authors><pubmed_authors>Bottaro S</pubmed_authors><pubmed_authors>On Behalf Of The Italian Pbc Genetics Study Group</pubmed_authors><pubmed_authors>Bernasconi DP</pubmed_authors></additional><is_claimable>false</is_claimable><name>LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis.</name><description>&lt;h4>Background&lt;/h4>The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC).&lt;h4>Methods&lt;/h4>Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of "if-then" rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort.&lt;h4>Results&lt;/h4>The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden's value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73.&lt;h4>Conclusions&lt;/h4>This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals.</description><dates><release>2022-01-01T00:00:00Z</release><publication>2022 Sep</publication><modification>2025-04-03T21:29:27.757Z</modification><creation>2025-04-03T21:29:27.757Z</creation></dates><accession>S-EPMC9604872</accession><cross_references><pubmed>36294727</pubmed><doi>10.3390/jpm12101587</doi></cross_references></HashMap>