{"database":"biostudies-literature","file_versions":[],"scores":null,"additional":{"submitter":["Gerussi A"],"funding":["Ministero della Salute"],"pagination":["1587"],"full_dataset_link":["https://www.ebi.ac.uk/biostudies/studies/S-EPMC9604872"],"repository":["biostudies-literature"],"omics_type":["Unknown"],"volume":["12(10)"],"pubmed_abstract":["<h4>Background</h4>The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC).<h4>Methods</h4>Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of \"if-then\" rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort.<h4>Results</h4>The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden's value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73.<h4>Conclusions</h4>This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals."],"journal":["Journal of personalized medicine"],"pubmed_title":["LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis."],"pmcid":["PMC9604872"],"funding_grant_id":["PE-2016- 02363915","GR-2018-12367794"],"pubmed_authors":["Gerussi A","Verda D","Asselta R","Cristoferi L","Muselli M","Cappadona C","Invernizzi P","Carbone M","Bottaro S","On Behalf Of The Italian Pbc Genetics Study Group","Bernasconi DP"],"additional_accession":[]},"is_claimable":false,"name":"LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis.","description":"<h4>Background</h4>The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC).<h4>Methods</h4>Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of \"if-then\" rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort.<h4>Results</h4>The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden's value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73.<h4>Conclusions</h4>This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals.","dates":{"release":"2022-01-01T00:00:00Z","publication":"2022 Sep","modification":"2025-04-03T21:29:27.757Z","creation":"2025-04-03T21:29:27.757Z"},"accession":"S-EPMC9604872","cross_references":{"pubmed":["36294727"],"doi":["10.3390/jpm12101587"]}}