{"database":"biostudies-literature","file_versions":[],"scores":{"citationCount":0,"reanalysisCount":0,"viewCount":54,"searchCount":0},"additional":{"submitter":["Ji Y"],"funding":["NCATS NIH HHS","NICHD NIH HHS","NCRR NIH HHS","NIA NIH HHS","NHLBI NIH HHS","NHGRI NIH HHS","NINDS NIH HHS","National Institutes of Health","NIH HHS","NIGMS NIH HHS"],"pagination":["e1009814"],"full_dataset_link":["https://www.ebi.ac.uk/biostudies/studies/S-EPMC9278751"],"repository":["biostudies-literature"],"omics_type":["Unknown"],"volume":["18(6)"],"pubmed_abstract":["A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG's applications to Alzheimer's disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits."],"journal":["PLoS genetics"],"pubmed_title":["Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery."],"pmcid":["PMC9278751"],"funding_grant_id":["UL1 TR002243","R01 HL151152","R01AG065611","U01 HG004798","RC2 GM092618","S10 OD017985","U01 HG006378","UL1 TR000445","R01 NS032830","U01HG009086","U01 HG009086","R01 AG069900","UL1 RR024975","R01HL151152","P50 GM115305","S10 RR025141","R01 HD074711","R01 AG065611","R01AG069900","U19 HL065962"],"pubmed_authors":["Wang Q","Wei Q","Tao R","Ji Y","Li B","Chen R"],"view_count":["54"],"additional_accession":[]},"is_claimable":false,"name":"Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery.","description":"A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG's applications to Alzheimer's disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.","dates":{"release":"2022-01-01T00:00:00Z","publication":"2022 Jun","modification":"2024-10-16T11:18:32.497Z","creation":"2022-07-19T10:55:35.491Z"},"accession":"S-EPMC9278751","cross_references":{"pubmed":["35771864"],"doi":["10.1371/journal.pgen.1009814"]}}