Long read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors in disease
Ontology highlight
ABSTRACT: Genome-wide association studies (GWASs) have revealed thousands of associations in many complex traits and diseases. Previous studies suggest that a subset of associations are due to alterations in splicing; however, interpreting the effects of splicing on protein isoforms is hindered by limitations in defining full-length transcript isoforms using short-read RNA-seq data. Long-read RNA-seq represents a powerful approach to define and quantify transcript isoforms. In this study, we developed a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease relevant model to infer the effects of sQTL on the ultimate protein isoform products they encode. Such information enables identification of genes potentially responsible for GWAS associations. As a proof-of-concept, we generated deep coverage (N=~22 million full-length reads) PacBio long-read RNAseq data on human fetal osteoblasts (hFOBs), a cell-line of relevance to the regulation of bone mineral density (BMD). We identified 68,326 protein-coding isoforms, including 17,375 (25%) which were novel. Next, we used Bayesian colocalization to identify 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H4PP > 0.75). A total of 836 junctions with colocalizing sQTLs in 459 (of the 732) genes were expressed in hFOB long-read RNA-seq data. With these data, we formulated hypotheses regarding the potential mechanism of action of each sQTL. For example, we identified 7 junctions with colocalizing sQTLs (maximum H4PP = 0.98-0.99) in TPM2 for splice junctions between two nearly mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in hFOBs showed two TPM2 isoforms with opposing effects on mineralization. Our results suggest that splicing is a major mechanism underlying GWAS associations and long-read proteogenomics data is critical to precisely define the protein isoforms that are produced from splicing alterations.
ORGANISM(S):  Homo sapiens 
PROVIDER: GSE224588 | GEO | 2023/03/01 
REPOSITORIES:  GEO
ACCESS DATA