Unknown

Dataset Information

0

Robust classification of protein variation using structural modelling and large-scale data integration.


ABSTRACT: Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

SUBMITTER: Baugh EH 

PROVIDER: S-EPMC4824117 | biostudies-literature | 2016 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Robust classification of protein variation using structural modelling and large-scale data integration.

Baugh Evan H EH   Simmons-Edler Riley R   Müller Christian L CL   Alford Rebecca F RF   Volfovsky Natalia N   Lash Alex E AE   Bonneau Richard R  

Nucleic acids research 20160228 6


Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with k  ...[more]

Similar Datasets

| S-EPMC4426897 | biostudies-literature
| S-EPMC6056659 | biostudies-literature
| S-EPMC2788929 | biostudies-literature
| S-EPMC7286535 | biostudies-literature
| S-EPMC3245020 | biostudies-literature
| S-EPMC8491082 | biostudies-literature
| S-EPMC3092116 | biostudies-literature
| S-EPMC1557986 | biostudies-literature