Dataset Information

Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

ABSTRACT:

Background

During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas.

Results

We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.

Conclusions

By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

SUBMITTER: Ricci L

PROVIDER: S-EPMC4559882 | biostudies-literature | 2015 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Ricci Leonardo L Del Vescovo Valerio V Cantaloni Chiara C Grasso Margherita M Barbareschi Mattia M Denti Michela Alessandra MA

BMC bioinformatics 20150904

<h4>Background</h4>During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these to ...[more]

PMID: 26338526

Similar Datasets

Project description:In clinical trials evaluating antibody-conjugated drugs (ADCs), HER2-low breast cancer is defined through protein immunohistochemistry scoring (IHC) 1+ or 2+ without gene amplification. However, in daily practice, the accuracy of IHC is compromised by inter-observer variability. Herein, we aimed to identify HER2-low breast cancer primary tumors by leveraging gene expression profiling. A discovery approach was applied to gene expression profile of institutional INT1 (n = 125) and INT2 (n = 84) datasets. We identified differentially expressed genes (DEGs) in each specific HER2 IHC category 0, 1+, 2+ and 3+. Principal Component Analysis was used to generate a HER2-low signature whose performance was evaluated in the independent INT3 (n = 95), and in the publicly available TCGA and GSE81538 datasets. The association between the HER2-low signature and HER2 IHC categories was evaluated by Kruskal-Wallis test with post hoc pair-wise comparisons. The HER2-low signature discriminatory capability was assessed by estimating the area under the receiver operating characteristic curve (AUC). Gene Ontology and KEGG analyses were performed to evaluate the HER2-low signature genes functional enrichment. A HER2-low signature was computed based on HER2 IHC category-specific DEGs. The twenty genes included in the signature were significantly enriched with lipid and steroid metabolism pathways, peptidase regulation, and humoral immune response. The HER2-low signature values showed a bell-shaped distribution across IHC categories (low values in 0 and 3+; high values in 1+ and 2+), effectively distinguishing HER2-low from 0 (p < 0.001) to 3+ (p < 0.001). Notably, the signature values were higher in tumors scored with 1+ as compared to 0. The HER2-low signature association with IHC categories and its bell-shaped distribution was confirmed in the independent INT3, TCGA and GSE81538 datasets. In the combined INT1 and INT3 datasets, the HER2-low signature achieved an AUC value of 0.74 (95% confidence interval, CI 0.67-0.81) in distinguishing HER2-low vs. the other categories, outperforming the individual ERBB2 mRNA AUC value of 0.52 (95% CI 0.43-0.60). These results represent a proof-of-concept for an observer-independent gene-expression-based classifier of HER2-low status. The herein identified 20-gene signature shows promise in distinguishing between HER2 0 and HER2-low expressing tumors, including those scored as 1+ at IHC, and in developing a selection approach for ADCs candidates.

Project description:Protein secondary structure prediction provides insight into protein function and is a valuable preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to derive sparse models that discourage over-fitting and provide biological insight.In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the algorithm is able to recover true sparse structures with high accuracy, and using real data, that the sparse model identifies known correlation structure (local and non-local) related to different classes of secondary structure elements.We present a secondary structure prediction method that employs dynamic Bayesian networks and support vector machines. We also introduce an algorithm for sparsifying the parameters of the dynamic Bayesian network. The sparsification approach yields a significant speed-up in generating predictions, and we demonstrate that the amino acid correlations identified by the algorithm correspond to several known features of protein secondary structure. Datasets and source code used in this study are available at http://noble.gs.washington.edu/proj/pssp.

Dataset Information

Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Background

Results

Conclusions

Publications

Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets