Unknown

Dataset Information

0

An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.


ABSTRACT: Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error). Especially in gene expression experiments often a lot of variables (genes) are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

SUBMITTER: Telaar A 

PROVIDER: S-EPMC3569448 | BioStudies | 2013-01-01

REPOSITORIES: biostudies

Similar Datasets

2020-01-01 | S-EPMC7180459 | BioStudies
1000-01-01 | S-EPMC3337399 | BioStudies
1000-01-01 | S-EPMC6127926 | BioStudies
1000-01-01 | S-EPMC3133555 | BioStudies
1000-01-01 | S-EPMC6320853 | BioStudies
2014-01-01 | S-EPMC4018361 | BioStudies
2019-01-01 | S-EPMC6832872 | BioStudies
2018-01-01 | S-EPMC5912890 | BioStudies
2020-01-01 | S-EPMC7098614 | BioStudies
2019-01-01 | S-EPMC6631843 | BioStudies