Unknown

Dataset Information

0

Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data.


ABSTRACT: High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO's outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark.

SUBMITTER: Jo J 

PROVIDER: S-EPMC9714948 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

altmetric image

Publications

Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data.

Jo Jongkwon J   Jung Seungha S   Park Joongyang J   Kim Youngsoon Y   Kang Mingon M  

PloS one 20221201 12


High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice wi  ...[more]

Similar Datasets

| S-EPMC6113509 | biostudies-literature
| S-EPMC10119907 | biostudies-literature
| S-EPMC9269990 | biostudies-literature
| S-EPMC10585895 | biostudies-literature
| S-EPMC8756192 | biostudies-literature
| S-EPMC7537910 | biostudies-literature
| S-EPMC7886179 | biostudies-literature
| S-EPMC3445441 | biostudies-literature
| S-EPMC4827277 | biostudies-literature
| S-EPMC9677495 | biostudies-literature