Unknown

Dataset Information

0

A tree-based gene-environment interaction analysis with rare features.


ABSTRACT: Gene-environment (G-E) interaction analysis plays a critical role in understanding and modeling complex diseases. Compared to main-effect-only analysis, it is more seriously challenged by higher dimensionality, weaker signals, and the unique "main effects, interactions" variable selection hierarchy. In joint G-E interaction analysis under which a large number of G factors are analysed in a single model, effort tailored to rare features (e.g., SNPs with low minor allele frequencies) has been limited. Existing investigations on rare features have been mostly focused on marginal analysis, where various data aggregation techniques have been developed, and hypothesis testings have been conducted to identify significant aggregated features. However, such techniques cannot be extended to joint G-E interaction analysis. In this study, building on a very recent tree-based data aggregation technique, which has been developed for main-effect-only analysis, we develop a new G-E interaction analysis approach tailored to rare features. The adopted data aggregation technique allows for more efficient information borrowing from neighboring rare features. Similar to some existing state-of-the-art ones, the proposed approach adopts penalization for variable selection, regularized estimation, and respect of the variable selection hierarchy. Simulation shows that it has more accurate identification of important interactions and main effects than several competing alternatives. In the analysis of NFBC1966 study, the proposed approach leads to findings different from the alternatives and with satisfactory prediction and stability performance.

SUBMITTER: Liu M 

PROVIDER: S-EPMC10691867 | biostudies-literature | 2022 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

A tree-based gene-environment interaction analysis with rare features.

Liu Mengque M   Zhang Qingzhao Q   Ma Shuangge S  

Statistical analysis and data mining 20220301 5


Gene-environment (G-E) interaction analysis plays a critical role in understanding and modeling complex diseases. Compared to main-effect-only analysis, it is more seriously challenged by higher dimensionality, weaker signals, and the unique "main effects, interactions" variable selection hierarchy. In joint G-E interaction analysis under which a large number of G factors are analysed in a single model, effort tailored to rare features (e.g., SNPs with low minor allele frequencies) has been limi  ...[more]

Similar Datasets

| S-EPMC6399020 | biostudies-literature
| S-EPMC9555665 | biostudies-literature
| S-EPMC3713231 | biostudies-literature
| S-EPMC7028505 | biostudies-literature
| S-EPMC5288316 | biostudies-literature
| S-EPMC8661424 | biostudies-literature
| S-EPMC6339974 | biostudies-literature
| S-EPMC4926416 | biostudies-literature
| S-EPMC10622332 | biostudies-literature
| S-EPMC4157149 | biostudies-literature