DIAGNOSIS-GUIDED METHOD FOR IDENTIFYING MULTI-MODALITY NEUROIMAGING BIOMARKERS ASSOCIATED WITH GENETIC RISK FACTORS IN ALZHEIMER'S DISEASE.
ABSTRACT: Many recent imaging genetic studies focus on detecting the associations between genetic markers such as single nucleotide polymorphisms (SNPs) and quantitative traits (QTs). Although there exist a large number of generalized multivariate regression analysis methods, few of them have used diagnosis information in subjects to enhance the analysis performance. In addition, few of models have investigated the identification of multi-modality phenotypic patterns associated with interesting genotype groups in traditional methods. To reveal disease-relevant imaging genetic associations, we propose a novel diagnosis-guided multi-modality (DGMM) framework to discover multi-modality imaging QTs that are associated with both Alzheimer's disease (AD) and its top genetic risk factor (i.e., APOE SNP rs429358). The strength of our proposed method is that it explicitly models the priori diagnosis information among subjects in the objective function for selecting the disease-relevant and robust multi-modality QTs associated with the SNP. We evaluate our method on two modalities of imaging phenotypes, i.e., those extracted from structural magnetic resonance imaging (MRI) data and fluorodeoxyglucose positron emission tomography (FDG-PET) data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The experimental results demonstrate that our proposed method not only achieves better performances under the metrics of root mean squared error and correlation coefficient but also can identify common informative regions of interests (ROIs) across multiple modalities to guide the disease-induced biological interpretation, compared with other reference methods.
Project description:Neuroimaging genetics has attracted growing attention and interest, which is thought to be a powerful strategy to examine the influence of genetic variants (i.e., single nucleotide polymorphisms (SNPs)) on structures or functions of human brain. In recent studies, univariate or multivariate regression analysis methods are typically used to capture the effective associations between genetic variants and quantitative traits (QTs) such as brain imaging phenotypes. The identified imaging QTs, although associated with certain genetic markers, may not be all disease specific. A useful, but underexplored, scenario could be to discover only those QTs associated with both genetic markers and disease status for revealing the chain from genotype to phenotype to symptom. In addition, multimodal brain imaging phenotypes are extracted from different perspectives and imaging markers consistently showing up in multimodalities may provide more insights for mechanistic understanding of diseases (i.e., Alzheimer's disease (AD)). In this work, we propose a general framework to exploit multi-modal brain imaging phenotypes as intermediate traits that bridge genetic risk factors and multi-class disease status. We applied our proposed method to explore the relation between the well-known AD risk SNP APOE rs429358 and three baseline brain imaging modalities (i.e., structural magnetic resonance imaging (MRI), fluorodeoxyglucose positron emission tomography (FDG-PET) and F-18 florbetapir PET scans amyloid imaging (AV45)) from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The empirical results demonstrate that our proposed method not only helps improve the performances of imaging genetic associations, but also discovers robust and consistent regions of interests (ROIs) across multi-modalities to guide the disease-induced interpretation.
Project description:Brain imaging genetics studies the genetic basis of brain structures and functions via integrating both genotypic data such as single nucleotide polymorphism (SNP) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analyses. MTL methods generally incorporate a few of QTs and are not designed for feature selection from a large number of QTs; while existing SCCA methods typically employ only one modality of QTs to study its association with SNPs. Both MTL and SCCA encounter computational challenges as the number of SNPs increases. In this paper, combining the merits of MTL and SCCA, we propose a novel multi-task SCCA (MTSCCA) learning framework to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. Using the G 2,1-norm regularization, MTSCCA treats all SNPs in the same group together to enforce sparsity at the group level. The l2,1 -norm penalty is used to jointly select features across multiple tasks for SNPs, and across multiple modalities for QTs. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains improved performance regarding both correlation coefficients and canonical weights patterns. In addition, our method runs very fast and is easy-to-implement, and thus could provide a powerful tool for genome-wide brain-wide imaging genetic studies.
Project description:Neuroimaging genetics is an emerging field that aims to identify the associations between genetic variants (e.g., single nucleotide polymorphisms (SNPs)) and quantitative traits (QTs) such as brain imaging phenotypes. In recent studies, in order to detect complex multi-SNP-multi-QT associations, bi-multivariate techniques such as various structured sparse canonical correlation analysis (SCCA) algorithms have been proposed and used in imaging genetics studies. However, associations between genetic markers and imaging QTs identified by existing bi-multivariate methods may not be all disease specific. To bridge this gap, we propose an analytical framework, based on three-way sparse canonical correlation analysis (T-SCCA), to explore the intrinsic associations among genetic markers, imaging QTs, and clinical scores of interest. We perform an empirical study using the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort to discover the relationships among SNPs from AD risk gene APOE, imaging QTs extracted from structural magnetic resonance imaging scans, and cognitive and diagnostic outcomes. The proposed T-SCCA model not only outperforms the traditional SCCA method in terms of identifying strong associations, but also discovers robust outcome-relevant imaging genetic patterns, demonstrating its promise for improving disease-related mechanistic understanding.
Project description:Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the influence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation.Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ(2,1)-norm (G(2,1)-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G(2,1)-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ(2,1)-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.Software is publicly available at: http://ranger.uta.edu/%7eheng/imaging-genetics/.
Project description:Neurodegenerative disorders, such as Alzheimer's disease, are associated with changes in multiple neuroimaging and biological measures. These may provide complementary information for diagnosis and prognosis. We present a multi-modality classification framework in which manifolds are constructed based on pairwise similarity measures derived from random forest classifiers. Similarities from multiple modalities are combined to generate an embedding that simultaneously encodes information about all the available features. Multi-modality classification is then performed using coordinates from this joint embedding. We evaluate the proposed framework by application to neuroimaging and biological data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Features include regional MRI volumes, voxel-based FDG-PET signal intensities, CSF biomarker measures, and categorical genetic information. Classification based on the joint embedding constructed using information from all four modalities out-performs the classification based on any individual modality for comparisons between Alzheimer's disease patients and healthy controls, as well as between mild cognitive impairment patients and healthy controls. Based on the joint embedding, we achieve classification accuracies of 89% between Alzheimer's disease patients and healthy controls, and 75% between mild cognitive impairment patients and healthy controls. These results are comparable with those reported in other recent studies using multi-kernel learning. Random forests provide consistent pairwise similarity measures for multiple modalities, thus facilitating the combination of different types of feature data. We demonstrate this by application to data in which the number of features differs by several orders of magnitude between modalities. Random forest classifiers extend naturally to multi-class problems, and the framework described here could be applied to distinguish between multiple patient groups in the future.
Project description:Multimodal data fusion has shown great advantages in uncovering information that could be overlooked by using single modality. In this paper, we consider the integration of high-dimensional multi-modality imaging and genetic data for Alzheimer's disease (AD) diagnosis. With a focus on taking advantage of both phenotype and genotype information, a novel structured sparsity, defined by ? 1, p-norm (p > 1), regularized multiple kernel learning method is designed. Specifically, to facilitate structured feature selection and fusion from heterogeneous modalities and also capture feature-wise importance, we represent each feature with a distinct kernel as a basis, followed by grouping the kernels according to modalities. Then, an optimally combined kernel presentation of multimodal features is learned in a data-driven approach. Contrary to the Group Lasso (i.e., ? 2, 1-norm penalty) which performs sparse group selection, the proposed regularizer enforced on kernel weights is to sparsely select concise feature set within each homogenous group and fuse the heterogeneous feature groups by taking advantage of dense norms. We have evaluated our method using data of subjects from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The effectiveness of the method is demonstrated by the clearly improved prediction diagnosis and also the discovered brain regions and SNPs relevant to AD.
Project description:MOTIVATION:Identifying the genetic basis of the brain structure, function and disorder by using the imaging quantitative traits (QTs) as endophenotypes is an important task in brain science. Brain QTs often change over time while the disorder progresses and thus understanding how the genetic factors play roles on the progressive brain QT changes is of great importance and meaning. Most existing imaging genetics methods only analyze the baseline neuroimaging data, and thus those longitudinal imaging data across multiple time points containing important disease progression information are omitted. RESULTS:We propose a novel temporal imaging genetic model which performs the multi-task sparse canonical correlation analysis (T-MTSCCA). Our model uses longitudinal neuroimaging data to uncover that how single nucleotide polymorphisms (SNPs) play roles on affecting brain QTs over the time. Incorporating the relationship of the longitudinal imaging data and that within SNPs, T-MTSCCA could identify a trajectory of progressive imaging genetic patterns over the time. We propose an efficient algorithm to solve the problem and show its convergence. We evaluate T-MTSCCA on 408 subjects from the Alzheimer's Disease Neuroimaging Initiative database with longitudinal magnetic resonance imaging data and genetic data available. The experimental results show that T-MTSCCA performs either better than or equally to the state-of-the-art methods. In particular, T-MTSCCA could identify higher canonical correlation coefficients and capture clearer canonical weight patterns. This suggests that T-MTSCCA identifies time-consistent and time-dependent SNPs and imaging QTs, which further help understand the genetic basis of the brain QT changes over the time during the disease progression. AVAILABILITY AND IMPLEMENTATION:The software and simulation data are publicly available at https://github.com/dulei323/TMTSCCA. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. In imaging genetics, genes associated with a phenotype should at least expressed in the phenotypical region. We study the association between the genotype and amyloid imaging data and propose a transcriptome-guided SCCA framework that incorporates the gene expression information into the SCCA criterion. An alternating optimization method is used to solve the formulated problem. Although the problem is not biconcave, a closed-form solution has been found for each subproblem. The results on real data show that using the gene expression data to guide the feature selection facilities the detection of genetic markers that are not only associated with the identified QTs, but also highly expressed there.
Project description:In this article, the authors aim to maximally utilize multimodality neuroimaging and genetic data for identifying Alzheimer's disease (AD) and its prodromal status, Mild Cognitive Impairment (MCI), from normal aging subjects. Multimodality neuroimaging data such as MRI and PET provide valuable insights into brain abnormalities, while genetic data such as single nucleotide polymorphism (SNP) provide information about a patient's AD risk factors. When these data are used together, the accuracy of AD diagnosis may be improved. However, these data are heterogeneous (e.g., with different data distributions), and have different number of samples (e.g., with far less number of PET samples than the number of MRI or SNPs). Thus, learning an effective model using these data is challenging. To this end, we present a novel three-stage deep feature learning and fusion framework, where deep neural network is trained stage-wise. Each stage of the network learns feature representations for different combinations of modalities, via effective training using the maximum number of available samples. Specifically, in the first stage, we learn latent representations (i.e., high-level features) for each modality independently, so that the heterogeneity among modalities can be partially addressed, and high-level features from different modalities can be combined in the next stage. In the second stage, we learn joint latent features for each pair of modality combination by using the high-level features learned from the first stage. In the third stage, we learn the diagnostic labels by fusing the learned joint latent features from the second stage. To further increase the number of samples during training, we also use data at multiple scanning time points for each training subject in the dataset. We evaluate the proposed framework using Alzheimer's disease neuroimaging initiative (ADNI) dataset for AD diagnosis, and the experimental results show that the proposed framework outperforms other state-of-the-art methods.
Project description:Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2, 1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2, 1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs.