Unknown

Dataset Information

0

A data mining paradigm for identifying key factors in biological processes using gene expression data.


ABSTRACT: A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.

SUBMITTER: Li J 

PROVIDER: S-EPMC5998123 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

A data mining paradigm for identifying key factors in biological processes using gene expression data.

Li Jin J   Zheng Le L   Uchiyama Akihiko A   Bin Lianghua L   Mauro Theodora M TM   Elias Peter M PM   Pawelczyk Tadeusz T   Sakowicz-Burkiewicz Monika M   Trzeciak Magdalena M   Leung Donald Y M DYM   Morasso Maria I MI   Yu Peng P  

Scientific reports 20180613 1


A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological pr  ...[more]

Similar Datasets

2018-05-31 | GSE100100 | GEO
| S-EPMC4480927 | biostudies-literature
| S-EPMC6668839 | biostudies-literature
| S-EPMC3116449 | biostudies-literature
| S-EPMC4833187 | biostudies-literature