Unknown

Dataset Information

0

BioVDB: biological vector database for high-throughput gene expression meta-analysis.


ABSTRACT: High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.

SUBMITTER: Winnicki MJ 

PROVIDER: S-EPMC10957786 | biostudies-literature | 2024

REPOSITORIES: biostudies-literature

altmetric image

Publications

BioVDB: biological vector database for high-throughput gene expression meta-analysis.

Winnicki Michał J MJ   Brown Chase A CA   Porter Hunter L HL   Giles Cory B CB   Wren Jonathan D JD  

Frontiers in artificial intelligence 20240308


High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB,  ...[more]

Similar Datasets

2022-12-12 | GSE219045 | GEO
| PRJNA906635 | ENA
| S-EPMC9887474 | biostudies-literature
| S-EPMC1779561 | biostudies-literature
| S-EPMC10030367 | biostudies-literature
| S-EPMC2186358 | biostudies-literature
| S-EPMC516075 | biostudies-literature
| S-EPMC1971126 | biostudies-literature
| S-EPMC2719616 | biostudies-literature
| S-EPMC3699644 | biostudies-literature