Unknown

Dataset Information

0

Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations.


ABSTRACT:

Motivation

As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks.

Results

We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier.

Availability and implementation

Orchid and our annotated tumor mutation database are freely available at https://github.com/wittelab/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution.

Contact

JWitte@ucsf.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Cario CL 

PROVIDER: S-EPMC5860353 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations.

Cario Clinton L CL   Witte John S JS  

Bioinformatics (Oxford, England) 20180301 6


<h4>Motivation</h4>As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks.<h4>Results</h4>We present orchid, a python based software package  ...[more]

Similar Datasets

2025-09-30 | GSE299429 | GEO
| S-EPMC6524241 | biostudies-literature
| S-EPMC9087966 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC11817629 | biostudies-literature
| S-EPMC11784821 | biostudies-literature
| S-EPMC10067827 | biostudies-literature
| S-EPMC7027427 | biostudies-literature
| S-EPMC11801618 | biostudies-literature