Data Portal for the Library of Integrated Network-based Cellular Signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data.
ABSTRACT: The Library of Integrated Network-based Cellular Signatures (LINCS) program is a national consortium funded by the NIH to generate a diverse and extensive reference library of cell-based perturbation-response signatures, along with novel data analytics tools to improve our understanding of human diseases at the systems level. In contrast to other large-scale data generation efforts, LINCS Data and Signature Generation Centers (DSGCs) employ a wide range of assay technologies cataloging diverse cellular responses. Integration of, and unified access to LINCS data has therefore been particularly challenging. The Big Data to Knowledge (BD2K) LINCS Data Coordination and Integration Center (DCIC) has developed data standards specifications, data processing pipelines, and a suite of end-user software tools to integrate and annotate LINCS-generated data, to make LINCS signatures searchable and usable for different types of users. Here, we describe the LINCS Data Portal (LDP) (http://lincsportal.ccs.miami.edu/), a unified web interface to access datasets generated by the LINCS DSGCs, and its underlying database, LINCS Data Registry (LDR). LINCS data served on the LDP contains extensive metadata and curated annotations. We highlight the features of the LDP user interface that is designed to enable search, browsing, exploration, download and analysis of LINCS data and related curated content.
Project description:The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
Project description:The Library of Integrated Network-based Cellular Signatures (LINCS) L1000 big data provide gene expression profiles induced by over 10?000 compounds, shRNAs, and kinase inhibitors using the L1000 platform. We developed csNMF, a systematic compound signature discovery pipeline covering from raw L1000 data processing to drug screening and mechanism generation. The csNMF pipeline demonstrated better performance than the original L1000 pipeline. The discovered compound signatures of breast cancer were consistent with the LINCS KINOMEscan data and were clinically relevant. The csNMF pipeline provided a novel and complete tool to expedite signature-based drug discovery leveraging the LINCS L1000 resources.
Project description:For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites.
Project description:BACKGROUND:Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the default ontology for LINCS cell line representation and integration. RESULTS:CLO has consistently represented all 1097 LINCS cell lines and included information extracted from the LINCS Data Portal and ChEMBL. Using MCF 10A cell line cells as an example, we demonstrated how to ontologically model LINCS cellular signatures such as their non-tumorigenic epithelial cell type, three-dimensional growth, latrunculin-A-induced actin depolymerization and apoptosis, and cell line transfection. A CLO subset view of LINCS cell lines, named LINCS-CLOview, was generated to support systematic LINCS cell line analysis and queries. In summary, LINCS cell lines are currently associated with 43 cell types, 131 tissues and organs, and 121 cancer types. The LINCS-CLO view information can be queried using SPARQL scripts. CONCLUSIONS:CLO was used to support ontological representation, integration, and analysis of over a thousand LINCS cell line cells and their cellular responses.
Project description:Modern research in the biomedical sciences is data-driven utilizing high-throughput technologies to generate big genomic data. The Library of Integrated Network-based Cellular Signatures (LINCS) is an example for a large-scale genomic data repository providing hundred thousands of high-dimensional gene expression measurements for thousands of drugs and dozens of cell lines. However, the remaining challenge is how to use these data effectively for pharmacogenomics. In this paper, we use LINCS data to construct drug association networks (DANs) representing the relationships between drugs. By using the Anatomical Therapeutic Chemical (ATC) classification of drugs we demonstrate that the DANs represent a systems pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale meaningfully. Here we identify the modules of the DANs as therapeutic attractors of the ATC drug classes.
Project description:The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.
Project description:MOTIVATION:LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies. RESULTS:Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction. AVAILABILITY AND IMPLEMENTATION:The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:MOTIVATION:Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all ?>20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections. AVAILABILITY AND IMPLEMENTATION:The interface for the adverse event predictions for the ?>20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ CONTACT: firstname.lastname@example.org SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:<h4>Background</h4>Predicting the drug response of the cancer diseases through the cellular perturbation signatures under the action of specific compounds is very important in personalized medicine. In the process of testing drug responses to the cancer, traditional experimental methods have been greatly hampered by the cost and sample size. At present, the public availability of large amounts of gene expression data makes it a challenging task to use machine learning methods to predict the drug sensitivity.<h4>Results</h4>In this study, we introduced the WRFEN-XGBoost cell viability prediction algorithm based on LINCS-L1000 cell signatures. We integrated the LINCS-L1000, CTRP and Achilles datasets and adopted a weighted fusion algorithm based on random forest and elastic net for key gene selection. Then the FEBPSO algorithm was introduced into XGBoost learning algorithm to predict the cell viability induced by the drugs. The proposed method was compared with some new methods, and it was found that our model achieved good results with 0.83 Pearson correlation. At the same time, we completed the drug sensitivity validation on the NCI60 and CCLE datasets, which further demonstrated the effectiveness of our method.<h4>Conclusions</h4>The results showed that our method was conducive to the elucidation of disease mechanisms and the exploration of new therapies, which greatly promoted the progress of clinical medicine.
Project description:Sepsis is one of the most common clinical syndromes that causes death and disability. Although many studies have developed drugs for sepsis treatment, none have decreased the mortality rate. The aim of this study was to identify a novel treatment option for sepsis using the library of integrated network-based cellular signatures (LINCS) L1000 perturbation dataset based on an in vitro and in vivo sepsis model. Sepsis-related microarray studies of early-stage inflammatory processes in patients and innate immune cells were collected from the Gene Expression Omnibus (GEO) data repository and used for candidate drug selection based on the LINCS L1000 perturbation dataset. The anti-inflammatory effects of the selected candidate drugs were analyzed using activated macrophage cell lines. CGP-60474, an inhibitor of cyclin-dependent kinase, was the most potent drug. It alleviated tumor necrosis factor-? (TNF-?) and interleukin-6 (IL-6) in activated macrophages by downregulating the NF-?B activity, and it reduced the mortality rate in LPS induced endotoxemia mice. This study shows that CGP-60474 could be a potential therapeutic candidate to attenuate the endotoxemic process. Additionally, the virtual screening strategy using the LINCS L1000 perturbation dataset could be a cost and time effective tool in the early stages of drug development.