Compound signature detection on LINCS L1000 big data.
ABSTRACT: The Library of Integrated Network-based Cellular Signatures (LINCS) L1000 big data provide gene expression profiles induced by over 10?000 compounds, shRNAs, and kinase inhibitors using the L1000 platform. We developed csNMF, a systematic compound signature discovery pipeline covering from raw L1000 data processing to drug screening and mechanism generation. The csNMF pipeline demonstrated better performance than the original L1000 pipeline. The discovered compound signatures of breast cancer were consistent with the LINCS KINOMEscan data and were clinically relevant. The csNMF pipeline provided a novel and complete tool to expedite signature-based drug discovery leveraging the LINCS L1000 resources.
Project description:MOTIVATION:LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies. RESULTS:Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction. AVAILABILITY AND IMPLEMENTATION:The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS<sup>2</sup>. The L1000CDS<sup>2</sup> search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS<sup>2</sup> search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS<sup>2</sup> to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS<sup>2</sup>, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS<sup>2</sup> we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection <i>in vitro</i> without causing cellular toxicity in human cell lines. In summary, the L1000CDS<sup>2</sup> tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.
Project description:For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites.
Project description:BACKGROUND:LINCS L1000 is a high-throughput technology that allows gene expression measurement in a large number of assays. However, to fit the measurements of ~1000 genes in the ~500 color channels of LINCS L1000, every two landmark genes are designed to share a single channel. Thus, a deconvolution step is required to infer the expression values of each gene. Any errors in this step can be propagated adversely to the downstream analyses. RESULTS:We presented a LINCS L1000 data peak calling R package l1kdeconv based on a new outlier detection method and an aggregate Gaussian mixture model (AGMM). Upon the remove of outliers and the borrowing information among similar samples, l1kdeconv showed more stable and better performance than methods commonly used in LINCS L1000 data deconvolution. CONCLUSIONS:Based on the benchmark using both simulated data and real data, the l1kdeconv package achieved more stable results than the commonly used LINCS L1000 data deconvolution methods.
Project description:Sepsis is one of the most common clinical syndromes that causes death and disability. Although many studies have developed drugs for sepsis treatment, none have decreased the mortality rate. The aim of this study was to identify a novel treatment option for sepsis using the library of integrated network-based cellular signatures (LINCS) L1000 perturbation dataset based on an in vitro and in vivo sepsis model. Sepsis-related microarray studies of early-stage inflammatory processes in patients and innate immune cells were collected from the Gene Expression Omnibus (GEO) data repository and used for candidate drug selection based on the LINCS L1000 perturbation dataset. The anti-inflammatory effects of the selected candidate drugs were analyzed using activated macrophage cell lines. CGP-60474, an inhibitor of cyclin-dependent kinase, was the most potent drug. It alleviated tumor necrosis factor-? (TNF-?) and interleukin-6 (IL-6) in activated macrophages by downregulating the NF-?B activity, and it reduced the mortality rate in LPS induced endotoxemia mice. This study shows that CGP-60474 could be a potential therapeutic candidate to attenuate the endotoxemic process. Additionally, the virtual screening strategy using the LINCS L1000 perturbation dataset could be a cost and time effective tool in the early stages of drug development.
Project description:MOTIVATION:Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all ?>20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections. AVAILABILITY AND IMPLEMENTATION:The interface for the adverse event predictions for the ?>20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ CONTACT: firstname.lastname@example.org SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:<h4>Background</h4>Predicting the drug response of the cancer diseases through the cellular perturbation signatures under the action of specific compounds is very important in personalized medicine. In the process of testing drug responses to the cancer, traditional experimental methods have been greatly hampered by the cost and sample size. At present, the public availability of large amounts of gene expression data makes it a challenging task to use machine learning methods to predict the drug sensitivity.<h4>Results</h4>In this study, we introduced the WRFEN-XGBoost cell viability prediction algorithm based on LINCS-L1000 cell signatures. We integrated the LINCS-L1000, CTRP and Achilles datasets and adopted a weighted fusion algorithm based on random forest and elastic net for key gene selection. Then the FEBPSO algorithm was introduced into XGBoost learning algorithm to predict the cell viability induced by the drugs. The proposed method was compared with some new methods, and it was found that our model achieved good results with 0.83 Pearson correlation. At the same time, we completed the drug sensitivity validation on the NCI60 and CCLE datasets, which further demonstrated the effectiveness of our method.<h4>Conclusions</h4>The results showed that our method was conducive to the elucidation of disease mechanisms and the exploration of new therapies, which greatly promoted the progress of clinical medicine.
Project description:The LINCS L1000 data repository contains almost two million gene expression profiles for thousands of small molecules and drugs. However, due to the complexity and the size of the data repository and a lack of an interoperable interface, the creation of pharmacologically meaningful workflows utilizing these data is severely hampered. In order to overcome this limitation, we developed the L1000 Viewer, a search engine and graphical web interface for the LINCS data repository. The web interface serves as an interactive platform allowing the user to select different forms of perturbation profiles, e.g., for specific cell lines, drugs, dosages, time points and combinations thereof. At its core, our method has a database we created from inferring and utilizing the intricate dependency graph structure among the data files. The L1000 Viewer is accessible via http://L1000viewer.bio-complexity.com/.
Project description:The Library of Integrated Cellular Signatures (LINCS) project provides comprehensive transcriptome profiling of human cell lines before and after chemical and genetic perturbations. Its L1000 platform utilizes 978 landmark genes to infer the transcript levels of 14,292 genes computationally. Here we conducted the L1000 data quality control analysis by using MCF7, PC3, and A375 cell lines as representative examples. Before perturbations, a promising 80% correlation in transcriptome was observed between L1000- and Affymetrix HU133A-platforms. After library-based shRNA perturbations, a moderate 30% of differentially expressed genes overlapped between any two selected controls viral vectors using the L1000 platform. The mitogen-activated protein kinase, vascular endothelial growth factor, and T-cell receptor pathways were identified as the most significantly shared pathways between chemical and genetic perturbations in cancer cells. In conclusion, L1000 platform is reliable in assessing transcriptome before perturbation. Its response to perturbagens needs to be interpreted with caution. A quality control analysis pipeline of L1000 is recommended before addressing biological questions.
Project description:Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.