Project description:The cell cycle is a highly conserved, continuous process which controls faithful replication and division of cells. Single-cell technologies have enabled increasingly precise measurements of the cell cycle as both as a biological process of interest and as a possible confounding factor. Despite its importance and conservation, there is no universally applicable approach to infer position in the cell cycle with high-resolution from single-cell RNA-seq data. Here, we present tricycle, an R/Bioconductor package,to address this challenge by leveraging key features of the biology of the cell cycle, the mathematical properties of principal component analysis of periodic functions, and the ubiquitous applicability of transfer learning. We show that tricycle can predict any cell’s position in the cell cycle regardless of the cell type, species of origin, and even sequencing assay. The accuracy of tricycle compares favorably to gold-standard experimental assays which generally require specialized measurements in specifically constructed in vitro systems. Unlike gold-standard assays, tricycle is applicable to any single-cell RNA-seq dataset. Tricycle is highly scalable, universally accurate, and eminently pertinent for atlas-level data.
Project description:Data and results for paper, "Deep Semi-Supervised Learning Improves Universal Peptide Identification of Shotgun Proteomics Data," found at:
https://doi.org/10.1101/2020.11.12.380881
Deep learning software for PSM recalibration, called ProteoTorch-DNN, available at:
https://github.com/proteoTorch/proteoTorch
with documentation:
https://proteotorch.readthedocs.io/en/latest/
Project description:RNA internal modifications play critical role in development of multicellular organisms and their response to environmental cues. Using nanopore direct RNA sequencing (DRS), we constructed a large in vitro epitranscriptome (IVET) resource from plant cDNA library labeled with m6A, m1A and m5C respectively. Furthermore, after transfer learning, the pre-trained model was used to detect additional RNA internal modification such as m1A, hm5C, m7G and Ψ modification. Finally, we illustrated a global view of epitranscriptome with m6A, m1A, m5C, m7G and Ψ modification in rice seedlings under normal and high salinity environment. In summary, we provided a strategy for creating IVET resource from cDNA library and developed a computational method that use IVET-based transfer learning termed TandemMod for profiling epitranscriptome landscape with co-occupancy of multiple types of RNA modification in plants responsive to environmental signal.
Project description:Prediction of protein localization plays an important role in understanding protein function and mechanism. A deep learning-based localization prediction tool (“MULocDeep”) assessing each amino acid’s contribution to the localization process provides insights into the mechanism of protein sorting and localization motifs. A dataset with 45 sub-organellar localization annotations under 10 major sub-cellular compartments was produced and the tool was tested on an independent dataset of mitochondrial proteins that were extracted from Arabidopsis thaliana cell cultures, Solanum tuberosum tubers, and Vicia faba roots, and analyzed by shotgun mass spectrometry.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:ImageMol is a Representation Learning Framework that utilizes molecule images for encoding molecular inputs as machine readable vectors for downstream tasks such as bio-activity prediction, drug metabolism analysis, or drug toxicity prediction. The approach utilizes transfer learning, that is, pre-training the model on massive unlabeled datasets to help it in generalizing feature extraction and then fine tuning on specific tasks. This model is fine tuned on 13 assays concerned with a number of target categories ranging from viral entry to toxicity in humans. These interactions are formulated as binary classification tasks.
Model Type: Predictive machine learning model.
Model Relevance: SARS-CoV-2 Anti viral screening.
Model Encoded by: Dhanshree Arora (Ersilia)
Metadata Submitted in BioModels by: Zainab Ashimiyu-Abdusalam
Implementation of this model code by Ersilia is available here:
https://github.com/ersilia-os/eos4cxk