Project description:The National Genomics Data Center (NGDC), which is a part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support the global academic and industrial communities. With the rapid accumulation of multi-omics data at an unprecedented pace, CNCB-NGDC continuously expands and updates core database resources through big data archiving, integrative analysis and value-added curation. Importantly, NGDC collaborates closely with major international databases and initiatives to ensure seamless data exchange and interoperability. Over the past year, significant efforts have been dedicated to integrating diverse omics data, synthesizing expanding knowledge, developing new resources, and upgrading major existing resources. Particularly, several database resources are newly developed for the biodiversity of protists (P10K), bacteria (NTM-DB, MPA) as well as plant (PPGR, SoyOmics, PlantPan) and disease/trait association (CROST, HervD Atlas, HALL, MACdb, BioKA, BioKA, RePoS, PGG.SV, NAFLDkb). All the resources and services are publicly accessible at https://ngdc.cncb.ac.cn.
Project description:Branched chain amino acid transaminase 1 (BCAT1) catalyzes the production of glutamates and branched-chain α-ketoacids from branched chain amino acids, and a normal BCAT1 expression is associated with tumorigenesis. Sequencing data from public databases, including The Cancer Genome Atlas, was used to analyze BCAT1 expression and regulation networks for hepatocellular carcinoma (HCC). Expression and methylation were assessed using UALCAN analysis, and data from multiple datasets concerning the BCAT1 expression level and associated survival rates were further analyzed using HCCDB; interaction networks of biological function were constructed using GeneMANIA. LinkedOmics was used to indicate correlations between BCAT1 and any identified differentially expressed genes. Gene enrichment analysis of BCAT-associated genes was conducted using the Web-based Gene SeT AnaLysis Toolkit. The expression levels of BCAT1 were increased in patients with HCC and in most cases, the level of BCAT1 promoter methylation was reduced. Interaction network analysis suggested that BCAT1 was involved in 'metabolism', 'carcinogenesis' and the 'immune response' via numerous cancer-associated pathways. The present study revealed the expression patterns and potential function networks of BCAT1 in HCC, providing insights for future research into the role of BCAT1 in hepatocarcinogenesis. In addition, the study provided researchers with a way to analyze the genes of interest so they can continue their research in the right direction.
Project description:Jupyter Notebooks have transformed the communication of data analysis pipelines by facilitating a modular structure that brings together code, markdown text, and interactive visualizations. Here, we extended Jupyter Notebooks to broaden their accessibility with Appyters. Appyters turn Jupyter Notebooks into fully functional standalone web-based bioinformatics applications. Appyters present to users an entry form enabling them to upload their data and set various parameters for a multitude of data analysis workflows. Once the form is filled, the Appyter executes the corresponding notebook in the cloud, producing the output without requiring the user to interact directly with the code. Appyters were used to create many bioinformatics web-based reusable workflows, including applications to build customized machine learning pipelines, analyze omics data, and produce publishable figures. These Appyters are served in the Appyters Catalog at https://appyters.maayanlab.cloud. In summary, Appyters enable the rapid development of interactive web-based bioinformatics applications.
Project description:Relentless mining operations have destroyed our environment significantly. Soil inhabiting microbes play a significant role in ecological restoration of these areas. Microbial weathering processes like chemical dissolution of rocks significantly promotes the soil properties and enhances the rock to soil ratio respectively. Earlier studies have reported that bacteria exhibit efficient rock-dissolution abilities by releasing organic acids and other chemical elements from the silicate rocks. However, rock-dissolving mechanisms of the bacterium remain to be unclear till date. Thus, we have performed rock-dissolution experiments followed by genome and transcriptome sequencing of novel Pseudomonas sp.NLX-4 strain to explore the efficiency of microbe-mediated habitat restoration and its molecular mechanisms underlying this biological process. Results obtained from initial rock dissolution experiments revealed that Pseudomonas sp. NLX-4 strain efficiently accelerates the dissolution of silicate rocks by secreting amino acids, exopolysaccharides, and organic acids with elevated concentrations of potassium, silicon and aluminium elements. The rock dissolution experiments of NLX-4 strain exhibited an initial increase in particle diameter variation values between 0-15 days and decline after 15 days-time respectively. The 6,771,445-base pair NLX-4 genome exhibited 63.21 GC percentage respectively with a total of 6041 protein coding genes. Genome wide annotations of NLX-4 strain exhibits 5045-COG, 3996-GO, 5342-InterPro, 4386-KEGG proteins respectively Transcriptome analysis of NLX-4 cultured with/without silicate rocks resulted in 539 (288-up and 251-down) differentially expressed genes (DEGs). Fifteen DEGs encoding for siderophore transport, EPS and amino acids synthesis, organic acids metabolism, and bacterial resistance to adverse environmental conditions were highly up-regulated by cultured with silicate rocks. This study has not only provided a new strategy for the ecological restoration of rock mining areas, but also enriched the applicable bacterial and genetic resources.
Project description:BackgroundThe clinical significance of LINC00996 in colorectal cancer (CRC) has not been verified. In the current study, the authors aimed to explore the expression of LINC00996 and its clinical significance in CRC based on the data mining of Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) datasets, as well as to elucidate the functions of its potential target genes.Materials and methodsGEO and TCGA microarray datasets were used to evaluate the LINC00996 expression and its clinical significance in CRC. LINC00996 related genes were identified by Multi Experiment Matrix, RNA-Binding Protein DataBase, and The Atlas of Noncoding RNAs in Cancer. Subsequently, they were sent to gene ontology enrichment and Kyoto Encyclopedia of Genes and Genomes pathway analysis.ResultsLINC00996 is significantly decreased in CRC tissues compared with non-tumor tissues. Low level of LINC00996 is associated with remote metastasis and poor overall survival. However, LINC00996 has a minimal effect on gender, lymphatic invasion, tumor size, lymph node metastasis, and pathological stage. One hundred and forty-two LINC00996 related genes were identified; the results of functional analysis indicated that LINC00996 might repress tumorigenesis and metastasis via modulating the JAK-STAT, NF-κB, HIF-1, TLR, and PI3K-AKT signaling pathways.ConclusionOur study demonstrates that decreased LINC00996 expression may be involved in colorectal carcinogenesis and metastasis, and the depletion of LINC00996 is associated with a poor outcome in CRC patients. Moreover, the JAK-STAT, NF-κB, HIF-1, TLR, and PI3K-AKT pathways may be the key pathways regulated by LINC00996 in CRC.
Project description:Since its outbreak in late 2019, the SARS-Cov-2 pandemic already infected over 3.7 million people and claimed more than 250,000 lives globally. At least 1 year may take for an approved vaccine to be in place, and meanwhile millions more could be infected, some with fatal outcome. Over thousand clinical trials with COVID-19 patients are already listed in ClinicalTrials.com, some of them for assessing the utility of therapeutics approved for other conditions. However, clinical trials take many months, and are typically done with small cohorts. A much faster and by far more efficient method for rapidly identifying approved therapeutics that can be repurposed for treating COVID-19 patients is data mining their past and current electronic health and prescription records for identifying drugs that may protect infected individuals from severe COVID-19 symptoms. Examples are discussed for applying health and prescription records for assessing the potential repurposing (repositioning) of angiotensin receptor blockers, estradiol, or antiandrogens for reducing COVID-19 morbidity and fatalities. Data mining of prescription records of COVID-19 patients will not cancel the need for conducting controlled clinical trials, but could substantially assist in trial design, drug choice, inclusion and exclusion criteria, and prioritization. This approach requires a strong commitment of health provides for open collaboration with the biomedical research community, as health provides are typically the sole owners of retrospective drug prescription records.
Project description:The patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal protection. It is not obvious whether antibody information from patent documents, such as antibody sequences, is useful in conveying engineering know-how, rather than as a legal reference only. To assess the utility of patent data for therapeutic antibody engineering, we quantified the amount of antibody sequences in patents destined for medicinal purposes and how well they reflect the primary sequences of therapeutic antibodies in clinical use. We identified 16,526 patent families covering major jurisdictions (e.g., US Patent and Trademark Office (USPTO) and World Intellectual Property Organization) that contained antibody sequences. These families held 245,109 unique antibody chains (135,397 heavy chains and 109,712 light chains) that we compiled in our Patented Antibody Database (PAD, http://naturalantibody.com/pad). We find that antibodies make up a non-trivial proportion of all patent amino acid sequence depositions (e.g., 11% of USPTO Full Text database). Our analysis of the 16,526 families demonstrates that the volume of patent documents with antibody sequences is growing, with the majority of documents classified as containing antibodies for medicinal purposes. We further studied the 245,109 antibody chains from patent literature to reveal that they very well reflect the primary sequences of antibody therapeutics in clinical use. This suggests that the patent literature could serve as a reference for previous engineering efforts to improve rational antibody design.
Project description:modENCODE was a 5year NHGRI funded project (2007-2012) to map the function of every base in the genomes of worms and flies characterizing positions of modified histones and other chromatin marks, origins of DNA replication, RNA transcripts and the transcription factor binding sites that control gene expression. Here we describe the Drosophila modENCODE datasets and how best to access and use them for genome wide and individual gene studies.