Dataset Information

Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents

ABSTRACT:

SUBMITTER: Blakey M

PROVIDER: S-EPMC11017645 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

Project description:During the past two decades, the world has witnessed the emergence of various SARS-CoV-2 variants with distinct mutational profiles influencing the global health, economy, and clinical aspects of the COVID-19 pandemic. These variants or mutants have raised major concerns regarding the protection provided by neutralizing monoclonal antibodies and vaccination, rates of virus transmission, and/or the risk of reinfection. The newly emerged Omicron, a genetically distinct lineage of SARS-CoV-2, continues its spread in the face of rising vaccine-induced immunity while maintaining its replication fitness. Efforts have been made to improve the therapeutic interventions and the FDA has issued Emergency Use Authorization for a few monoclonal antibodies and drug treatments for COVID-19. However, the current situation of rapidly spreading Omicron and its lineages demands the need for effective therapeutic interventions to reduce the COVID-19 pandemic. Several experimental studies have indicated that the FDA-approved monoclonal antibodies are less effective than antiviral drugs against the Omicron variant. Thus, in this study, we aim to identify antiviral compounds against the Spike protein of Omicron, which binds to the human angiotensin-converting enzyme 2 (ACE2) receptor and facilitates virus invasion. Initially, docking-based virtual screening of the in-house database was performed to extract the potential hit compounds against the Spike protein. The obtained hits were optimized by DFT calculations to determine the electronic properties and molecular reactivity of the compounds. Further, MD simulation studies were carried out to evaluate the dynamics of protein-ligand interactions at an atomistic level in a time-dependent manner. Collectively, five compounds (AKS-01, AKS-02, AKS-03, AKS-04, and AKS-05) with diverse scaffolds were identified as potential hits against the Spike protein of Omicron. Our study paves the way for further in vitro and in vivo studies.

Dataset Information

Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documents

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets