AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds.
ABSTRACT: Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource.
Project description:Aqueous solubility is an important physicochemical property of compounds in anti-cancer drug discovery. Artificial intelligence solubility prediction tools have scored impressive performances by employing regression, machine learning, and deep learning methods. The reported performances vary significantly partly because of the different datasets used. Solubility prediction on novel compounds needs to be improved, which may be achieved by going deeper with deep learning. We constructed deeper-net models of ~20-layer modified ResNet convolutional neural network architecture, which were trained and tested with 9,943 compounds encoded by molecular fingerprints. Retrospectively tested by 62 recently-published novel compounds, one deeper-net model outperformed four established tools, shallow-net models, and four human experts. Deeper-net models also outperformed others in predicting the solubility values of a series of novel compounds newly-synthesized for anti-cancer drug discovery. Solubility prediction may be improved by going deeper with deep learning. Our deeper-net models are accessible at http://www.npbdb.net/solubility/index.jsp.
Project description:Summary Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of data sets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved. Graphical Abstract Highlights • Consensus machine learning models perform better than singular models• Quality-oriented data selection yields better results than using all data• The uncertainty of test data determines the theoretical limit of a model's performance• The concepts of actual and observed performances of solubility models are introduced Chemistry; Analytical Reagents; Computational Chemistry; Artificial Intelligence
Project description:Whole-cell screening of 20,000 drug-like small molecules led to the identification of nitrofuranyl methylpiperazines as potent anti-TB agents. In the present study, validation followed by medicinal chemistry has been used to explore the structure-activity relationship. Ten compounds demonstrated potent MIC in the range of 0.17-0.0072 ?M against H37Rv Mycobacterium tuberculosis (MTB) and were further investigated against nonreplicating and resistant (Rif(R) and MDR) strains of MTB. These compounds were also tested for cytotoxicity. Among the 10 tested compounds, five showed submicromolar to nanomolar potency against nonreplicating and resistant (Rif(R) and MDR) strains of MTB along with a good safety index. Based on their overall in vitro profiles, the solubility and pharmacokinetic properties of five potent compounds were studied, and two analogues, 14f and 16g, were found to have comparatively better solubility than others tested and acceptable pharmacokinetic properties. This study presents the rediscovery of a nitrofuranyl class of compounds with improved aqueous solubility and acceptable oral PK properties, opening a new direction for further development.
Project description:Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46?000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.
Project description:Predicting the equilibrium solubility of organic, crystalline materials at all relevant temperatures is crucial to the digital design of manufacturing unit operations in the chemical industries. The work reported in our current publication builds upon the limited number of recently published quantitative structure-property relationship studies which modelled the temperature dependence of aqueous solubility. One set of models was built to directly predict temperature dependent solubility, including for materials with no solubility data at any temperature. We propose that a modified cross-validation protocol is required to evaluate these models. Another set of models was built to predict the related enthalpy of solution term, which can be used to estimate solubility at one temperature based upon solubility data for the same material at another temperature. We investigated whether various kinds of solid state descriptors improved the models obtained with a variety of molecular descriptor combinations: lattice energies or 3D descriptors calculated from crystal structures or melting point data. We found that none of these greatly improved the best direct predictions of temperature dependent solubility or the related enthalpy of solution endpoint. This finding is surprising because the importance of the solid state contribution to both endpoints is clear. We suggest our findings may, in part, reflect limitations in the descriptors calculated from crystal structures and, more generally, the limited availability of polymorph specific data. We present curated temperature dependent solubility and enthalpy of solution datasets, integrated with molecular and crystal structures, for future investigations.
Project description:We report a series of ionically modified ferrocene compounds for hybrid lithium-organic non-aqueous redox flow batteries, based on the ferrocene/ferrocenium redox couple as the active catholyte material. Tetraalkylammonium ionic moieties were incorporated into the ferrocene structure, in order to enhance the solubility of the otherwise relatively insoluble ferrocene. The effect of various counter anions of the tetraalkylammonium ionized species appended to the ferrocene, such as bis(trifluoromethanesulfonyl)imide, hexafluorophosphate, perchlorate, tetrafluoroborate, and dicyanamide on the solubility of the ferrocene was investigated. The solution chemistry of the ferrocene species was studied, in order to understand the mechanism of solubility enhancement. Finally, the electrochemical performance of these ionized ferrocene species was evaluated and shown to have excellent cell efficiency and superior cycling stability.
Project description:Prostanoid receptor EP2 can play a proinflammatory role, exacerbating disease pathology in a variety of central nervous system and peripheral diseases. A highly selective EP2 antagonist could be useful as a drug to mitigate the inflammatory consequences of EP2 activation. We recently identified a cinnamic amide class of EP2 antagonists. The lead compound in this class (5d) displays anti-inflammatory and neuroprotective actions. However, this compound exhibited moderate selectivity to EP2 over the DP1 prostanoid receptor (?10-fold) and low aqueous solubility. We now report compounds that display up to 180-fold selectivity against DP1 and up to 9-fold higher aqueous solubility than our previous lead. The newly developed compounds also display higher selectivity against EP4 and IP receptors and a comparable plasma pharmacokinetics. Thus, these compounds are useful for proof of concept studies in a variety of models where EP2 activation is playing a deleterious role.
Project description:Lapatinib, an approved epidermal growth factor receptor inhibitor, was explored as a starting point for the synthesis of new hits against Trypanosoma brucei, the causative agent of human African trypanosomiasis (HAT). Previous work culminated in 1 (NEU-1953), which was part of a series typically associated with poor aqueous solubility. In this report, we present various medicinal chemistry strategies that were used to increase the aqueous solubility and improve the physicochemical profile without sacrificing antitrypanosomal potency. To rank trypanocidal hits, a new assay (summarized in a cytocidal effective concentration (CEC50)) was established, as part of the lead selection process. Increasing the sp3 carbon content of 1 resulted in 10e (0.19 ?M EC50 against T. brucei and 990 ?M aqueous solubility). Further chemical exploration of 10e yielded 22a, a trypanocidal quinolinimine (EC50: 0.013 ?M; aqueous solubility: 880 ?M; and CEC50: 0.18 ?M). Compound 22a reduced parasitemia 109 fold in trypanosome-infected mice; it is an advanced lead for HAT drug development.
Project description:Novel polar functionalities containing 6-nitro-2,3-dihydroimidazooxazole (NHIO) analogues were synthesized to produce a compound with enhanced solubility. Polar functionalities including sulfonyl, uridyl, and thiouridyl-bearing NHIO analogues were synthesized and evaluated against Mycobacterium tuberculosis (MTB) H37Rv. The aqueous solubility of compounds with MIC values ?0.5 ?g/mL were tested, and six compounds showed enhanced aqueous solubility. The best six compounds were further tested against resistant (Rif(R) and MDR) and dormant strains of MTB and tested for cytotoxicity in HepG2 cell line. Based on its overall in vitro characteristics and solubility profile, compound 6d was further shown to possess high microsomal stability, solubility under all tested biological conditions (PBS, SGF and SIF), and favorable oral in vivo pharmacokinetics and in vivo efficacy.
Project description:We estimated aqueous solubilities and activity coefficients of atmospherically relevant highly oxidized multifunctional organic compounds in binary mixtures with water at temperatures between 278.15 and 338.15 K, using the COSMOtherm program. Physicochemical properties of organic aerosol constituents are needed in the modeling of atmospheric aerosol processes. As experimental data are often impossible to obtain, reliable estimates from theoretical approaches are a promising path to fill this gap. We investigated the effect of intramolecular hydrogen bonds on the estimation of these condensed-phase properties, attempting to improve the agreement between experimental and estimated values. Citric, tartaric, malic, and maleic acids, which are often used in atmospheric models as representatives of oxidized compounds, were selected to benchmark our calculations. In addition, we estimated aqueous solubilities and activity coefficients of ?-pinene-derived organosulfates and highly oxidized isoprene-derived organic compounds, for which no experimental data are available. Our results indicate that the absolute aqueous solubility and activity coefficient estimates of citric, tartaric, malic, and maleic acids, and likely other multifunctional organics, can be improved significantly by selecting conformers on the basis of their intramolecular hydrogen bonding in COSMOtherm calculations.