ABSTRACT: The Molecular Entities in Linked Data (MEiLD) dataset comprises data of distinct atoms, molecules, ions, ion pairs, radicals, radical ions, and others that can be identifiable as separately distinguishable chemical entities. The dataset is provided in a JSON-LD format and was generated by the SDFEater, a tool that allows parsing atoms, bonds, and other molecule data. MEiLD contains 349,960 of 'small' chemical entities. Our dataset is based on the SDF files and is enriched with additional ontologies and line notation data. As a basis, the Molecular Entities in Linked Data dataset uses the Resource Description Framework (RDF) data model. Saving the data in such a model allows preserving the semantic relations, like hierarchical and associative, between them. To describe chemical molecules, vocabularies such as Chemical Vocabulary for Molecular Entities (CVME) and Simple Knowledge Organization System (SKOS) are used. The dataset can be beneficial, among others, for people concerned with research and development tools for cheminformatics and bioinformatics. In this paper, we describe various methods of access to our dataset. In addition to the MEiLD dataset, we publish the Shapes Constraint Language (SHACL) schema of our dataset and the CVME ontology. The data is available in Mendeley Data.
Project description:The mussel inspired chemistry of dopamine oxidation to form polydopamine (PDA) and in situ reduction of metal ions in solution to form metal nanoparticles have widely opened the application of metal nanoparticles surface modification technology. This article contains the dataset of the scanning electron microscope (SEM) images of silver nanoparticles coated on polyethylene terephthalate (PET) films utilizing dopamine chemistry alone or combined with polyvinylpyrrolidone or glucose. The Ag NPs formed in various environments present round, cubic, or triangle shape. Mendeley Data, http://dx.doi.org/10.17632/bjjrt2dwbn.1.
Project description:The dataset includes 1,771 locations of major seagrass families (Cymodoceaceae, Zosteraceae, Posidoniaceae, Hydrocharitaceae, Ruppiaceae), which are further divided into the species they include, as well as 1,284 locations of seagrass absence (algorithmically produced), in the Mediterranean Sea. For each location, 217 biological, chemical, physics, and human related parameters are available, which were merged from other publicly available data sources. As the most comprehensive dataset for seagrass in the Mediterranean to date, it is suitable for data analysis and machine learning. For more insight, please see "Seagrass Detection in the Mediterranean: A Supervised Learning Approach" (Effrosynidis et al., 2018). The dataset is available on Mendeley Data (Effrosynidis, 2019).
Project description:Breast cancer diagnosis is one of the many areas that has taken advantage of artificial intelligence to achieve better performance, despite the fact that the availability of a large medical image dataset remains a challenge. Transfer learning (TL) is a phenomenon that enables deep learning algorithms to overcome the issue of shortage of training data in constructing an efficient model by transferring knowledge from a given source task to a target task. However, in most cases, ImageNet (natural images) pre-trained models that do not include medical images, are utilized for transfer learning to medical images. Considering the utilization of microscopic cancer cell line images that can be acquired in large amount, we argue that learning from both natural and medical datasets improves performance in ultrasound breast cancer image classification. The proposed multistage transfer learning (MSTL) algorithm was implemented using three pre-trained models: EfficientNetB2, InceptionV3, and ResNet50 with three optimizers: Adam, Adagrad, and stochastic gradient de-scent (SGD). Dataset sizes of 20,400 cancer cell images, 200 ultrasound images from Mendeley and 400 ultrasound images from the MT-Small-Dataset were used. ResNet50-Adagrad-based MSTL achieved a test accuracy of 99 ± 0.612% on the Mendeley dataset and 98.7 ± 1.1% on the MT-Small-Dataset, averaging over 5-fold cross validation. A <i>p</i>-value of 0.01191 was achieved when comparing MSTL against ImageNet based TL for the Mendeley dataset. The result is a significant improvement in the performance of artificial intelligence methods for ultrasound breast cancer classification compared to state-of-the-art methods and could remarkably improve the early diagnosis of breast cancer in young women.
Project description:Recent trends in voicebot application development have enabled utilization of both speech-to-text and text-to-speech (TTS) generation techniques. In order to generate a voice response to a given speech, one needs to use a TTS engine. The recently developed TTS engines are shifting towards end-to-end approaches utilizing models such as Tacotron, Tacotron-2, WaveNet, and WaveGlow. The reason is that it enables a TTS service provider to focus on developing training and validating datasets comprising of labelled texts and recorded speeches instead of designing an entirely new model that outperforms the others which is time-consuming and costly. In this context, this work introduces the first Vietnamese FPT Open Speech Data (FOSD)-Tacotron-2-based TTS model dataset. This dataset comprises of a configuration file in *.json format; training and validating text input files (in *.csv format); a 225,000-step checkpoint of the trained model; and several sample generated audios. The published dataset is extremely worth for serving as a model for benchmarking with other newly developed TTS models / engines. In addition, it opens an entirely new TTS research optimization problem to be addressed: How to effectively generate speech from text given: a black box TTS (trained) model and its training and validation input texts.
Project description:The data presented in this paper are related to the research paper entitled "Bremsstrahlung spectra produced by kilovolt electron impact on thick targets" . The dataset includes our measured bremsstrahlung spectra on Al, Ti, Zr, Mo, and W thick targets at 5, 10, 15, 20, and 25?keV electron impact. In this paper we present the experimental method and make the dataset publicly available to enable extended analyses or reuse. The dataset is available on mendeley data public repository at http://dx.doi.org/10.17632/5zx3459bj3.1.
Project description:Surfacewater overlay groundwater at the groundwater/surface water interface(GSI). Water and chemicals are continually exchanged via GSI. Surface water recharges the underlying aquifer and undergo significant changes in chemical composition before it discharges back into the stream or at the surface. Thus a sustainable management of water resource needs an insight in the water chemistry and seasonal variations. The hydrochemical dataset representing a total of 37 groundwater samples and 13 surface water samples has been collected from Kattumannarkoil taluk,India to identify the factors governing water chemistry of the region. Hence, the samples were collected during two different seasons summer (April 2015) and monsoon (September 2015) to broadly cover seasonal variation. The collected samples were analyzed for physical and chemical parameters. The physical parameters measured in the field are pH, electrical conductivity (EC), total dissolved solids (TDS). The chemical parameters analyzed in the laboratory are calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K +), chloride (Cl-), bicarbonate (HCO3 -), nitrate (NO3 -), phosphate (PO4), sulfate (SO4 2+) and silica (H4SiO4). Furthermore, the results were processed using AquaChem software, Geographical information system (GIS), multivariate statistical techniques and a computer program WATCLAST written in C ++. This hydrochemical dataset ascertain the utility purpose of water. The dataset can serve as a guide for hydrogeochemistry of other predominantly agricultural area that share similar geological characteristics. The raw data of this research work is hosted in the mendeley repository .
Project description:Geochemical modelling data and Powder X-Ray Diffraction data on samples collected along Rio Irvi (Montevecchio-Ingurtosu mining district, SW Sardinia, Italy) are reported in this paper. The data show the results of data processing to calculate water chemical speciation of ions and saturation indices of relevant mineral phases. These data are related with the research article: De Giudici G. et al (2018), Application of hydrologic-tracer techniques to the Casargiu adit and Rio Irvi (SW-Sardinia, Italy): Using enhanced natural attenuation to reduce extreme metal loads, Applied Geochemistry, vol.96, 42-54. The comparison of the calculated saturation indices of relevant Fe-bearing phases with the PXRD data of samples collected along the stream confirm the quality of the SI dataset and the good correlation between the calculations and the observed data. The comparison of this dataset with others can help to deeper understand and quantify the impact of past and current mining activity on water bodies, contributing to implement the scientific background for the application of remediation actions.
Project description:In this work, we present a dataset which provides information on the scientific program of a set conferences of Machine Learning. Data were extracted from the IEEE Xplore Digital Library and the official web site of the International Conference on Machine Learning Applications (ICMLA). We include data of four different editions (from 2014 to 2017). Web scrapping techniques were used to mine the data contained in these web sites. The dataset covers 448 papers presented in the conference and every paper contains 6 attributes including information about the thematic session in which they were presented in the conference. The dataset is hosted in the Mendeley Dataset Repository.
Project description:An integrated dataset was developed that combined stakeholder perceptions of environmental change (precipitation, air temperature, water temperature, fish abundance, fish size, residential development) and comparable instrumented measures of environmental changes based on sensor records. All data were transformed to a common 3-point categorical scale to support statistical comparison of social and biophysical change for the same change variables. The integrated dataset is available on Mendeley (http://dx.doi.org/10.17632/cjfxg84bmx.1).