Unknown

Dataset Information

0

A machine learning driven automated system for safety data sheet indexing.


ABSTRACT: Safety Data Sheets (SDS) are foundational to chemical management systems and are used in a wide variety of applications such as green chemistry, industrial hygiene, and regulatory compliance, among others within the Environment, Health, and Safety (EHS) and the Environment, Social, and Governance (ESG) domains. Companies usually prefer to have key pieces of information extracted from these datasheets and stored in an easy to access structured repository. This process is referred to as SDS "indexing". Historically, SDS indexing has always been done manually, which is labor-intensive, time-consuming, and costly. In this paper, we present an automated system to index the composition information of chemical products from SDS documents using a multi-stage ensemble method with a combination of machine learning models and rule-based systems stacked one after the other. The system specifically indexes the ingredient names, their corresponding Chemical Abstracts Service (CAS) numbers, and weight percentages. It takes the SDS document in PDF format as the input and gives the list of ingredient names along with their corresponding CAS numbers and weight percentages in a tabular format as the output. The system achieves a precision of 0.93 at the document level when evaluated on 20,000 SDS documents annotated for this purpose.

SUBMITTER: Suman A 

PROVIDER: S-EPMC10883951 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

A machine learning driven automated system for safety data sheet indexing.

Suman Aatish A   Khan Misbah M   Talreja Veeru V   Penfield Julia J   Crowell Stephanie S  

Scientific reports 20240222 1


Safety Data Sheets (SDS) are foundational to chemical management systems and are used in a wide variety of applications such as green chemistry, industrial hygiene, and regulatory compliance, among others within the Environment, Health, and Safety (EHS) and the Environment, Social, and Governance (ESG) domains. Companies usually prefer to have key pieces of information extracted from these datasheets and stored in an easy to access structured repository. This process is referred to as SDS "index  ...[more]

Similar Datasets

| S-EPMC10820802 | biostudies-literature
| S-EPMC10401178 | biostudies-literature
| S-EPMC9849330 | biostudies-literature
2022-08-14 | GSE184943 | GEO
| S-EPMC10349995 | biostudies-literature
| S-EPMC10783149 | biostudies-literature
| S-EPMC7687896 | biostudies-literature
| S-EPMC11556278 | biostudies-literature
2020-09-01 | E-MTAB-9501 | biostudies-arrayexpress
| S-EPMC10791605 | biostudies-literature