Unknown

Dataset Information

0

COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems.


ABSTRACT: Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.

SUBMITTER: Mayo Yanes E 

PROVIDER: S-EPMC10799083 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems.

Mayo Yanes Eduardo E   Chakraborty Sabyasachi S   Gershoni-Poranne Renana R  

Scientific data 20240119 1


Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our  ...[more]

Similar Datasets

| S-EPMC4515081 | biostudies-literature
| S-EPMC4669568 | biostudies-literature
| S-EPMC11301039 | biostudies-literature
| S-EPMC6222356 | biostudies-literature
| PRJNA590550 | ENA
2012-04-18 | GSE37326 | GEO
| S-EPMC9898583 | biostudies-literature
| PRJNA656884 | ENA
2012-04-17 | E-GEOD-37326 | biostudies-arrayexpress
| PRJNA793526 | ENA