Unknown

Dataset Information

0

Descriptor engineering in machine learning regression of electronic structure properties for 2D materials.


ABSTRACT: We build new material descriptors to predict the band gap and the work function of 2D materials by tree-based machine-learning models. The descriptor's construction is based on vectorizing property matrices and on empirical property function, leading to mixing features that require low-resource computations. Combined with database-based features, the mixing features significantly improve the training and prediction of the models. We find R[Formula: see text] greater than 0.9 and mean absolute errors (MAE) smaller than 0.23 eV both for the training and prediction. The highest R[Formula: see text] of 0.95, 0.98 and the smallest MAE of 0.16 eV and 0.10 eV were obtained by using extreme gradient boosting for the bandgap and work-function predictions, respectively. These metrics were greatly improved as compared to those of database features-based predictions. We also find that the hybrid features slightly reduce the overfitting despite a small scale of the dataset. The relevance of the descriptor-based method was assessed by predicting and comparing the electronic properties of several 2D materials belonging to new classes (oxides, nitrides, carbides) with those of conventional computations. Our work provides a guideline to efficiently engineer descriptors by using vectorized property matrices and hybrid features for predicting 2D materials properties via ensemble models.

SUBMITTER: Dau MT 

PROVIDER: S-EPMC10070413 | biostudies-literature | 2023 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Descriptor engineering in machine learning regression of electronic structure properties for 2D materials.

Dau Minh Tuan MT   Al Khalfioui Mohamed M   Michon Adrien A   Reserbat-Plantey Antoine A   Vézian Stéphane S   Boucaud Philippe P  

Scientific reports 20230403 1


We build new material descriptors to predict the band gap and the work function of 2D materials by tree-based machine-learning models. The descriptor's construction is based on vectorizing property matrices and on empirical property function, leading to mixing features that require low-resource computations. Combined with database-based features, the mixing features significantly improve the training and prediction of the models. We find R[Formula: see text] greater than 0.9 and mean absolute er  ...[more]

Similar Datasets

| S-EPMC8813923 | biostudies-literature
| S-EPMC7193307 | biostudies-literature
| S-EPMC10987159 | biostudies-literature
| S-EPMC9597664 | biostudies-literature
| S-EPMC6180079 | biostudies-literature
| S-EPMC7519134 | biostudies-literature
| S-EPMC8692616 | biostudies-literature
| S-EPMC10915406 | biostudies-literature
| S-EPMC10199776 | biostudies-literature
| S-EPMC5998124 | biostudies-literature