Unknown

Dataset Information

0

Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.


ABSTRACT:

Premise of the study

Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms.

Methods and results

Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae.

Conclusions

The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.

SUBMITTER: Endara L 

PROVIDER: S-EPMC5895189 | biostudies-literature | 2018 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.

Endara Lorena L   Cui Hong H   Burleigh J Gordon JG  

Applications in plant sciences 20180331 3


<h4>Premise of the study</h4>Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms.<h  ...[more]

Similar Datasets

| S-EPMC7797509 | biostudies-literature
| S-EPMC2995674 | biostudies-literature
| S-EPMC10403813 | biostudies-literature
| S-EPMC11390993 | biostudies-literature
| S-EPMC6714891 | biostudies-literature
| S-EPMC11004121 | biostudies-literature
| S-EPMC4849652 | biostudies-literature
| S-EPMC6255800 | biostudies-literature
| S-EPMC8285739 | biostudies-literature
| S-EPMC6750723 | biostudies-literature