Dataset Information


Semi-automatic conversion of BioProp semantic annotation to PASBio annotation.

ABSTRACT: BACKGROUND: Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator. RESULTS: Our first experiment evaluated our rule-based converter's performance independently from BIOSMILE performance. The converter achieved an F-score of 85.29%. The second experiment evaluated combined system (BIOSMILE + rule-based converter). The system achieved an F-score of 69.08% for PASBio's 29 verbs. CONCLUSION: Our approach allows PAS conversion between BioProp and PASBio annotation using BIOSMILE alongside our newly developed semi-automatic rule generator and rule-based converter. Our system can match the performance of other state-of-the-art domain-specific ML-based SRL systems and can be easily customized for PASBio application development.


PROVIDER: S-EPMC2638158 | BioStudies | 2008-01-01T00:00:00Z

REPOSITORIES: biostudies

Similar Datasets

2017-01-01 | S-EPMC5561560 | BioStudies
1000-01-01 | S-EPMC3483229 | BioStudies
1000-01-01 | S-EPMC3694679 | BioStudies
2009-01-01 | S-EPMC2774701 | BioStudies
2010-01-01 | S-EPMC2903722 | BioStudies
| S-EPMC3449393 | BioStudies
2011-01-01 | S-EPMC3111993 | BioStudies
2015-01-01 | S-EPMC4301805 | BioStudies
2017-01-01 | S-EPMC5549299 | BioStudies
| S-EPMC3168731 | BioStudies