Dataset Information

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.

ABSTRACT: Grammar error correction systems are pivotal in the field of natural language processing (NLP), with a primary focus on identifying and correcting the grammatical integrity of written text. This is crucial for both language learning and formal communication. Recently, neural machine translation (NMT) has emerged as a promising approach in high demand. However, this approach faces significant challenges, particularly the scarcity of training data and the complexity of grammar error correction (GEC), especially for low-resource languages such as Indonesian. To address these challenges, we propose InSpelPoS, a confusion method that combines two synthetic data generation methods: the Inverted Spellchecker and Patterns+POS. Furthermore, we introduce an adapted seq2seq framework equipped with a dynamic decoding method and state-of-the-art Transformer-based neural language models to enhance the accuracy and efficiency of GEC. The dynamic decoding method is capable of navigating the complexities of GEC and correcting a wide range of errors, including contextual and grammatical errors. The proposed model leverages the contextual information of words and sentences to generate a corrected output. To assess the effectiveness of our proposed framework, we conducted experiments using synthetic data and compared its performance with existing GEC systems. The results demonstrate a significant improvement in the accuracy of Indonesian GEC compared to existing methods.

SUBMITTER: Musyafa A

PROVIDER: S-EPMC11232608 | biostudies-literature | 2024

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.

Musyafa Ahmad A Gao Ying Y Solyman Aiman A Khan Siraj S Cai Wentian W Khan Muhammad Faizan MF

PeerJ. Computer science 20240705

Grammar error correction systems are pivotal in the field of natural language processing (NLP), with a primary focus on identifying and correcting the grammatical integrity of written text. This is crucial for both language learning and formal communication. Recently, neural machine translation (NMT) has emerged as a promising approach in high demand. However, this approach faces significant challenges, particularly the scarcity of training data and the complexity of grammar error correction (GE ...[more]

PMID: 38983192

Similar Datasets

Project description:IntroductionPakistan is among the countries with the highest maternal death rates. Obstetric hemorrhage accounts for 41% of these deaths. Uterine rupture is a grave obstetric emergency with high maternal and neonatal morbidity and mortality. It is important to identify its frequency and associated risk factors to formulate programs for its prevention and management. This study aimed to assess the frequency, associated risk factors, fetomaternal outcomes, and management of women with the ruptured uterus at our hospital.Material and methodsIt was a retrospective study of 206 women to review data collected from cases of uterine rupture managed at the WCTH Bannu, Pakistan from October 2016 to October 2018. A structured proforma was designed and used to extract data from operating theatre registers and the hospital medical records. In our hospital, there is a strong system of maintaining all information of the patients related to demographics, obstetric information, operative notes, and postoperative course during their hospital stay in the patient's charts. Detailed information on operative procedures is further maintained in the operation theater register and all these registers are checked in the weekly statistical meetings to ensure proper documentation. Data was entered and analyzed in SPSS package version 21 (IBM Corp.; Armonk, NY, USA). Frequency and percentages were calculated for the categorical variables. For inferential statistics, chi-square or Fischer exact tests were used. A p-value of < 0.05 was considered statistically significant.ResultsThe overall incidence of the ruptured uterus was 1.71%. The important etiological factors were grand multiparity 62 (35.2%), obstructed/neglected labour 58 (32.9%), injudicious use of Oxytocin 56 (31.8%) and prostaglandins 26 (14.7%), previous cesarean section 35 (19.8%) and previous pelvic surgery (0.5%). Hysterectomy was done in 80.6% of cases, 34 (19.2%) patients underwent uterine repair and 4.5% had bladder repair. The mortality rate was 21%, mainly due to irreversible shock or disseminated intravascular coagulation. Perinatal mortality was 91.4%. Duration of surgery more than two hours and presentation to the hospital at night time was significantly associated with poor maternal outcome (p = 0.00).ConclusionUterine rupture is a preventable obstetric emergency associated with high fetomaternal morbidity and mortality. The main causes were grand multigravidity, obstructed labour, previous C-sections and injudicious use of oxytocin and prostaglandins. Women with prolonged surgery and admission at night time had a poor maternal outcome.

Project description:BACKGROUND:Intrahepatic dosimetry is paramount to optimize radioembolization treatment accuracy using radioactive holmium-166 microspheres (166Ho). This requires a practical protocol that combines quantitative imaging of microsphere distribution with automated and robust delineation of the volumes of interest. To this end, we propose a dual isotope single photon emission computed tomography (SPECT) protocol based on 166Ho therapeutic microspheres and technetium-99?m (99mTc) stannous phytate, which accumulates in healthy liver tissue. This protocol may allow accurate and automatic estimation of tumor-absorbed dose and healthy liver-absorbed dose. The current study focuses on a Monte Carlo-based reconstruction framework that inherently corrects for scatter crosstalk between the 166Ho and 99mTc imaging. To demonstrate the feasibility of the method, it is evaluated with realistic phantom experiments and patient data. METHODS:The Utrecht Monte Carlo System (UMCS) was extended to include detailed modeling of crosstalk interactions between 99mTc and 166Ho. First, 99mTc images were reconstructed including energy window-based corrections for 166Ho downscatter. Next, 99mTc downscatter in the 81-keV 166Ho window was Monte Carlo simulated to allow quantitative reconstruction of the 166Ho images. The accuracy of the 99mTc-downscatter modeling was evaluated by comparing measurements with simulations. In addition, the ratio between 99mTc and 166Ho yielding the best 166Ho dose estimates was established and the quantitative accuracy was reported. RESULTS:Given the same level of activity, 99mTc contributes twice as many counts to the 81-keV window than 166Ho, and four times as many counts to the 140-keV window, applying a 166Ho/99mTc ratio of 5:1 yielded a high accuracy in both 166Ho and 99mTc reconstruction. Phantom experiments revealed that the accuracy of quantitative 166Ho activity recovery was reduced by 10% due to the presence of 99mTc. Twenty iterations (8 subsets) of the SPECT/CT reconstructions were considered feasible for clinical practice. Applicability of the proposed protocol was shown in a proof-of-concept case. CONCLUSION:A novel 166Ho/99mTc dual-isotope protocol for automatic dosimetry compensates accurately for downscatter and allows for the addition of 99mTc without compromising 166Ho SPECT image quality.

Dataset Information

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.

Publications

Dynamic decoding and dual synthetic data for automatic correction of grammar in low-resource scenario.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets