Project description:BackgroundHealth professionals and consumers use different terms to express medical events or concerns, which makes the communication barriers between the professionals and consumers. This may lead to bias in the diagnosis or treatment due to the misunderstanding or incomplete understanding. To solve the issue, a consumer health vocabulary was developed to map the consumer-used health terms to professional-used medical terms.MethodsIn this study, we extracted Chinese consumer health terms from both online health forum and patient education monographs, and manually mapped them to medical terms used by professionals (terms in medical thesauri or in medical books). To ensure the above annotation quality, we developed annotation guidelines.ResultsWe applied our method to extract consumer-used disease terms in endocrinology, cardiology, gastroenterology and dermatology. In this study, we identified 1349 medical mentions from 8436 questions posted in an online health forum and 1428 articles for patient education monographs. After manual annotation and review, we released 1036 Chinese consumer health terms with mapping to 480 medical terms. Four annotators worked on the manual annotation work following the Chinese consumer health term annotation guidelines. Their average inter-annotator agreement (IAA) score was 93.91% ensuring high consistency of the released terms.ConclusionsWe extracted Chinese consumer health terms from online forum and patient education monographs, and mapped them to medical terms used by professionals. Manual annotation efforts have been made for term annotating and mapping. Our study may contribute to the Chinese consumer health vocabulary construction. In addition, our annotated corpus, both the contexts of consumer health terms and consumer-professional term mapping, would be a useful resource for automatic methodology development. The dataset of the Chinese consumer health terms (CHT) is publicly available at http://www.phoc.org.cn/cht/ .
Project description:Grammars Across Time Analyzed (GATA) is a resource capturing two snapshots of the grammatical structure of a diverse range of languages separated in time, aimed at furthering research on historical linguistics, language evolution, and cultural change. GATA comprises grammatical information on 52 diverse languages across all continents, featuring morphological, syntactic, and phonological information based on published grammars of the same language at two different time points. Here we introduce the coding scheme and design features of GATA, and we describe some salient patterns related to language change and the coverage of grammatical descriptions over time.
Project description:Several studies examined how some characteristics of personal bankruptcy laws influenced entrepreneurial developments during the last two decades. Our main objective is to analyze the association between self-employment and the leniency of the personal bankruptcy systems in 24 EU countries. Unlike previous studies, we measure differences and changes in the leniency of the regulations with a composite index that incorporates 35 variables. Based on a cross-country database of self-employment ratios and various control variables spanning the years 2000 to 2019, we apply a panel regression model. We find that the implementation of new regulations and reforms in personal bankruptcy legislation in more lenient directions positively correlates with entrepreneurial developments measured by self-employment rates. This is more significant in the group of countries where the eligibility criteria for entrepreneurs are not constrained. We find a one-year negative time-lag effect and conclude that strong anticipation of the law for a more lenient system can immediately change the risk-reward profile, and thereby influence entrepreneurship before implementing the actual reform. An important policy implication is that a major reform in regulation or the first implementation of conservative legislation has the same order of magnitude of effect on promoting entrepreneurship as other public policy reforms of similar purpose.
Project description:Working with a large temporal dataset spanning several decades often represents a challenging task, especially when the record is heterogeneous and incomplete. The use of statistical laws could potentially overcome these problems. Here we apply Benford's Law (also called the "First-Digit Law") to the traveled distances of tropical cyclones since 1842. The record of tropical cyclones has been extensively impacted by improvements in detection capabilities over the past decades. We have found that, while the first-digit distribution for the entire record follows Benford's Law prediction, specific changes such as satellite detection have had serious impacts on the dataset. The least-square misfit measure is used as a proxy to observe temporal variations, allowing us to assess data quality and homogeneity over the entire record, and at the same time over specific periods. Such information is crucial when running climatic models and Benford's Law could potentially be used to overcome and correct for data heterogeneity and/or to select the most appropriate part of the record for detailed studies.
Project description:We present EU-Forest, a dataset that integrates and extends by almost one order of magnitude the publicly available information on European tree species distribution. The core of our dataset (~96% of the occurrence records) came from an unpublished, large database harmonising forest plot surveys from National Forest Inventories on an INSPIRE-compliant 1 km×1 km grid. These new data can potentially benefit several disciplines, including forestry, biodiversity conservation, palaeoecology, plant ecology, the bioeconomy, and pest management.
Project description:Glycan arrays continue to be the primary resource for determining the glycan-binding specificity of proteins. The volume and diversity of glycan-array data are increasing, but no common method and resource exist to analyze, integrate, and use the available data. To meet this need, we developed a resource of analyzed glycan-array data called CarboGrove. Using the ability to process and interpret data from any type of glycan array, we populated the database with the results from 35 types of glycan arrays, 13 glycan families, 5 experimental methods, and 19 laboratories or companies. In meta-analyses of glycan-binding proteins, we observed glycan-binding specificities that were not uncovered from single sources. In addition, we confirmed the ability to efficiently optimize selections of glycan-binding proteins to be used in experiments for discriminating between closely related motifs. Through descriptive reports and a programmatically accessible Application Programming Interface, CarboGrove yields unprecedented access to the wealth of glycan-array data being produced and powerful capabilities for both experimentalists and bioinformaticians.
Project description:A number of genetic diseases are a result of missense mutations in protein structure. These mutations can lead to severe protein destabilization and misfolding. The unfolding mutation screen (UMS) is a computational method that calculates unfolding propensities for every possible missense mutation in a protein structure. The UMS validation demonstrated a good agreement with experimental and phenotypical data. 15 protein structures (a combination of homology models and crystal structures) were analyzed using UMS. The standard and clustered heat maps, and patterned protein structure from the analysis were stored in a UMS library. The library is currently composed of 15 protein structures from 14 inherited eye diseases including retina degenerations, glaucoma, and cataracts, and contains data for 181,110 mutations. The UMS protein library introduces 13 new human models of eye disease related proteins and is the first collection of the consistently calculated unfolding propensities, which could be used as a tool for the express analysis of novel mutations in clinical practice, next generation sequencing, and genotype-to-phenotype relationships in inherited eye disease.
Project description:The dataset encompasses discretionary fiscal actions in the group of 11 EU New Member States (NMS), which so far have been under-researched, spanning from 2004 to 2019. It extends the narrative dataset of fiscal actions in Advanced Economies created by [1]. Information on actions and their estimated fiscal effects was collected from annually published Convergence or Stability Programmes. Individual actions were classified according to the relevant category of government expenditure and revenue (in line with the classification proposed by [1]) and identified as either exogenous or endogenous (based on the distinction introduced by [3]). The raw data were then aggregated into fiscal plans, by country and announcement year. Finally, two spreadsheets were created. The baseline spreadsheet contains only fiscal consolidation episodes, following the method of [1], while the alternative dataset includes both consolidation and expansion fiscal plans, which extends the methodological approach of [1]. Moreover, in the alternative dataset, the time span of fiscal effects is extended from 5 to 8 years (the longest available) and dividends are added. Dividends represent a category of public revenue not included in [1], but they are non-negligible in the case of the NMS due to the post-socialist legacy.
Project description:The zebrafish (Danio rerio) is an important and widely used vertebrate model organism for the study of human diseases which include disorders caused by dysfunctional mitochondria. Mitochondria play an essential role in both energy metabolism and apoptosis, which are mediated through a mitochondrial phospholipid cardiolipin (CL). In order to examine the cardiolipin profile in the zebrafish model, we developed a CL analysis platform by using liquid chromatography-mass spectrometry (LC-MS). Meanwhile, we tested whether chlorella diet would alter the CL profile in the larval fish, and in various organs of the adult fish. The results showed that chlorella diet increased the chain length of CL in larval fish. In the adult zebrafish, the distribution patterns of CL species were similar between the adult brain and eye tissues, and between the heart and muscles. Interestingly, monolyso-cardiolipin (MLCL) was not detected in brain and eyes but found in other examined tissues, indicating a different remodeling mechanism to maintain the CL integrity. While the adult zebrafish were fed with chlorella for four weeks, the CL distribution showed an increase of the species of saturated acyl chains in the brain and eyes, but a decrease in the other organs. Moreover, chlorella diet led to a decrease of MLCL percentage in organs except the non-MLCL-containing brain and eyes. The CL analysis in the zebrafish provides an important tool for studying the mechanism of mitochondria diseases, and may also be useful for testing medical regimens targeting against the Barth Syndrome.