Project description:News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas.
Project description:Recognizing emotions is vital in communication. Emotions convey additional meanings to the communication process. Nowadays, people can communicate their emotions on many platforms; one is the product review. Product reviews in the online platform are an important element that affects customers' buying decisions. Hence, it is essential to recognize emotions from the product reviews. Emotions recognition from the product reviews can be done automatically using a machine or deep learning algorithm. Dataset can be considered as the fuel to model the recognizer. However, only a limited dataset exists in recognizing emotions from the product reviews, particularly in a local language. This research contributes to the dataset collection of 5400 product reviews in Indonesian. It was carefully curated from various (29) product categories, annotated with five emotions, and verified by an expert in clinical psychology. The dataset supports an innovative process to build automatic emotion classification on product reviews.
Project description:We present an ocean-basin-scale dataset that includes tail fluke photographic identification (photo-ID) and encounter data for most living individual humpback whales (Megaptera novaeangliae) in the North Pacific Ocean. The dataset was built through a broad collaboration combining 39 separate curated photo-ID catalogs, supplemented with community science data. Data from throughout the North Pacific were aggregated into 13 regions, including six breeding regions, six feeding regions, and one migratory corridor. All images were compared with minimal pre-processing using a recently developed image recognition algorithm based on machine learning through artificial intelligence; this system is capable of rapidly detecting matches between individuals with an estimated 97-99% accuracy. For the 2001-2021 study period, a total of 27,956 unique individuals were documented in 157,350 encounters. Each individual was encountered, on average, in 5.6 sampling periods (i.e., breeding and feeding seasons), with an annual average of 87% of whales encountered in more than one season. The combined dataset and image recognition tool represents a living and accessible resource for collaborative, basin-wide studies of a keystone marine mammal in a time of rapid ecological change.
Project description:BACKGROUND:Annotation of cell identity is an essential process in neuroscience that allows comparison of cells, including that of neural activities across different animals. In Caenorhabditis elegans, although unique identities have been assigned to all neurons, the number of annotatable neurons in an intact animal has been limited due to the lack of quantitative information on the location and identity of neurons. RESULTS:Here, we present a dataset that facilitates the annotation of neuronal identities, and demonstrate its application in a comprehensive analysis of whole-brain imaging. We systematically identified neurons in the head region of 311 adult worms using 35 cell-specific promoters and created a dataset of the expression patterns and the positions of the neurons. We found large positional variations that illustrated the difficulty of the annotation task. We investigated multiple combinations of cell-specific promoters driving distinct fluorescence and generated optimal strains for the annotation of most head neurons in an animal. We also developed an automatic annotation method with human interaction functionality that facilitates annotations needed for whole-brain imaging. CONCLUSION:Our neuron ID dataset and optimal fluorescent strains enable the annotation of most neurons in the head region of adult C. elegans, both in full-automated fashion and a semi-automated version that includes human interaction functionalities. Our method can potentially be applied to model species used in research other than C. elegans, where the number of available cell-type-specific promoters and their variety will be an important consideration.
Project description:The dataset contains 1339 cone penetration tests (CPT, CPTu, SCPT, SCPTu) executed within Austria and Germany by the company Premstaller Geotechnik ZT GmbH. As a first processing step, core drillings, located within a maximum distance of approximately 50 m to the insitu tests, were assigned to these cone penetration tests, which allow an interpretation of the insitu measurements based on its grain size distribution. In a second step, the software Geologismiki was used to calculate various normalized measures, which can e.g. be used as input parameters for soil behaviour type charts. The present data can be utilized by researches for example to develop new approaches related to soil classification based on cone penetration test. Furthermore, it provides a framework for combining insitu measurements (qc, fs, Rf, u2, Vs), normalized measures (i.e. Qt, Bq, U2) and soil classifications.
Project description:Protein abnormalities are the major cause of neurodegenerative diseases such as spinocerebellar ataxia (SCA). Protein misfolding and impaired degradation leads to the build-up of protein aggregates inside the cell, which may further cause cellular degeneration. Reducing levels of either the soluble misfolded form of the protein or its precipitated aggregate, even marginally, could significantly improve cellular health. Despite numerous pre-existing strategies to target these protein aggregates, there is considerable room to improve their specificity and efficiency. In this study, we demonstrated the enhanced intracellular degradation of both monomers and aggregates of mutant ataxin1 (Atxn1 82Q) by engineering an E3 ubiquitin ligase enzyme, promyelocytic leukemia protein (PML). Specifically, we showed enhanced degradation of both soluble and aggregated Atxn1 82Q in mammalian cells by targeting this protein using PML fused to single chain variable fragments (scFvs) specific for monomers and aggregates of the target protein. The ability to solubilize Atxn1 82Q aggregates was due to the PML-mediated enhanced SUMOylation of the target protein. This ability to reduce the intracellular levels of both misfolded forms of Atxn1 82Q may not only be useful for treating SCA, but also applicable for the treatment of other PolyQ disorders.