Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.
ABSTRACT: BACKGROUND:Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. OBJECTIVE:This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. METHODS:We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. RESULTS:LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. CONCLUSIONS:We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.
Project description:As one of the serious public health issues, vaccination refusal has been attracting more and more attention, especially for newly approved human papillomavirus (HPV) vaccines. Understanding public opinion towards HPV vaccines, especially concerns on social media, is of significant importance for HPV vaccination promotion.In this study, we leveraged a hierarchical machine learning based sentiment analysis system to extract public opinions towards HPV vaccines from Twitter. English tweets containing HPV vaccines-related keywords were collected from November 2, 2015 to March 28, 2016. Manual annotation was done to evaluate the performance of the system on the unannotated tweets corpus. Followed time series analysis was applied to this corpus to track the trends of machine-deduced sentiments and their associations with different days of the week.The evaluation of the unannotated tweets corpus showed that the micro-averaging F scores have reached 0.786. The learning system deduced the sentiment labels for 184,214 tweets in the collected unannotated tweets corpus. Time series analysis identified a coincidence between mainstream outcome and Twitter contents. A weak trend was found for "Negative" tweets that decreased firstly and began to increase later; an opposite trend was identified for "Positive" tweets. Tweets that contain the worries on efficacy for HPV vaccines showed a relative significant decreasing trend. Strong associations were found between some sentiments ("Positive", "Negative", "Negative-Safety" and "Negative-Others") with different days of the week.Our efforts on sentiment analysis for newly approved HPV vaccines provide us an automatic and instant way to extract public opinion and understand the concerns on Twitter. Our approaches can provide a feedback to public health professionals to monitor online public response, examine the effectiveness of their HPV vaccination promotion strategies and adjust their promotion plans.
Project description:Riboswitches are cis-regulatory genetic elements that use an aptamer to control gene expression. Specificity to cognate ligand and diversity of such ligands have expanded the functional repertoire of riboswitches to mediate mounting apt responses to sudden metabolic demands and signal changes in environmental conditions. Given their critical role in microbial life, riboswitch characterisation remains a challenging computational problem. Here we have addressed the issue with advanced deep learning frameworks, namely convolutional neural networks (CNN), and bidirectional recurrent neural networks (RNN) with Long Short-Term Memory (LSTM). Using a comprehensive dataset of 32 ligand classes and a stratified train-validate-test approach, we demonstrated the accurate performance of both the deep learning models (CNN and RNN) relative to conventional hyperparameter-optimized machine learning classifiers on all key performance metrics, including the ROC curve analysis. In particular, the bidirectional LSTM RNN emerged as the best-performing learning method for identifying the ligand-specificity of riboswitches with an accuracy >0.99 and macro-averaged F-score of 0.96. An additional attraction is that the deep learning models do not require prior feature engineering. A dynamic update functionality is built into the models to factor for the constant discovery of new riboswitches, and extend the predictive modeling to new classes. Our work would enable the design of genetic circuits with custom-tuned riboswitch aptamers that would effect precise translational control in synthetic biology. The associated software is available as an open-source Python package and standalone resource for use in genome annotation, synthetic biology, and biotechnology workflows.
Project description:Yield prediction is of great significance for yield mapping, crop market planning, crop insurance, and harvest management. Remote sensing is becoming increasingly important in crop yield prediction. Based on remote sensing data, great progress has been made in this field by using machine learning, especially the Deep Learning (DL) method, including Convolutional Neural Network (CNN) or Long Short-Term Memory (LSTM). Recent experiments in this area suggested that CNN can explore more spatial features and LSTM has the ability to reveal phenological characteristics, which both play an important role in crop yield prediction. However, very few experiments combining these two models for crop yield prediction have been reported. In this paper, we propose a deep CNN-LSTM model for both end-of-season and in-season soybean yield prediction in CONUS at the county-level. The model was trained by crop growth variables and environment variables, which include weather data, MODIS Land Surface Temperature (LST) data, and MODIS Surface Reflectance (SR) data; historical soybean yield data were employed as labels. Based on the Google Earth Engine (GEE), all these training data were combined and transformed into histogram-based tensors for deep learning. The results of the experiment indicate that the prediction performance of the proposed CNN-LSTM model can outperform the pure CNN or LSTM model in both end-of-season and in-season. The proposed method shows great potential in improving the accuracy of yield prediction for other crops like corn, wheat, and potatoes at fine scales in the future.
Project description:Early defibrillation by an automated external defibrillator (AED) is key for the survival of out-of-hospital cardiac arrest (OHCA) patients. ECG feature extraction and machine learning have been successfully used to detect ventricular fibrillation (VF) in AED shock decision algorithms. Recently, deep learning architectures based on 1D Convolutional Neural Networks (CNN) have been proposed for this task. This study introduces a deep learning architecture based on 1D-CNN layers and a Long Short-Term Memory (LSTM) network for the detection of VF. Two datasets were used, one from public repositories of Holter recordings captured at the onset of the arrhythmia, and a second from OHCA patients obtained minutes after the onset of the arrest. Data was partitioned patient-wise into training (80%) to design the classifiers, and test (20%) to report the results. The proposed architecture was compared to 1D-CNN only deep learners, and to a classical approach based on VF-detection features and a support vector machine (SVM) classifier. The algorithms were evaluated in terms of balanced accuracy (BAC), the unweighted mean of the sensitivity (Se) and specificity (Sp). The BAC, Se, and Sp of the architecture for 4-s ECG segments was 99.3%, 99.7%, and 98.9% for the public data, and 98.0%, 99.2%, and 96.7% for OHCA data. The proposed architecture outperformed all other classifiers by at least 0.3-points in BAC in the public data, and by 2.2-points in the OHCA data. The architecture met the 95% Sp and 90% Se requirements of the American Heart Association in both datasets for segment lengths as short as 3-s. This is, to the best of our knowledge, the most accurate VF detection algorithm to date, especially on OHCA data, and it would enable an accurate shock no shock diagnosis in a very short time.
Project description:Forecasting vessel flows is important to the development of intelligent transportation systems in the maritime field, as real-time and accurate traffic information has favorable potential in helping a maritime authority to alleviate congestion, mitigate emission of GHG (greenhouse gases) and enhance public safety, as well as assisting individual vessel users to plan better routes and reduce additional costs due to delays. In this paper, we propose three deep learning-based solutions to forecast the inflow and outflow of vessels within a given region, including a convolutional neural network (CNN), a long short-term memory (LSTM) network, and the integration of a bidirectional LSTM network with a CNN (BDLSTM-CNN). To apply those solutions, we first divide the given maritime region into M × N grids, then we forecast the inflow and outflow for all the grids. Experimental results based on the real AIS (Automatic Identification System) data of marine vessels in Singapore demonstrate that the three deep learning-based solutions significantly outperform the conventional method in terms of mean absolute error and root mean square error, with the performance of the BDLSTM-CNN-based hybrid solution being the best.
Project description:With the development of society, more and more attention has been paid to cultural festivals. In addition to the government's emphasis, the increasing consumption in festivals also proves that cultural festivals are playing increasingly important role in public life. Therefore, it is very vital to grasp the public festival sentiment. Text sentiment analysis is an important research content in the field of machine learning in recent years. However, at present, there are few studies on festival sentiment, and sentiment classifiers are also limited by domain or language. The Chinese text classifier is much less than the English version. This paper takes Sina Weibo as the text information carrier and Chinese festival microblogs as the research object. CHN-EDA is used to do Chinese text data augmentation, and then the traditional classifiers CNN, DNN, and naïve Bayes are compared to obtain a higher accuracy. The matching optimizer is selected, and relevant parameters are determined through experiments. This paper solves the problem of unbalanced Chinese sentiment data and establishes a more targeted festival text classifier. This festival sentiment classifier can collect public festival emotion effectively, which is beneficial for cultural inheritance and business decisions adjustment.
Project description:<h4>INTRODUCTION</h4> The public most frequently associates tobacco use solely with pulmonary health risks, despite heart disease being the leading cause of death in smokers. The health perceptions of e-cigarettes, especially cardiovascular health, have not been well studied. We aimed to evaluate the prevalence and health perceptions of tweets related to cardiovascular, pulmonary, and brain health – three organ systems for which tobacco use is a major disease risk factor. <h4>METHODS</h4> We examined the cardiovascular, pulmonary, and brain health perceptions of vaping and JUUL on Twitter, followed by a content analysis of tweets pertaining to the cardiovascular risks. A Twitter firehose API scraped about 6.2 million publicly available tweets from 2015–2019 that contained vaping-related terms, and a separate dataset of about 1.9 million tweets that contained the term JUUL. A quantitative content analysis (n=2145) of tweets was subsequently conducted to assess the health perceptions of vaping and JUUL. Two trained coders independently assessed the posts and Twitter profiles to determine age (<18 or ?18 years), sex, race, sentiment towards JUUL, and vaping-related topics. <h4>RESULTS</h4> The majority of tweets containing vaping or JUUL-related terms did not also contain cardiovascular, pulmonary, or brain health terms (97.99% and 96.67%, respectively). Multiple linear regression analysis showed that youth (<18 years), females, non-White individuals, mention of a flavor, and mention of cardiovascular health harm words were associated with more positive sentiments towards JUUL. Pearson’s chi-squared analyses indicated that youth were more likely to mention a JUUL flavor. Females and youth were more likely to reference cardiovascular terms with humor. <h4>CONCLUSIONS</h4> The cardiovascular health risks of vaping are not fully recognized by the public. Vulnerable populations such as youth and females reference JUUL with cardiovascular-related words that downplay the severity of tobacco as a major risk factor for cardiovascular disease.
Project description:The ability to identify and accurately predict abnormal behavior is important for health monitoring systems in smart environments. Specifically, for elderly persons wishing to maintain their independence and comfort in their living spaces, abnormal behaviors observed during activities of daily living are a good indicator that the person is more likely to have health and behavioral problems that need intervention and assistance. In this paper, we investigate a variety of deep learning models such as Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), CNN-LSTM and Autoencoder-CNN-LSTM for identifying and accurately predicting the abnormal behaviors of elderly people. The temporal information and spatial sequences collected over time are used to generate models, which can be fitted to the training data and the fitted model can be used to make a prediction. We present an experimental evaluation of these models performance in identifying and predicting elderly persons abnormal behaviors in smart homes, via extensive testing on two public data sets, taking into account different models architectures and tuning the hyperparameters for each model. The performance evaluation is focused on accuracy measure.
Project description:Falls in the elderly is a major public health concern due to its high prevalence, serious consequences and heavy burden on the society. Many falls in older people happen within a very short time, which makes it difficult to predict a fall before it occurs and then to provide protection for the person who is falling. The primary objective of this study was to develop deep neural networks for predicting a fall during its initiation and descending but before the body impacts to the ground so that a safety mechanism can be enabled to prevent fall-related injuries. We divided the falling process into three stages (non-fall, pre-impact fall and fall) and developed deep neutral networks to perform three-class classification. Three deep learning models, convolutional neural network (CNN), long short term memory (LSTM), and a novel hybrid model integrating both convolution and long short term memory (ConvLSTM) were proposed and evaluated on a large public dataset of various falls and activities of daily living (ADL) acquired with wearable inertial sensors (accelerometer and gyroscope). Fivefold cross validation results showed that the hybrid ConvLSTM model had mean sensitivities of 93.15, 93.78, and 96.00% for non-fall, pre-impact fall and fall, respectively, which were higher than both LSTM (except the fall class) and CNN models. ConvLSTM model also showed higher specificities for all three classes (96.59, 94.49, and 98.69%) than LSTM and CNN models. In addition, latency test on a microcontroller unit showed that ConvLSTM model had a short latency of 1.06 ms, which was much lower than LSTM model (3.15 ms) and comparable with CNN model (0.77 ms). High prediction accuracy (especially for pre-impact fall) and low latency on the microboard indicated that the proposed hybrid ConvLSTM model outperformed both LSTM and CNN models. These findings suggest that our proposed novel hybrid ConvLSTM model has great potential to be embedded into wearable inertial sensor-based systems to predict pre-impact fall in real-time so that protective devices could be triggered in time to prevent fall-related injuries for older people.
Project description:BACKGROUND:Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL. OBJECTIVE:We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use. METHODS:Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions. RESULTS:Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions). CONCLUSIONS:Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.