Using Twitter to Examine Web-Based Patient Experience Sentiments in the United States: Longitudinal Study.
ABSTRACT: BACKGROUND:There are documented differences in access to health care across the United States. Previous research indicates that Web-based data regarding patient experiences and opinions of health care are available from Twitter. Sentiment analyses of Twitter data can be used to examine differences in patient views of health care across the United States. OBJECTIVE:The objective of our study was to provide a characterization of patient experience sentiments across the United States on Twitter over a 4-year period. METHODS:Using data from Twitter, we developed a set of 4 software components to automatically label and examine a database of tweets discussing patient experience. The set includes a classifier to determine patient experience tweets, a geolocation inference engine for social data, a modified sentiment classifier, and an engine to determine if the tweet is from a metropolitan or nonmetropolitan area in the United States. Using the information retrieved, we conducted spatial and temporal examinations of tweet sentiments at national and regional levels. We examined trends in the time of the day and that of the week when tweets were posted. Statistical analyses were conducted to determine if any differences existed between the discussions of patient experience in metropolitan and nonmetropolitan areas. RESULTS:We collected 27.3 million tweets between February 1, 2013 and February 28, 2017, using a set of patient experience-related keywords; the classifier was able to identify 2,759,257 tweets labeled as patient experience. We identified the approximate location of 31.76% (876,384/2,759,257) patient experience tweets using a geolocation classifier to conduct spatial analyses. At the national level, we observed 27.83% (243,903/876,384) positive patient experience tweets, 36.22% (317,445/876,384) neutral patient experience tweets, and 35.95% (315,036/876,384) negative patient experience tweets. There were slight differences in tweet sentiments across all regions of the United States during the 4-year study period. We found the average sentiment polarity shifted toward less negative over the study period across all the regions of the United States. We observed the sentiment of tweets to have a lower negative fraction during daytime hours, whereas the sentiment of tweets posted between 8 pm and 10 am had a higher negative fraction. Nationally, sentiment scores for tweets in metropolitan areas were found to be more extremely negative and mildly positive compared with tweets in nonmetropolitan areas. This result is statistically significant (P<.001). Tweets with extremely negative sentiments had a medium effect size (d=0.34) at the national level. CONCLUSIONS:This study presents methodologies for a deeper understanding of Web-based discussion related to patient experience across space and time and demonstrates how Twitter can provide a unique and unsolicited perspective from users on the health care they receive in the United States.
Project description:BACKGROUND:The coronavirus disease (COVID-19) pandemic led to substantial public discussion. Understanding these discussions can help institutions, governments, and individuals navigate the pandemic. OBJECTIVE:The aim of this study is to analyze discussions on Twitter related to COVID-19 and to investigate the sentiments toward COVID-19. METHODS:This study applied machine learning methods in the field of artificial intelligence to analyze data collected from Twitter. Using tweets originating exclusively in the United States and written in English during the 1-month period from March 20 to April 19, 2020, the study examined COVID-19-related discussions. Social network and sentiment analyses were also conducted to determine the social network of dominant topics and whether the tweets expressed positive, neutral, or negative sentiments. Geographic analysis of the tweets was also conducted. RESULTS:There were a total of 14,180,603 likes, 863,411 replies, 3,087,812 retweets, and 641,381 mentions in tweets during the study timeframe. Out of 902,138 tweets analyzed, sentiment analysis classified 434,254 (48.2%) tweets as having a positive sentiment, 187,042 (20.7%) as neutral, and 280,842 (31.1%) as negative. The study identified 5 dominant themes among COVID-19-related tweets: health care environment, emotional support, business economy, social change, and psychological stress. Alaska, Wyoming, New Mexico, Pennsylvania, and Florida were the states expressing the most negative sentiment while Vermont, North Dakota, Utah, Colorado, Tennessee, and North Carolina conveyed the most positive sentiment. CONCLUSIONS:This study identified 5 prevalent themes of COVID-19 discussion with sentiments ranging from positive to negative. These themes and sentiments can clarify the public's response to COVID-19 and help officials navigate the pandemic.
Project description:BACKGROUND:Infodemiology is an emerging field of research that utilizes user-generated health-related content, such as that found in social media, to help improve public health. Twitter has become an important venue for studying emerging patterns in health issues such as substance use because it can reflect trends in real-time and display messages generated directly by users, giving a uniquely personal voice to analyses. Over the past year, several states in the United States have passed legislation to legalize adult recreational use of cannabis and the federal government in Canada has done the same. There are few studies that examine the sentiment and content of tweets about cannabis since the recent legislative changes regarding cannabis have occurred in North America. OBJECTIVE:To examine differences in the sentiment and content of cannabis-related tweets by state cannabis laws, and to examine differences in sentiment between the United States and Canada between 2017 and 2019. METHODS:In total, 1,200,127 cannabis-related tweets were collected from January 1, 2017, to June 17, 2019, using the Twitter application programming interface. Tweets then were grouped geographically based on cannabis legal status (legal for adult recreational use, legal for medical use, and no legal use) in the locations from which the tweets came. Sentiment scoring for the tweets was done with VADER (Valence Aware Dictionary and sEntiment Reasoner), and differences in sentiment for states with different cannabis laws were tested using Tukey adjusted two-sided pairwise comparisons. Topic analysis to determine the content of tweets was done using latent Dirichlet allocation in Python, using a Java implementation, LdaMallet, with Gensim wrapper. RESULTS:Significant differences were seen in tweet sentiment between US states with different cannabis laws (P=.001 for negative sentiment tweets in fully illegal compared to legal for adult recreational use states), as well as between the United States and Canada (P=.003 for positive sentiment and P=.001 for negative sentiment). In both cases, restrictive state policy environments (eg, those where cannabis use is fully illegal, or legal for medical use only) were associated with more negative tweet sentiment than less restrictive policy environments (eg, where cannabis is legal for adult recreational use). Six key topics were found in recent US tweet contents: fun and recreation (keywords, eg, love, life, high); daily life (today, start, live); transactions (buy, sell, money); places of use (room, car, house); medical use and cannabis industry (business, industry, company); and legalization (legalize, police, tax). The keywords representing content of tweets also differed between the United States and Canada. CONCLUSIONS:Knowledge about how cannabis is being discussed online, and geographic differences that exist in these conversations may help to inform public health planning and prevention efforts. Public health education about how to use cannabis in ways that promote safety and minimize harms may be especially important in places where cannabis is legal for adult recreational and medical use.
Project description:BACKGROUND:With restrictions on movement and stay-at-home orders in place due to the COVID-19 pandemic, social media platforms such as Twitter have become an outlet for users to express their concerns, opinions, and feelings about the pandemic. Individuals, health agencies, and governments are using Twitter to communicate about COVID-19. OBJECTIVE:The aims of this study were to examine key themes and topics of English-language COVID-19-related tweets posted by individuals and to explore the trends and variations in how the COVID-19-related tweets, key topics, and associated sentiments changed over a period of time from before to after the disease was declared a pandemic. METHODS:Building on the emergent stream of studies examining COVID-19-related tweets in English, we performed a temporal assessment covering the time period from January 1 to May 9, 2020, and examined variations in tweet topics and sentiment scores to uncover key trends. Combining data from two publicly available COVID-19 tweet data sets with those obtained in our own search, we compiled a data set of 13.9 million English-language COVID-19-related tweets posted by individuals. We use guided latent Dirichlet allocation (LDA) to infer themes and topics underlying the tweets, and we used VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis to compute sentiment scores and examine weekly trends for 17 weeks. RESULTS:Topic modeling yielded 26 topics, which were grouped into 10 broader themes underlying the COVID-19-related tweets. Of the 13,937,906 examined tweets, 2,858,316 (20.51%) were about the impact of COVID-19 on the economy and markets, followed by spread and growth in cases (2,154,065, 15.45%), treatment and recovery (1,831,339, 13.14%), impact on the health care sector (1,588,499, 11.40%), and governments response (1,559,591, 11.19%). Average compound sentiment scores were found to be negative throughout the examined time period for the topics of spread and growth of cases, symptoms, racism, source of the outbreak, and political impact of COVID-19. In contrast, we saw a reversal of sentiments from negative to positive for prevention, impact on the economy and markets, government response, impact on the health care industry, and treatment and recovery. CONCLUSIONS:Identification of dominant themes, topics, sentiments, and changing trends in tweets about the COVID-19 pandemic can help governments, health care agencies, and policy makers frame appropriate responses to prevent and control the spread of the pandemic.
Project description:BACKGROUND:Twitter presents a valuable and relevant social media platform to study the prevalence of information and sentiment on vaping that may be useful for public health surveillance. Machine learning classifiers that identify vaping-relevant tweets and characterize sentiments in them can underpin a Twitter-based vaping surveillance system. Compared with traditional machine learning classifiers that are reliant on annotations that are expensive to obtain, deep learning classifiers offer the advantage of requiring fewer annotated tweets by leveraging the large numbers of readily available unannotated tweets. OBJECTIVE:This study aims to derive and evaluate traditional and deep learning classifiers that can identify tweets relevant to vaping, tweets of a commercial nature, and tweets with provape sentiments. METHODS:We continuously collected tweets that matched vaping-related keywords over 2 months from August 2018 to October 2018. From this data set of tweets, a set of 4000 tweets was selected, and each tweet was manually annotated for relevance (vape relevant or not), commercial nature (commercial or not), and sentiment (provape or not). Using the annotated data, we derived traditional classifiers that included logistic regression, random forest, linear support vector machine, and multinomial naive Bayes. In addition, using the annotated data set and a larger unannotated data set of tweets, we derived deep learning classifiers that included a convolutional neural network (CNN), long short-term memory (LSTM) network, LSTM-CNN network, and bidirectional LSTM (BiLSTM) network. The unannotated tweet data were used to derive word vectors that deep learning classifiers can leverage to improve performance. RESULTS:LSTM-CNN performed the best with the highest area under the receiver operating characteristic curve (AUC) of 0.96 (95% CI 0.93-0.98) for relevance, all deep learning classifiers including LSTM-CNN performed better than the traditional classifiers with an AUC of 0.99 (95% CI 0.98-0.99) for distinguishing commercial from noncommercial tweets, and BiLSTM performed the best with an AUC of 0.83 (95% CI 0.78-0.89) for provape sentiment. Overall, LSTM-CNN performed the best across all 3 classification tasks. CONCLUSIONS:We derived and evaluated traditional machine learning and deep learning classifiers to identify vaping-related relevant, commercial, and provape tweets. Overall, deep learning classifiers such as LSTM-CNN had superior performance and had the added advantage of requiring no preprocessing. The performance of these classifiers supports the development of a vaping surveillance system.
Project description:BACKGROUND:Discrimination in the health care system contributes to worse health outcomes among lesbian, gay, bisexual, transgender, and queer (LGBTQ) patients. OBJECTIVE:The aim of this study is to examine disparities in patient experience among LGBTQ persons using social media data. METHODS:We collected patient experience data from Twitter from February 2013 to February 2017 in the United States. We compared the sentiment of patient experience tweets between Twitter users who self-identified as LGBTQ and non-LGBTQ. The effect of state-level partisan identity on patient experience sentiment and differences between LGBTQ users and non-LGBTQ users were analyzed. RESULTS:We observed lower (more negative) patient experience sentiment among 13,689 LGBTQ users compared to 1,362,395 non-LGBTQ users. Increasing state-level liberal political identification was associated with higher patient experience sentiment among all users but had stronger effects for LGBTQ users. CONCLUSIONS:Our findings highlight that social media data can yield insights about patient experience for LGBTQ persons and suggest that a state-level sociopolitical environment influences patient experience for this group. Efforts are needed to reduce disparities in patient care for LGBTQ persons while taking into context the effect of the political climate on these inequities.
Project description:BACKGROUND:In the United States, racial disparities in birth outcomes persist and have been widening. Interpersonal and structural racism are leading explanations for the continuing racial disparities in birth outcomes, but research to confirm the role of racism and evaluate trends in the impact of racism on health outcomes has been hampered by the challenge of measuring racism. Most research on discrimination relies on self-reported experiences of discrimination, and few studies have examined racial attitudes and bias at the US national level. OBJECTIVE:This study aimed to investigate the associations between state-level Twitter-derived sentiments related to racial or ethnic minorities and birth outcomes. METHODS:We utilized Twitter's Streaming application programming interface to collect 26,027,740 tweets from June 2015 to December 2017, containing at least one race-related term. Sentiment analysis was performed using support vector machine, a supervised machine learning model. We constructed overall indicators of sentiment toward minorities and sentiment toward race-specific groups. For each year, state-level Twitter-derived sentiment data were merged with birth data for that year. The study participants were women who had singleton births with no congenital abnormalities from 2015 to 2017 and for whom data were available on gestational age (n=9,988,030) or birth weight (n=9,985,402). The main outcomes were low birth weight (birth weight ?2499 g) and preterm birth (gestational age <37 weeks). We estimated the incidence ratios controlling for individual-level maternal characteristics (sociodemographics, prenatal care, and health behaviors) and state-level demographics, using log binomial regression models. RESULTS:The accuracy for identifying negative sentiments on comparing the machine learning model to manually labeled tweets was 91%. Mothers living in states in the highest tertile for negative sentiment tweets referencing racial or ethnic minorities had greater incidences of low birth weight (8% greater, 95% CI 4%-13%) and preterm birth (8% greater, 95% CI 0%-14%) compared with mothers living in states in the lowest tertile. More negative tweets referencing minorities were associated with adverse birth outcomes in the total population, including non-Hispanic white people and racial or ethnic minorities. In stratified subgroup analyses, more negative tweets referencing specific racial or ethnic minority groups (black people, Middle Eastern people, and Muslims) were associated with poor birth outcomes for black people and minorities. CONCLUSIONS:A negative social context related to race was associated with poor birth outcomes for racial or ethnic minorities, as well as non-Hispanic white people.
Project description:<h4>Objective</h4>To examine public and media response to the draft (October 2011) and finalised (May 2012) recommendations of the United States Preventive Services Task Force (USPSTF) against prostate-specific antigen (PSA) testing via Twitter, a popular social network with over 200 million active users.<h4>Materials and methods</h4>We used a mixed-methods design to analyse posts on Twitter, known as 'tweets'. Using the search term 'prostate cancer', we archived tweets in the 24-h periods following the release of both the draft and the finalised USPSTF recommendations. We recorded tweet rate per h and developed a coding system to assess the type of user and sentiment expressed in tweets and linked articles.<h4>Results</h4>After the draft and finalised USPSTF recommendations were released, 2042 and 5357 tweets focused on the USPSTF report, respectively. The tweet rate nearly doubled within 2 h of both announcements. Fewer than 10% of tweets expressed an opinion about screening, and the majority of these were pro-screening during both periods. By contrast, anti-screening articles were tweeted more frequently in both the draft and finalised study periods. Between the draft and the finalised recommendations, the proportion of anti-screening tweets and anti-screening article links increased (P = 0.03 and P < 0.01, respectively).<h4>Conclusions</h4>There was increased Twitter activity surrounding the USPSTF draft and finalised recommendations. The percentage of anti-screening tweets and articles appeared to increase, perhaps due to the interval public comment period. Despite this, most tweets did not express an opinion, suggesting a missed opportunity in this important arena for advocacy.
Project description:We examined openly shared substance-related tweets to estimate prevalent sentiment around substance use and identify popular substance use activities. Additionally, we investigated associations between substance-related tweets and business characteristics and demographics at the zip code level.A total of 79,848,992 tweets were collected from 48 states in the continental United States from April 2015-March 2016 through the Twitter API, of which 688,757 were identified as being related to substance use. We implemented a machine learning algorithm (maximum entropy text classifier) to estimate sentiment score for each tweet. Zip code level summaries of substance use tweets were created and merged with the 2013 Zip Code Business Patterns and 2010 US Census Data.Quality control analyses with a random subset of tweets yielded excellent agreement rates between computer generated and manually generated labels: 97%, 88%, 86%, 75% for underage engagement in substance use, alcohol, drug, and smoking tweets, respectively. Overall, 34.1% of all substance-related tweets were classified as happy. Alcohol was the most frequently tweeted substance, followed by marijuana. Regression results suggested more convenience stores in a zip code were associated with higher percentages of tweets about alcohol. Larger zip code population size and higher percentages of African Americans and Hispanics were associated with fewer tweets about substance use and underage engagement. Zip code economic disadvantage was associated with fewer alcohol tweets but more drug tweets.The patterns in substance use mentions on Twitter differ by zip code economic and demographic characteristics. Online discussions have great potential to glorify and normalize risky behaviors. Health promotion and underage substance prevention efforts may include interactive social media campaigns to counter the social modeling of risky behaviors.
Project description:To investigate factors associated with engagement of U.S. Federal Health Agencies via Twitter. Our specific goals are to study factors related to a) numbers of retweets, b) time between the agency tweet and first retweet and c) time between the agency tweet and last retweet.We collect 164,104 tweets from 25 Federal Health Agencies and their 130 accounts. We use negative binomial hurdle regression models and Cox proportional hazards models to explore the influence of 26 factors on agency engagement. Account features include network centrality, tweet count, numbers of friends, followers, and favorites. Tweet features include age, the use of hashtags, user-mentions, URLs, sentiment measured using Sentistrength, and tweet content represented by fifteen semantic groups.A third of the tweets (53,556) had zero retweets. Less than 1% (613) had more than 100 retweets (mean ?=?284). The hurdle analysis shows that hashtags, URLs and user-mentions are positively associated with retweets; sentiment has no association with retweets; and tweet count has a negative association with retweets. Almost all semantic groups, except for geographic areas, occupations and organizations, are positively associated with retweeting. The survival analyses indicate that engagement is positively associated with tweet age and the follower count.Some of the factors associated with higher levels of Twitter engagement cannot be changed by the agencies, but others can be modified (e.g., use of hashtags, URLs). Our findings provide the background for future controlled experiments to increase public health engagement via Twitter.
Project description:As one of the serious public health issues, vaccination refusal has been attracting more and more attention, especially for newly approved human papillomavirus (HPV) vaccines. Understanding public opinion towards HPV vaccines, especially concerns on social media, is of significant importance for HPV vaccination promotion.In this study, we leveraged a hierarchical machine learning based sentiment analysis system to extract public opinions towards HPV vaccines from Twitter. English tweets containing HPV vaccines-related keywords were collected from November 2, 2015 to March 28, 2016. Manual annotation was done to evaluate the performance of the system on the unannotated tweets corpus. Followed time series analysis was applied to this corpus to track the trends of machine-deduced sentiments and their associations with different days of the week.The evaluation of the unannotated tweets corpus showed that the micro-averaging F scores have reached 0.786. The learning system deduced the sentiment labels for 184,214 tweets in the collected unannotated tweets corpus. Time series analysis identified a coincidence between mainstream outcome and Twitter contents. A weak trend was found for "Negative" tweets that decreased firstly and began to increase later; an opposite trend was identified for "Positive" tweets. Tweets that contain the worries on efficacy for HPV vaccines showed a relative significant decreasing trend. Strong associations were found between some sentiments ("Positive", "Negative", "Negative-Safety" and "Negative-Others") with different days of the week.Our efforts on sentiment analysis for newly approved HPV vaccines provide us an automatic and instant way to extract public opinion and understand the concerns on Twitter. Our approaches can provide a feedback to public health professionals to monitor online public response, examine the effectiveness of their HPV vaccination promotion strategies and adjust their promotion plans.