Data on sentiments and emotions of olympic-themed tweets.
ABSTRACT: Two code files and one dataset related to Olympic Twitter activity are the foundation for this article. Through Twitter's Spritzer streaming API (Application Programming Interface), we collected over 430 million tweets from May 12th, 2016 to September 12th, 2016 windowing the Rio de Janeiro Olympics and Paralympics. We cleaned and filtered these tweets to contain Olympic-related content. We then analyzed the raw data of 21,218,652 tweets including location data, language, and tweet content to distill the sentiment and emotions of Twitter users pertaining to the Olympic Games Kassens-Noor E. et al., 2019. We generalized the original data set to comply with the Twitter's Terms of Service and Developer agreement, 2018. We present the modified dataset and accompanying code files in this article to suggest using both for further analysis on sentiment and emotions related to the Rio de Janeiro Olympics and for comparative research on imagery and perceptions of other Olympic Games.
Project description:Background:Twitter has been used to track trends and disseminate health information during viral epidemics. On January 21, 2020, the Centers for Disease Control and Prevention activated its Emergency Operations Center and the World Health Organization released its first situation report about coronavirus disease 2019 (COVID-19), sparking significant media attention. How Twitter content and sentiment evolved in the early stages of the COVID-19 pandemic has not been described. Methods:We extracted tweets matching hashtags related to COVID-19 from January 14 to 28, 2020 using Twitter's application programming interface. We measured themes and frequency of keywords related to infection prevention practices. We performed a sentiment analysis to identify the sentiment polarity and predominant emotions in tweets and conducted topic modeling to identify and explore discussion topics over time. We compared sentiment, emotion, and topics among the most popular tweets, defined by the number of retweets. Results:We evaluated 126 049 tweets from 53 196 unique users. The hourly number of COVID-19-related tweets starkly increased from January 21, 2020 onward. Approximately half (49.5%) of all tweets expressed fear and approximately 30% expressed surprise. In the full cohort, the economic and political impact of COVID-19 was the most commonly discussed topic. When focusing on the most retweeted tweets, the incidence of fear decreased and topics focused on quarantine efforts, the outbreak and its transmission, as well as prevention. Conclusions:Twitter is a rich medium that can be leveraged to understand public sentiment in real-time and potentially target individualized public health messages based on user interest and emotion.
Project description:<h4>Background</h4>Social media technology such as Twitter allows users to share their thoughts, feelings, and opinions online. The growing body of social media data is becoming a central part of infodemiology research as these data can be combined with other public health datasets (eg, physical activity levels) to provide real-time monitoring of psychological and behavior outcomes that inform health behaviors. Currently, it is unclear whether Twitter data can be used to monitor physical activity levels.<h4>Objective</h4>The aim of this study was to establish the feasibility of using Twitter data to monitor physical activity levels by assessing whether the frequency and sentiment of physical activity-related tweets were associated with physical activity levels across the United States.<h4>Methods</h4>Tweets were collected from Twitter's application programming interface (API) between January 10, 2017 and January 2, 2018. We used Twitter's garden hose method of collecting tweets, which provided a random sample of approximately 1% of all tweets with location metadata falling within the United States. Geotagged tweets were filtered. A list of physical activity-related hashtags was collected and used to further classify these geolocated tweets. Twitter data were merged with physical activity data collected as part of the Behavioral Risk Factor Surveillance System. Multiple linear regression models were fit to assess the relationship between physical activity-related tweets and physical activity levels by county while controlling for population and socioeconomic status measures.<h4>Results</h4>During the study period, 442,959,789 unique tweets were collected, of which 64,005,336 (14.44%) were geotagged with latitude and longitude coordinates. Aggregated data were obtained for a total of 3138 counties in the United States. The mean county-level percentage of physically active individuals was 74.05% (SD 5.2) and 75.30% (SD 4.96) after adjusting for age. The model showed that the percentage of physical activity-related tweets was significantly associated with physical activity levels (beta=.11; SE 0.2; P<.001) and age-adjusted physical activity (beta=.10; SE 0.20; P<.001) on a county level while adjusting for both Gini index and education level. However, the overall explained variance of the model was low (R<sup>2</sup>=.11). The sentiment of the physical activity-related tweets was not a significant predictor of physical activity level and age-adjusted physical activity on a county level after including the Gini index and education level in the model (P>.05).<h4>Conclusions</h4>Social media data may be a valuable tool for public health organizations to monitor physical activity levels, as it can overcome the time lag in the reporting of physical activity epidemiology data faced by traditional research methods (eg, surveys and observational studies). Consequently, this tool may have the potential to help public health organizations better mobilize and target physical activity interventions.
Project description:We evaluated the risk for the Spanish Olympic Team acquiring Zika virus in Rio de Janeiro, Brazil, during 2016. We recruited 117 team members, and all tested negative for Zika virus. Lack of cases in this cohort supports the minimum risk estimates made before the Games.
Project description:<h4>Background</h4>Emerging evidence suggests that people with arthritis are reporting increased physical pain and psychological distress during the COVID-19 pandemic. At the same time, Twitter's daily usage has surged by 23% throughout the pandemic period, presenting a unique opportunity to assess the content and sentiment of tweets. Individuals with arthritis use Twitter to communicate with peers, and to receive up-to-date information from health professionals and services about novel therapies and management techniques.<h4>Objective</h4>The aim of this research was to identify proxy topics of importance for individuals with arthritis during the COVID-19 pandemic, and to explore the emotional context of tweets by people with arthritis during the early phase of the pandemic.<h4>Methods</h4>From March 20 to April 20, 2020, publicly available tweets posted in English and with hashtag combinations related to arthritis and COVID-19 were extracted retrospectively from Twitter. Content analysis was used to identify common themes within tweets, and sentiment analysis was used to examine positive and negative emotions in themes to understand the COVID-19 experiences of people with arthritis.<h4>Results</h4>In total, 149 tweets were analyzed. The majority of tweeters were female and were from the United States. Tweeters reported a range of arthritis conditions, including rheumatoid arthritis, systemic lupus erythematosus, and psoriatic arthritis. Seven themes were identified: health care experiences, personal stories, links to relevant blogs, discussion of arthritis-related symptoms, advice sharing, messages of positivity, and stay-at-home messaging. Sentiment analysis demonstrated marked anxiety around medication shortages, increased physical symptom burden, and strong desire for trustworthy information and emotional connection.<h4>Conclusions</h4>Tweets by people with arthritis highlight the multitude of concurrent concerns during the COVID-19 pandemic. Understanding these concerns, which include heightened physical and psychological symptoms in the context of treatment misinformation, may assist clinicians to provide person-centered care during this time of great health uncertainty.
Project description:OBJECTIVE:To describe the frequency and the distribution of degenerative disc disease (DDD) detected in athletes who underwent spine MRI in the 2016 Summer Olympic Games in Rio de Janeiro. METHODS:Data on spine MRI examinations from the 2016 Summer Olympics were retrospectively analyzed. We assessed the frequency of DDD of the cervical (Cs), thoracic (Ts), and lumbar (Ls) spine using Pfirrmann's classification. Grade II and III were considered as mild, grade IV as moderate, and grade V as severe disc degeneration. Data were analyzed according to the location of the degenerative disc, type of sport, age-groups, and gender of the athletes. RESULTS:One hundred out of 11,274 athletes underwent 108 spine MRI's (21 C, 6?T, and 81?L) (53% Females (F), 47% Males (M)). The frequency of DDD was 40% (42% F, 58% M) over the entire spine (28% mild, 9% moderate and 3% severe). There were 58% (12%F, 88%M) of the cervical spine discs that showed some degree of degeneration (44% mild, 13.5% moderate and 1% severe). Athletics, Boxing, and Swimming were the sports most affected by DDD in the Cs. There were 12.5% of the thoracic discs that showed some degree of degeneration, all were mild DDD and were exclusively seen in female athletes. There were 39% (53% F, 47% M) of the lumbar discs with DDD (26% mild, 9% moderate, and 4% severe). CONCLUSION:Athletes who underwent spine MRI during the 2016 Summer Olympic Games show a high frequency of DDD of cervical and lumbar spines. Recognition of these conditions is important to develop training techniques that may minimize the development of degenerative pathology of the spine.
Project description:BACKGROUND:In the United States, racial disparities in birth outcomes persist and have been widening. Interpersonal and structural racism are leading explanations for the continuing racial disparities in birth outcomes, but research to confirm the role of racism and evaluate trends in the impact of racism on health outcomes has been hampered by the challenge of measuring racism. Most research on discrimination relies on self-reported experiences of discrimination, and few studies have examined racial attitudes and bias at the US national level. OBJECTIVE:This study aimed to investigate the associations between state-level Twitter-derived sentiments related to racial or ethnic minorities and birth outcomes. METHODS:We utilized Twitter's Streaming application programming interface to collect 26,027,740 tweets from June 2015 to December 2017, containing at least one race-related term. Sentiment analysis was performed using support vector machine, a supervised machine learning model. We constructed overall indicators of sentiment toward minorities and sentiment toward race-specific groups. For each year, state-level Twitter-derived sentiment data were merged with birth data for that year. The study participants were women who had singleton births with no congenital abnormalities from 2015 to 2017 and for whom data were available on gestational age (n=9,988,030) or birth weight (n=9,985,402). The main outcomes were low birth weight (birth weight ?2499 g) and preterm birth (gestational age <37 weeks). We estimated the incidence ratios controlling for individual-level maternal characteristics (sociodemographics, prenatal care, and health behaviors) and state-level demographics, using log binomial regression models. RESULTS:The accuracy for identifying negative sentiments on comparing the machine learning model to manually labeled tweets was 91%. Mothers living in states in the highest tertile for negative sentiment tweets referencing racial or ethnic minorities had greater incidences of low birth weight (8% greater, 95% CI 4%-13%) and preterm birth (8% greater, 95% CI 0%-14%) compared with mothers living in states in the lowest tertile. More negative tweets referencing minorities were associated with adverse birth outcomes in the total population, including non-Hispanic white people and racial or ethnic minorities. In stratified subgroup analyses, more negative tweets referencing specific racial or ethnic minority groups (black people, Middle Eastern people, and Muslims) were associated with poor birth outcomes for black people and minorities. CONCLUSIONS:A negative social context related to race was associated with poor birth outcomes for racial or ethnic minorities, as well as non-Hispanic white people.
Project description:The tobacco industry has long sought affiliation with major sporting events, including the Olympic Games, for marketing, advertising and promotion purposes. Since 1988, each Olympic Games has adopted a tobacco-free policy. Limited study of the effectiveness of the smoke-free policy has been undertaken to date, with none examining the tobacco industry's involvement with the Olympics or use of the Olympic brand.A comparison of the contents of Olympic tobacco-free policies from 1988 to 2014 was carried out by searching the websites of the IOC and host NOCs. The specific tobacco control measures adopted for each Games were compiled and compared with measures recommended by the WHO Tobacco Free Sports Initiative and Article 13 of the Framework Convention on Tobacco Control (FCTC). This was supported by semi-structured interviews of key informants involved with the adoption of tobacco-free policies for selected games. To understand the industry's interests in the Olympics, the Legacy Tobacco Documents Library (http://legacy.library.ucsf.edu) was systematically searched between June 2013 and August 2014. Company websites, secondary sources and media reports were also searched to triangulate the above data sources. This paper finds that, while most direct associations between tobacco and the Olympics have been prohibited since 1988, a variety of indirect associations undermine the Olympic tobacco-free policy. This is due to variation in the scope of tobacco-free policies, limited jurisdiction and continued efforts by the industry to be associated with Olympic ideals.The paper concludes that, compatible with the IOC's commitment to promoting healthy lifestyles, a comprehensive tobacco-free policy with standardized and binding measures should be adopted by the International Olympic Committee and all national Olympic committees.
Project description:Twitter data are becoming an important part of modern political science research, but key aspects of the inner workings of Twitter streams as well as self-censorship on the platform require further research. A particularly important research agenda is to understand removal rates of politically charged tweets. In this article, I provide a strategy to understand removal rates on Twitter, particularly on politically charged topics. First, the technical properties of Twitter's API that may distort the analyses of removal rates are tested. Results show that the forward stream does not capture every possible tweet -between 2 and 5 percent of tweets are lost on average, even when the volume of tweets is low and the firehose not needed. Second, data from Twitter's streams are collected on contentious topics such as terrorism or political leaders and non-contentious topics such as types of food. The statistical technique used to detect uncommon removal rate patterns is multilevel analysis. Results show significant differences in the removal of tweets between different topic groups. This article provides the first systematic comparison of information loss and removal on Twitter as well as a strategy to collect valid removal samples of tweets.
Project description:BACKGROUND:Although cancer screening reduces morbidity and mortality, millions of people worldwide remain unscreened. Social media provide a unique platform to understand public sentiment toward tools that are commonly used for cancer screening. OBJECTIVE:The objective of our study was to examine public sentiment toward colonoscopy, mammography, and Pap smear and how this sentiment spreads by analyzing discourse on Twitter. METHODS:In this observational study, we classified 32,847 tweets (online postings on Twitter) related to colonoscopy, mammography, or Pap smears using a naive Bayes algorithm as containing positive, negative, or neutral sentiment. Additionally, we characterized the spread of sentiment on Twitter using an established model to study contagion. RESULTS:Colonoscopy-related tweets were more likely to express negative than positive sentiment (negative to positive ratio 1.65, 95% CI 1.51-1.80, P<.001), in contrast to the more positive sentiment expressed regarding mammography (negative to positive ratio 0.43, 95% CI 0.39-0.47, P<.001). The proportions of negative versus positive tweets about Pap smear were not significantly different (negative to positive ratio 0.95, 95% CI 0.87-1.04, P=.18). Positive and negative tweets tended to share lexical features across screening modalities. Positive tweets expressed resonance with the benefits of early detection. Fear and pain were the principal lexical features seen in negative tweets. Negative sentiment for colonoscopy and mammography spread more than positive sentiment; no correlation with sentiment and spread was seen for Pap smear. CONCLUSIONS:Analysis of social media data provides a unique, quantitative framework to better understand the public's perception of medical interventions that are commonly used for cancer screening. Given the growing use of social media, public health interventions to improve cancer screening should use the health perceptions of the population as expressed in social network postings about tests that are frequently used for cancer screening, as well as other people they may influence with such postings.
Project description:BACKGROUND:Social media has become a major resource for observing and understanding public opinions using infodemiology and infoveillance methods, especially during emergencies such as disease outbreaks. For public health agencies, understanding the driving forces of web-based discussions will help deliver more effective and efficient information to general users on social media and the web. OBJECTIVE:The study aimed to identify the major contributors that drove overall Zika-related tweeting dynamics during the 2016 epidemic. In total, 3 hypothetical drivers were proposed: (1) the underlying Zika epidemic quantified as a time series of case counts; (2) sporadic but critical real-world events such as the 2016 Rio Olympics and World Health Organization's Public Health Emergency of International Concern (PHEIC) announcement, and (3) a few influential users' tweeting activities. METHODS:All tweets and retweets (RTs) containing the keyword Zika posted in 2016 were collected via the Gnip application programming interface (API). We developed an analytical pipeline, EventPeriscope, to identify co-occurring trending events with Zika and quantify the strength of these events. We also retrieved Zika case data and identified the top influencers of the Zika discussion on Twitter. The influence of 3 potential drivers was examined via a multivariate time series analysis, signal processing, a content analysis, and text mining techniques. RESULTS:Zika-related tweeting dynamics were not significantly correlated with the underlying Zika epidemic in the United States in any of the four quarters in 2016 nor in the entire year. Instead, peaks of Zika-related tweeting activity were strongly associated with a few critical real-world events, both planned, such as the Rio Olympics, and unplanned, such as the PHEIC announcement. The Rio Olympics was mentioned in >15% of all Zika-related tweets and PHEIC occurred in 27% of Zika-related tweets around their respective peaks. In addition, the overall tweeting dynamics of the top 100 most actively tweeting users on the Zika topic, the top 100 users receiving most RTs, and the top 100 users mentioned were the most highly correlated to and preceded the overall tweeting dynamics, making these groups of users the potential drivers of tweeting dynamics. The top 100 users who retweeted the most were not critical in driving the overall tweeting dynamics. There were very few overlaps among these different groups of potentially influential users. CONCLUSIONS:Using our proposed analytical workflow, EventPeriscope, we identified that Zika discussion dynamics on Twitter were decoupled from the actual disease epidemic in the United States but were closely related to and highly influenced by certain sporadic real-world events as well as by a few influential users. This study provided a methodology framework and insights to better understand the driving forces of web-based public discourse during health emergencies. Therefore, health agencies could deliver more effective and efficient web-based communications in emerging crises.