Quantifying collective attention from tweet stream.
ABSTRACT: Online social media are increasingly facilitating our social interactions, thereby making available a massive "digital fossil" of human behavior. Discovering and quantifying distinct patterns using these data is important for studying social behavior, although the rapid time-variant nature and large volumes of these data make this task difficult and challenging. In this study, we focused on the emergence of "collective attention" on Twitter, a popular social networking service. We propose a simple method for detecting and measuring the collective attention evoked by various types of events. This method exploits the fact that tweeting activity exhibits a burst-like increase and an irregular oscillation when a particular real-world event occurs; otherwise, it follows regular circadian rhythms. The difference between regular and irregular states in the tweet stream was measured using the Jensen-Shannon divergence, which corresponds to the intensity of collective attention. We then associated irregular incidents with their corresponding events that attracted the attention and elicited responses from large numbers of people, based on the popularity and the enhancement of key terms in posted messages or "tweets." Next, we demonstrate the effectiveness of this method using a large dataset that contained approximately 490 million Japanese tweets by over 400,000 users, in which we identified 60 cases of collective attentions, including one related to the Tohoku-oki earthquake. "Retweet" networks were also investigated to understand collective attention in terms of social interactions. This simple method provides a retrospective summary of collective attention, thereby contributing to the fundamental understanding of social behavior in the digital era.
Project description:The advent of social media expands our ability to transmit information and connect with others instantly, which enables us to behave as "social sensors." Here, we studied concurrent bursty behavior of Twitter users during major sporting events to determine their function as social sensors. We show that the degree of concurrent bursts in tweets (posts) and retweets (re-posts) works as a strong indicator of winning or losing a game. More specifically, our simple tweet analysis of Japanese professional baseball games in 2013 revealed that social sensors can immediately react to positive and negative events through bursts of tweets, but that positive events are more likely to induce a subsequent burst of retweets. We confirm that these findings also hold true for tweets related to Major League Baseball games in 2015. Furthermore, we demonstrate active interactions among social sensors by constructing retweet networks during a baseball game. The resulting networks commonly exhibited user clusters depending on the baseball team, with a scale-free connectedness that is indicative of a substantial difference in user popularity as an information source. While previous studies have mainly focused on bursts of tweets as a simple indicator of a real-world event, the temporal correlation between tweets and retweets implies unique aspects of social sensors, offering new insights into human behavior in a highly connected world.
Project description:"Media events" generate conditions of shared attention as many users simultaneously tune in with the dual screens of broadcast and social media to view and participate. We examine how collective patterns of user behavior under conditions of shared attention are distinct from other "bursts" of activity like breaking news events. Using 290 million tweets from a panel of 193,532 politically active Twitter users, we compare features of their behavior during eight major events during the 2012 U.S. presidential election to examine how patterns of social media use change during these media events compared to "typical" time and whether these changes are attributable to shifts in the behavior of the population as a whole or shifts from particular segments such as elites. Compared to baseline time periods, our findings reveal that media events not only generate large volumes of tweets, but they are also associated with (1) substantial declines in interpersonal communication, (2) more highly concentrated attention by replying to and retweeting particular users, and (3) elite users predominantly benefiting from this attention. These findings empirically demonstrate how bursts of activity on Twitter during media events significantly alter underlying social processes of interpersonal communication and social interaction. Because the behavior of large populations within socio-technical systems can change so dramatically, our findings suggest the need for further research about how social media responses to media events can be used to support collective sensemaking, to promote informed deliberation, and to remain resilient in the face of misinformation.
Project description:Since 2014, the Society of Critical Care Medicine has encouraged "live-tweeting" through the use of specific hashtags at each annual Critical Care Congress. We describe how the digital footprint of the Society of Critical Care Medicine Congress on Twitter has evolved at a time when social media use at conferences is becoming increasingly popular. Design:We used Symplur Signals (Symplur LLC, Pasadena, CA) to track all tweets containing the Society of Critical Care Medicine Congress hashtag for each annual meeting between 2014 and 2020. We collected data on the number of tweets, tweet characteristics, and impressions (i.e., potential views) for each year and data on the characteristics of the top 100 most actively tweeting users of that Congress. Setting:Twitter. Subjects:Users tweeting with the Critical Care Congress hashtag. Interventions:Not applicable. Measurements and Main Results:The Critical Care Congress digital footprint grew substantially from 2014 to 2020. The 2014 Critical Care Congress included 1,629 tweets by 266 users, compared with 29,657 tweets by 3,551 participants in 2020; average hourly tweets increased from 9.7 to 177. The percentage of tweets with mentions of other users and tweets with visual media increased. Users attending the conference were significantly more likely to compose original tweets, whereas those tweeting from afar were more likely to retweet Critical Care Congress content. There was a yearly increase in content-specific hashtags used in conjunction with Critical Care Congress hashtags (n = 429 in 2014 to n = 22,272 in 2020), most commonly related to pediatrics (18% of all hashtags), mobility/rehab (9%), sepsis (7%) social media (6%), and ICU burnout (1%). Conclusions:There has been significant growth in live-tweeting at the Critical Care Congress, along with the increased use of content-specific hashtags and visual media. This digital footprint is largely driven by a proportion of highly engaged users. As medical conferences transition to completely or partially online platforms, understanding of the digital footprint is crucial for success.
Project description:BACKGROUND:The CDC hosts monthly panel presentations titled 'Public Health Grand Rounds' and publishes monthly reports known as Vital Signs. Hashtags #CDCGrandRounds and #VitalSigns were used to promote them on Twitter. Objectives: This study quantified the effect of hashtag count, mention count, and URL count and attaching visual cues to #CDCGrandRounds or #VitalSigns tweets on their retweet frequency. METHODS:Through Twitter Search Application Programming Interface, original tweets containing the hashtag #CDCGrandRounds (n = 6,966; April 21, 2011-October 25, 2016) and the hashtag #VitalSigns (n = 15,015; March 19, 2013-October 31, 2016) were retrieved respectively. Negative binomial regression models were applied to each corpus to estimate the associations between retweet frequency and three predictors (hashtag count, mention count, and URL link count). Each corpus was sub-set into cycles (#CDCGrandRounds: n = 58, #VitalSigns: n = 42). We manually coded the 30 tweets with the highest number of retweets for each cycle, whether it contained visual cues (images or videos). Univariable negative binomial regression models were applied to compute the prevalence ratio (PR) of retweet frequency for each cycle, between tweets with and without visual cues. FINDINGS:URL links increased retweet frequency in both corpora; effects of hashtag count and mention count differed between the two corpora. Of the 58 #CDCGrandRounds cycles, 29 were found to have statistically significantly different retweet frequencies between tweets with and without visual cues. Of these 29 cycles, one had a PR estimate < 1; twenty-four, PR > 1 but < 3; and four, PR > 3. Of the 42 #VitalSigns cycles, 19 were statistically significant. Of these 19 cycles, six were PR > 1 and < 3; and thirteen, PR > 3. Conclusions: The increase of retweet frequency through attaching visual cues varied across cycles for original tweets with #CDCGrandRounds and #VitalSigns. Future research is needed to determine the optimal choice of visual cues to maximize the influence of public health tweets.
Project description:BACKGROUND:The Ebola communication crisis of 2014 generated widespread fear and attention among Western news media, social media users, and members of the United States (US) public. Health communicators need more information on misinformation and the social media environment during a fear-inducing disease outbreak to improve communication practices. The purpose of this study was to describe the content of Ebola-related tweets with a specific focus on misinformation, political content, health related content, risk framing, and rumors. METHODS:We examined tweets from a random 1% sample of all tweets published September 30th - October 30th, 2014, filtered for English-language tweets mentioning "Ebola" in the content or hashtag, that had at least 1 retweet (N?=?72,775 tweets). A randomly selected subset of 3639 (5%) tweets were evaluated for inclusion. We analyzed the 3113 tweets that meet inclusion criteria using public health trained human coders to assess tweet characteristics (joke, opinion, discord), veracity (true, false, partially false), political context, risk frame, health context, Ebola specific messages, and rumors. We assessed the proportion of tweets with specific content using descriptive statistics and chi-squared tests. RESULTS:Of non-joke tweets, 10% of Ebola-related tweets contained false or partially false information. Twenty-five percent were related to politics, 28% contained content that provoked reader response or promoted discord, 42% contained risk elevating messages and 72% were related to health. The most frequent rumor mentioned focused on government conspiracy. When comparing tweets with true information to tweets with misinformation, a greater percentage of tweets with misinformation were political in nature (36% vs 15%) and contained discord-inducing statements (45% vs 10%). Discord-inducing statements and political messages were both significantly more common in tweets containing misinformation compared with those without(p?<?0.001). CONCLUSIONS:Results highlight the importance of anticipating politicization of disease outbreaks, and the need for policy makers and social media companies to build partnerships and develop response frameworks in advance of an event. While each public health event is different, our findings provide insight into the possible social media environment during a future epidemic and could help optimize potential public health communication strategies.
Project description:Despite concerns about their health risks, e‑cigarettes have gained popularity in recent years. Concurrent with the recent increase in e‑cigarette use, social media sites such as Twitter have become a common platform for sharing information about e-cigarettes and to promote marketing of e‑cigarettes. Monitoring the trends in e‑cigarette-related social media activity requires timely assessment of the content of posts and the types of users generating the content. However, little is known about the diversity of the types of users responsible for generating e‑cigarette-related content on Twitter.The aim of this study was to demonstrate a novel methodology for automatically classifying Twitter users who tweet about e‑cigarette-related topics into distinct categories.We collected approximately 11.5 million e‑cigarette-related tweets posted between November 2014 and October 2016 and obtained a random sample of Twitter users who tweeted about e‑cigarettes. Trained human coders examined the handles' profiles and manually categorized each as one of the following user types: individual (n=2168), vaper enthusiast (n=334), informed agency (n=622), marketer (n=752), and spammer (n=1021). Next, the Twitter metadata as well as a sample of tweets for each labeled user were gathered, and features that reflect users' metadata and tweeting behavior were analyzed. Finally, multiple machine learning algorithms were tested to identify a model with the best performance in classifying user types.Using a classification model that included metadata and features associated with tweeting behavior, we were able to predict with relatively high accuracy five different types of Twitter users that tweet about e‑cigarettes (average F1 score=83.3%). Accuracy varied by user type, with F1 scores of individuals, informed agencies, marketers, spammers, and vaper enthusiasts being 91.1%, 84.4%, 81.2%, 79.5%, and 47.1%, respectively. Vaper enthusiasts were the most challenging user type to predict accurately and were commonly misclassified as marketers. The inclusion of additional tweet-derived features that capture tweeting behavior was found to significantly improve the model performance-an overall F1 score gain of 10.6%-beyond metadata features alone.This study provides a method for classifying five different types of users who tweet about e‑cigarettes. Our model achieved high levels of classification performance for most groups, and examining the tweeting behavior was critical in improving the model performance. Results can help identify groups engaged in conversations about e‑cigarettes online to help inform public health surveillance, education, and regulatory efforts.
Project description:BACKGROUND:Social media platforms play a vital role in the dissemination of health information. However, evidence suggests that a high proportion of Twitter posts (ie, tweets) are not necessarily accurate, and many studies suggest that tweets do not need to be accurate, or at least evidence based, to receive traction. This is a dangerous combination in the sphere of health information. OBJECTIVE:The first objective of this study is to examine health-related tweets originating from Saudi Arabia in terms of their accuracy. The second objective is to find factors that relate to the accuracy and dissemination of these tweets, thereby enabling the identification of ways to enhance the dissemination of accurate tweets. The initial findings from this study and methodological improvements will then be employed in a larger-scale study that will address these issues in more detail. METHODS:A health lexicon was used to extract health-related tweets using the Twitter application programming interface and the results were further filtered manually. A total of 300 tweets were each labeled by two medical doctors; the doctors agreed that 109 tweets were either accurate or inaccurate. Other measures were taken from these tweets' metadata to see if there was any relationship between the measures and either the accuracy or the dissemination of the tweets. The entire range of this metadata was analyzed using Python, version 3.6.5 (Python Software Foundation), to answer the research questions posed. RESULTS:A total of 34 out of 109 tweets (31.2%) in the dataset used in this study were classified as untrustworthy health information. These came mainly from users with a non-health care background and social media accounts that had no corresponding physical (ie, organization) manifestation. Unsurprisingly, we found that traditionally trusted health sources were more likely to tweet accurate health information than other users. Likewise, these provisional results suggest that tweets posted in the morning are more trustworthy than tweets posted at night, possibly corresponding to official and casual posts, respectively. Our results also suggest that the crowd was quite good at identifying trustworthy information sources, as evidenced by the number of times a tweet's author was tagged as favorited by the community. CONCLUSIONS:The results indicate some initially surprising factors that might correlate with the accuracy of tweets and their dissemination. For example, the time a tweet was posted correlated with its accuracy, which may reflect a difference between professional (ie, morning) and hobbyist (ie, evening) tweets. More surprisingly, tweets containing a kashida-a decorative element in Arabic writing used to justify the text within lines-were more likely to be disseminated through retweets. These findings will be further assessed using data analysis techniques on a much larger dataset in future work.
Project description:The 2008 financial crisis unveiled the intrinsic failures of the financial system as we know it. As a consequence, impact investing started to receive increasing attention, as evidenced by the high market growth rates. The goal of impact investment is to generate social and environmental impact alongside a financial return. In this paper we identify the main players in the sector and how they interact and communicate with each other. We use Twitter as a proxy of the impact investing market, and analyze relevant tweets posted over a period of ten months. We apply network, contents and sentiment analysis on the acquired dataset.Our study shows that Twitter users exhibit favourable leaning (predominantly neutral or positive) towards impact investing. Retweet communities are decentralised and include users from a variety of sectors. Despite some basic common vocabulary used by all retweet communities identified, the vocabulary and the topics discussed by each community vary largely. We note that an additional effort should be made in raising awareness about the sector, especially by policymakers and media outlets. The role of investors and the academia is also discussed, as well as the emergence of hybrid business models within the sector and its connections to the tech industry. This paper extends our previous study, one of the first analyses of Twitter activities in the impact investing market.
Project description:BACKGROUND:Twitter is an indicator of real-world performance, thus, is an appropriate arena to assess the social consideration and attitudes toward psychosis. OBJECTIVE:The aim of this study was to perform a mixed-methods study of the content and key metrics of tweets referring to psychosis in comparison with tweets referring to control diseases (breast cancer, diabetes, Alzheimer, and human immunodeficiency virus). METHODS:Each tweet's content was rated as nonmedical (NM: testimonies, health care products, solidarity or awareness and misuse) or medical (M: included a reference to the illness's diagnosis, treatment, prognosis, or prevention). NM tweets were classified as positive or pejorative. We assessed the appropriateness of the medical content. The number of retweets generated and the potential reach and impact of the hashtags analyzed was also investigated. RESULTS:We analyzed a total of 15,443 tweets: 8055 classified as NM and 7287 as M. Psychosis-related tweets (PRT) had a significantly higher frequency of misuse 33.3% (212/636) vs 1.15% (853/7419; P<.001) and pejorative content 36.2% (231/636) vs 11.33% (840/7419; P<.001). The medical content of the PRT showed the highest scientific appropriateness 100% (391/391) vs 93.66% (6030/6439; P<.001) and had a higher frequency of content about disease prevention. The potential reach and impact of the tweets related to psychosis were low, but they had a high retweet-to-tweet ratio. CONCLUSIONS:We show a reduced number and a different pattern of contents in tweets about psychosis compared with control diseases. PRT showed a predominance of nonmedical content with increased frequencies of misuse and pejorative tone. However, the medical content of PRT showed high scientific appropriateness aimed toward prevention.
Project description:The advent of the digital era provided a fertile ground for the development of virtual societies, complex systems influencing real-world dynamics. Understanding online human behavior and its relevance beyond the digital boundaries is still an open challenge. Here we show that online social interactions during a massive voting event can be used to build an accurate map of real-world political parties and electoral ranks for Italian elections in 2018. We provide evidence that information flow and collective attention are often driven by a special class of highly influential users, that we name "augmented humans", who exploit thousands of automated agents, also known as bots, for enhancing their online influence. We show that augmented humans generate deep information cascades, to the same extent of news media and other broadcasters, while they uniformly infiltrate across the full range of identified groups. Digital augmentation represents the cyber-physical counterpart of the human desire to acquire power within social systems.