In the time since social media has emerged, there’s never been a global event as significant as COVID-19, and the center was uniquely qualified to make sense of this evolving public health crisis. “With a new challenge like COVID, a systematic study of social media can help to uncover risk factors, shed light on public sentiment and even inform how to frame public health messaging or combat misinformation,” Merchant noted. “So when the COVID crisis started, it made sense for our team to pivot rapidly and begin studying COVID-related content.”
Using many of the same techniques in natural language processing from previous work, the center’s research team quickly created a web-based COVID-19 Twitter map to examine aggregate increases in reported symptoms or anxiety levels.
“The way people talk online really informs us of their concerns. So we thought there would be a lot of value in understanding what people were saying in different areas, particularly the variances in places that were harder hit compared to others,” Dr. Guntuku shared. The goal is to use this data to provide current, local, and actionable information for patients, providers, health systems, and policy makers. “We think that the COVID-19 Twitter map could help policy makers and health systems predict outbreak hotspots or a second wave coming in the fall, when traditional cold and flu season is also emerging,” adds Elissa Klinger, Center for Digital Health Assistant Director.
When Twitter launched its COVID-19 stream in April 2020, one of the first applications came from the Center for Digital Health and World Well-Being Project collaboration. The team was interested in how this data could help augment their Twitter map. The main challenges the team faced had to do with the volume of the data -- the COVID-19 stream was at least 10x as large as datasets they had previously worked with -- and their ability to perform data validation at scale. The team also anonymized all the data, and ran it through several custom machine learning tools to develop sentiment scores. “Processing and analyzing this amount of data on a daily basis was an exciting challenge,” says Garrick Sherman, Senior Data Scientist at the World Well-Being Project. “We are employing systems and models that have been developed over several years, but this was our first opportunity to apply them to a public health emergency in real-time.”
Undaunted by the extra effort, analysts at the center see incredible value in mining social media data. Compared to the traditional survey and interview datasets that often inform health systems and public policy, there is possibly more predictive power in observing the conversations on Twitter. “We see people’s own thoughts and feelings, shared in their own words,” says Lauren Southwick, a research manager at the center. “This lets us uncover different terminologies, expressions, or sentence structures that help us better identify indicators of illness, anxiety, or isolation.”
With the ability to understand population-level moods and symptoms, the team can quickly validate new information, like emerging symptoms, in order to rapidly iterate on their predictive models. “When the CDC added six new COVID-19 symptoms on April 17, we could go back to see those symptoms being discussed on Twitter as far back as early March,” notes Dr. Gunktuku. The linguistic characteristics of people talking about these symptoms can then be applied to study the conversation in real-time, so analysts can potentially identify regions that may see a surge in cases.