The businesses and brands we work with here at the Developer Platform consistently monitor Twitter via our APIs for a variety of reasons, in a variety of ways. From tracking the latest consumer trends and analyzing competitors to staying ahead of breaking news and responding to customer service requests, Twitter APIs are key for unlocking insights into real-time public conversations impacting business.
Twitter is a treasure trove of data, but language is complex, and the journey to insights involves processing a massive amount of Tweets by ways of organizing, sorting and filtering. In this article, I will discuss three common approaches of organizing large volumes of Tweets by ways of Topic analysis, a process to identify and categorize the underlying themes in the Tweet text. I’ll also go over when it would make sense to use natural language processing (NLP) and custom machine learning models (CMLM) for topic and keyword extraction to power industry specific use cases.
The purpose of this article is to introduce you to some common approaches to Topic discovery with Twitter data so that you can choose the approach that makes the most sense for your use case.
The first step in Topic analysis is Topic discovery (aka topic detection or entity extraction). The goal of this technique is to organize and understand large collections of Tweet text by assigning tags or categories according to each topic or theme in the Tweet text.
The typical use cases to discover topics from a large volume of Tweets are:
- Trend analysis
- Power alerts and recommendations
- Enhance search and personalization
- Gain insights (customer feedback, market research, competitive intelligence, etc.)
- Issue detection (customer service/support issues)
Approach 1: Tweet Annotations
A turnkey solution for topic discovery with the Twitter API is Tweet annotations, which offer named entity recognition and context annotations.
Twitter categorizes entities as “people,” “places,” “products,” “organizations,” or “other.” Entities are programmatically assigned based on what is explicitly mentioned in the Tweet text and delivered in the entity object within a Tweet payload.
Context annotations are labeled for a Tweet if the Tweet’s text matches with Twitter’s semantically classified Tweets. Twitter curates a list of keywords, hashtags, and @handles that are relevant to a given topic and assigns context annotations labels. Context annotations are added to a Tweet’s text based on semantic rules as opposed to a machine learning approach, where a model is trained to classify text. Context annotations can be used to discover Tweets on topics that may have been previously difficult to surface.
Tweet Annotations Example
Let’s explore Tweet annotations for a set of Tweets specific to the customer experience domain and use annotations as filters to narrow down to specific Tweets of interest. The below examples leverage a set of 300 Tweets ingested into a database. We’ll start with filtering by Entity annotations. The five entity types (image below) provided by Tweet annotations are: