Twitter’s customers often build products that need to use the location of a Tweet, or the user who posted it. For example, a customer may be interested in public opinion on health care legislation in a specific part of the country, or want to track customer satisfaction in different regions. Or they may want to research social media communications during extreme weather events.
Customers looking to use or integrate location data into their product face challenges in determining which type of data is best for their use. Factors in this determination include the level of precision and accuracy provided for the different kinds of data, as well as the ease-of-use in filtering for the different types of data.
What geospatial metadata comes with a Tweet?
Twitter provides its users the option to ‘geo-tag’ a Tweet as it is posted. This geo-tagging can be based on an exact location, assigned a Twitter Place (see HERE and HERE for more information), or both. Twitter Places can be thought of as at the neighborhood level, which provides a “bounding box” with latitude and longitude coordinates that define the location area. This type of geographic metadata, referred to as “Tweet Location” provides the highest level of precision. Tweet Locations require no language parsing/processing to access the geographic information. The main drawback to relying on Tweet Locations is that only 1-2% of Tweets are geo-tagged. Additionally, targeting very large areas (e.g. an entire State or Province) requires the use of a significant array of PowerTrack rules to capture the entire area. However, filtering for specific countries is made easy with the place_country: Operator. Also, Places afford nice options, including the option to filter by country code or place name.
A second source of geospatial metadata are mentions of locations in the Tweet content. This type of “Mentioned Location” metadata requires parsing the Tweet message for location names of interest, including nicknames. One Tweet may mention Manhattan, while another may mention the Big Apple. Ease-of-use is fairly high for these types of Tweets, provided you know how people on Twitter refer to the place you care about. You can simply implement keywords or phrases to look for those terms. On the other hand, accuracy is likely lower, as it’s a less-reliable indicator of the user’s precise location.
Finally, every Twitter Profile has a “Location” setting that can be filled out by the account owner. These Profile Locations provide the largest source of geospatial metadata. Not everyone provides this information, and it can contain any phrase the user wants. One Twitter account could have its location set to “Living in the Colorado foothills”, while another could be set to a less helpful “My parents’ basement.” This type of reference is a middle-ground – it isn’t a definite geo-point, validated by GPS, but it is being designated by the user as their location, which provides an extra boost to the expectation of reliability. The options for filtering on this type of data are abundant, and are discussed below.
In summary, there are three metadata sources for geo-referencing tweets:
- Tweet location: tweets that are geotagged with an exact location or Twitter Place.
- Exact location with long/lat coordinates: -85.7629, 38.2267
- Twitter Place with a name (“Louisville Central”) and four pairs of lat/long coordinates that define a “bounding box.”
- Mentioned location: parsing the Tweet message for geospatial location.
- “If you are in Louisville, check out the pizza place off main”
- “I’m in Louisville and it is raining cats and dogs”
- Profile location: parsing the account-level location for locations of interest.
- “I live in Louisville, home of the Derby!”
- “I live in Louisville, the one in beautiful Colorado.”
For example JSON that illustrates how this metadata is delivered in the Tweet payload, along with details on how to filter on it, see this article.
How can I use this metadata to geo-reference Tweets?
Twitter PowerTrack provides many ways to filter on these types of geospatial metadata. These filters, or rules, are built using the more than fifty PowerTrack Operators (see complete list HERE).
See our Filtering Twitter by location article for an introduction to the PowerTrack Operators that can be used to filter on Tweet Locations and Profile Locations. Since Profile Locations are by far the largest source of Twitter geographic metadata, Twitter provides the Profile Geo enrichment.
Since Profile Geo vastly increases the amount of geographic data, there has wide adoption of this enrichment. For introductions to the power of the Twitter Profile Geo data enrichment see our documentation HERE.