Building rules for filtered stream
The filtered stream endpoints deliver filtered Tweets to you in real-time that match on a set of rules that are applied to the stream. Rules are made up of operators that are used to match on a variety of Tweet attributes.
Multiple rules can be applied to a stream using the POST /tweets/search/stream/rules endpoint. Once you’ve added rules and connect to your stream using the GET /tweets/search/stream endpoint, only those Tweets that match your rules will be delivered in real-time through a persistent streaming connection. You do not need to disconnect from your stream to add or remove rules.
To learn more about how to create high-quality rules, visit the following tutorial:
Building high-quality filters for getting Twitter data
Table of contents
Building a rule
Rule limitations
Your rules will be limited depending on which product track you are using.
If you are using the Standard product track at the Basic level of access, you are able to add 25 concurrent rules to your stream, and each rule can be 512 characters long.
If you are using the Academic Research product track, you are able to add 1000 concurrent rules to your stream, and each rule can be 1024 characters long.
Operator availability
While most operators are available to any developer, there are several that are reserved for those that have been approved for the Academic Research product track. We list which product tracks each operator is available to in the list of operators table using the following labels:
- All: Available when using any Project.
- Academic Research only: Available when using an Academic Research Project
Operator types: standalone and conjunction-required
Standalone operators can be used alone or together with any other operators (including those that require conjunction).
For example, the following rule will work because it uses the #hashtag operator, which is standalone:
#twitterapiv2
Conjunction required operators cannot be used by themselves in a rule; they can only be used when at least one standalone operator is included in the rule. This is because using these operators alone would be far too general, and would match on an extremely high volume of Tweets.
For example, the following rules are not supported since they contain only conjunction required operators:
has:media
has:links OR is:retweet
If we add in a standalone operator, such as the phrase "twitter data", the rule would then work properly.
"twitter data" has:mentions (has:media OR has:links)
Boolean operators and grouping
If you would like to string together multiple operators in a single rule, you have the following tools at your disposal:
AND logic | Successive operators with a space between them will result in boolean "AND" logic, meaning that Tweets will match only if both conditions are met. For example, snow day #NoSchool will match Tweets containing the terms snow and day and the hashtag #NoSchool. |
OR logic | Successive operators with OR between them will result in OR logic, meaning that Tweets will match if either condition is met. For example, specifying grumpy OR cat OR #meme will match any Tweets containing at least the terms grumpy or cat, or the hashtag #meme. |
NOT logic, negation | Prepend a dash (-) to a keyword (or any operator) to negate it (NOT). For example, cat #meme -grumpy will match Tweets containing the hashtag #meme and the term cat, but only if they do not contain the term grumpy. One common rule clause is -is:retweet, which will not match on Retweets, thus matching only on original Tweets. All operators can be negated, but negated operators cannot be used alone. Do not negate a set of operators grouped together in a set of parentheses. Instead, negate each individual operator. For example, Instead of using -(grumpy OR cat OR #meme), we suggest that you use -grumpy -cat -#meme. |
Grouping | You can use parentheses to group operators together. For example, (grumpy cat) OR (#meme has:images) will return either Tweets containing the terms grumpy and cat, or Tweets with images containing the hashtag #meme. Note that ANDs are applied first, then ORs are applied. |
A note on negations
All operators can be negated except for sample:, and -is:nullcast must always be negated. Negated operators cannot be used alone.
Do not negate a set of operators grouped together in a set of parentheses. Instead, negate each individual operator.
For example, instead of using skiing -(snow OR day OR noschool), we suggest that you use skiing -snow -day -noschool.
Order of operations
When combining AND and OR functionality, the following order of operations will dictate how your rule is evaluated.
- Operators connected by AND logic are combined first
- Then, operators connected with OR logic are applied
For example:
- apple OR iphone ipad would be evaluated as apple OR (iphone ipad)
- ipad iphone OR android would be evaluated as (iphone ipad) OR android
To eliminate uncertainty and ensure that your rule is evaluated as intended, group terms together with parentheses where appropriate.
For example:
- (apple OR iphone) ipad
- iphone (ipad OR android)
Punctuation, diacritics, and case sensitivity
If you specify a keyword or hashtag rule with character accents or diacritics, it will match Tweet text honoring the diacritics (hashtags or keywords). Rules with a keyword Diacrítica or hashtag #cumpleaños will match Diacrítica or #cumpleaños but not Diacritica or #cumpleanos without the tilde í or eñe.
Characters with accents or diacritics are treated the same as normal characters and are not treated as word boundaries. For example, a rule with the keyword cumpleaños would only match activities containing the word cumpleaños and would not match activities containing cumplea, cumplean, or os.
All operators are evaluated in a case-insensitive manner. For example, the rule cat will match all Tweets that include the following: cat, CAT, Cat.
Specificity and efficiency
When you start to build your rule, it is important to keep a few things in mind.
- Using broad, standalone operators for your rule such as a single keyword or #hashtag is generally not recommended since it will likely match on a massive volume of Tweets. Creating a more robust rule will result in a more specific set of matching Tweets, and will hopefully reduce the amount of noise in the payload that you will need to sift through to find valuable insights.
- For example, if your rule was just the keyword happy you will likely get anywhere from 200,000 - 300,000 Tweets per day.
- Adding more conditional operators narrows your search results, for example (happy OR happiness) place_country:GB -birthday -is:retweet
- Writing efficient rules is also beneficial for staying within the characters rule length restriction. The character count includes the entire rule string including spaces and operators.
- For example, the following rule is 59 characters long: (happy OR happiness) place_country:GB -birthday -is:retweet
- For example, the following rule is 59 characters long: (happy OR happiness) place_country:GB -birthday -is:retweet
Iteratively building a rule
Test your rule early and often
Getting a rule to return the "right" results the first time is rare. There is so much on Twitter that may or may not be obvious at first and the rule syntax described above may be hard to match to your desired search. As you build a rule, it is important for you to periodically test it out with the stream endpoint to see what data it returns. You can also test with one of the Search Tweet endpoints, assuming the operators that you are using are also available via that endpoint.
For this section, we are going to start with the following rule and adjust it based on the results that we receive during our test:
happy OR happiness
Use results to narrow the rule
As you test the rule, you should scan the returned Tweets to see if they include the data that you are expecting and hoping to receive. Starting with a broad rule and a superset of Tweet matches allows you to review the result and narrow the rule to filter out undesired results.
When we tested the example rule, we noticed that we were getting Tweets in a variety of different languages. In this situation, we want to only receive Tweets that are in english, so we’re going to add the lang: operator:
(happy OR happiness) lang:en
The test delivered a number of Tweets wishing people a happy birthday, so we are going to add -birthday as a negated keyword operator. We also want to only receive original Tweets, so we’ve added the negated -is:retweet operator:
(happy OR happiness) lang:en -birthday -is:retweet
Adjust for inclusion where needed
If you notice that you are not receiving data that you expect and know that there are existing Tweets that should return, you may need to broaden your rule by removing operators that may be filtering out the desired data.
For our example, we noticed that there were other Tweets in our personal timeline that expressed the emotion that we are looking for and weren’t included in the test results. To ensure we have greater coverage, we are going to add the keywords, excited and elated.
(happy OR happiness OR excited OR elated) lang:en -birthday -is:retweet
Adjust for popular trends/bursts over the time period
Trends come and go on Twitter quickly. Maintaining your rule should be an active process. If you plan to use a single rule for a while, we suggest that you periodically check in on the data that you are receiving to see if you need to make any adjustments.
In our example, we notice that we started to receive some Tweets that are wishing people a “happy holidays”. Since we don’t want these Tweets included in our results, we are going to add a negated -holidays keyword.
(happy OR happiness OR excited OR elated) lang:en -birthday -is:retweet -holidays
Operators
Operator | Type | Availability | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
keyword | Standalone | All | Matches a keyword within the body of a Tweet. This is a tokenized match, meaning that your keyword string will be matched against the tokenized text of the Tweet body. Tokenization splits words based on punctuation, symbols, and Unicode basic plane separator characters. For example, a Tweet with the text “I like coca-cola” would be split into the following tokens: I, like, coca, cola. These tokens would then be compared to the keyword string used in your rule. To match strings containing punctuation (for example coca-cola), symbol, or separator characters, you must wrap your keyword in double-quotes. Example: pepsi OR cola OR "coca cola" |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
emoji | Standalone | All | Matches an emoji within the body of a Tweet. Similar to a keyword, emojis are a tokenized match, meaning that your emoji will be matched against the tokenized text of the Tweet body. Note that if an emoji has a variant, you must wrap it in double quotes to add to a rule. Example: (😃 OR 😡) 😬 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
"exact phrase match" | Standalone | All | Matches the exact phrase within the body of a Tweet. Example: ("Twitter API" OR #v2) -"filtered stream" |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# | Standalone | All | Matches any Tweet containing a recognized hashtag, if the hashtag is a recognized entity in a Tweet. This operator performs an exact match, NOT a tokenized match, meaning the rule #thanku will match posts with the exact hashtag #thanku, but not those with the hashtag #thankunext. Example: #thankunext #fanart OR @arianagrande |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@ | Standalone | All | Matches any Tweet that mentions the given username, if the username is a recognized entity (including the @ character). Example: (@twitterdev OR @twitterapi) -@twitter |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
$ | Standalone | Academic Research only | Matches any Tweet that contains the specified ‘cashtag’ (where the leading character of the token is the ‘$’ character). Note that the cashtag operator relies on Twitter’s ‘symbols’ entity extraction to match cashtags, rather than trying to extract the cashtag from the body itself. Example: $twtr OR @twitterdev -$fb |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
from: | Standalone | All | Matches any Tweet from a specific user. The value can be either the username (excluding the @ character) or the user’s numeric user ID. Example: from:twitterdev OR from:twitterapi -from:twitter |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
to: | Standalone | All | Matches any Tweet that is in reply to a particular user. The value can be either the username (excluding the @ character) or the user’s numeric user ID. Example: to:twitterdev OR to:twitterapi -to:twitter |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
url: | Standalone | All | Performs a tokenized match on any validly-formatted URL of a Tweet. This operator can matches on the contents of both the url or expanded_url fields. For example, a Tweet containing "You should check out Twitter Developer Labs: https://t.co/c0A36SWil4" (with the short URL redirecting to https://developer.twitter.com) will match both the following rules: from:TwitterDev url:"https://developer.twitter.com" (because it will match the contents of entities.urls.expanded_url) from:TwitterDev url:"https://t.co" (because it will match the contents of entities.urls.url) Tokens and phrases containing punctuation or special characters should be double-quoted (for example, url:"/developer"). Similarly, to match on a specific protocol, enclose in double-quotes (for example, url:"https://developer.twitter.com"). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
retweets_of: | Standalone | All | Matches Tweets that are Retweets of the specified user. The value can be either the username (excluding the @ character) or the user’s numeric user ID. Example: retweets_of:twitterdev OR retweets_of:twitterapi |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
context: | Standalone | All | NEW Matches Tweets with a specific domain id and/or domain id, enitity id pair where * represents a wildcard. To learn more about this operator, please visit our page on annotations. context:domain_id.entity_id context:domain_id.* context:*.entity_id Examples: context:10.799022225751871488 (domain_id.entity_id returns Tweets matching that specific domain-entity pair) context:47.* (domain_id.* returns Tweets matching that domain ID, with any domain-entity pair) context:*.799022225751871488 (*.entity_id returns Tweets matching that entity ID, with any domain-entity pair) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
entity: | Standalone | All | NEW Matches Tweets with a specific entity string value. To learn more about this operator, please visit our page on annotations. entity:"string declaration of entity/place" Examples: entity:"Michael Jordan" OR entity:"Barcelona" |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
conversation_id: | Standalone | All | NEW Matches Tweets that share a common conversation ID. A conversation ID is set to the Tweet ID of a Tweet that started a conversation. As Replies to a Tweet are posted, even Replies to Replies, the conversation_id is added to its JSON payload. Example: conversation_id:1334987486343299072 (from:twitterdev OR from:twitterapi) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
bio: | Standalone | Academic Research only | Matches a keyword or phrase within the Tweet publisher's bio. This is a tokenized match within the contents of the description field within the User object. Example: bio:developer OR bio:"data engineer" OR bio:academic |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
bio_name: | Standalone | Academic Research only | Matches a keyword within the Tweet publisher's user bio name. This is a tokenized match within the contents of a user’s “name” field within the User object. Example: bio_name:phd OR bio_name:md |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
bio_location: | Standalone | Academic Research only | Matches Tweets that are published by users whose location contains the specified keyword or phrase. This operator performs a tokenized match, similar to the normal keyword rules on the message body. This location is part of the User object, matches on the 'location' field, and is a non-normalized, user-generated, free-form string. It is also different from a Tweet's location (see place:). Example: bio_location:"big apple" OR bio_location:nyc OR bio_location:manhattan |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
place: | Standalone | Academic Research only | Matches Tweets tagged with the specified location or Twitter place ID. Multi-word place names (“New York City”, “Palo Alto”) should be enclosed in quotes. Note: See the GET geo/search standard v1.1 endpoint for how to obtain Twitter place IDs. Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet. Example: place:"new york city" OR place:seattle OR place:fd70c22040963ac7 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
place_country: | Standalone | Academic Research only | Matches Tweets where the country code associated with a tagged place/location matches the given ISO alpha-2 character code. You can find a list of valid ISO codes on Wikipedia. Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet. Example: place_country:US OR place_country:MX OR place_country:CA |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
point_radius: | Standalone | Academic Research only | Matches against the place.geo.coordinates object of the Tweet when present, and in Twitter, against a place geo polygon, where the Place polygon is fully contained within the defined region.
Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet. Example: point_radius:[2.355128 48.861118 16km] OR point_radius:[-41.287336 174.761070 20mi]
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
bounding_box: | Standalone | Academic Research only | Matches against the place.geo.coordinates object of the Tweet when present, and in Twitter, against a place geo polygon, where the place polygon is fully contained within the defined region. bounding_box:[west_long south_lat east_long north_lat]
Note: This operator will not match on Retweets, since Retweet's places are attached to the original Tweet. It will also not match on places attached to the original Tweet of a Quote Tweet. Example: bounding_box:[-105.301758 39.964069 -105.178505 40.09455] |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is:retweet | Conjunction required | All | Matches on Retweets that match the rest of the specified rule. This operator looks only for true Retweets (for example, those generated using the Retweet button). Quote Tweets will not be matched by this operator. Example: data @twitterdev -is:retweet |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is:reply | Conjunction required | All | Deliver only explicit replies that match a rule. Can also be negated to exclude replies that match a rule from delivery. When used with the filtered stream, this operator matches on replies to an original Tweet, replies in quoted Tweets and replies in Retweets. Example: from:twitterdev is:reply |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is:quote | Conjunction required | All | Returns all Quote Tweets, also known as Tweets with comments. Example: "sentiment analysis" is:quote |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is:verified | Conjunction required | All | Deliver only Tweets whose authors are verified by Twitter. Example: #nowplaying is:verified |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-is:nullcast | Conjunction required | Academic Research only | Removes Tweets created for promotion only on ads.twitter.com that have a "source":"Twitter for Advertisers (legacy)" or "source":"Twitter for Advertisers". This operator must be negated. For more info on Nullcasted Tweets, see our page on Tweet availability. Example: "mobile games" -is:nullcast |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:hashtags | Conjunction required | All | Matches Tweets that contain at least one hashtag. Example: from:twitterdev -has:hashtags |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:cashtags | Conjunction required | Academic Research only | Matches Tweets that contain a cashtag symbol (with a leading ‘$’ character. For example, $tag). Example: #stonks has:cashtags |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:links | Conjunction required | All | This operator matches Tweets which contain links and media in the Tweet body. Example: from:twitterdev announcement has:links |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:mentions | Conjunction required | All | Matches Tweets that mention another Twitter user. Example: #nowplaying has:mentions |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:media | Conjunction required | All | Matches Tweets that contain a media object, such as a photo, GIF, or video, as determined by Twitter. This will not match on media created with Periscope, or Tweets with links to other media hosting sites. Example: (kittens OR puppies) has:media |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:images | Conjunction required | All | Matches Tweets that contain a recognized URL to an image. Example: #meme has:images |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:videos | Conjunction required | All | Matches Tweets that contain native Twitter videos, uploaded directly to Twitter. This will not match on videos created with Periscope, or Tweets with links to other video hosting sites. Example: #icebucketchallenge has:videos |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
has:geo | Conjunction required | Academic Research only | Matches Tweets that have Tweet-specific geolocation data provided by the Twitter user. This can be either a location in the form of a Twitter place, with the corresponding display name, geo polygon, and other fields, or in rare cases, a geo lat-long coordinate. Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data. Example: recommend #paris has:geo -bakery |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
sample: | Conjunction required | All | Returns a random percent sample of Tweets that match a rule rather than the entire set of Tweets. The percent value must be represented by an integer between 1 and 100 (for example, sample:10 will return a random 10% sample). This operator first reduces the scope of the stream to the percentage you specified, then the rule/filter is applied to that sampled subset. In other words, if you are using, for example, sample:10, each Tweet will have a 10% chance of being in the sample. This operator applies to the entire rule and requires all OR'd terms to be grouped. Example: #nowplaying @spotify sample:15 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
lang: | Conjunction required | All | Matches Tweets that have been classified by Twitter as being of a particular language (if, and only if, the tweet has been classified). It is important to note that each Tweet is currently only classified as being of one language, so AND’ing together multiple languages will yield no results. Note: if no language classification can be made the provided result is ‘und’ (for undefined). Example: recommend #paris lang:en The list below represents the currently supported languages and their corresponding BCP 47 language identifier:
|