Premium

Enterprise

Introduction 

Premium and enterprise products utilize a specific operator language to deliver filtered social data to you based the rule(s). Rules are made up of one or more ‘clauses’, where a clause is a keyword, exact phrase, or one of the many operators. Before beginning to build rules with these operators, be sure to review the syntax described below, look through the list of available operators, and understand the restrictions around building rules. You should also be sure to understand the nuances of how rules are evaluated logically, in the ‘order of operations’ section.

Multiple clauses can be combined with boolean logic. 
‘And'ed’ logic is specified with a space between clauses 
‘Or'ed’ logic is specified with an upper-case OR. 

See below for more details…

Each rule can be up to a specific character length depending on the level of access with no limits on the number of positive clauses (things you want to match or filter on) and negative clauses (things you want to exclude and not match on).

Rule length limits: sandbox (256characters), premium (1,024 characters), enterprise (2,048 characters)

Note that operators may be either positive or negative.

Positive Operators define what you want to include in the results. E.g. the ‘has:hashtags’ operator says “I want activities containing hashtags.”

Negative Operators define what you want to exclude from the results, and are created by using the Boolean NOT logic described above. E.g. ‘-has:hashtags’ says “Exclude any activities containing hashtags, even if they otherwise match my rule.”

Products have no restriction on the number of positive and negative clauses, subject to a maximum length of 2,048 characters.

 

Boolean Syntax

Rule creation utilizes various types of boolean logic and grouping. See the table below for detail regarding the syntax and requirements for each.

Logic type Premium operator syntax Description
AND social data Whitespace between two operators results in AND logic between them

Matches activities containing BOTH keywords ('social', 'data').

Do not use AND explicitly in your rule. Only use whitespace. 
An explicit AND will be treated as a regular keyword.
OR social OR data To OR together two operators, insert an all-caps OR, enclosed in whitespace between them

Matches activities with EITHER keyword ('social' OR 'data')

Note that if you combine OR and AND functionality in a single rule, you should understand the order of operations described here, and consider grouping operators together using parentheses as described below to ensure your rule behaves as expected. 

You must use upper-case 'OR' in your rule. 
Lower-case 'or' will be treated as a regular keyword.
NOT social -data
apple -(fruit OR orange)
apple -(android phone)
Insert a - character immediately in front of the operator or group of operators.

The example rule shown matches activities containing keyword 'social', but excludes those which contain the keyword 'data')

Negated ORs are not allowed where the rule would request "everything in the firehose except the negation." E.g., apple OR -ipad is invalid because it would match all activities except those mentioning 'ipad'.
Grouping (social OR data) -(gnip OR ping) Parentheses around multiple operators create a functional "group".

Groups can be connected to clauses in the same manner as an individual clause via whitespace (AND) or ORs, and can be negated. However, note that the same restriction described above regarding negation/OR combination also applies to groups. For example, the following are examples of invalid syntax using groups:
ipad OR -(iphone OR ipod)
ipad OR (-iphone OR ipod)

Grouping is especially important where a single rule combines AND and OR functionality, due to the order of operations used to evaluate the rule. See below for more details.


Order of Operations

When combining AND and OR functionality in a single rule, the following order of operations will dictate how your rule is evaluated.

  1. Operators connected by AND logic are combined first
  2. Then, operators connected with OR logic are applied

Example:

  • apple OR iphone ipad would be evaluated as apple OR (iphone ipad)
  • ipad iphone OR android would be evaluated as (iphone ipad) OR android

To eliminate uncertainty and ensure that your rules are evaluated as intended, group terms together with parentheses where appropriate. For example:

  • (apple OR iphone) ipad
  • iphone (ipad OR android)

Order of Operations exceptions:

If grouped by parentheses, the entire group is evaluated as a whole ignoring any negation, then is negated. There is no distribution.  If operators in a (group) are all AND'ed together, and 1 is false, then the entire expression is false, which of course is turned into true when negated.

Examples:

from:furiouscamper -(fresh tunes is:retweet)

This evaluates to: true AND NOT(true AND true AND false) ==> true AND NOT(false) ==> true AND true ==> true

 

from:furiouscamper -fresh -tunes -is:retweet

true AND NOTtrue AND NOTtrue AND NOTfalse ==> true AND false AND false AND true ==> false

 

from:furiouscamper -("fresh tunes" is:retweet)

Similarly, true AND NOT(true AND false) ==> true AND NOT(false) ==> true AND true ==> true

 

from:furiouscamper -"fresh tunes" -is:retweet

true AND NOTtrue AND NOTfalse ==> true AND false AND true ==> false

 

If they are all AND'ed together, and 1 is false, then the entire expression is false, which of course is turned into true when negated.

Punctuation, Diacritics, and Case Sensitivity

If you specify a keyword or hashtag rule with character accents or diacritics for premium operators, it will match Tweet text honoring the diacritics (hashtags or keywords). A rule with a keyword Diacrítica or hashtag #cumpleaños will match Diacrítica or #cumpleaños but not Diacritica or #cumpleanos without the tilde í or eñe.

Characters with accents or diacritics are treated the same as normal characters and are not treated as word boundaries. For example, a rule of cumpleaños would only match activities containing the word cumpleaños and would not match activities containing cumplea, cumplean, or os.

All operators are evaluated in a case-insensitive manner. For example, the rule Cat will match all of the following: cat, CAT, Cat.

Building rules with operators (example)

Creating a rule to capture Tweets from attendees of a specific house party for New Years Eve.

Start with a keyword match

Keyword matches are similar to queries in a search interface (e.g. Google). For example, the following premium operator rule would match activities with ‘happy’ in the text body.

    happy

ANDing terms with white space

Adding another keyword is the same as adding another requirement for finding matches. For example, this rule would only match activities where both ‘happy’ and ‘party’ were present in the text, in either order – having a space between terms operates as boolean AND logic. If you include an explicit AND in your rule, it will be rejected by the rules endpoint.

    happy party

ORing terms with upper-case OR

Many situations actually call for boolean OR logic, however. This is easily accomplished as well. Note that the OR operator must be upper-case and a lower-case ‘or’ will be treated as a regular keyword.

    happy OR party

Negating terms

Still, other scenarios might call for excluding results with certain keywords (a boolean NOT logic). For instance, activities with ‘happy’, but excluding any with ‘birthday’ in the text.

    happy -birthday

Grouping with parentheses

These types of logic can be combined using grouping with parentheses and expanded to much more complex queries.

    (happy OR party) (holiday OR house) -(birthday OR democratic OR republican)

This is just the beginning though – while the above examples rely simply on tokenized matching for keywords, premium products also offer operators to perform different types of matching on the text.

Exact phrase match

    "new year's eve
"

Substring match

    contains:day

Proximity match

    "happy birthday"~3

Further, other operators allow you to filter on unique aspects of social data, besides just the text. For example:

The user who is posting a Tweet

    from:user

Geo-tagged Tweets within 10 miles of Pearl St. in Boulder, CO

    point_radius:[-105.27346517 40.01924738 10.0mi]

Putting it all together

These can be combined with text filters using the same types of logic described above.

    (happy OR party) (holiday OR house OR "new year's eve") point_radius:[-105.27346517 40.01924738 10.0mi] lang:en -(birthday OR democratic OR republican)

 

 

Rule Tags

As described here, each PowerTrack rule may be created with a tag. These tags have no effect on filtering, but can be used to create logical groupings of rules within your app. Each rule may have only one tag, with a maximum of 255 characters. Tags are included with the JSON formatted rule at the time of creation via the API, as described in our documentation.

Putting Rules in JSON Format

In order to add or delete a rule from a stream via the API, the rules must utilize JSON format. Essentially, this requires putting each rule into the following structure:

{"value":"insert_rule_here"}

Rules with Double-quotes

If the ‘rule’ contains double-quote characters (“) associated with exact-match or other operators, they must be escaped using a backslash to distinguish them from the structure of the JSON format. For example, if your rule is:

"social data" @gnip

The JSON formatted rule would be:

{"value":"\"social data\" @gnip"}

Rules with Double-quote String Literals

To include a double-quote character as a string literal within an exact-match, it must be double-escaped. For example, for a rule matching on the exact phrase ‘Toys “R” Us’, including the double-quotes around R, the plain-text representation of this would look like the following:

"Toys \"R\" Us"

Translating this to JSON format, you should use the following structure:

{"value":"\"Toys \\\"R\\\" Us\""}

Rules with Tags

To include an optional Tag with your rule, as described above, simply include an additional “tag” field with the rule value:

{"value":"\"social data\" @gnip","tag":"RULE-TAG-01"}

Formatting for API Requests

When adding or deleting rules from the stream via the API, multiple JSON formatted rules should be comma delimited, and wrapped in a JSON “rules” array, as shown below:

{"rules":[{"value":"from:gnip"},{"value":"\social data\" @gnip","tag":"RULE-TAG-01"}]}

Operators that Match Quote Tweets

In terms of filtering, the operators below will match on content from both the original quoted Tweet and the new “comment” Tweet.

  • Keywords
  • Phrases
  • Proximity
  • #hashtags
  • @mentions
  • $cashtags
  • url:
  • url_contains:
  • has:links
  • has:mentions
  • has:hashtags
  • has:media
  • has:symbols
  • is:quote

Premium operators

Below are the operators available in real-time and historical PowerTrack. A subset of these are available with the premium and enterprise search APIs. See this table for a product-by-product list of available operators. 

 Operator Description
keyword

Matches a keyword within the body of a Tweet. This is a tokenized match, meaning that your keyword string will be matched against the tokenized text of the Tweet body – tokenization is based on punctuation, symbol, and separator Unicode basic plane characters. For example, a Tweet with the text “I like coca-cola” would be split into the following tokens: I, like, coca, cola. These tokens would then be compared to the keyword string used in your rule. To match strings containing punctuation (e.g. coca-cola), symbol, or separator characters, you must use a quoted exact match as described below.

emoji
Matches an emoji within the body of a Tweet. Emojis are a tokenized match, meaning that your emoji will be matched against the tokenized text of the Tweet body – tokenization is based on punctuation, symbol/emoji, and separator Unicode basic plane characters. For example, a Tweet with the text “I like 🍕” would be split into the following tokens: I, like, 🍕. These tokens would then be compared to the emoji used in your rule. Note that if an emoji has a variant, you must use “quotations” to add to a rule.
"exact phrase match"

Matches an exact phrase within the body of a Tweet.

Note: In 30 Day Search and Full Archive Search, punctuation is not tokenized and is instead treated as whitespace. 

e.g. quoted “#hashtag” will match “hashtag” but not #hashtag (use the hashtag # operator without quotes to match on actual hashtags 

e.g. quoted “$cashtag” will match “cashtag” but not $cashtag (use the cashtag $ operator without quotes to match on actual cashtags 

#

Matches any Tweet with the given hashtag.

This operator performs an exact match, NOT a tokenized match, meaning the rule “2016” will match posts with the exact hashtag “2016”, but not those with the hashtag “2016election”

Note: that the hashtag operator relies on Twitter’s entity extraction to match hashtags, rather than extracting the hashtag from the body itself. See HERE for more information on Twitter Entities JSON attributes.

@

Matches any Tweet that mentions the given username.

The to: operator returns a subset match of the @mention operator.

The value can be either the username (excluding the @ character) or the user’s numeric Account ID or. See HERE for looking up numeric Twitter Account IDs.

"keyword1 keyword2"~N

Commonly referred to as a proximity operator, this matches a Tweet where the keywords are no more than N tokens from each other.

If the keywords are in the opposite order, they can not be more than N-2 tokens from each other.

Can have any number of keywords in quotes.

N cannot be greater than 6.

contains:

Substring match for Tweets that have the given substring in the body, regardless of tokenization. In other words, this does a pure substring match and does not consider word boundaries.

Use double quotes to match substrings that contain whitespace or punctuation.

from:

Matches any Tweet from a specific user.

The value must be the user’s Twitter numeric Account ID or username (excluding the @ character). See HERE for looking up numeric Twitter Account IDs.

to:

Matches any Tweet that is in reply to a particular user.

The value must be the user’s numeric Account ID or username (excluding the @ character). See HERE  for looking up numeric Twitter Account IDs.

url:
Performs a tokenized (keyword/phrase) match on the expanded URLs of a Tweet (similar to url_contains). Tokens and phrases containing punctuation or special characters should be double-quoted. E.g. url:"/developer". While generally not recommended, if you want to match on a specific protocol, enclose in double-quotes: url:"https://developer.twitter.com".
url_title:
Performs a keyword/phrase match on the (new) expanded URL HTML title metadata. See HERE for more information on expanded URL enrichment.
url_description:
Performs a keyword/phrase match on the (new) expanded page description metadata. See HERE for more information on expanded URL enrichment.
url_contains:

Matches Tweets with URLs that literally contain the given phrase or keyword. To search for patterns with punctuation in them (i.e. google.com) enclose the search term in double-quotes.

NOTE: If you’re using the Expanded URL output format, we will match against the expanded URL as well.

bio:
Matches a keyword or phrase within the user bio of a Tweet. This is a tokenized match within the contents of the 'description' field within the User object.
bio_name:
Matches a keyword within the user bio name of a Tweet. This is a tokenized match within the contents of a user’s “name” field within the User object.
bio_location:

Matches tweets where the User object's location contains the specified keyword or phrase. This operator performs a tokenized match, similar to the normal keyword rules on the message body.

This location is part of the User object, and is the account's 'home' location, is a non-normalized, user-generated, free-form string, and is different from a Tweet's location (when available). 

statuses_count:

Matches Tweets when the author has posted a number of statuses that falls within the given range.

If a single number is specified, any number equal to or higher will match.

Additionally, a range can be specified to match any number in the given range  (e.g., statuses_count:1000..10000).
.

followers_count:

Matches Tweets when the author has a followers count within the given range.

If a single number is specified, any number equal to or higher will match.

Additionally, a range can be specified to match any number in the given range (e.g., followers_count:1000..10000).

friends_count:

Matches Tweets when the author has a friends count (the number of users they follow) that falls within the given range.

If a single number is specified, any number equal to or higher will match.

Additionally, a range can be specified to match any number in the given range (e.g., friends_count:1000..10000).

listed_count:

Matches Tweets when the author has been listed on Twitter a number of times falls within the given range.

If a single number is specified, any number equal to or higher will match.

Additionally, a range can be specified to match any number in the given range (e.g., listed_count:10..100).

$

Matches any Tweet that contains the specified ‘cashtag’ (where the leading character of the token is the ‘$’ character).

Note that the cashtag operator relies on Twitter’s ‘symbols’ entity extraction to match cashtags, rather than trying to extract the cashtag from the body itself. See HERE for more information on Twitter Entities JSON attributes.

Note that this operator is only available with the enterprise search API.

retweets_of:
Matches Tweets that are Retweets of a specified user. Accepts both usernames and numeric Twitter Account IDs (NOT tweet status IDs).
See HERE for looking up numeric Twitter Account IDs.
retweets_of_status_id:
Deliver only explicit Retweets of the specified Tweet. Note that the status ID used should be the ID of an original Tweet and not a Retweet. 
in_reply_to_status_id:
Deliver only explicit replies to the specified Tweet.
sample:

Returns a random sample of Tweets that match a rule rather than the entire set of Tweets. Sample percent must be represented by an integer value between 1 and 100. This operator applies to the entire rule and requires any “OR’d” terms be grouped.

Important Note: The sample operator first reduces the scope of the firehose to X%, then the rule/filter is applied to that sampled subset. If you are using, for example, sample:10, each Tweet has a 10% chance of being in the sample. 

Also, the sampling is deterministic, and you will get the same data sample in realtime as you would if you pulled the data historically.

source: Matches any Tweet generated by the given source application. The value must be either the name of the application or the application’s URL. Cannot be used alone.
lang:

Matches Tweets that have been classified by Twitter as being of a particular language (if, and only if, the tweet has been classified). It is important to note that each Tweet is currently only classified as being of one language, so AND’ing together multiple languages will yield no results.

Note: if no language classification can be made the provided result is ‘und’ (for undefined).

The list below represents the currently supported languages and their corresponding BCP 47 language identifier:

Amharic - am             Hungarian – hu        Portuguese - pt

Arabic - ar                  Icelandic - is               Romanian - ro

Armenian - hy           Indonesian - in          Russian - ru

Bengali - bn               Italian - it                    Serbian - sr

Bulgarian - bg            Japanese - ja              Sindhi - sd

Burmese – my           Kannada - kn             Sinhala - si

Chinese - zh               Khmer - km               Slovak - sk

Czech - cs                   Korean - ko                Slovenian - sl

Danish - da                 Lao - lo                       Sorani Kurdish - ckb

Dutch - nl                   Latvian - lv                 Spanish - es

English - en                Lithuanian - lt            Swedish - sv

Estonian - et              Malayalam - ml          Tagalog - tl

Finnish - fi                  Maldivian - dv            Tamil - ta

French - fr                  Marathi - mr               Telugu - te

Georgian - ka            Nepali - ne                  Thai - th

German - de              Norwegian - no         Tibetan - bo

Greek - el                   Oriya - or                    Turkish - tr

Gujarati - gu              Panjabi - pa                Ukrainian - uk

Haitian - ht                Pashto - ps                  Urdu - ur

Hebrew - iw              Persian - fa                  Uyghur - ug

Hindi - hi                   Polish - pl                    Vietnamese - vi

                                                                        Welsh - cy

 

time_zone:

Matches Tweets where the user-selected time zone specified in a user’s profile settings matches a given string.

These values are normalized to the options specified on a user’s account settings page: https://twitter.com/account/settings

place:

Matches Tweets tagged with the specified location or Twitter place ID (see examples). Multi-word place names (“New York City”, “Palo Alto”) should be enclosed in quotes.

Note: See the GET geo/search public API endpoint for how to obtain Twitter place IDs.

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

place_country:

Matches Tweets where the country code associated with a tagged place/location matches the given ISO alpha-2 character code.

Valid ISO codes can be found here: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

point_radius:[lon lat radius]

Matches against the Exact Location (x,y) of the Tweet when present, and in Twitter, against a “Place” geo polygon, where the Place is fully contained within the defined region.

  • Units of radius supported are miles (mi) and kilometers (km).
  • Radius must be less than 25mi.
  • Longitude is in the range of ±180
  • Latitude is in the range of ±90
  • All coordinates are in decimal degrees.
  • Rule arguments are contained within brackets, space delimited.

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

bounding_box:[west_long south_lat east_long north_lat]

Matches against the Exact Location (long, lat) of the Tweet when present, and in Twitter, against a “Place” geo polygon, where the Place is fully contained within the defined region.

  • west_long south_lat represent the southwest corner of the bounding box where west-long is the longitude of that point, and south_lat is the latitude.
  • east_long and north_lat represent the northeast corner of the bounding box, where east_long is the longitude of that point, and north_lat is the latitude.
  • Width and height of the bounding box must be less than 25mi
  • Longitude is in the range of ±180
  • Latitude is in the range of ±90
  • All coordinates are in decimal degrees.
  • Rule arguments are contained within brackets, space delimited.

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

profile_country:

Exact match on the “countryCode” field from the “address” object in the Profile Geo enrichment.

Uses a normalized set of two-letter country codes, based on ISO-3166-1-alpha-2 specification. This operator is provided in lieu of an operator for “country” field from the “address” object to be concise.

profile_region:

Matches on the “region” field from the “address” object in the Profile Geo enrichment.

This is an exact full string match. It is not necessary to escape characters with a backslash. For example, if matching something with a slash, use “one/two”, not “one\/two”. Use double quotes to match substrings that contain whitespace or punctuation.

profile_locality:

Matches on the “locality” field from the “address” object in the Profile Geo enrichment.

This is an exact full string match. It is not necessary to escape characters with a backslash. For example, if matching something with a slash, use “one/two”, not “one\/two”. Use double quotes to match substrings that contain whitespace or punctuation.

profile_subregion:

Matches on the “subRegion” field from the “address” object in the Profile Geo enrichment. In addition to targeting specific counties, these operators can be helpful to filter on a metro area without defining filters for every city and town within the region.

This is an exact full string match. It is not necessary to escape characters with a backslash. For example, if matching something with a slash, use “one/two”, not “one\/two”. Use double quotes to match substrings that contain whitespace or punctuation.

  NOTE:  All ‘is:’ and ‘has:’ operators cannot be used as standalone operators and must be combined with another clause (e.g. @TwitterDev has:links)
has:geo

Matches Tweets that have Tweet-specific geolocation data provided from Twitter. This can be either “geo” lat-long coordinate, or a “location” in the form of a Twitter “Place”, with the corresponding display name, geo polygon, and other fields.

Note: Operators matching on place (Tweet geo) will only include matches from original tweets. Retweets do not contain any place data.

has:profile_geo
Matches Tweets that have any Profile Geo metadata, regardless of the actual value.

has:links

This operator matches Tweets which contain links in the message body.
is:retweet

Deliver only explicit retweets that match a rule. Can also be negated to exclude retweets that match a rule from delivery and only original content is delivered.

This operator looks only for true Retweets, which use Twitter’s Retweet functionality. Quoted Tweets and Modified Tweets which do not use Twitter’s Retweet functionality will not be matched by this operator.

 Can also be negated to match only on original Tweets.

is:quote A Boolean search operator that returns all Quoted Tweets. Delivers only explicit Quote Tweets that match a rule. Can also be negated to exclude Quote Tweets that match a rule from delivery.
is:verified
Deliver only Tweets where the author is “verified” by Twitter. Can also be negated to exclude Tweets where the author is verified.
has:mentions
Matches Tweets that mention another Twitter user.
has:hashtags
Matches Tweets that contain a hashtag.
has:media
Matches Tweets that contain a media URL classified by Twitter, e.g. pic.twitter.com.
has:images
Matches Tweets that contain a media URL classified by Twitter, e.g. pic.twitter.com.
has:videos
Matches Tweets that contain native Twitter videos, uploaded directly to Twitter. This will not match on videos created with Vine, Periscope, or Tweets with links to other video hosting sites.
has:symbols

Matches Tweets that contain a cashtag symbol (with a leading ‘$’ character, e.g. $tag).

Note that this operator is only available with the enterprise search API.