Enterprise search APIs

Introduction 

There are two enterprise search APIs:

  • 30-Day Search API - provides Tweets posted with the last 30 days.
  • Full-Archive Search API - provides Tweets from as early as 2006, starting with the first Tweet posted in March 2006.

These search APIs share a common design and the documentation below applies to both.

Below you will find important details needed when integrating with the enterprise search APIs:

  • Methods for requesting Tweet data and counts
  • Authentication
  • Pagination
  • API request parameters and example requests
  • API response JSON payloads and example responses
  • HTTP response codes

The enterprise APIs provide low-latency, full-fidelity, query-based access to the Tweet archive. The only difference in the two APIs is the time frame you can search, either the previous 30 days or from as early as 2006. Time frames can be specified with minute granularity. Tweet data is served in reverse chronological order, starting with the most recent Tweet that matches your query. Tweets are available from the search API approximately 30 seconds after being published.

Methods 

The base URI for enterprise search is https://gnip-api.twitter.com/search/.

Method Description
POST /search/:product/accounts/:account_name/:label Retrieve Tweets from the past 30 days that match the specified PowerTrack rule.
POST /search/:product/accounts/:account_name/:label/counts Retrieve the number of Tweets from the past 30 days that match the specified PowerTrack rule.

Where:

  • :product indicates the search endpoint you are making requests to, either 30day or fullarchive.
  • :account_name is the (case-sensitive) name associated with your account, as displayed at console.gnip.com
  • :label is the (case-sensitive) label associated with your search endpoint, as displayed at console.gnip.com

For example, if the TwitterDev account has the 30-Day search product with a label of 'prod' (short for production), the search endpoints would be:

Your complete enterprise search API endpoint is displayed at https://console.gnip.com.

Below there are several example requests using a simple HTTP utility called curl. These examples use URLs with :product, :account_name, and :label. To use these examples, be sure to update the URLs with your details.

Authentication 

All requests to the enterprise search APIs must use HTTP Basic Authentication, constructed from a valid email address and password combination used to log into your account at https://console.gnip.com. Credentials must be passed as the Authorization header for each request.

Request/response behavior 

Using the fromDate and toDate parameters, you can request any time period that the API supports. The 30-Day search API provides Tweets from the most recent 31 days (even though referred to as the '30-Day' API, it makes 31 days available to enable users to make complete-month requests). The Full-Archive search API provides Tweets back to the very first tweet (March 21, 2006). However, a single response will be limited to the lesser of your specified 'maxResults' or 31 days. If matching data or your time range exceeds your specified maxResults or 31 days, you will receive a 'next' token which you should use to paginate through the remainder of your specified time range.

For example, say you are using Full-Archive search and want all Tweets matching your query from January 1, 2017 to June 30, 2017. You will specify that full six-month time period in your request using the fromDate and toDate parameters. The search API will respond with the first 'page' of Tweets, with the number of Tweets matching your maxResults parameter (which defaults to 100). Assuming there are more Tweets (and there most likely will be more), the API will also provide a 'next' token that enables you to make a request for the next 'page' of data. This process is repeated until the API does not return a 'next' token. See the next section for more details.

Pagination

When making both data and count requests it is likely that there is more data than can be returned in a single response. When that is the case the response will include a ‘next’ token. The ‘next’ token is provided as a root-level JSON attribute. Whenever a ‘next’ token is provided, there is additional data to retrieve so you will need to keep making API requests.

Note: The ‘next’ token behavior differs slightly for data and counts requests, and both are described below with example responses provided in the API Reference section.

Data pagination

Requests for data will likely generate more data than can be returned in a single response. Each data request includes a parameter that sets the maximum number of Tweets to return per request. This maxResults parameter defaults to 100 and can be set to a range of 10-500. If your query matches more Tweets than the 'maxResults' parameter used in the request, the response will include a 'next' token (as a root-level JSON attribute). This 'next' token is used in the subsequent request to retrieve the next portion of the matching Tweets for that query (i.e. the next 'page”). Next tokens will continue to be provided until you have reached the last 'page' of results for that query when no 'next' token is provided.

To request the next 'page' of data, you must make the exact same query as the original, including query, toDate, and fromDate parameters, if used, and also include a 'next' request parameter set to the value from the previous response. This can be utilized with either a GET or POST request. However, the 'next' parameter must be URL encoded in the case of a GET request.

You can continue to pass in the 'next' element from your previous query until you have received all Tweets from the time period covered by your query. When you receive a response that does not include a 'next' element, it means that you have reached the last page and no additional data is available for the specified query and time range.

Counts pagination

The 'counts' endpoint provides Tweet volumes associated with a query on either a daily, hourly, or per-minute basis. The 'counts' API endpoint will return a timestamped array of counts for a maximum of a 31-day payload of counts. If you request more than 31 days of counts you will be provided a 'next' token. As with the data 'next' tokens, you must make the exact same query as the original and also include a 'next' request parameter set to the value from the previous response.

Beyond requesting more than 31 days of counts, there is another scenario when a 'next' token is provided. For higher volume queries, there is the potential that the generation of counts will take long enough to trigger a response timeout. When this occurs you will receive less than 31 days of counts but will be provided a 'next' token in order to continue making requests for the entire payload of counts. Important: Timeouts will only issue full "buckets" - so 2.5 days would result in 2 full day "buckets".

Additional notes

  • When using a fromDate or toDate in a search request, you will only get results from within your time range. When you reach the last group of results within your time range, you will not receive a 'next' token.
  • The 'next' element can be used with any maxResults value between 10-500 (with a default value of 100). The maxResults determines how many Tweets are returned in each response, but does not prevent you from eventually getting all results.
  • The 'next' element does not expire. Multiple requests using the same 'next' query will receive the same results, regardless of when the request is made.
  • When paging through results using the 'next' parameter, you may encounter duplicates at the edges of the query. Your application should be tolerant of these.

Data endpoint  

POST /search/:product/:label

Endpoint pattern:

This endpoint returns data for the specified query and time period. If a time period is not specified the time parameters will default to the last 30 days. Note: This functionality can also be accomplished using a GET request, instead of a POST, by encoding the parameters described below into the URL.

Data request parameters 

Parameters Description Required Sample Value
query The equivalent of one PowerTrack rule, with up to 2,048 characters (and no limits on the number of positive and negative clauses).

This parameter should include ALL portions of the PowerTrack rule, including all operators, and portions of the rule should not be separated into other parameters of the query.

Note: Not all PowerTrack operators are supported. Supported Operators are listed HERE.
Yes (snow OR cold OR blizzard) weather
tag Tags can be used to segregate rules and their matching data into different logical groups. If a rule tag is provided the rule tag is included in the 'matching_rules' attribute.

It is recommended to assign rule-specific UUIDs to rule tags and maintain desired mappings on the client side.

No 8HYG54ZGTU
fromDate The oldest UTC timestamp (back to 3/21/2006 with Full-Archive search) from which the Tweets will be provided. Timestamp is in minute granularity and is inclusive (i.e. 12:00 includes the 00 minute).

Specified: Using only the fromDate with no toDate parameter will deliver results for the query going back in time from now( ) until the fromDate.

Not Specified: If a fromDate is not specified, the API will deliver all of the results for 30 days prior to now( ) or the toDate (if specified).

If neither the fromDate or toDate parameter is used, the API will deliver all results for the most recent 30 days, starting at the time of the request, going backwards.
No 201207220000
toDate The latest, most recent UTC timestamp to which the Tweets will be provided. Timestamp is in minute granularity and is not inclusive (i.e. 11:59 does not include the 59th minute of the hour).

Specified: Using only the toDate with no fromDate parameter will deliver the most recent 30 days of data prior to the toDate.

Not Specified: If a toDate is not specified, the API will deliver all of the results from now( ) for the query going back in time to the fromDate.

If neither the fromDate or toDate parameter is used, the API will deliver all results for the entire 30-day index, starting at the time of the request, going backwards.
No 201208220000
maxResults The maximum number of search results to be returned by a request. A number between 10 and the system limit (currently 500). By default, a request response will return 100 results. No 500
next This parameter is used to get the next 'page' of results as described HERE. The value used with the parameter is pulled directly from the response provided by the API, and should not be modified. No NTcxODIyMDMyODMwMjU1MTA0

Additional details 

Available Timeframe 30-Day: last 31 days
Full-Archive: March 21, 2006 - Present
Query Format The equivalent of one PowerTrack rule, with up to 2,048 characters (and no limits on the number of positive and negative clauses).

Note: Not all PowerTrack operators are supported. Refer to Available operators for a list of supported operators.
Rate Limit Partners will be rate limited at both minute and second granularity. The per minute rate limit will vary by partner as specified in your contract. However, these per-minute rate limits are not intended to be used in a single burst. Regardless of your per minute rate limit, all partners will be limited to a maximum of 20 requests per second, aggregated across all requests for data and/or counts.
Compliance All data delivered via the Full-Archive Search API is compliant at the time of delivery.
Realtime Availability Data is available in the index within 30 seconds of generation on the Twitter Platform

Example data requests and responses 

Example POST request

  • Request parameters in a POST request are sent via a JSON-formatted body, as shown below.
  • All portions of the PowerTrack rule being queried for (e.g. keywords, other operators like bounding_box:) should be placed in the 'query' parameter.
  • Do not split portions of the rule out as separate parameters in the query URL.

Here is an example POST (using cURL) command for making an initial data request:

curl -X POST -u<username> "https://gnip-api.twitter.com/search/:product/accounts/:account_name/:label.json" -d '{"query":"TwitterDev "search api"","maxResults":"500","fromDate":"<yyyymmddhhmm>","toDate":"<yyyymmddhhmm>"}'

If the API data response includes a 'next' token, below is a subsequent request that consists of the original request, with the 'next' parameter set to the provided token:

curl -X POST -u<username> "https://gnip-api.twitter.com/search/:product/accounts/:account_name/:label.json" -d '{"query":"TwitterDev "search api"","maxResults":"500","fromDate":"<yyyymmddhhmm>","toDate":"<yyyymmddhhmm>",
"next":"NTcxODIyMDMyODMwMjU1MTA0"}'

Example GET request 

  • Request parameters in a GET request are encoded into the URL, using standard URL encoding.
  • All portions of the PowerTrack rule being queried for (e.g. keywords, other operators like bounding_box:) should be placed in the 'query' parameter.
  • Do not split portions of the rule out as separate parameters in the query URL.

Here is an example GET (using cURL) command for making an initial data request:

curl -u<username> "http://gnip-api.twitter.com/search/:product/accounts/:account_name/:label.json?query=TwitterDev%20%5C%22search%20api%5C%22&maxResults=500&fromDate=<yyyymmddhhmm>&toDate=<yyyymmddhhmm>"

Example data responses 

Below is an example response to a data query. This example assumes that there were more than ‘maxResults’ Tweets available so a 'next' token is provided for subsequent requests. If 'maxResults' or fewer Tweets are associated with your query, no 'next' token would be included in the response.
The value of the 'next' element will change with each query and should be treated as an opaque string. The 'next' element will look like the following in the response body:

{
    "results":
      [
           {--Tweet 1--},
           {--Tweet 2--},
           ...
           {--Tweet 500--}
      ],
    "next":"NTcxODIyMDMyODMwMjU1MTA0",  
    "requestParameters":
      {
        "maxResults":500,
        "fromDate":"201101010000",
        "toDate":"201201010000"
      }
 }

The response to a subsequent request might look like the following (note the new Tweets and different 'next' value):

{
      "results":
      [
           {--Tweet 501--},
           {--Tweet 502--},
           ...
           {--Tweet 1000--}
      ],
      "next":"R2hCDbpBFR6eLXGwiRF1cQ",
      "requestParameters":
      {
        "maxResults":500,
        "fromDate":"201101010000",
        "toDate":"201201010000"
      }
 }

You can continue to pass in the 'next' element from your previous query until you have received all Tweets from the time period covered by your query. When you receive a response that does not include a 'next' element, it means that you have reached the last page and no additional data is available in your time range.


Counts endpoint  

/search/:stream/counts

Endpoint pattern:

/search/fullarchive/accounts/:account_name/:label/counts.json

This endpoint returns counts (data volumes) data for the specified query. If a time period is not specified the time parameters will default to the last 30 days. Data volumes are returned as a timestamped array on either daily, hourly (default), or by the minute.

Note: This functionality can also be accomplished using a GET request, instead of a POST, by encoding the parameters described below into the URL.

Counts request parameters 

Parameters Description Required Sample Value
query The equivalent of one PowerTrack rule, with up to 2,048 characters (and no limits on the number of positive and negative clauses).

This parameter should include ALL portions of the PowerTrack rule, including all operators, and portions of the rule should not be separated into other parameters of the query.

Note: Not all PowerTrack operators are supported. Refer to Available operators for a list of supported operators.
Yes (snow OR cold OR blizzard) weather
fromDate The oldest UTC timestamp (back to 3/21/2006) from which the Tweets will be provided. Timestamp is in minute granularity and is inclusive (i.e. 12:00 includes the 00 minute).

Specified: Using only the fromDate with no toDate parameter, the API will deliver counts (data volumes) data for the query going back in time from now until the fromDate. If the fromDate is older than 31 days from now( ), you will receive a next token to page through your request.

Not Specified: If a fromDate is not specified, the API will deliver counts (data volumes) for 30 days prior to now( ) or the toDate (if specified).

If neither the fromDate or toDate parameter is used, the API will deliver counts (data volumes) for the most recent 30 days, starting at the time of the request, going backwards.

No 201207220000
toDate The latest, most recent UTC timestamp to which the Tweets will be provided. Timestamp is in minute granularity and is not inclusive (i.e. 11:59 does not include the 59th minute of the hour).

Specified: Using only the toDate with no fromDate parameter will deliver the most recent counts (data volumes) for 30 days prior to the toDate.

Not Specified: If a toDate is not specified, the API will deliver counts (data volumes) for the query going back in time to the fromDate. If the fromDate is more than 31 days from now( ), you will receive a next token to page through your request.

If neither the fromDate or toDate parameter is used, the API will deliver counts (data volumes) for the most recent 30 days, starting at the time of the request, going backwards.
No 201208220000
bucket The unit of time for which count data will be provided. Count data can be returned for every day, hour or minute in the requested timeframe. By default, hourly counts will be provided. Options: 'day', 'hour', 'minute' No minute
next This parameter is used to get the next 'page' of results as described HERE. The value used with the parameter is pulled directly from the response provided by the API, and should not be modified. No NTcxODIyMDMyODMwMjU1MTA0

Additional details

Available Timeframe 30-Day: last 31 days
Full-Archive: March 21, 2006 - Present
Query Format The equivalent of one PowerTrack rule, with up to 2,048 characters.

Note: Not all PowerTrack operators are supported. Refer to Available operators for a list of supported operators.
Rate Limit Partners will be rate limited at both minute and second granularity. The per minute rate limit will vary by partner as specified in your contract. However, these per-minute rate limits are not intended to be used in a single burst. Regardless of your per minute rate limit, all partners will be limited to a maximum of 20 requests per second, aggregated across all requests for data and/or counts.
Count Precision The counts delivered through this endpoint reflect the number of Tweets that occurred and do not reflect any later compliance events (deletions, scrub geos). Some Tweets counted may not be available via data endpoint due to user compliance actions.

Example counts requests and responses 

Example POST request

  • Request parameters in a POST request are sent via a JSON-formatted body, as shown below.
  • All portions of the PowerTrack rule being queried for (e.g. keywords, other operators like bounding_box:) should be placed in the 'query' parameter.
  • Do not split portions of the rule out as separate parameters in the query URL.

Here is an example POST (using cURL) command for making an initial counts request:

curl -X POST -u<username> "https://gnip-api.twitter.com/search/:product/accounts/:account_name/:label/counts.json" -d '{"query":"TwitterDev","fromDate":"<yyyymmddhhmm>","toDate":"<yyyymmddhhmm>","bucket":"day"}'

If the API counts response includes a 'next' token, below is a subsequent request that consists of the original request, with the 'next' parameter set to the provided token:

curl -X POST -u<username> "https://gnip-api.twitter.com/search/:product/accounts/:account_name/:label/counts.json" -d '{"query":"TwitterDev","fromDate":"<yyyymmddhhmm>","toDate":"<yyyymmddhhmm>","bucket":"day",
"next":"YUcxO87yMDMyODMwMjU1MTA0"}'

Example GET request

  • Request parameters in a GET request are encoded into the URL, using standard URL encoding
  • All portions of the PowerTrack rule being queried for (e.g. keywords, other operators like bounding_box:) should be placed in the 'query' parameter
  • Do not split portions of the rule out as separate parameters in the query URL

Here is an example GET (using cURL) command for making an initial counts request:

curl -u<username> "http://gnip-api.twitter.com/search/fullarchive/accounts/:account_name/:label/counts.json?query=TwitterDev&bucket=day&fromDate=<yyyymmddhhmm>&toDate=<yyyymmddhhmm>"

Example counts responses

Below is an example response to a counts (data volume) query. This example response includes a 'next' token, meaning the counts request was for more than 31 days, or that the submitted query had a large enough volume associated with it to trigger a partial response.

The value of the 'next' element will change with each query and should be treated as an opaque string. The 'next' element will look like the following in the response body:

{
  "results": [
    { "timePeriod": "201101010000", "count": 32 },
    { "timePeriod": "201101020000", "count": 45 },
    { "timePeriod": "201101030000", "count": 57 },
    { "timePeriod": "201101040000", "count": 123 },
    { "timePeriod": "201101050000", "count": 134 },
    { "timePeriod": "201101060000", "count": 120 },
    { "timePeriod": "201101070000", "count": 43 },
    { "timePeriod": "201101080000", "count": 65 },
    { "timePeriod": "201101090000", "count": 85 },
    { "timePeriod": "201101100000", "count": 32 },
    { "timePeriod": "201101110000", "count": 23 },
    { "timePeriod": "201101120000", "count": 85 },
    { "timePeriod": "201101130000", "count": 32 },
    { "timePeriod": "201101140000", "count": 95 },
    { "timePeriod": "201101150000", "count": 109 },
    { "timePeriod": "201101160000", "count": 34 },
    { "timePeriod": "201101170000", "count": 74 },
    { "timePeriod": "201101180000", "count": 24 },
    { "timePeriod": "201101190000", "count": 90 },
    { "timePeriod": "201101200000", "count": 85 },
    { "timePeriod": "201101210000", "count": 93 },
    { "timePeriod": "201101220000", "count": 48 },
    { "timePeriod": "201101230000", "count": 37 },
    { "timePeriod": "201101240000", "count": 54 },
    { "timePeriod": "201101250000", "count": 52 },
    { "timePeriod": "201101260000", "count": 84 },
    { "timePeriod": "201101270000", "count": 120 },
    { "timePeriod": "201101280000", "count": 34 },
    { "timePeriod": "201101290000", "count": 83 },
    { "timePeriod": "201101300000", "count": 23 },
    { "timePeriod": "201101310000", "count": 12 }
   ],
  "totalCount":2027,
  "next":"NTcxODIyMDMyODMwMjU1MTA0",
  "requestParameters":
    {
      "bucket":"day",
      "fromDate":"201101010000",
      "toDate":"201201010000"
    }
}

The response to a subsequent request might look like the following (note the new counts timeline and different 'next' value):

{
  "results": [
    { "timePeriod": "201102010000", "count": 45 },
    { "timePeriod": "201102020000", "count": 76 },
     ....
    { "timePeriod": "201103030000", "count": 13 }
 ],
 "totalCount":3288,
 "next":"WE79fnakFanyMDMyODMwMjU1MTA0",
 "requestParameters":
    {
      "bucket":"day",
      "fromDate":"201101010000",
      "toDate":"201201010000"
    }
}

You can continue to pass in the 'next' element from your previous query until you have received all counts from the query time period. When you receive a response that does not include a 'next' element, it means that you have reached the last page and no additional counts are available in your time range.

HTTP response codes 

Status Text Description
200 OK The request was successful. The JSON response will be similar to the following:
400 Bad Request Generally, this response occurs due to the presence of invalid JSON in the request, or where the request failed to send any JSON payload.
401 Unauthorized HTTP authentication failed due to invalid credentials. Log in to console.gnip.com with your credentials to ensure you are using them correctly with your request.
404 Not Found The resource was not found at the URL to which the request was sent, likely because an incorrect URL was used.
422 Unprocessable Entity This is returned due to invalid parameters in the query -- e.g. invalid PowerTrack rules.
429 Unknown Code Your app has exceeded the limit on connection requests. The corresponding JSON message will look similar to the following:
500 Internal Server Error There was an error on the server side. Retry your request using an exponential backoff pattern.
502 Proxy Error There was an error on server side. Retry your request using an exponential backoff pattern.
503 Service Unavailable There was an error on server side. Retry your request using an exponential backoff pattern.