Filter realtime Tweets
Connecting to a streaming endpoint
Establishing a connection to the streaming APIs means making a very long lived HTTP request, and parsing the response incrementally. Conceptually, you can think of it as downloading an infinitely long file over HTTP.
The following authentication methods are supported by the Streaming APIs:
|Auth Type||Supported APIs||Description|
||Requests must be authorized according to the OAuth specification .|
||Requests must use of HTTP Basic Authentication, constructed from a valid email address and password combination.
To connect to the Streaming API, form a HTTP request and consume the resulting stream for as long as is practical. Our servers will hold the connection open indefinitely, barring server-side error, excessive client-side lag, network hiccups, routine server maintenance or duplicate logins.
The method to form an HTTP request and parse the response will be different for every language or framework, so consult the documentation for the HTTP library you are using.
Some HTTP client libraries only return the response body after the connection has been closed by the server. These clients will not work for accessing the Streaming API. You must use an HTTP client that will return response data incrementally. Most robust HTTP client libraries will provide this functionality. The Apache HttpClient will handle this use case, for example.
Twitter will close a streaming connection for the following reasons:
- A client establishes too many connections with the same credentials. When this occurs, the oldest connection will be terminated. This means you have to be careful not to run two reconnecting clients in parallel with the same credentials, or else they will take turns disconnecting each other.
- A client stops reading data suddenly. If the rate of Tweets being read off of the stream drops suddenly, the connection will be closed.
- A client reads data too slowly. Every streaming connection is backed by a queue of messages to be sent to the client. If this queue grows too large over time, the connection will be closed.
- A streaming server is restarted. This is usually related to a code deploy and is not very frequent.
- Twitter’s network configuration changes. These events are rare, and would represent load balancer restarts or network reconfigurations, for example.
Set a timer, either a 90 second TCP level socket timeout, or a 90 second application level timer on the receipt of new data. If 90 seconds pass with no data received, including newlines, disconnect and reconnect immediately according to the backoff strategies in the next section. The Streaming API will send a keep-alive newline every 30 seconds to prevent your application from timing out the connection. You should wait at least 3 cycles to prevent spurious reconnects in the event of network congestion, local CPU starvation, local GC pauses, etc.
Once an established connection drops, attempt to reconnect immediately. If the reconnect fails, slow down your reconnect attempts according to the type of error experienced:
- Back off linearly for TCP/IP level network errors. These problems are generally temporary and tend to clear quickly. Increase the delay in reconnects by 250ms each attempt, up to 16 seconds.
- Back off exponentially for HTTP errors for which reconnecting would be appropriate. Start with a 5 second wait, doubling each attempt, up to 320 seconds.
- Back off exponentially for HTTP 420 errors. Start with a 1 minute wait and double each attempt. Note that every HTTP 420 received increases the time you must wait until rate limiting will no longer will be in effect for your account.
Repeatedly opening and closing a connection (churn) wastes server resources. Keep your connections as stable and long-lived as possible.
Avoid mobile (cellular network) connections from mobile devices. WiFi is generally OK.
Delay opening a streaming connection in cases where the user may quit the application quickly.
If your client works in an environment where the connection quality changes over time, attempt to detect flaky connections. When detected, fall back to REST polling until the connection quality improves.
Clients which do not implement backoff and attempt to reconnect as often as possible will have their connections rate limited for a small number of minutes. Rate limited clients will receive HTTP 420 responses for all connection requests.
Clients which break a connection and then reconnect frequently (to change query parameters, for example) run the risk of being rate limited.
Twitter does not make public the number of connection attempts which will cause a rate limiting to occur, but there is some tolerance for testing and development. A few dozen connection attempts from time to time will not trigger a limit. However, it is essential to stop further connection attempts for a few minutes if a HTTP 420 response is received. If your client is rate limited frequently, it is possible that your IP will be blocked from accessing Twitter for an indeterminate period of time.
Test backoff strategies
A good way to test a backoff implementation is to use invalid authorization credentials and examine the reconnect attempts. A good implementation will not get any 420 responses.
Issue alerts for multiple reconnects
If a client reaches its upper threshold of its time between reconnects, it should send you notifications so you can triage the issues affecting your connection.
Handle DNS changes
Test that your client process honors the DNS Time To live (TTL). Some stacks will cache a resolved address for the duration of the process and will not pick up DNS changes within the proscribed TTL. Such aggressive caching will lead to service disruptions on your client as Twitter shifts load between IP addresses.
Ensure your user-agent HTTP header includes the client’s version. This will be critical in diagnosing issues on Twitter’s end. If your environment precludes setting the user-agent field, then set an x-user-agent header.
Most error codes are returned with a string with additional details. For all codes greater than 200, clients should wait before attempting another connection. See the Connecting section, above.
HTTP authentication failed due to either:
|403||Forbidden||The connecting account is not permitted to access this endpoint.|
|404||Unknown||There is nothing at this URL, which means the resource does not exist.|
At least one request parameter is invalid. For example, the filter endpoint returns this status if:
A parameter list is too long. For example, the filter endpoint returns this status if:
For example, an endpoint returns this status if:
The client has connected too frequently. For example, an endpoint returns this status if:
|503||Service Unavailable||A streaming server is temporarily overloaded. Attempt to make another connection, keeping in mind the connection attempt rate limiting and possible DNS caching in your client.|