When consuming realtime data, maximizing your connection time and receiving all matched data is a fundamental goal. This means that it is important to take advantage of redundant connections, automatically detect disconnections, to reconnect quickly, and to have a plan for recovering lost data.
In this integration guide, we will discuss two different recovery and redundancy features: redundant connections and backfill.
A redundant connection simply allows you to establish more than one simultaneous connections to the filtered stream. This provides redundancy by allowing you to connect to the same stream with two separate consumers, receiving the same data through both connections. Thus, your app has a hot failover for various situations such as if one stream is disconnected or if your application's primary server fails.
Filtered stream currently only allows academic research product tracks to connect to up to two redundant connections. To use a redundant stream, simply connect to the same URL used for your primary connection. The data for your stream will be sent through both connections.
Note that we deduplicate the Tweet counts you receive through multiple connections so that each Tweet only counts towards your Tweet cap once.
Recovering missed data after a disconnection: Backfill
After you've detected a disconnection, your system should be smart enough to reconnect to the stream. If possible, your system should take note of how long the disconnection lasted so that you can use the proper recovery feature to backfill the data.
If you are using the Academic Research product track and identified that the disconnection lasted five minutes or less, you can use the backfill parameter, backfill_minutes. If you pass this parameter with your GET /tweets/search/stream request, you will receive the Tweets that match your rules within the past one to five minutes. We generally deliver these older Tweets first before any newly matched Tweets, and also do not deduplicate Tweets. This means that if you were disconnected for 90 seconds, but request two minutes worth of backfill data, you will receive 30 seconds worth of duplicate Tweets, which your system should be tolerant of. Here is an example of what a request might look like with the backfill parameter:
curl 'https://api.twitter.com/2/tweets/search/stream?backfill_minutes=5' -H "Authorization: Bearer $BEARER_TOKEN"
If you don't have access to the Academic Research product track, or identified that the disconnection time lasted for longer than five minutes, you can utilize the recent search endpoint to request missed data. However, note that the search Tweets endpoints do not include the sample:, bio:, bio_name:, or bio_location: operators, and has certain differences in matching behavior when using accents and diacritics with the keyword and #hashtag operators. These differences could mean that you don't fully recover all Tweets that might have been received via the filtered stream endpoints.