Learning path / How to detect signal from noise and build powerful filtering rules

Step 5: Refining your filters and building a strong ruleset


This is step 5 of the learning path, How to detect signal from noise and build powerful filtering rules.


Building a robust ruleset is an iterative process. Once you have an initial set of rules and start filtering Tweets with the Twitter API, make sure to review the data you receive with an eye to refining your ruleset. 

The following tactics can help you refine your ruleset:

  • Review the Tweets you received. First, segregate matched Tweets into two groups (signal and noise). Then, identify where improvements can be made and fine tune your rules accordingly. 
  • You can use rule tags and the “matching rules” section of the Tweet payload to identify which rule(s) caused a given Tweet to be returned. This is especially important in the case of unwanted Tweets (aka noise) to understand which of your rule(s) is returning unwanted data.
  • Consider using the Search API to refine your ruleset:
  • Your PowerTrack rule can be passed as a query to the Search API (unless your rule is using operators that are not supported by the Search API - see our enterprise operator list).
  • It’s often faster to analyse data returned by the Search API (with the ability to paginate through the results) than it is to analyse data delivered in real time. For this reason, you may want to start by testing and refining your ruleset by querying the Search API.
  • In addition, you may want to use the /counts endpoint available with the Search API to identify data volume for a specified query. This can help ensure that the volume of data returned by your rules matches your expectations.

Following the above steps may feel relatively cumbersome as you get started, but eventually you will end up with a solid ruleset that will save you time on the processing front and ensure that you don’t consume (and pay for) unwanted data.

Once you have fine tuned your rules and reached the limits of what can be achieved with the available filtering operators, consider building a post-processing layer to discard any Tweets that are not of interest to you. You might want to create a series of regex filters; for example, to discard Tweets from users who have a screen name that starts with “bot_” or ends with “_bot” (given that there is no operator with PowerTrack that can be used to exclude Tweets based on an account's screen name). 




Go to the final article - Walkthrough: what this means in practice

Go back to the learning path homepage - Learning path: How to detect signal from noise and build powerful filtering rules