Introduction to Tweet JSON

All Twitter APIs that return Tweets provide that data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes, and their state, are used to describe objects.

At Twitter we serve many objects as JSON, including Tweets and Users. These objects all encapsulate core attributes that describe the object. Each Tweet has an author, a message, a unique ID, a timestamp of when it was posted, and sometimes geo metdata shared by the user. Each User has a Twitter name, an ID, a number of followers, and most often an account bio.

With each Tweet we also generate 'entity' objects, which are arrays of common Tweet contents such as hashtags, mentions, media, and links. If there are links, the JSON payload can also provide metadata such as the fully unwound URL and the webpage’s title and description.

So, in addition to the text content itself, a Tweet can have over 140 attributes associated with it. Let’s start with an example Tweet: 

The following JSON illustrates the structure for these objects and some of their attributes:

  {
  "tweet": {
    "created_at": "Thu Apr 06 15:24:15 +0000 2017",
    "id_str": "850006245121695744",
    "text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
    "user": {
      "id": 2244994945,
      "name": "Twitter Dev",
      "screen_name": "TwitterDev",
      "location": "Internet",
      "url": "https:\/\/dev.twitter.com\/",
      "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
    },
    "place": {
      
    },
    "entities": {
      "hashtags": [
        
      ],
      "urls": [
        {
          "url": "https:\/\/t.co\/XweGngmxlP",
          "unwound": {
            "url": "https:\/\/cards.twitter.com\/cards\/18ce53wgo4h\/3xo1c",
            "title": "Building the Future of the Twitter API Platform"
          }
        }
      ],
      "user_mentions": [
        
      ]
    }
  }
}

Fundamental Twitter objects

Tweet objects

When ingesting Tweet data the main object is the Tweet Object, which is a parent object to several child objects. For example, all Tweets include a User object that describes who authored the Tweet.

If the Tweet is geo-tagged, there will a 'place' object included.

Every Tweet includes an entities object that encapsulates arrays of hashtags, user mentions, URLs, cashtags, and native media. If the Tweet has any ‘attached’ or ‘native’ media (photos, video, animated GIF), there will be an extended_entities object.

  {
  "tweet": {
    "user": {
      
    },
    "place": {
      
    },
    "entities": {
      
    },
    "extended_entities": {
      
    }
  }
}

Notes on Retweets

If you are working with a Retweet object, then that object will contain two Tweet objects, complete with two User objects.  The Tweet that was Retweets is referred to as the 'original' Tweet and is displayed uner the 'retweeted_status' key. If a Retweet gets Retweeted, the 'retweet_status' will still point to the original Tweet, meaning the intermediate Retweet is not included.

  {
  "tweet": {
    "user": {
      
    },
    "retweeted_status": {
      "tweet": {
        "user": {
          
        },
        "place": {
          
        },
        "entities": {
          
        },
        "extended_entities": {
          
        }
      },
      
    },
    "place": {
      
    },
    "entities": {
      
    },
    "extended_entities": {
      
    }
  }
}

Notice that Retweets are really made up of two Tweet objects (and two sets of child objects), with the ‘top level’ (Re)Tweet containing the original Tweet under the “retweeted_status” attribute.

The same is true of Quoted Tweets, where the original Tweet being Quoted is contained under a “quoted_status” attribute. 

Data dictionaries

Whatever your Twitter use case, understanding what these JSON-encoded Tweet objects and attributes represent is critical to successfully finding your data signals of interest. To help in that effort, there are a set of Data Dictionaries for these fundamental Twitter objects.

Reflecting the JSON hierachy above, here are links and further descriptions of these Objects:

  • Tweet - Also referred to as a ‘Status’ object, has many ‘root-level’ attributes, parent of other objects.
    • User - Twitter Account level metadata. Will include any available account-level enrichments, such as Profile geo and Klout.
    • Entities - Contains object arrays of #hashtags, @mentions, $symbols, URLs, and media.
    • Extended Entities - Contains up to four native photos, or one video or animated GIF.
    • Places - Parent to ‘coordinates’ object.

Parsing best-practices

  • Twitter JSON is encoded using UTF-8 characters.
  • Parsers should tolerate variance in ordering of fields with ease. It should be assumed that Tweet JSON is served as an unordered hash of data.
  • Parsers should tolerate the addition of 'new' fields. The Twitter platform has continually evolved since 2006, so there is a long history of new metadata being added to Tweets.  
  • JSON parsers must be tolerant of ‘missing’ fields, since not all fields appear in all contexts.
  • It is generally safe to consider a nulled field, an empty set, and the absence of a field as the same thing.

Important notes

Product details

These JSON attribute dictionaries are specifically for the Tweets delivered by the following Twitter products:

Please note that Tweets sourced elsewhere may vary somewhat in structure from this document.

Tweet JSON formats

Note that there are currently two JSON formats used to encode Tweets: Original (‘native firehose’) and Activity Stream (AS). Gnip started encoding Tweets in the Activity Stream format in 2010, and until 2016 was the only source for a set of data enrichments including Profile geomatching rules, and Exhanced URLs.

When Gnip 2.0 was released in 2016, these data enrichments were included in the ‘native’ format for the first time. With enrichments available in the ‘native’ format, the prominent distinction of the AS format was ended. As new Tweet metadata becomes available, note that it will be available only in the ‘native’ format. A first example of this is Twitter Polls metadata, which is now available in ‘native’ format, but not in the AS format.

Next steps