Twitter Entities  

Notice that Retweets are really made up of two Tweet objects (and two sets of child objects), with the ‘top level’ (Re)Tweet containing the original Tweet under the “retweeted_status” attribute.

The same is true of Quoted Tweets, where the original Tweet being Quoted is contained under a “quoted_status” attribute. (Check out this article on identifying Retweets and Quote Tweets.)

Data dictionaries

Whatever your Twitter use case, understanding what these JSON-encoded Tweet objects and attributes represent is critical to successfully finding your data signals of interest. To help in that effort, there are a set of Data Dictionaries for these fundamental Twitter objects.

Reflecting the JSON hierachy above, here are links and further descriptions of these Objects:

  • Tweet - Also referred to as a ‘Status’ object, has many ‘root-level’ attributes, parent of other objects.
    • User - Twitter Account level metadata. Will include any available account-level enrichments, such as Profile geo and Klout.
    • Entities - Contains object arrays of #hashtags, @mentions, $symbols, URLs, and media.
    • Extended Entities - Contains up to four native photos, or one video or animated GIF.
    • Places - Parent to ‘coordinates’ object.

Parsing best-practices

  • Twitter JSON is encoded using UTF-8 characters.
  • Parsers should tolerate variance in ordering of fields with ease. It should be assumed that Tweet JSON is served as an unordered hash of data.
  • Parsers should tolerate the addition of 'new' fields. The Twitter platform has continually evolved since 2006, so there is a long history of new metadata being added to Tweets.  
  • JSON parsers must be tolerant of ‘missing’ fields, since not all fields appear in all contexts.
  • It is generally safe to consider a nulled field, an empty set, and the absence of a field as the same thing.

Important notes

Product details

These JSON attribute dictionaries are specifically for the Tweets delivered by the following Twitter products:

Please note that Tweets sourced elsewhere may vary somewhat in structure from this document.

Tweet JSON formats

Note that there are currently two JSON formats used to encode Tweets: Original (‘native firehose’) and Activity Stream (AS). Gnip started encoding Tweets in the Activity Stream format in 2010, and until 2016 was the only source for a set of data enrichments including Profile geomatching rules, and Exhanced URLs.

When Gnip 2.0 was released in 2016, these data enrichments were included in the ‘native’ format for the first time. With enrichments available in the ‘native’ format, the prominent distinction of the AS format was ended. As new Tweet metadata becomes available, note that it will be available only in the ‘native’ format. A first example of this is Twitter Polls metadata, which is now available in ‘native’ format, but not in the AS format.

Next steps

Introduction

Twitter Entities

Entities provide metadata and additional contextual information about content posted on Twitter. The entities section provides arrays of common things included in Tweets: hashtags, user mentions, links, stock tickers (symbols), Twitter polls, and attached media. These arrays are convenient for developers when ingesting Tweets, since Twitter has essentially pre-processed, or pre-parsed, the text body. Instead of needing to explicitly search and find these entities in the Tweet body, your parser can go straight to this JSON section and there they are.

Beyond providing parsing conveniences, the entities section also provides useful ‘value-add’ metadata. For example, if you are using the Enhanced URLs enrichment, URL metadata include fully-expanded URLs, as well as associated website titles and descriptions. Another example is when there are user mentions, the entities metadata include the numeric user ID, which are useful when making requests to many Twitter APIs.

Every Tweet JSON payload includes an entities section, with the minimum set of hashtags, urls, user_mentions, and symbols attributes, even if none of those entities are part of the Tweet message. For example, if you examine the JSON for a Tweet with a body of “Hello World!” and no attached media, the Tweet’s JSON will include the following content with entity arrays containing zero items:

"entities": {
    "hashtags": [
    ],
    "urls": [
    ],
    "user_mentions": [
    ],
    "symbols": [
    ]
  }

Notes:

  • media and polls entities will only appear when that type of content is part of the Tweet.
  • if you are working with native media (photos, videos, or GIFs), the Extended Entities object is the way to go.

Entities Object  

The entities and extended_entities sections are both made up of arrays of entity objects. Below you will find descriptions for each of these entity objects, including data dictionaries that describe the object attribute names, types, and short description. We’ll also indicate which PowerTrack Operators match these attributes, and include some sample JSON payloads.

A collection of common entities found in Tweets, including hashtags, links, and user mentions. This entities object does include a media attribute, but its implementation in the entiites section is only completely accurate for Tweets with a single photo. For all Tweets with more than one photo, a video, or animated GIF, the reader is directed to the extended_entities section.

Entities Data Dictionary

The entities object is a holder of arrays of other entity sub-objects. After illustrating the entities structure, data dictionaries for these sub-objects, and the Operators that match them, will be provided.

Field Type Description
hashtags Array of Hashtag Objects

Represents hashtags which have been parsed out of the Tweet text. Example:

{
  "hashtags": [
    {
      "indices": [
        32,
        38
      ],
      "text": "nodejs"
    }
  ]
}
media Array of Media Objects

Represents media elements uploaded with the Tweet. Example:

{
  "media": [
    {
      "type": "photo",
      "sizes": {
        "thumb": {
          "h": 150,
          "resize": "crop",
          "w": 150
        },
        "large": {
          "h": 238,
          "resize": "fit",
          "w": 226
        },
        "medium": {
          "h": 238,
          "resize": "fit",
          "w": 226
        },
        "small": {
          "h": 238,
          "resize": "fit",
          "w": 226
        }
      },
      "indices": [
        15,
        35
      ],
      "url": "http://t.co/rJC5Pxsu",
      "media_url": "http://p.twimg.com/AZVLmp-CIAAbkyy.jpg",
      "display_url": "pic.twitter.com/rJC5Pxsu",
      "id": 1.1408049304097e+17,
      "id_str": "114080493040967680",
      "expanded_url": "http://twitter.com/yunorno/status/114080493036773378/photo/1",
      "media_url_https": "https://p.twimg.com/AZVLmp-CIAAbkyy.jpg"
    }
  ]
}
urls Array of URL Objects

Represents URLs included in the text of a Tweet.

Example (without Enhanced URLs enrichment enabled):

{
  "urls": [
    {
      "indices": [
        32,
        52
      ],
      "url": "http://t.co/IOwBrTZR",
      "display_url": "youtube.com/watch?v=oHg5SJ…",
      "expanded_url": "http://www.youtube.com/watch?v=oHg5SJYRHA0"
    }
  ]
}

Example (with Enhanced URLs enrichment enabled):

{"urls": [
      {
        "url": "https://t.co/D0n7a53c2l",
        "expanded_url": "http://bit.ly/18gECvy",
        "display_url": "bit.ly/18gECvy",
        "unwound": {
          "url": "https://www.youtube.com/watch?v=oHg5SJYRHA0",
          "status": 200,
          "title": "RickRoll'D",
          "description": "http://www.facebook.com/rickroll548 As long as trolls are still trolling, the Rick will never stop rolling."
        },
        "indices": [
          62,
          85
        ]
      }
    ]
}
user_mentions Array of User Mention Objects

Represents other Twitter users mentioned in the text of the Tweet. Example:

{
  "user_mentions": [
    {
      "name": "Twitter API",
      "indices": [
        4,
        15
      ],
      "screen_name": "twitterapi",
      "id": 6253282,
      "id_str": "6253282"
    }
  ]
}
symbols Array of Symbol Objects

Represents symbols, i.e. $cashtags, included in the text of the Tweet. Example:

{
  "symbols": [
    {
      "indices": [
        12,
        17
      ],
      "text": "twtr"
    }
  ]
}
polls Array of Poll Objects

Represents Twitter Polls included in the Tweet. Example:

{"polls": [
      {
        "options": [
          {
            "position": 1,
            "text": "I read documentation once."
          },
          {
            "position": 2,
            "text": "I read documentation twice."
          },
          {
            "position": 3,
            "text": "I read documentation over and over again."
          }
        ],
        "end_datetime": "Thu May 25 22:20:27 +0000 2017",
        "duration_minutes": 60
      }
    ]
  }

Hashtag Object  

The entities section will contain a hashtags array containing an object for every hashtag included in the Tweet body, and include an empty array if no hashtags are present.

The PowerTrack # Operator is used to match on the text attribute. The has:hashtags Operator will match if there is at least one item in the array.

Field Type Description
indices Array of Int

An array of integers indicating the offsets within the Tweet text where the hashtag begins and ends. The first integer represents the location of the # character in the Tweet text string. The second integer represents the location of the first character after the hashtag. Therefore the difference between the two numbers will be the length of the hashtag name plus one (for the ‘#’ character). Example:

"indices":[32,38]
text String

Name of the hashtag, minus the leading ‘#’ character. Example:

"text":"nodejs"

Media Object  

The entities section will contain a media array containing a single media object if any media object has been ‘attached’ to the Tweet. If no native media has been attached, there will be no media array in the entities. For the following reasons the extended_entities section should be used to process Tweet native media:
+ Media type will always indicate ‘photo’ even in cases of a video and GIF being attached to Tweet.
+ Even though up to four photos can be attached, only the first one will be listed in the entities section.

The has:media Operator will match if this array is populated.

Field Type Description
display_url String

URL of the media to display to clients. Example:

"display_url":"pic.twitter.com/rJC5Pxsu"
expanded_url String

An expanded version of display_url. Links to the media display page. Example:

"expanded_url": "http://twitter.com/yunorno/status/114080493036773378/photo/1"
id Int64

ID of the media expressed as a 64-bit integer. Example:

"id":114080493040967680
id_str String

ID of the media expressed as a string. Example:

"id_str":"114080493040967680"
indices Array of Int

An array of integers indicating the offsets within the Tweet text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Tweet text. The second integer represents the location of the first non-URL character occurring after the URL (or the end of the string if the URL is the last part of the Tweet text). Example:

"indices":[15,35]
media_url String

An http:// URL pointing directly to the uploaded media file. Example:

"media_url":"http://p.twimg.com/AZVLmp-CIAAbkyy.jpg"

For media in direct messages, media_url is the same https URL as media_url_https and must be accessed via an authenticated twitter.com session or by signing a request with the user’s access token using OAuth 1.0A. It is not possible to directly embed these images in a web page.

media_url_https String

An https:// URL pointing directly to the uploaded media file, for embedding on https pages. Example:

"media_url_https":"https://p.twimg.com/AZVLmp-CIAAbkyy.jpg"

For media in direct messages, media_url_https must be accessed via an authenticated twitter.com session or by signing a request with the user’s access token using OAuth 1.0A. It is not possible to directly embed these images in a web page.

sizes Size Object

An object showing available sizes for the media file. Example:

{
  "sizes": {
    "thumb": {
      "h": 150,
      "resize": "crop",
      "w": 150
    },
    "large": {
      "h": 238,
      "resize": "fit",
      "w": 226
    },
    "medium": {
      "h": 238,
      "resize": "fit",
      "w": 226
    },
    "small": {
      "h": 238,
      "resize": "fit",
      "w": 226
    }
  }
}
source_status_id Int64

Nullable. For Tweets containing media that was originally associated with a different tweet, this ID points to the original Tweet. Example:

"source_status_id": 205282515685081088
source_status_id_str Int64

Nullable. For Tweets containing media that was originally associated with a different tweet, this string-based ID points to the original Tweet. Example:

"source_status_id_str": "205282515685081088"
type String

Type of uploaded media. Possible types include photo, video, and animated_gif. Example:

"type":"photo"
url String

Wrapped URL for the media link. This corresponds with the URL embedded directly into the raw Tweet text, and the values for the indices parameter. Example:

"url":"http://t.co/rJC5Pxsu"

 

Media Size Objects

All Tweets with native media (photos, video, and GIFs) will include a set of ‘thumb’, ‘small’, ‘medium’, and ‘large’ sizes with height and width pixel sizes.

Sizes Object 

 

Field Type Description
thumb Size Object

Information for a thumbnail-sized version of the media. Example:

 

 

"thumb":{"h":150, "resize":"crop", "w":150}

 

 

large Size Object

Information for a large-sized version of the media. Example:

 

 

"large":{"h":238, "resize":"fit", "w":226}

 

 

medium Size Object

Information for a medium-sized version of the media. Example:

 

 

"medium":{"h":238, "resize":"fit", "w":226}

 

 

small Size Object

Information for a small-sized version of the media. Example:

 

 

"small":{"h":238, "resize":"fit", "w":226}

 

 

 

Size Object 

 

Field Type Description
w Int

Width in pixels of this size. Example:

 

 

"w":150

 

 

h Int

Height in pixels of this size. Example:

 

 

"h":150

 

 

resize String

Resizing method used to obtain this size. A value of fit means that the media was resized to fit one dimension, keeping its native aspect ratio. A value of crop means that the media was cropped in order to fit a specific resolution. Example:

 

 

"resize":"crop"

 

 

 

 

 

URL Object 

The entities section will contain a urls array containing an object for every link included in the Tweet body, and include an empty array if no links are present.

The has:links Operator will match if there is at least one item in the array. The url: Operator is used to match on the expanded_url attribute. If you are using the Expanded URL enrichment, the url: Operator is used to match on the unwound.url (fully unwound URL) attribute. If you are using the Exhanced URL enrichment, the url_title: and url_decription: Operators are used to match on the unwound.title and unwound.description attributes.

Field Type Description
display_url String

URL pasted/typed into Tweet. Example:

"display_url":"bit.ly/2so49n2"
expanded_url String

Expanded version of `` display_url`` . Example:

"expanded_url":"http://bit.ly/2so49n2"
indices Array of Int

An array of integers representing offsets within the Tweet text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Tweet text. The second integer represents the location of the first non-URL character after the end of the URL. Example:

"indices":[30,53]
url String

Wrapped URL, corresponding to the value embedded directly into the raw Tweet text, and the values for the indices parameter. Example:

"url":"https://t.co/yzocNFvJuL"

If you are using the Expanded and/or Enhanced URL enrichments, the following metadata is available under the unwound attribute:

Field Type Description
url String

The fully unwound version of the link included in the Tweet. Example:

"url":"https://blog.twitter.com/en_us/topics/insights/2016/using-twitter-as-a-go-to-communication-channel-during-severe-weather-events.html"
status Int

Final HTTP status of the unwinding process, a '200' indicating success. Example:

200
title String

HTML title for the link. Example:

"title":"Using Twitter as a ‘go-to’ communication channel during severe weather"
description String

HTML description for the link. Example:

"description":"Using Twitter as a ‘go-to’ communication channel during severe weather"

User Mention Object  

The entities section will contain a user_mentions array containing an object for every user mention included in the Tweet body, and include an empty array if no user mention is present.

The PowerTrack @ Operator is used to match on the screen_name attribute. The has:mentions Operator will match if there is at least one item in the array.

Field Type Description
id Int64

ID of the mentioned user, as an integer. Example:

"id":6253282
id_str String

If of the mentioned user, as a string. Example:

"id_str":"6253282"
indices Array of Int

An array of integers representing the offsets within the Tweet text where the user reference begins and ends. The first integer represents the location of the ‘@’ character of the user mention. The second integer represents the location of the first non-screenname character following the user mention. Example:

"indices":[4,15]
name String

Display name of the referenced user. Example:

"name":"Twitter API"
screen_name String

Screen name of the referenced user. Example:

"screen_name":"twitterapi"

 

Symbol Object  

The entities section will contain a symbols array containing an object for every $cashtag included in the Tweet body, and include an empty array if no symbol is present.

The PowerTrack $ Operator is used to match on the text attribute. The has:symbols Operator will match if there is at least one item in the array.

Field Type Description
indices Array of Int

An array of integers indicating the offsets within the Tweet text where the symbol/cashtag begins and ends. The first integer represents the location of the $ character in the Tweet text string. The second integer represents the location of the first character after the cashtag. Therefore the difference between the two numbers will be the length of the hashtag name plus one (for the ‘$’ character). Example:

"indices":[12,17]
text String

Name of the cashhtag, minus the leading ‘$’ character. Example:

"text":"twtr"

Poll Object

The entities section will contain a polls array containing a single poll object if the Tweet contains a poll. If no poll is included, there will be no polls array in the entities section.

Note that these Poll metadata are only available with the following Enterprise APIs:

 

Field Type Description
options Array of Option Object

An array of options, each having a poll position, and the text for that position. Example:

{"options": [
          {
            "position": 1,
            "text": "I read documentation once."
          }
      ]
}
end_datetime String

Time stamp (UTC) of when poll ends. Example:

"end_datetime": "Thu May 25 22:20:27 +0000 2017"
duration_minutes String

Duration of poll in minutes. Example:

"duration_minutes": 60

Retweet and Quote Tweet details

From the Twitter API perspective, Retweet and Quote Tweets are special kinds of Tweets that contains the original Tweet as an embedded object. So Retweets and Quote Tweet objects are parents of a child 'original' Tweet (and thus double the size). Retweets have a top-level 'retweeted_status" object, and Quoted Tweets have a "quoted_status" object.  For consistency, these top-level Retweet and Quote Tweet objects also have a text property and associated entities. However, the entities at the top level can differ from the entities provided by the embedded 'original' entities. In case of Retweets, new text is prepended to the original Tweet body. For Quoted Tweets, new text is appended to the Tweet body.

  [Note about and link to b140 docs]

In general, the best practice is to retrieve the text, entities, original author and date from the original Tweet in retweeted_status whenever this exists. An exception is getting Twitter entities that are part of the additive Quote. (example?) See below for more details and tips.

 

Retweets

An important detail with Retweets is that no additional Twitter entities can be added to the Tweet.  Users can not add hashtags, URLs or other details when they Retweet. However, the Retweet (top-level) text attribute is composed of the original Tweet text with “RT @username: ” prepended.  

In some cases, especially with accounts with long user names, the combination of these new characters and the original Tweet body can easily exceed 140 characters. In order to preserve support for 140 character based display and storage, the top-level body truncates the end of the Tweet body and adds an ellipsis (“…”). Consequently, some top-level entities positioned at the end of the original Tweet might be incorrect or missing, for instance in the case of a truncated hashtag or URL entry.

[Example here? Just refer to Tweet IDs linking to examples] https://twitter.com/FloodSocial/status/907974220298125312

Just another test Tweet that needs to be exactly 140 characters with trailing URL and hashtag http://wapo.st/2w8iwPQ  #Testing

In the above example, both the URL and hashtag were affected. Since the hashtag was completely truncated and the URL partially truncated, these are missing from the the top-level entities. You will also notice the additional user_mentions top-level entity coming from the “RT @floodsocial: ” prefix on the text field.

However, the Tweet text and entities in retweeted_status perfectly reflect the original Tweet with no truncation or incorrect entities, hence our recommendation to rely on the nested retweeted_status object for Retweets.

 

Quote Tweets

Quote Tweets were introduced in 2016, and differ from Retweets in that when you "quote" a Tweet you are adding new content "on top" of a shared Tweet. This new content can include nearly anything an original Tweet can have, including new text, hashtags, mentions, and URLs. One important exception is that no native media (photos, videos, and GIFs) can be added.

Since Twitter entities can be added, the Quote entities are likely different from the original entities.

In this example, a new URL and hashtag were positioned at the end of the Quote Tweet. 

[Example here? Just refer to Tweet IDs linking to examples] https://twitter.com/FloodSocial/status/907983973225160704

strange and equally tragic when islands flood... trans-atlantic testing of quote tweets | @thisuser @thatuser http://bit.ly/2vMMDuu  #testing

In this case, the top-level entities do not reflect the Quote details. 

-->  truncated:true /quoted_status_id: 907974220298125312 / quoted_status (original) /extended_tweet (quote) 

However, the Tweet text and entities in extended_tweet perfectly reflect the Quote Tweet with no truncation or incorrect entities, hence our recommendation to rely on the nested extended_tweet object for Quote Tweets.

 

 

Entities for User object

Entities for User Objects describe URLs that appear in the user defined profile URL and description fields. They do not describe hashtags or user_mentions. Unlike Tweet entities, user entities can apply to multiple fields within its parent object — to disambiguate, you will find a parent nodes called url and description that indicate which field contains the entitized URL.

In this example, the user url field contains a t.co link that is fully expanded within the entities/url/urls[0] node of the response. The user does not have a wrapped URL in their description.

 

JSON Example

 

  {
  "id": 6253282,
  "id_str": "6253282",
  "name": "Twitter API",
  "screen_name": "twitterapi",
  "location": "San Francisco, CA",
  "description": "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.",
  "url": "http:\/\/t.co\/78pYTvWfJd",
  "entities": {
    "url": {
      "urls": [
        {
          "url": "http:\/\/t.co\/78pYTvWfJd",
          "expanded_url": "http:\/\/dev.twitter.com",
          "display_url": "dev.twitter.com",
          "indices": [
            0,
            22
          ]
        }
      ]
    },
    "description": {
      "urls": [
        
      ]
    }
  }
}

Entities for Direct Messages

Entities for Direct Messages are very similar to entities for Tweets. However, there are a few differences concerning the media entities.

Unlike media shared in Tweets, media shared in Direct Messages requires authorization to view. This authorization can be presented via an authenticated twitter.com session or by signing a request with the User’s access token using OAuth 1.0A.

Also, in Tweets, media URLs are only in the media entities, but in Direct Messages, media URLs are in both media and URLs entities.

 

JSON Example

 

  {
  "id": 4.1103150381704e+17,
  "id_str": "411031503817039874",
  "text": "test $TWTR @twitterapi #hashtag http:\/\/t.co\/p5dOtmnZyu https:\/\/t.co\/ZSvIEMOPb8",
  "created_at": "Thu Dec 12 07:15:21 +0000 2013",
  "entities": {
    "hashtags": [
      {
        "text": "hashtag",
        "indices": [
          23,
          31
        ]
      }
    ],
    "symbols": [
      {
        "text": "TWTR",
        "indices": [
          5,
          10
        ]
      }
    ],
    "urls": [
      {
        "url": "http:\/\/t.co\/p5dOtmnZyu",
        "expanded_url": "http:\/\/dev.twitter.com",
        "display_url": "dev.twitter.com",
        "indices": [
          32,
          54
        ]
      },
      {
        "url": "https:\/\/t.co\/ZSvIEMOPb8",
        "expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
        "display_url": "pic.twitter.com\/ZSvIEMOPb8",
        "indices": [
          55,
          78
        ]
      }
    ],
    "user_mentions": [
      {
        "screen_name": "twitterapi",
        "name": "Twitter API",
        "id": 6253282,
        "id_str": "6253282",
        "indices": [
          11,
          22
        ]
      }
    ],
    "media": [
      {
        "id": 4.1103150383379e+17,
        "id_str": "411031503833792512",
        "indices": [
          55,
          78
        ],
        "media_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
        "media_url_https": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
        "url": "https:\/\/t.co\/ZSvIEMOPb8",
        "display_url": "pic.twitter.com\/ZSvIEMOPb8",
        "expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
        "type": "photo",
        "sizes": {
          "medium": {
            "w": 600,
            "h": 450,
            "resize": "fit"
          },
          "large": {
            "w": 1024,
            "h": 768,
            "resize": "fit"
          },
          "thumb": {
            "w": 150,
            "h": 150,
            "resize": "crop"
          },
          "small": {
            "w": 340,
            "h": 255,
            "resize": "fit"
          }
        }
      }
    ]
  }
}