Downloading Historical PowerTrack Files

Accessing download links 

If you already have access to your Job URL, you can skip to the Downloading your job section.

You can use the GET /jobs Historical PowerTrack endpoint to request a list of all the jobs associated with your account and a brief summary of their status.

Once you find the job that you are trying to download, make note of the Job URL, as you will be using it in a later step. Your UUID is located at the end of your Job URL:

https://gnip-api.gnip.com/historical/powertrack/accounts/{ACCOUNT_NAME}/publishers/twitter/jobs/{JOB_UUID}.json

The results of the GET /jobs request will also include a “percentComplete” object. Once this has reached 100, you can move onto the next step.

Downloading your job

Considering that your HPT job might contain thousands of files that need to be downloaded, unzipped, and combined, you will likely want to use an automated process. We have listed some example strategies below to help you get going with your data!

Downloading files with a bash script (Recommended)

  1. Replace the “.json” extension of the Data URL/Job URL with a “.csv” extension, as in:
    https://gnip-api.gnip.com/historical/powertrack/accounts/{ACCOUNT_NAME}/publishers/twitter/jobs/{JOB_UUID}/results.csv
    
  2. Make a GET request to the CSV endpoint (or run the link in your browser) to download a file containing the list of download links that includes your data.

    Your username will be your email address. If you forgot your password, you can reset it here.

    Each line contains a file name and the corresponding Amazon S3 link. This file will be used by later steps to download, unzip, and combine your data files.
  3. Install the PTDataDownload tool on your machine:
    1. a. Download PTDataDownload.zip

    2. b. Unzip the file and note the resulting folder’s location
  4. Add the CSV file from step 2 into the ‘PTDataDownload/input’ folder and make sure the ‘PTDataDownload/download’ folder is empty.
  5. Open ‘Terminal’ or your favorite command line interface and navigate your working directory to ‘PTDataDownload’.
    If you need help with this step, we recommend that you read through this tutorial about navigating through your file system with bash.
  6. Type the following into your command line:
    $ ./run.sh
    
  7. You will be prompted with the following:
    Historical Download Options:
      d: Download files.
      D: Delete downloaded file.
      q: Quit/Exit.
    
    Enter selection:
    
    You will enter ‘d’ and ‘return’ to initiate the download of your 10-minute interval gzip files into the ‘downloads’ folder within ‘PTDataDownload.’ This could take a while depending on the size of the job you are trying to download.

    Please note: If your job is interrupted for any reason, delete the most recently added gzip file from the ‘downloads’ folder and re-enter$ ./run.shinto your command line. 
  8. Once all of your files have been downloaded, make your working directory PTDataDownload and enter the following into your command line:
    $ gzcat -d -r downloads/ > filename.json
    

    For Mac OS users - Please note that you might have to run rm -f downloads/.DS_Store before you run the gzcat command to delete the automated .DS_Store file.

    This will automatically unzip and combine all of your files into a single file, which you can name whatever you’d like. This file will show up in PTDataDownload once complete.

    Please note: If your job is more than 10gb, you might want to consider dividing the downloaded gzip files from step 7 into a few different folders so that you don't end up with a file that is too big for your computer to handle.
    If you do divide your data, then you will need to run the gzcat command for each of the folders containing gzip files. 

    Example:

    $ gzcat -d -r downloads/11_2017_folder > filename_11_2017.json
    $ gzcat -d -r downloads/12_2017_folder > filename_12_2017.json
    $ gzcat -d -r downloads/01_2018_folder > filename_01_2018.json
    

Next steps

Once you’ve downloaded your data, you should review our PowerTrack Data Format documentation.