API for UN Comtrade bulk data
21 Mar 2024: Found a Comtrade GitHub page with their official API guideline and code to use their new platform. They have explained better than I could: link here
21 Sep 2023: Comtrade decided to make an upgrade that makes the platform somehow LESS user-friendly... This API guide is not compatible to the current Comtrade platform.
UN Comtrade Database is one of the best source when it comes to bilateral trade data by product code. As of early 2022, it covers more country-year observations than WTO and ITC. This blogpost aims to guide you through step-by-step to fetch large amount of data from Comtrade via Stata-Python function.
*I used to be a Python dummy (to a certain extent, still am), it was my buddy, Satyam Anand, at Graduate Institute (now at George Town) introduced the API magic to me. All credit goes to him.
Too long didn’t read: jump to the final do.file here
Step 0 - Preamble: Is your Stata ready for this?
1) Check if Python is already integrated with your Stata, you can simply type “python search” in the Stata command box. If there is no Python installation found, follow this
. python search ---------------------------------------------------------------------------------------------------------------------- Python environments found: /usr/local/bin/python3 ----------------------------------------------------------------------------------------------------------------------
2) Check if you have the necessary python packages for the task. Follow this guide to install the missing package(s)
. python which json <module 'json' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py > '> . python which numpy <module 'numpy' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/n > umpy/__init__.py'> . python which pandas <module 'pandas' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ > pandas/__init__.py'> . python which requests <module 'requests' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package > s/requests/__init__.py'>
Step 1 - Identify your data of interest
As a working example, let’s say we want to get:
Data: country i export to China, for all countries in the world
Frequency: Annual
Time period: 2000 to 2021
Classification: All 2-digit SITC Rev. 2
Once, you put the according info on Comtrade - Get data you should see the below page with “View API call” in the bottom
! The requested data is therefore accessible via: http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2
Note that, I changed max=502 to max=1000, I will explain below what does that mean and why.
Step 2 - Understand the API call
/api/get?max=502&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2 (What are we looking at ??)
API code | Range | Category | In simple words |
---|---|---|---|
max=502 | [1, 10000] | Limit | Download limits on number of observations, guest limit is 10000 |
type=C | [C,S] | Data type | C is Commodities (merchandise trade data); S is Services (trade in services data) |
freq=A | [A,M] | Obs. frequency | A is annual frequency; M is monthly |
px=S2 | S[i], H[i], i = [1,4] | Classification system | S2 is SITC Revision 2 ; S[i] is SITC Revision i ; H0 is HS 1992 (oldest), H4 is HS 2012 (relatively latest) |
ps=2021 | [1962,2021] | Time period | 2021 is year 2021; it is possible to do up to 5 years per request, but it might run into the download limit, hence I will do it by year here |
r=all | All, UN codes | Reporting area | all is all countries (Not aggregated) |
p=156 | All, UN codes | Partner(s) | 156 is China’s UN numerical country code |
rg=2 | [1,2,all] | Trade flow | 2 is export from reporting area to partner, 1 is import, all is both (not recommended, because download ceiling AGAIN!) |
cc=AG2 | [TOTAL, AG1, AG2, AG3, AG4, AG5, AG6, ALL] | Classification code | AG2 is all of the 2-digit codes, AG6 is the most detailed option that are internationally comparable. |
See more in SITC Revision 2, UN numerical country code and API documentation
Some of you might already notice the pattern:
/api/get?[API code 1]&[API code 2]&….
while each block of [API code] is in a format of
[x] = [choice]
We will make use of this pattern for the Python code below.
! Now, recall our objective is to fetch all bilateral export to China from 2000 to 2021, that means the above API setting is applicable to all years once we introduce a simple loop on ps = [2000, 2001, 2002, …, 2021]
Step 3 - Check the API link
We have this link generated by the click and select interface from UN Comtrade
http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2
We better make sure the API link is valid before put it in work
Install JSON formatter on Chrome extension
Copy and paste the API link
You should see the below:
! Note that all our data of interest begins from
“dataset” : (red box)
Step 4 - Test version
1) Open do-file, set up the destination folder
local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours *country i export to china cd `filepath'/output/i_X_CHN
2) Initiate python in Stata do-file
python: [python code] end
3) [python code part] Add necessary packages
import json import numpy as np import pandas as pd import requests
4) Copy and paste the valid API link we have AND fetch the JSON file
url = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2' result = requests.get(url).json()
5) Recall that we found our data of interest from “dataset” : in Step 3
*Indent is essential
if 'dataset' in result: df = pd.DataFrame(result['dataset']) df = df.replace({None: np.nan}) df.columns= [i[:32] for i in df.columns]
6) Store the data in Stata format (.dta)
df.to_stata(f'i_X_China_2021.dta') end *exiting python environment
7) Check your work
use ./i_X_China_2021, clear
local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours *country i export to china cd `filepath'/output/i_X_CHN python: import json import numpy as np import pandas as pd import requests url = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2' result = requests.get(url).json() if 'dataset' in result: df = pd.DataFrame(result['dataset']) df = df.replace({None: np.nan}) df.columns= [i[:32] for i in df.columns] df.to_stata(f'i_X_China_2021.dta') end use ./i_X_China_2021, clear
Step 5 - Loop it through years
Now that we have a single-year do-file up and running, we can easily extend it with a loop from 2000 to 2021 and storing each year as one .dta file.
1) Compartmentalize the API link
def Comtrade_Scraper (ps: int, type: str= 'C', freq: str= 'A', px : str= 'S2', r : str= 'all', p : int= 156, rg : int= 2, cc : str= 'AG2'):
Note that ps is the only compartment without an equal sign as it is our only varying part and it is set to be loops by the later command
2) Define the ps (refers to year) being the varying input
""" Wrapper for creating URLs to access the Comtrade API ARGUMENTS ********* Required ps = year """
3) Assemble the API link
base = 'https://comtrade.un.org/api/get?max=10000' url = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'
4) JSON to .dta file
a little twist here is that now we name the file as i_X_China_{ps}.dta where {ps} refers to the year [2000, 2021]
result = requests.get(url).json() if 'dataset' in result: df = pd.DataFrame(result['dataset']) df = df.replace({None: np.nan}) df.columns= [i[:32] for i in df.columns] df.to_stata(f'i_X_China_{ps}.dta') return df
Note that I also added return df, it is just a small trick to return the dataframe right before converting it into .dta file. The practical use is that you can keep an eye on which year has ran and quickly vet if there is any issue from the no. of obs. (red box) and/or no. of columns (blue box).
5) Introduce the loop
Python iterates with increments by 1 and stops before the ending number. In other words, we have to put (2000, 2022) for our period of interest year 2000 to 2021.
for i in range(2000,2022): Comtrade_Scraper(i)
6) Assemble all
local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours *country i export to china cd `filepath'/output/i_X_CHN python: import json import numpy as np import pandas as pd import requests def Comtrade_Scraper (ps: int, type: str= 'C', freq: str= 'A', px : str= 'S2', r : str= 'all', p : int= 156, rg : int= 2, cc : str= 'AG2'): """ Wrapper for creating URLs to access the Comtrade API ARGUMENTS ********* Required ps = year """ base = 'https://comtrade.un.org/api/get?max=10000' url = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}' result = requests.get(url).json() if 'dataset' in result: df = pd.DataFrame(result['dataset']) df = df.replace({None: np.nan}) df.columns= [i[:32] for i in df.columns] df.to_stata(f'i_X_China_{ps}.dta') return df for i in range(2000,2022): Comtrade_Scraper(i) end *Explains below in 8) Final touch di "`c(pwd)'" // Display path to current folder local files : dir "`c(pwd)'" files "*.dta" foreach x of local files { di "`x'" // Display file name append using `x' } save i_X_China_2000_2021.dta, replace
7) Result
8) Final touch
Appending all the by-year files into long format
di "`c(pwd)'" // Display path to current folder local files : dir "`c(pwd)'" files "*.dta" foreach x of local files { di "`x'" // Display file name append using `x' } save i_X_China_2000_2021.dta, replace
Variant version:
Some final notes:
An obvious shortcoming is that the download data limit restricting researchers access to the most granular data possible (6-digit product code). Some organisation has premium site license subscription, e.g. Geneva Graduate Institute. By connecting via their IP addresses, I could download up to 100’000 obs. per request, which is sufficient to cover all country i-to-country j by 4-digit product code (75’803 country-pair-product obs. for year 2019).
To go further from there, you will need to find an authentication token/ valid account at Comtrade from powerful friend(s), colleague(s) or supervisor(s).
Long page done and you have made it! Hope you all enjoy this guide! Any questions / suggestion, you can find me here.
More resources:
Stata-Python integration blogpost by Chuck Huber