API with Python via Google colab
One of the readers raised the problem that Stata-Python is only available in Stata 16 onwards. Users with older versions would not be able to make use of the convenient API. Here is a variant version from the original guide where I will show you how to download the data via Google Colab, a completely free and online python platform requires minimal setup.
Step 0 - Preamble
Install Google Colab on your browser (preferably Google Chrome) if you haven’t had it.
Setup your folder on google drive. Recommended folder name: Comtrade with two subfolder: code and i_X_CHN
Step 1 - Mount Google Drive locally
from google.colab import drive drive.mount('/content/drive')
My personal trick is to separate the drive mounting code from the rest of the code block, hence you don’t need to re-authorize in every re-run.
Step 2 - Paste the Python code and adjust file directory
Copy and paste everything between Python: and end from the final do-file
Add the following file directory shortcut
root = '/content/drive/MyDrive/Comtrade/i_X_CHN'
In addition, adjust the data export line:
df.to_stata(f'{root}/i_X_CHN_{ps}.dta')
Then you should be done and arrive to this:
from google.colab import drive drive.mount('/content/drive')
import json import numpy as np import pandas as pd import requests root = '/content/drive/MyDrive/Comtrade/i_X_CHN' def Comtrade_Scraper (ps: int, type: str= 'C', freq: str= 'A', px : str= 'S2', r : str= 'all', p : int= 156, rg : int= 2, cc : str= 'AG2'): """ Wrapper for creating URLs to access the Comtrade API ARGUMENTS ********* Required ps = year """ base = 'https://comtrade.un.org/api/get?max=10000' url = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}' result = requests.get(url).json() if 'dataset' in result: df = pd.DataFrame(result['dataset']) df = df.replace({None: np.nan}) df.columns= [i[:32] for i in df.columns] df.to_stata(f'{root}/i_X_CHN_{ps}.dta') return df for i in range(2000,2022): Comtrade_Scraper(i)
Output files should look like this:
Some drawbacks:
There are some glitches, e.g. 2004 data was not downloaded (file size only 315 bytes)
Re-run the code block too many times will hit the request limit (somehow more often than the Stata version)
If you have a Python setup already, e.g. ANOVA or others, you are probably better off using that than Google Colab. It is convenient but it is not without a cost.