Import, clean, ready-to-go
Import data
Directly from Github:
import delimited "https://raw.githubusercontent.com/filelink.csv", clear
Some options:
delimiter(comma) varnames(1) encoding(UTF-8)
Drop all column(s) if all obs. are missing
simple loop to go through all variables
foreach var of varlist _all { capture assert mi(`var') if !_rc { drop `var' } }
The above code is written to be generic, therefore it can work on all datasets without any adjustment. There should be an equivalent one for dropping row(s) with all missing obs., I couldn’t find it in my old do-files as of now. Please drop me a note if you know one.
Deal with date
This guide from Wisconsin explains better than I could
Business Calendar
If the time variable is already nicely formatted, you can simply
format mydate %tbsimple
format mydate %tbsimple:CCYY.NN.DD
would display 21nov2011 as 2011.11.2 ; see more in Stata guide
Deal with String
See Stata Expert website
Annoying leading/ trailing space in string variables:
replace var = strtrim(var)
For country names, I usually remove leading/ trailing space and convert all to upper case:
gen COUNTRY = upper(stritrim(country))
Rename variable by stored value
Here I try to rename a variable name as a specified string observation,
i.e. var2 → tes2
clear all input str4 var1 str4 var2 "tes1" "tes2" "tes3" "tes4" end local variable_name = var2[1] * storing the 1st obs. from var2 di "`variable_name'" * just checking what we have stored rename var2 `variable_name' * renaming var2 as the stored value