Data Input — Steve Ka Lok Wong

Import, clean, ready-to-go

Import data

Directly from Github:

import delimited "https://raw.githubusercontent.com/filelink.csv", clear

Some options:

delimiter(comma) varnames(1) encoding(UTF-8)

Drop all column(s) if all obs. are missing

simple loop to go through all variables

foreach var of varlist _all {
	capture assert mi(`var')
		if !_rc {
        drop `var'
		}
}

The above code is written to be generic, therefore it can work on all datasets without any adjustment. There should be an equivalent one for dropping row(s) with all missing obs., I couldn’t find it in my old do-files as of now. Please drop me a note if you know one.

Deal with date

This guide from Wisconsin explains better than I could

Business Calendar

If the time variable is already nicely formatted, you can simply

format mydate %tbsimple

format mydate %tbsimple:CCYY.NN.DD

would display 21nov2011 as 2011.11.2 ; see more in Stata guide

Deal with String

See Stata Expert website

Annoying leading/ trailing space in string variables:

replace var = strtrim(var)

For country names, I usually remove leading/ trailing space and convert all to upper case:

gen COUNTRY = upper(stritrim(country))

Rename variable by stored value

Here I try to rename a variable name as a specified string observation,
i.e. var2 → tes2

clear all 

input str4 var1 str4 var2
"tes1" "tes2"
"tes3" "tes4"
end

local variable_name = var2[1] 
* storing the 1st obs. from var2 
di "`variable_name'" 
* just checking what we have stored

rename var2 `variable_name' 
* renaming var2 as the stored value