Advanced Database Tricks for Social Scientists – Querying a SQL Database in Stata

This is a quick guide to setting up a connection to directly query an ODBC-compliant database from a do file in Stata.

NOTE: As the title suggests, this guide assumes relatively advanced knowledge of both relational databases (e.g. Oracle, SQL Server, MySQL, etc) and Stata.  Viewer discretion advised.

OBJECTIVE: Use a Stata do file to directly query a live ODBC-compliant, relational, T-SQL database and load the results as the current dataset in Stata.  Do this using a connection string so no OS-specific named ODBC datasources are required.

Step 1 – Make sure you have a working connection to your database server of choice (and have the appropriate permissions and access)

Step 2 – Assemble the following information to use in creating a connection string.

Step 3 – Use the following code in your Stata do file

*** LOAD DATA ***
// clear any data already loaded in memory
clear

//database connection
local db = "DRIVER={YOURDRIVERNAMEHERE};SERVER=yourservernameoripaddress;DATABASE=databasename;UID=uid;PWD=pwd;"

// SQL SELECT statement to retrieve data
local sql = "SELECT * FROM YourTable"

// Load data via ODBC (using connection string)
odbc load, exec("`sql'") connectionstring(“`db'") clear

//destring any vars that came in as strings but should be numeric
destring, replace

 

The driver attribute in the connection string is the full name of the driver as your OS sees it when installed. For example, when I connect to MySQL, I use the following

DRIVER={MySQL ODBC 5.3 Unicode Driver}

You can find the driver name by checking your ODBC drivers using a command line (WIN/OSX/Linux) or with a GUI through the ODBC settings in your control panel (WIN) or by installing the iODBC GUI (OSX/Linux) available at: http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/Downloads

Coming Soon: In a former life, I was a software engineer, IT project manager, and database developer. I built software and worked with enterprise-level database tools on a daily basis for about a dozen years before getting my PhD and becoming a full-time researcher and teacher in the field of Management and Entrepreneurship.  When my fellow academics learn this part of my history, they often ask me for advice in learning how to better manage the data they use for social science research.  So, I’ve decided to post a series of tutorials (not sure whether they’ll be articles, videos, or both) giving an introduction to how an ordinary social scientist can use the kinds of high-horsepower tools that software and database engineers use every day.  Stay Tuned.