I’m not going to deal with value labels here - just the values themselves. There are ways to parse these files programmatically but I wanted to go through this process myself. I’ve copied and pasted this information into Excel and used Excel to massage it into a SQL query. The import scripts and codebook provided by IPUMS provide information about the variables in the dataset. However, I’d prefer not to read the whole file into memory, which is why I’ve turned to the database solution. Now I could read this CSV directly into R the fread function from the data.table package takes care of this in 33 seconds. My CSV, which has around 21 million rows, is just under 2.5 GB in size. However, they also give you the option of downloading the data as a CSV. IPUMS provides the data in fixed-width format along with scripts to import the data into SAS, SPSS, and Stata out of the box. I’m drawing data from a large extract I produced from the IPUMS system, that includes 1 percent samples from the American Community Surveys between 20. I’m presently working on a series of projects that look at internal migration trends in the United States over the past several years. However, I’m trying to open-source my workflow to enhance reproducibility, so I wanted to figure out how to use IPUMS data with open-source software. I’ve been using IPUMS data for many years, but I have always done so with proprietary statistical software first SPSS, then Stata. IPUMS is a fantastic resource for demographers as it includes many different datasets, including microdata samples from the US and several international censuses, as well as complete-count historical census data for a few countries. The Integrated Public Use Microdata Series (IPUMS) is a project from the Minnesota Population Center to provide users with access to formatted microdata. This is a brief write-up to document how I am using IPUMS data with PostgreSQL, R, and dplyr. ![]() Pull down the "Run" menu and select "Submit." SAS will then read in your data.Ĭopyright © Minnesota Population Center, University of Minnesota.IPUMS with PostgreSQL, R, and dplyr Kyle Walker July 22, 2014 One line in the command file (the ".sas" file) will read libname ipumsdat '.' įor example, if your file is saved in a directory called IPUMS on your C drive, change that line to read libname ipumsdat 'C:\IPUMS\' You will see "end of do-file" when STATA has finished reading in the data. If your file is saved in a directory called IPUMS on your C drive, type cd "C:\IPUMS\" Stata:Ĭhange directories to the location containing your ".dat" and ".do" files. ![]() Pull down the "Run" menu and select "All". The examples below assume that you are working with a data file called "ipumsi_00001.dat" that is stored in a folder on your "C:" drive called "IPUMS", so that the full path to your dataset is "C:\IPUMS\ipumsi_00001.dat" SPSS:Ī line in the command file (the ".sps" file) will read data list file ='ipumsi_00001.dat'/Įdit that line to include the path to the directory where the data are stored: data list file ='C:\IPUMS\ipumsi_00001.dat'/ These instructions vary slightly for each statistics package. The final step is to modify your command file to indicate the location of the ".dat" file on your computer. The path is indicated in the "Location:" section of the Properties or Get Info window. If you are unsure about the path to the file, right-click on the file and choose "Properties" (or, on Mac, "Get Info"). Note the path to the location of the ".dat" file on your computer. When decompression is complete, you should see a file with the suffix ".dat" (decompression removes the ".gz" part of the suffix). ![]() A free option is 7zip other programs are also available. If your Windows-based does not know how to decompress the file, you need to download decompression software. Many Windows-based computers will also do this. ![]() Mac OSX will decompress the file when you double-click it. The downloaded data file should have the suffix "dat.gz", such as "ipumsi_". Do the same for the link named "SPSS", "SAS", or "Stata". To get the data, right click on the "data" link and select "Save Link As". On the Download or Revise Extracts page, you will see your extract requests along with the creation date, optional description, and five links on the right side of the page: data, codebook, SPSS, SAS, STATA.ĭownload the data file and the command file. Step 1: Download the data and command files Instead, we provide compressed data files and the command files necessary for reading the raw data into SPSS, Stata, and SAS. The IPUMS extract system does not provide data in the format of any particular statistics package. General Instructions for Opening an Extract on Your PC
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |