This document discusses reading SPSS files into Stata. It describes various options for converting SPSS files to a format that Stata can import. These include using SPSS or other statistical software to export to Stata format, using specialized conversion software, or writing custom conversion code in Stata, Mata, or as a plugin. The document then introduces the USESPSS command, a new Stata plugin that allows direct import of SPSS files without requiring other software. It discusses features and syntax of USESPSS, and introduces the DESSPSS command for describing SPSS file contents without importing the data.
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
Radyakin usespss
1. 2008 Summer North American Stata Users Group meeting
Chicago, 24-25 July 2008
Using SPSS files in Stata
Sergiy Radyakin
The World Bank
2. How to get the data in?
• Use Stata to manipulate the data and read it in
• Use the data producing application to export the data in the
proper format that Stata can later import
• Use specialized conversion software to convert to a proper
format
• Use another statistical package that supports both formats to
make it convert the dataset
• Write own conversion program in/as:
– Stata (slow, portable)
– Mata (faster, portable)
– Plugin (very fast, not portable, dependent on Stata’s bit-width)
– Standalone (very fast, not portable, independent of Stata’s bitwidth)
2
3. Which data formats does Stata
support (as of v10)?
• Stata native formats (use)
• ASCII data with dictionaries (insheet/infix)
• SAS XPORT format (fdause)
• Data import via ODBC, provided that a
required driver is installed and configured
• But, no SPSS support
3
4. When SPSS is available:
• SPSS v14 and later supports exporting data to Stata
format
• SPSS_to_Stata_00.sbs script by Alasdair Crockett is
available for earlier releases, requires both SPSS
and Stata for conversion
Data Services Guides: SPSS_to_Stata Conversion Utility Guide
http://www.data-archive.ac.uk/support/conversionguide.pdf
• This can be automated with an .ado wrapper similar
to USESAS by Dan Blanchette, which requires SAS
to be installed to import data to Stata
• These are not “true readers”, since they require
SPSS or SAS to be installed (with license costs, etc.)
4
5. Specialized Conversion Software
• Stat/Transfer
– http://www.stattransfer.com/
– $295 (New unit, Windows)
• DBMS/Copy
– http://www.dataflux.com/Product-
Services/Products/dbms.asp
– $495 (New individual, Windows)
• Both support command line parameters to convert in
a batch-mode and thus can be “wrapped” for use with
Stata, see e.g. STCMD by Roger Newson
(as of July 13, 2008)
5
6. USESPSS
• USESPSS is a new command for Stata to
read in SPSS data (*.sav files)
• It is a “true reader ” – does not require any
other software (other than OS Windows)
• Free
• Implemented as a plugin, with portions of
code (e.g. file decompression) written in
assembler for performance optimization
• Note: SPSS format documentation is not
released, and only fragmented information is
available in the Internet
6
7. USESPSS Features
• Reads *.sav files originating from both Windows and UNIX
versions of SPSS (LoHi and HiLo byte orders)
• Supports compressed and non-compressed SPSS files
• Preserves variable and value labels
• Optimizes data storage types (2-pass)
• Supports long variable names
• Automatically renames not allowed variable names and
resolves naming collisions
• Preserves number of decimals in numeric formats.
• Transfers, but does not format date/time variables
7
8. USESPSS Syntax
usespss can be used as any other command in the
command line, user’s .do files and .ado programs:
usespss [using] “filename.sav”
[,clear
saving(“filename.dta”)
iff(condition)
inn(condition)
memory(memsize)
lowmemory(memsize)]
8
9. Memory Tradeoff
• Stata and plugins share the same address space
• As a consequence, plugins can read Stata’s data
directly (if they know where it is located) and call
Stata’s subroutines (if exposed).
• However, the more memory is allocated for Stata
data, the less memory is available to the plugins,
because the size of the address space is limited
(typically 2GB on a 32-bit Windows system). In other
words, plugins compete for memory between
themselves and with Stata.
9
10. Memory Tradeoff
• Similarly to Stata, usespss attempts to load the whole
data file into memory; this speeds up the 2-pass
processing (1st pass – optimization of the storage
types, 2nd pass – actual conversion)
• But, when user loads the SPSS data Stata data (if
any) is discarded. So Stata’s memory use can be
temporarily decreased within usespss.ado
• It is important to do this when working with large files,
otherwise the plugin will not be able to allocate
enough memory to load the SPSS data file.
10
11. Memory Use
Consider the following code:
set mem 800m
usespss using “mydata.sav”, lowmemory(10) memory(800)
Limit, e.g.
2GB
Plugin code
Free memory Free memory
Plugin data
Memory
800m
10m Stata data
Stata code
usespss.ado Any dataset in Stata memory is Stata memory is set usespss.ado time
starts Stata’s temporarily set to a to a higher value ends
memory is low value
cleared 11
12. DESSPSS
• desspss is a new Stata command to describe
the contents of an SPSS system *.sav file
• does not destroy data in the memory
• works much faster than
usespss using filename.sav, saving(filename.dta)
describe
because no optimization/conversion is
actually performed, but does not list the
variable types (these are determined after
optimization)
• saves all descriptive information in r()
12
13. DESSPSS Example Report
. desspss using artificial.sav
DESSPSS Report
==============
SPSS System file: artificial.sav
Created (date): 17-Jul- 8
Created (time): 22: 4: 0
SPSS product: SPSS-X SYSTEM FILE. SPSS 5.0 MS/Windows made by DBMS/COPY
File label (if present):
File size (as stored on disk): 382692 bytes
Data size: 381432 bytes
Data stored in compressed format
This file is likely to originate from a Windows platform (LoHi byte order)
Number of cases (observations): 10000
Number of variables: 10
Case size: 88 bytes
----------------------------------------------------------------------
Variables:
GENDER MARRIED B_YEAR W_HOURS CITY_COD
AGE EMP_STAT WAGE FULLTIME CITY_NAM
13
14. Demonstration:
• Embedded artificially created dataset in SPSS format:
Click on the icon opens the SPSS file in Stata if:
artificial.sav
5. usespss is installed in Stata, and
6. file assosiation was set:
--------------------- beginning of sav_file.reg --------------------
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT.sav]
@="sav_auto_file"
[HKEY_CLASSES_ROOTsav_auto_file]
@="SPSS Dataset"
[HKEY_CLASSES_ROOTsav_auto_fileshell]
[HKEY_CLASSES_ROOTsav_auto_fileshellopen]
[HKEY_CLASSES_ROOTsav_auto_fileshellopencommand]
@=""C:Stata10sewsestata.exe" usespss "%1"“
---------------------- end of sav_file.reg -------------------------
Substitute with the full name of the Stata’s executable
• Questions?
14