Presentation made at OpenDataCamp in Bangalore (24th March 2012) on the organisation of unit-level data published by National Sample Survey Organisation, Govt. of India.
3. 1. Pre-History
1862 British Administration constituted the Statistical Committee for preparation of
forms for primary data collection, followed by the publication of the first
Statistical Abstract of British India (1840-1865)
1881 First Decennial Population Census begins
1914 Directorate of Statistics was established in Calcutta in 1914 that later became the
Directorate of Commercial Intelligence and Statistics, which was entrusted with the
compilation of colonial trade statistics
1916 Indian Industrial Commission
1925 Economic Enquiry Committee
1939 Wholesale Price Index collection and calculation begins
1947 P. C. Mahalanobis was appointed the Honorary Statistical Advisor
1949 The Central Statistical Unit was established
1951 Central Statistical Organization and the Department of Statistics are established.
They continue to be the major organisations for collection of national-level data
5. 2. Glossary
Round: Each round of data collection by NSSO, usually of annual duration
Schedule: Each thematic focus for data collection, multiple Schedules
per Round
Thick Round: Major data collection rounds repeated every 5 years
(hence called quinquennial rounds)
Thin Round: Minor data collection rounds
State-Region: Usually a cluster of three or more districts in a state
Fixed-Width File: Fixed-width text files are data files in text format specified
by fixed column widths, pad character and left/right alignment.
.do File: A Stata file format. Collection of Stata commands.
.dta File: A Stata file format for data files, similar to Excel, readable by R
.smcl File: A Stata file format for log files, automatically records the Stata
commands and results
7. 3. Schedules
Main themes for thick rounds / quinquennial surveys
Consumer expenditure
Employment and Unemployment
Debt and Investment
Manufacturing Enterprises (Organised and Unorganised)
Main themes for thin rounds
Participation and Expenditure in Education
Particulars of Slum and Housing Condition
Morbidity and Healthcare
Situation Assessment Survey of Farmers
Land and Livestock Holding
9. 4. Organisation of Data
Organisation of Raw Data
- The fixed-width file (.txt)
- The binary coding of information
The Supporting Files
- The ‘schedule’ file – survey questionnaire
- The ‘layout’ file – how the information is organised in data files
- The ‘readme’ file – how different data sets are organised
- The state and district codes
Level
- Coding information about single entity in multiple rows
10. 4. Organisation of Data
Raw Data
12121212121212121212
232323232323232323
343434343434343434
Layout
Column 1-2: Person Serial Number
Column 3-4: Age of the Person
Column 5-6: Educational Status
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
11. 4. Organisation of Data
Raw Data
12121212121212121212
232323232323232323
343434343434343434
Layout
Column 1-2: Person Serial Number
Column 3-4: Age of the Person
Column 5-6: Educational Status
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
12. 4. Organisation of Data
Raw Data
12121212121212121212
232323232323232323
343434343434343434
Layout
Column 1-2: Person Serial Number
Column 3-4: Age of the Person
Column 5-6: Educational Status
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
13. 4. Organisation of Data
Raw Data
12121212121212121212
232323232323232323
343434343434343434
Layout
Column 1-2: Person Serial Number
Column 3-4: Age of the Person
Column 5-6: Educational Status
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
14. 4. Organisation of Data
Raw Data with Levels
1201121212121212121212
1202121212121212121212
2301232323232323232323
Layout
Column 1-2: Person Serial Number
Column 3-4: Level Code
Column 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
15. 4. Organisation of Data
Raw Data with Levels
1201121212121212121212
1202121212121212121212
2301232323232323232323
Layout
Column 1-2: Person Serial Number
Column 3-4: Level Code
Column 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
16. 4. Organisation of Data
Raw Data with Levels
1201121212121212121212
1202121212121212121212
2301232323232323232323
Layout
Column 1-2: Person Serial Number
Column 3-4: Level Code
Column 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
17. 4. Organisation of Data
Raw Data with Levels
1201121212121212121212
1202121212121212121212
2301232323232323232323
Layout
Column 1-2: Person Serial Number
Column 3-4: Level Code
Column 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
18. 4. Organisation of Data
Raw Data with Levels
1201121212121212121212
1202121212121212121212
2301232323232323232323
Layout
Column 1-2: Person Serial Number
Column 3-4: Level Code
Column 5-6: Age of the Person [if Level = 01]; Educational Status [if Level = 02]
…
Schedule
Q.1: What is the serial number of the person?
Q.2: What is the age of the person?
Q.3: What is the educational status of the person?
[12 = up to class X; 23 = class X-XII; 34 = graduate and higher]
…
20. 5. The Extraction
Converting NSSO raw data to tabular form (Comma Separated) using Stata
- The .do file: Set of Stata commands for extraction
- The ‘infix’ command: Mapping variables to data columns
- The ‘var’ command: Labeling the variables
- The levels: Multiple data rows for single entity
- The .dta file: The Stata spreadsheet format
- The .smcl file: The Stata commands and results log file