The Cornell University CISER Data Archive contains over 27,000 numeric datasets covering topics such as demography, economics, health, labor, and surveys. It provides consulting services to help users find, access, and use appropriate data for their research needs. Cornell researchers can download publicly available datasets or access restricted data within the CISER computing environment. The archive also maintains a restricted data center for Cornell researchers to preserve and share their own research data.
2. Data Archive - Collection and Services
•Established over 30 years ago
•Collection of numeric datasets to support quantitative
research
c. 27,000 online files in addition to thousands of studies on CD/DVD
•Emphasis on demography (state/federal censuses),
economics, health, labor, election studies, attitudinal and
behavioral studies, family life etc.
3. •Consulting services to match user needs with appropriate data
•finding, accessing and using data
•Current Cornell researchers can download archive files from online
catalog (search & browse) in formats conversant with statistical software
•Data files are identified by a ‘traffic light’ icon that indicates usage level:
• Green – downloadable by anyone
• Yellow – downloadable from links in the catalog with CUWebAuth authentication
(for use within the CISER research computing environment - CISERRSCH) –
Cornell researchers can apply for a computing account
• Red – data to be used in restriction ( via conditions imposed by data provider)
• Cornell Restricted Access Data Center
4. •Provides Cornell social science researchers with a repository for
sharing and providing long-term preservation of their numeric/statistical
research data
•Participates in Cornell’s Research Data Management Service Group
•Assist Cornell social science researchers with Research Data
Management (RDM) plans
•Provide Cornell social science researchers with support and expertise
in obtaining and using restricted data
5. Believe it or not – not all data are the same …
• Data means different things to different people (informatics,
geography, art history, system biology, architecture,
archaeology etc)
• Definition of data / value of data in a commercial sense is
different to that in an academic sense
• Data requirements differ for the undergraduate, postgraduate,
teacher, researcher
• Data catalogs, data libraries, gateways, portals exist for a range
of disciplinary domains
6. Research data may include all of the following:
• Text or Word documents, spreadsheets
• Laboratory notebooks, field notebooks, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes
• Photographs, films
• Slides, artifacts, specimens, samples
• Database contents including video, audio, text, images
• Models, algorithms, scripts
• Contents of an application such as input, output, log files for analysis software, simulation
software, schemas
• Methodologies and workflows
• Standard operating procedures and protocols
Formats, size, volume, open, confidentiality, complexity, flat files – factors to consider as part of
the reference interview (computing capabilities, software dependencies, copyright and ethical
considerations)
7. Data Reference Interview - establish what the user actually needs (not
what they think they may need!) :
• Statistics or data? Summary statistics, secondary use datasets, raw or derived data
•
•
•
•
•
•
•
•
Software requirements, contingencies
What is the subject or topic? Health, unemployment, deprivation
Type of analysis? Visualization, map, statistical analysis, modelling
What is the unit of analysis? Individual, family, county-level, country-level
Geographic constraints?
Time constraints? Range of years, daily, monthly, quarterly, annual
Cross-sectional or longitudinal?
Data type? Historic, demographic, financial, administrative, geospatial
8. Sets the goals and structure for the data interview and helps articulate any
decisions made by the data librarian
Establishes the ‘learning stage of the user’ and helps put them at ease
Observations:
Establish time-line for research and data needs (can buy data librarian time, set
priorities, allow time for further investigation)
Fine balance between assistance and exploitation!!
Recognition that data finding, data handling etc may be the learning objective itself
(e.g. identifying variables and using a codebook)
All data queries should be viewed as new. It will soon become evident if the
request has similarities with previous enquiries.
9. Important not to use too much jargon and to double-check understanding of
unfamiliar terms – often we use the same word to mean something different,
conversely we can use different words but mean the same thing
Sometimes users will say they understand but often don’t. If there’s any doubt
ask and explain again.
Supply of up-to-date user guides to hand
Call Management Systems are great knowledge banks
Be familiar with available expertise (colleagues, organization, national,
international)
Google is a friend. A very good friend.
10. Two recent examples:
Q. Grad student wanting # of plastic surgery clinics in Seoul, South Korea from 19902009
A. the International Society of Aesthetic Plastic Surgery (ISAPS http://www.isaps.org/ ) in particular the ISAPS International Survey on
Aesthetic/Cosmetic Procedures – there’s data for 2010 and 2011
(http://www.isaps.org/isaps-global-statistics.html ).
Process:
Check NGO sources (World Bank, UN etc)
Check Google – deep searching in to results using a variety of related terms. Time
consuming but often productive. Searches often find references in literature which
can be followed up or discussion forums.
11. User needs statistical data about agrarian violence (originated by land disputes) variables
include: food riots, assassinations (if occurred as result of land dispute), imprisonments etc
unit of investigation is country-year; area of interest: Latin American countries; period: from
1960 until now, yearly
Process:
Not likely to available through NGO sources
Try deep searching through Google – find literature sources with summary statistics about
land disputes for individual countries – no time series
Responded:
Check Latin America Network Information Center (LANIC) at Univ. Taxas at Austin
Speak with our Cornell Colleague Sean Knowlton who has expertise in Latin American
statistical resources.
Check CEPALSTAT - gateway to statistical information of Latin America and the Caribbean
countries published by Economic Commission for Latin America and the Caribbean
11
12. Social Science research data resources
•Inter-University Consortium for Political and Social Research (ICPSR)
•National Archive of Criminal Justice Data
•Minority Data Resource Center
•National Archive of Computerized Data on Aging
•Roper Center for Public Opinion Archives
•International Data Archives e.g. CESSDA, UKDA, Eurostat
• CESSDA catalog (DDI) provides a multi-lingual interface to datasets from member social
science data archives across Europe
• Study description and online documentation are free
•Non-Govenmental Organizations
•National / Governmental Statistical Agencies
13. Social science statistical data on the internet:
CISER Internet Data Sources:
https://ciser.cornell.edu/info/datasource.shtml
MIT Data Sources:
http://libguides.mit.edu/ssds/any-subject
Columbia University Social Science Data
http://library.columbia.edu/locations/dssc/data/socsc.html
University California, San Diego – Data on the Web
http://3stages.org/idata/
Most research-driven universities have similar listings via Data Library webpages
14. Location & hours:
CISER Data Archive is located at 391 Pine Tree Road, Ithaca
CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in assistance
is not always available – so appointments are recommended
Contacts:
Tel.: (607) 255 4801
Email: ciser@cornell.edu
Notes de l'éditeur
Data, documentation and associated files (e.g. SAS, SPSS, Stata) are housed on the CISER file server. Files are downloaded from the catalog in ZIP compressed format..Cross-National Time Series data
UG – more general enquiries – summary statistics rather than raw data – what they ask for is often not what they really needPG – nature of enquiry more specific, more often again, summary statistics. May be raw data as PhD progresses. Often data collection may be involved, to be used in conjunction with other sources, visualized etcTeacher – teaching datasets or sample data. Or data subsets (NGO, IGO)Researcher – Have a better idea as to what data they need, usually raw data, need to identify variables, help with codebook / questionnaire. Use of statistical analysis packages, GIS
As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement
As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement