Presented at CASRAI 2013: Reconnect Big Data.
Appreciation to Amber Leahey, the metadata librarian at Scholars Portal, whose 2012 iASSIST slides were very useful in putting this together.
3. <odesi>
• An online data research tool developed between
2007 and 2009
• Jointly funded by the Ontario Council of
University Libraries (OCUL) and OntarioBuys
• Developed to serve the Ontario university
community, now expanding beyond the province
4. <odesi> in context
is managed by
which is a service of
which is governed by
21 Ontario
university
libraries
5. <odesi> goals
• Facilitate discovery, downloading, and analysis of
data products
• Create a tool that is useful to both experienced
and new researchers
6. <odesi>: where does the content
come from?
Confidential Microdata
available through the RDC
Statistics Canada
(data producers)
Public Use Microdata Files
(PUMFs)
available through the DLI
Other public products
available through
statcan.gc.ca
7. <odesi> : where does the content
come from?
ICPSR metadata
Public Use Microdata Files
(PUMFs)
Available through the DLI
Canadian Gallup Polls data
Other public products
Available through
statcan.gc.ca
Canadian Opinion Research
Archive (CORA) data
10. <odesi> in use
Broad questions:
•
“I want to write a paper on women in the workforce…”
11. <odesi> in use
Broad questions:
•
“I’m interested in exploring on-reserve housing issues.”
12. <odesi> in use
Testing a hypothesis
•
“How many Ontarians smoke today compared with 10 years ago?”
13. <odesi> in use
Testing a hypothesis
•
“How many Ontarians smoke today compared with 50 years ago?”
14. <odesi> highlights
• Metadata is bilingual and DDI-compliant
• Don’t need statistical software to run many
analyses
• Surveys also include all supplementary material
• New surveys added daily
15. MarkIt! program
• OCUL members (usually data librarians) apply for
funding
• Funds pay for student employees, who are
trained to mark up surveys using DDI 2 standards
• 2013-2014: Carleton, U of Ottawa, Queen’s and
McMaster are participating, as well as Scholars
Portal
17. MarkIt! program best practices
• Be flexible; always be ready to shift priorities
• Establish best practices and adhere to them
• Make QA and editing each others’ work the
norm (35% of datasets are marked up at more
than one school)
Good afternoon, my name is Jacqueline Whyte Appleby and I’m the Client Services Librarian, as well as the interim data and geospatial librarian, which means I both manage Odesi and work in teaching and research support, helping libraries and end users with Odesi implementation and useTalk about Odesi – a platform for finding and working with Data that is used by Ontario universitiesSpecial thanks to Amber Leahey whose presentation at iASSIST 2013 informed some of this presentation
Odesi is a too developed between 2007 and 2009….and I also want to acknowledge Jeff Moon, who was with this project from the beginning. Odesi was jointly funded by OCUL, which is the parent organization of Scholars Portal, and Ontario Buys, which is a government program we’ve had a lot of success withOdesi was developed specifically to support the Ontario university community, but we’ve now expanded beyond Ontario and there are schools in other provinces using the service.
So, just so the structure is clear here….Odesi is managed, on a day-to-day basis by Scholars Portal, by myself, by our programmers and metadata staffScholars Portal is a service of the Ontario Council of University Libraries, which is governed by the 21 Ontario university libraries.So staff at all of these universities work to make Scholars Portal services what they are, and it’s really the data librarians at many of these schools that pushed Odesi to realization, and who continue to build it
Odesi was created with the goal of facilitating discovery, downloading, and analysis of a range of data products To do this, it was important that Odesi was useful to both experienced and new researchers – so it needed to be sophisticated enough to allow for advanced searching and analysis, but it also needed to be friendly enough that an undergraduate with a question could play around with it and get something useful.
As I’ve alluded to, Odesi has a lot of content, and much of it comes from Statistics CanadaAs Jeff discussed, the RDC needs to be visited in person, it’s for people with very specific research needsAs Sylvie discussed, PUMFs are available through the DLI,This is data that you don’t need to go to the RDC to get, but it’s not available to everyone – you need to have signed the DLI license So we include many of the PUMFs, as well as a lot of supporting and related documentation that StatsCan publishes – that includes the codebooks, sometimes copies of the actual surveys themselves, reports based on the survey results. So we have about 3000 survey from this source
We also have data from other sources.The ICPSR – the interuniversity consortium for political and social research – houses a lot of excellent data, and most of our universities actually subscribe to it separately. So we’ve set up a script to run monthly and pull all metadata for ICPSR so that these surveys are also searchable in Odesi – the students will then be directed to the ICPSR website. We also host a large number of Canadian Gallup polls and the Canadian Opinion Research Archive (CORA) data – this is based at Queen’s – and these are really rich sources of social data, for students wondering what people thought about smoking, or the middle east, or the prime minister over time…they go back to the 70s (confirm).
Odesi has two pieces, and the first piece is the catalogue. You can search or browse for data in the catalogue, and you can do so at the series (Census of Canada) or study level (Census of Canada 2006), and using keywords. What’s really great about Odesi is that you can also search at the variable level – you can find particular questions and answers to questionsThe Odesi catalogue was built in-house using MarkLogic (about)
Once you’ve found data you want to explore, you’ll move into the repository. The repository is run on a platform called Nesstar (about) and it has this front end for users, and it also has a publisher’s backend, which is how we get all of this great metadata in there. It’s DDI compliantYou’re looking at a question fro the 2006 census on field of study…and you can see there’s the literal question asked and the breakdown of responses. And this is available for every question in almost every survey we have in Odesi. Users can run a cross tabulation on any variables that interest them right in the interface, or they can download a whole data set, or just part of one, in a number of formats, including SPSS, SAS, Stata, and CSV In other words it’s very easy to say: give me a file with the responses to every question by women who are over 50 (so cases), or I want to see everyone’s response to the question of how much exercise they get, and which province they live in, so I can compare across provinces.
DDI – data documentation Have 3300 data sets, many more recordsICPSR are pulled in as a script Stats Can is adding new data all the time – how to stay on top of that?
Metadata markup is used in our ML database to allow for searching at granular levels, for example if you wanted to know how many surveys had variables that asked about smoking you can search for this using odesi. The markup provided by the individuals doing the markup assist in performing better and more accurate searching.
-students grab the file off the Stats Can FTP server -markup the variables, and study level metadata (i.e. weighting, abstract, sampling procedures etc.)-means we can get data sets up quite quickly
Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.
Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.
Dataverse – one time deposits of legacy data, but also studies in process, with geographically dispersed researchers contributing to marking up data.We may also be in a good position to develop some guidelines for researchers doing all of their own depositing, or for librarians working to support them.