SlideShare une entreprise Scribd logo
1  sur  9
Efficient & effective
data management for research projects
ILRI's Data Management
Platform
Carlos Quiros
June, 2015
• Back in 2011
• Current status
• How we did it
• Example of a process
• CKAN
• Key decisions made
• Technology and skills required
Contents
Back in 2011
Survey design
• Too many
• Not common indicators
• <> Variables
• <> Calculations
Survey implementation
• Too many tools
• No protocols
• Poor field data
cleaning
• No standard process
Storage
• In files
• Too many formats
• Too many versions
• Messy data cleaning
• No accountability
Availability & accessibility
• Nothing
Now
Survey design
• Too many
• Common indicators
• = Variables
• = Calculations
Storage
• Server database
• No formats
• One version
• Central cleaning
• Accountability
Availability & accessibility
• CKAN
• OData
Survey implementation
• 2 tools (ODK, CSPro)
• Protocols
• Field data cleaning
• Standard process
• Standard tools
How we went around it
Storage• Server database
• How to integrate ODK and CSPro?
• How to make it easy for scientists?
• How to manage user decentralization?
• Increase accountability?
Availability and accessibility• What to use? CKAN, Dataverse, etc.
 CKAN
• How to extend it to serve our purpose?
• How to integrate it with a server database?
• How to manage our metadata and vocabularies?
• How to do this?
• Data interoperability? RDF, OData, Gdata, etc?
 OData
• How to do it?
Survey implementation• Support only two tools
• Wrote protocols
• Wrote field data cleaning applications
• Wrote policies and implementation plans
• Wrote standard processes and tools for processing the data
• Worked closely with teams
• Created a central place for all the surveys
• Separated surveys in modules
• Worked on common indicators
• Management supports this process
Survey design (ongoing)
Example of a process
Testing &
Review (.xls)
Uploaded to
Formhub to test
account
Testing &
Review
(ODK Collect)
Ok
?
Field
Deployment
Uploaded to
Formhub to
project account
Data
collection
Upload data
to Formhub
End of
Data
Collecti
on
Sharing in
Data Portal
Data Cleaning from
server using MySQL for
Excel
Detailed breakdown of ILRI’s RMD workflow with ODK
Coding
.doc  .xls
Start
Draft tool
(.doc) Consultation
Final tool
(.doc)
Who
Code
s
RMG Staff
Project Team Member
Create MySQL
schema with
ODKToMySQL
MySQL
schema in
server
Convert data to
JSON with
FormhubToJSO
N
Data in
JSON
format
Upload JSON into
MySQL Schema
with
JSONToMySQL
Metadata
for portal
Initialize META in
schema
S = Scientist input / usage
S S S
S
S
S
S
ILRI’s data portal (CKAN) – http://data.ilri.org/portal/
• CKAN?
• The Open Knowledge Foundation
• Biggest deployed data portal software
• USA data portal
• UK data portal
• EU data portal
• Open Africa
• What do you get out of the box?
• Create datasets with minimum metadata
• Name, Abstract, Author, Date
• Tags into controlled vocabulary
• Powerful search engine
• Public / private access to datasets
• Able to attach resources (files) to a dataset
• Data interoperability through powerful API and RDF
• Arrange datasets into organization and topics
• What can you do by creating extensions
• Add new vocabularies (e.g., Language, Countries, etc.)
• Add new metadata fields
• Visualize different kinds of data (e.g., maps)
• Change theme (colors, logos, fonts, etc.)
• Create data hubs by harvesting other CKANs
• What ever else you want…..
Key decisions made
• Use open source for all RDM
Pros:
• Bigger pool of tools
• Flexible
• Innovation
Cons:
• Complex skill set
• Learning curve
• Relational Database Management System (RDMS)
Pros:
• Central place
• Auditing
Cons:
• DB management skill set
• Scientist have no idea on how to work with a RDMS
• CKAN
Pros:
• There is nothing better out there
• Flexible and extendible
Cons:
• Programming in several languages is required
• Learning curve
Technology and skills required
• Server
• Linux (Ubuntu server) [Linux administration]
• http://www.ubuntu.com/download/server
• Database server
• MySQL – An open source database system [DB administration, SQL]
• http://www.mysql.com/
• Data processing software [Linux, C++, Python]
• ODK – A toolset for collecting data on mobile devices.
• https://opendatakit.org/
• CSPro – A software for creating data entry applications.
• https://www.census.gov/population/international/software/cspro/
• Formhub – A software tools that collects ODK data.
• https://github.com/SEL-Columbia/formhub
• ODK Tools – A toolbox for processing ODK survey data into MySQL databases.
• https://github.com/ilri/odktools
• META – A toolbox for managing research data in MySQL databases.
• https://github.com/ilri/meta
• CSProTools – A toolbox for processing CSPro survey data into MySQL databases.
• https://github.com/ilri/csprotools
• Data sharing and interoperability
• CKAN – The open source data portal software. [Linux, Python, WebDev]
• http://ckan.org/
• http://docs.ckan.org/en/latest/maintaining/installing/index.html
• http://docs.ckan.org/en/latest/extensions/index.html
• Odata – Allow the creation and consumption of queryable and interoperable data
resources in a simple and standard way. [Linux, Java, WebDev]
• http://www.odata.org/
Thank you
Visit us @
http://data.ilri.org/

Contenu connexe

Tendances

The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEMJohn Hubbard
 
Document management #RWIRW
Document management #RWIRWDocument management #RWIRW
Document management #RWIRWAlison McNab
 
Data warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformationData warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformationVaibhav Khanna
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on EverythingDavid Phillips
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017BrandonWilhelm4
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics nullowaspmumbai
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarNilesh Shah
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb Project
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Business objects data services advanced
Business objects data services advancedBusiness objects data services advanced
Business objects data services advancedsaddagiri
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Using Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedInUsing Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedInSasha Ovsankin
 
Centralizing Storage without going off the Rails
Centralizing Storage without going off the RailsCentralizing Storage without going off the Rails
Centralizing Storage without going off the Railsinside-BigData.com
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 

Tendances (20)

The Elastic Stack as a SIEM
The Elastic Stack as a SIEMThe Elastic Stack as a SIEM
The Elastic Stack as a SIEM
 
Document management #RWIRW
Document management #RWIRWDocument management #RWIRW
Document management #RWIRW
 
Data warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformationData warehouse 11 introduction to data transformation
Data warehouse 11 introduction to data transformation
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
ELK in Security Analytics
ELK in Security Analytics ELK in Security Analytics
ELK in Security Analytics
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriarAdf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
Adf and ala design c sharp corner toronto chapter feb 2019 meetup nik shahriar
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
Are we there yet?
Are we there yet?Are we there yet?
Are we there yet?
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)GOKb and Refine (Kuali Days 2013)
GOKb and Refine (Kuali Days 2013)
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
The AmeriFlux Network Data Management System
The AmeriFlux Network Data Management SystemThe AmeriFlux Network Data Management System
The AmeriFlux Network Data Management System
 
Business objects data services advanced
Business objects data services advancedBusiness objects data services advanced
Business objects data services advanced
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Using Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedInUsing Scalding for Data Driven Product Development at LinkedIn
Using Scalding for Data Driven Product Development at LinkedIn
 
Centralizing Storage without going off the Rails
Centralizing Storage without going off the RailsCentralizing Storage without going off the Rails
Centralizing Storage without going off the Rails
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 

Similaire à Efficient & effective data management for research projects : ILRI's Data Management Platform

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIKent Brooks
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCAbhijit Kumar
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHostedbyConfluent
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Southwickc lampert lodlam_training
Southwickc lampert lodlam_trainingSouthwickc lampert lodlam_training
Southwickc lampert lodlam_trainingssouthwick
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsIDERA Software
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceGustavo Rene Antunez
 
ActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation RoadshowActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation RoadshowZia Consulting
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?C4Media
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"CTSI at UCSF
 
USG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using DrupalUSG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using DrupalEric Sembrat
 
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...Marek Maśko
 

Similaire à Efficient & effective data management for research projects : ILRI's Data Management Platform (20)

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Ellucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BIEllucian Live 2014 Presentation on Reporting and BI
Ellucian Live 2014 Presentation on Reporting and BI
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Southwickc lampert lodlam_training
Southwickc lampert lodlam_trainingSouthwickc lampert lodlam_training
Southwickc lampert lodlam_training
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud ServiceFast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
 
ActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation RoadshowActiveMigrate - ECM Renovation Roadshow
ActiveMigrate - ECM Renovation Roadshow
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
 
USG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using DrupalUSG Summit - September 2014 - Web Management using Drupal
USG Summit - September 2014 - Web Management using Drupal
 
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
SQLSaturday 664 - Troubleshoot SQL Server performance problems like a Microso...
 

Plus de CIARD Movement

Social Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/InformationSocial Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/InformationCIARD Movement
 
DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”CIARD Movement
 
University of Nairobi, Open Access Initiatives
University of Nairobi, Open Access InitiativesUniversity of Nairobi, Open Access Initiatives
University of Nairobi, Open Access InitiativesCIARD Movement
 
Knowledge Management at KEFRI
Knowledge Management at KEFRIKnowledge Management at KEFRI
Knowledge Management at KEFRICIARD Movement
 
Open Research Data – the KALRO experience
Open Research Data – the KALRO experienceOpen Research Data – the KALRO experience
Open Research Data – the KALRO experienceCIARD Movement
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open AccessCIARD Movement
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open AccessCIARD Movement
 
Open Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: ManagementOpen Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: ManagementCIARD Movement
 
Open Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: UniversitiesOpen Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: UniversitiesCIARD Movement
 
ICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEODICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEODCIARD Movement
 
Open Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building InitiativeOpen Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building InitiativeCIARD Movement
 
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...CIARD Movement
 
Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions CIARD Movement
 
Open Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural researchOpen Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural researchCIARD Movement
 
Introducing the GODAN Secretariat
Introducing the GODAN SecretariatIntroducing the GODAN Secretariat
Introducing the GODAN SecretariatCIARD Movement
 
Research Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRIResearch Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRICIARD Movement
 
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...CIARD Movement
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsCIARD Movement
 
Turning three thesauri into a Global Agricultural Concept Scheme
Turning three thesauri into a  Global Agricultural Concept SchemeTurning three thesauri into a  Global Agricultural Concept Scheme
Turning three thesauri into a Global Agricultural Concept SchemeCIARD Movement
 

Plus de CIARD Movement (20)

Social Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/InformationSocial Media in: Disseminating and Sharing Agriculture Data/Information
Social Media in: Disseminating and Sharing Agriculture Data/Information
 
DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”DSpace at ILRI : A semi-technical overview of “CGSpace”
DSpace at ILRI : A semi-technical overview of “CGSpace”
 
University of Nairobi, Open Access Initiatives
University of Nairobi, Open Access InitiativesUniversity of Nairobi, Open Access Initiatives
University of Nairobi, Open Access Initiatives
 
Knowledge Management at KEFRI
Knowledge Management at KEFRIKnowledge Management at KEFRI
Knowledge Management at KEFRI
 
Open Research Data – the KALRO experience
Open Research Data – the KALRO experienceOpen Research Data – the KALRO experience
Open Research Data – the KALRO experience
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open Access
 
JKUAT Case on Open Access
JKUAT Case on Open AccessJKUAT Case on Open Access
JKUAT Case on Open Access
 
Open Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: ManagementOpen Data and Open Science in Agriculture: Management
Open Data and Open Science in Agriculture: Management
 
Open Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: UniversitiesOpen Access Initiatives and Challenges in Kenya: Universities
Open Access Initiatives and Challenges in Kenya: Universities
 
ICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEODICT Centre of Excellence and Open Data –iCEOD
ICT Centre of Excellence and Open Data –iCEOD
 
Open Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building InitiativeOpen Data and Big Data Capacity Building Initiative
Open Data and Big Data Capacity Building Initiative
 
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
 
Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions Open Data and Open Science in Agriculture : Experiences and Opinions
Open Data and Open Science in Agriculture : Experiences and Opinions
 
Open Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural researchOpen Access, Open Data and Open Science in the context of agricultural research
Open Access, Open Data and Open Science in the context of agricultural research
 
Introducing the GODAN Secretariat
Introducing the GODAN SecretariatIntroducing the GODAN Secretariat
Introducing the GODAN Secretariat
 
Research Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRIResearch Data Management at International Food Policy Research Institute-IFPRI
Research Data Management at International Food Policy Research Institute-IFPRI
 
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
Turning three thesauri into a Global Agricultural Concept Scheme
Turning three thesauri into a  Global Agricultural Concept SchemeTurning three thesauri into a  Global Agricultural Concept Scheme
Turning three thesauri into a Global Agricultural Concept Scheme
 

Dernier

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 

Dernier (20)

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 

Efficient & effective data management for research projects : ILRI's Data Management Platform

  • 1. Efficient & effective data management for research projects ILRI's Data Management Platform Carlos Quiros June, 2015
  • 2. • Back in 2011 • Current status • How we did it • Example of a process • CKAN • Key decisions made • Technology and skills required Contents
  • 3. Back in 2011 Survey design • Too many • Not common indicators • <> Variables • <> Calculations Survey implementation • Too many tools • No protocols • Poor field data cleaning • No standard process Storage • In files • Too many formats • Too many versions • Messy data cleaning • No accountability Availability & accessibility • Nothing Now Survey design • Too many • Common indicators • = Variables • = Calculations Storage • Server database • No formats • One version • Central cleaning • Accountability Availability & accessibility • CKAN • OData Survey implementation • 2 tools (ODK, CSPro) • Protocols • Field data cleaning • Standard process • Standard tools
  • 4. How we went around it Storage• Server database • How to integrate ODK and CSPro? • How to make it easy for scientists? • How to manage user decentralization? • Increase accountability? Availability and accessibility• What to use? CKAN, Dataverse, etc.  CKAN • How to extend it to serve our purpose? • How to integrate it with a server database? • How to manage our metadata and vocabularies? • How to do this? • Data interoperability? RDF, OData, Gdata, etc?  OData • How to do it? Survey implementation• Support only two tools • Wrote protocols • Wrote field data cleaning applications • Wrote policies and implementation plans • Wrote standard processes and tools for processing the data • Worked closely with teams • Created a central place for all the surveys • Separated surveys in modules • Worked on common indicators • Management supports this process Survey design (ongoing)
  • 5. Example of a process Testing & Review (.xls) Uploaded to Formhub to test account Testing & Review (ODK Collect) Ok ? Field Deployment Uploaded to Formhub to project account Data collection Upload data to Formhub End of Data Collecti on Sharing in Data Portal Data Cleaning from server using MySQL for Excel Detailed breakdown of ILRI’s RMD workflow with ODK Coding .doc  .xls Start Draft tool (.doc) Consultation Final tool (.doc) Who Code s RMG Staff Project Team Member Create MySQL schema with ODKToMySQL MySQL schema in server Convert data to JSON with FormhubToJSO N Data in JSON format Upload JSON into MySQL Schema with JSONToMySQL Metadata for portal Initialize META in schema S = Scientist input / usage S S S S S S S
  • 6. ILRI’s data portal (CKAN) – http://data.ilri.org/portal/ • CKAN? • The Open Knowledge Foundation • Biggest deployed data portal software • USA data portal • UK data portal • EU data portal • Open Africa • What do you get out of the box? • Create datasets with minimum metadata • Name, Abstract, Author, Date • Tags into controlled vocabulary • Powerful search engine • Public / private access to datasets • Able to attach resources (files) to a dataset • Data interoperability through powerful API and RDF • Arrange datasets into organization and topics • What can you do by creating extensions • Add new vocabularies (e.g., Language, Countries, etc.) • Add new metadata fields • Visualize different kinds of data (e.g., maps) • Change theme (colors, logos, fonts, etc.) • Create data hubs by harvesting other CKANs • What ever else you want…..
  • 7. Key decisions made • Use open source for all RDM Pros: • Bigger pool of tools • Flexible • Innovation Cons: • Complex skill set • Learning curve • Relational Database Management System (RDMS) Pros: • Central place • Auditing Cons: • DB management skill set • Scientist have no idea on how to work with a RDMS • CKAN Pros: • There is nothing better out there • Flexible and extendible Cons: • Programming in several languages is required • Learning curve
  • 8. Technology and skills required • Server • Linux (Ubuntu server) [Linux administration] • http://www.ubuntu.com/download/server • Database server • MySQL – An open source database system [DB administration, SQL] • http://www.mysql.com/ • Data processing software [Linux, C++, Python] • ODK – A toolset for collecting data on mobile devices. • https://opendatakit.org/ • CSPro – A software for creating data entry applications. • https://www.census.gov/population/international/software/cspro/ • Formhub – A software tools that collects ODK data. • https://github.com/SEL-Columbia/formhub • ODK Tools – A toolbox for processing ODK survey data into MySQL databases. • https://github.com/ilri/odktools • META – A toolbox for managing research data in MySQL databases. • https://github.com/ilri/meta • CSProTools – A toolbox for processing CSPro survey data into MySQL databases. • https://github.com/ilri/csprotools • Data sharing and interoperability • CKAN – The open source data portal software. [Linux, Python, WebDev] • http://ckan.org/ • http://docs.ckan.org/en/latest/maintaining/installing/index.html • http://docs.ckan.org/en/latest/extensions/index.html • Odata – Allow the creation and consumption of queryable and interoperable data resources in a simple and standard way. [Linux, Java, WebDev] • http://www.odata.org/
  • 9. Thank you Visit us @ http://data.ilri.org/

Notes de l'éditeur

  1. 2
  2. 7