SlideShare une entreprise Scribd logo
1  sur  18
CKAN
an open-source data management solution
for open data
Ivan Ermilov
AKSW Research Group
http://aksw.org
My experience with CKAN
● PublicData.eu portal
o Crowd-sourcing CSV2RDF mappings
● LODStats
o Version 1: crawling datahub.io (CKAN)
o Version 2: CKAN aggregator for data.gov,
publicdata.eu and datahub.io
o Version 2: Crawled all three portals and published
the data on datahub.io
CKAN IS NOT
a file storage!
Why CKAN?
● An open source platform
o Relatively easy to deploy
o Provides a rich set of features for free
● Data management
● Community involvement
Who use CKAN?
● All major open governments
o Canada (open.canada.ca): 244,238 datasets
o The U.S. (data.gov): 131,348 datasets
o Europe (publicdata.eu): 47,863 datasets
● And some other communities:
o Semantic Web community (datahub.io): 9,509
datasets
CKAN architecture
CKAN Pros/Cons
● Pros
o Organizes your data in structured way
o Have an extension to support DCAT (only for
datasets)
o Provides API to digest your data
● Cons
o The data model does not work for all use cases
(DBpedia)
o No strict guidelines for dataset publishing
CKAN functionality
● Publishing metadata
● Exposing metadata (API/front-end)
● Access control for users/organizations
● Additional functionality via plugins
CKAN extensions/plugins
● Data preview and visualization
● CKAN + DCAT
● Extension that adds the Disqus commenting
system to CKAN
● Simple API dataset hits counter
Full list is available at: http://extensions.ckan.org/
CKAN deployment
● From source
● OS package (e.g. as debian package)
● Docker image
Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html
CKAN Multi-Tier Deployment
CKAN API
● Well documented
● Covers everything you can do with the web
interface
o You can write your own web interface
● Various API clients
o ckanclient (python) - official
o Ruby, PHP, Java, Nodejs, Perl, R
https://github.com/ckan/ckan/wiki/CKAN-API-Clients
CKAN API methods
● Retrieving data
● Creating new data
● Update existing data
● Delete existing data
● Data is: packages, resources, groups, tags,
users etc.
http://docs.ckan.org/en/latest/api/index.html
CKAN API: Examples
● Get package list
o http://demo.ckan.org/api/3/action/package_list
o Disabled for data.gov
● Get one package
o http://demo.ckan.org/api/3/action/package_show?id=
adur_district_spending
● ckan.logic.action.get.organization_show
o api/3/action/organization_show?id=...
Use Case: LODStats
● Aggregate CKAN
instances via API
● Filter out only related
datasets
● Build an application on top
of it
Use Case: CSV2RDF
● Integrated with a particular CKAN instance
● Aggregates all CSV files from the instance
● Provides an interface for CSV2RDF conversion
Thank you for your attention!
Presented by Ivan Ermilov.
LinkedIn: https://www.linkedin.com/in/iermilov
Email: iermilov@informatik.uni-leipzig.de
Skype: earthquakesan

Contenu connexe

Tendances

How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
Neo4j
 

Tendances (20)

Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptxEncrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
Encrypting and Protecting Your Data in Neo4j(Jeff_Tallman).pptx
 
Google BigQueryのターゲットエンドポイントとしての利用
Google BigQueryのターゲットエンドポイントとしての利用Google BigQueryのターゲットエンドポイントとしての利用
Google BigQueryのターゲットエンドポイントとしての利用
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4jGraphs in Telecommunications - Jesus Barrasa, Neo4j
Graphs in Telecommunications - Jesus Barrasa, Neo4j
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricks
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
Graph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptxGraph Data Modeling Best Practices(Eric_Monk).pptx
Graph Data Modeling Best Practices(Eric_Monk).pptx
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
ontop: A tutorial
ontop: A tutorialontop: A tutorial
ontop: A tutorial
 
Zero Data Loss Recovery Applianceによるデータベース保護のアーキテクチャ
Zero Data Loss Recovery Applianceによるデータベース保護のアーキテクチャZero Data Loss Recovery Applianceによるデータベース保護のアーキテクチャ
Zero Data Loss Recovery Applianceによるデータベース保護のアーキテクチャ
 
Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach Training Week: Create a Knowledge Graph: A Simple ML Approach
Training Week: Create a Knowledge Graph: A Simple ML Approach
 

Similaire à CKAN as an open-source data management solution for open data

ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
Chengjen Lee
 
An API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM servicesAn API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM services
Joss Winn
 

Similaire à CKAN as an open-source data management solution for open data (20)

LOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKANLOD2 Webinar Series: publicdata.eu and CKAN
LOD2 Webinar Series: publicdata.eu and CKAN
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
ODN - Technical introduction of the platform
ODN - Technical introduction of the platformODN - Technical introduction of the platform
ODN - Technical introduction of the platform
 
Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)Cloud Native Landscape (CNCF and OCI)
Cloud Native Landscape (CNCF and OCI)
 
Architecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDCArchitecting the Future: Abstractions and Metadata - KCDC
Architecting the Future: Abstractions and Metadata - KCDC
 
Architecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLiningsArchitecting the Future: Abstractions and Metadata - STL SilverLinings
Architecting the Future: Abstractions and Metadata - STL SilverLinings
 
Docker introduction
Docker introductionDocker introduction
Docker introduction
 
EVA_Navigator_Presentation.ppt
EVA_Navigator_Presentation.pptEVA_Navigator_Presentation.ppt
EVA_Navigator_Presentation.ppt
 
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
Diving Through The Layers: Investigating runc, containerd, and the Docker eng...
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
 
Architecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStockArchitecting the Future: Abstractions and Metadata - CodeStock
Architecting the Future: Abstractions and Metadata - CodeStock
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
OpenStack and OpenDaylight, The Evolving Relationship in Cloud Networking: a ...
 
An API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM servicesAn API-first approach. Integrating ckan with RDM services
An API-first approach. Integrating ckan with RDM services
 
Open Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-MayOpen Data Node - Platform and Methodology - 2015-May
Open Data Node - Platform and Methodology - 2015-May
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 

Plus de AIMS (Agricultural Information Management Standards)

Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
AIMS (Agricultural Information Management Standards)
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
AIMS (Agricultural Information Management Standards)
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
AIMS (Agricultural Information Management Standards)
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
AIMS (Agricultural Information Management Standards)
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
AIMS (Agricultural Information Management Standards)
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
AIMS (Agricultural Information Management Standards)
 

Plus de AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

CKAN as an open-source data management solution for open data

  • 1. CKAN an open-source data management solution for open data Ivan Ermilov
  • 3. My experience with CKAN ● PublicData.eu portal o Crowd-sourcing CSV2RDF mappings ● LODStats o Version 1: crawling datahub.io (CKAN) o Version 2: CKAN aggregator for data.gov, publicdata.eu and datahub.io o Version 2: Crawled all three portals and published the data on datahub.io
  • 4. CKAN IS NOT a file storage!
  • 5. Why CKAN? ● An open source platform o Relatively easy to deploy o Provides a rich set of features for free ● Data management ● Community involvement
  • 6. Who use CKAN? ● All major open governments o Canada (open.canada.ca): 244,238 datasets o The U.S. (data.gov): 131,348 datasets o Europe (publicdata.eu): 47,863 datasets ● And some other communities: o Semantic Web community (datahub.io): 9,509 datasets
  • 8. CKAN Pros/Cons ● Pros o Organizes your data in structured way o Have an extension to support DCAT (only for datasets) o Provides API to digest your data ● Cons o The data model does not work for all use cases (DBpedia) o No strict guidelines for dataset publishing
  • 9. CKAN functionality ● Publishing metadata ● Exposing metadata (API/front-end) ● Access control for users/organizations ● Additional functionality via plugins
  • 10. CKAN extensions/plugins ● Data preview and visualization ● CKAN + DCAT ● Extension that adds the Disqus commenting system to CKAN ● Simple API dataset hits counter Full list is available at: http://extensions.ckan.org/
  • 11. CKAN deployment ● From source ● OS package (e.g. as debian package) ● Docker image Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html
  • 13. CKAN API ● Well documented ● Covers everything you can do with the web interface o You can write your own web interface ● Various API clients o ckanclient (python) - official o Ruby, PHP, Java, Nodejs, Perl, R https://github.com/ckan/ckan/wiki/CKAN-API-Clients
  • 14. CKAN API methods ● Retrieving data ● Creating new data ● Update existing data ● Delete existing data ● Data is: packages, resources, groups, tags, users etc. http://docs.ckan.org/en/latest/api/index.html
  • 15. CKAN API: Examples ● Get package list o http://demo.ckan.org/api/3/action/package_list o Disabled for data.gov ● Get one package o http://demo.ckan.org/api/3/action/package_show?id= adur_district_spending ● ckan.logic.action.get.organization_show o api/3/action/organization_show?id=...
  • 16. Use Case: LODStats ● Aggregate CKAN instances via API ● Filter out only related datasets ● Build an application on top of it
  • 17. Use Case: CSV2RDF ● Integrated with a particular CKAN instance ● Aggregates all CSV files from the instance ● Provides an interface for CSV2RDF conversion
  • 18. Thank you for your attention! Presented by Ivan Ermilov. LinkedIn: https://www.linkedin.com/in/iermilov Email: iermilov@informatik.uni-leipzig.de Skype: earthquakesan

Notes de l'éditeur

  1. What is CKAN? In two words. Who am I? -) PhD student @AKSW, University of Leipzig URZ (university data center) I hope, the presentation will be interesting for all of you and I’m looking forward to discussion.
  2. I want to briefly introduce our research group. We are relatively big, having 40+ PhD students and research assistants. Our group is divided in subgroups working on different topics, as you can see from the group roster, such as “Semantic Abstraction”, “Emergent Semantics”, “Machine Learning” etc.
  3. Projects started in LOD2 project
  4. The common misconception about CKAN is that it can store files for you. It can be extended to store files, indeed. But initially it dedicated to store METADATA, not data itself.
  5. Open source Open source solutions offer quite scarce documentation in general and even a small deviation from a typical scenario requires a specialist to be involved. In most of the cases it can be resolved through the mailing lists of a project. The customization of CKAN instance (if plugins are not available) requires a programmer to be involved. Data management CKAN enables organization and individuals to publish metadata about their datasets through an interface on a web front-end. This is an easy task, which does not require much effort. Community involvement CKAN has two main subdivisions for users: individual users and organizations. For govermental portals registration process is closed usually, because only governmental offices should be able to publish the data. For registered users it is possible to comment on the datasets as well as receive updates via various interfaces (more about it later).
  6. CKAN is adopted by all the major open governmental portals (for instance, data.gov was previously running on Socrata data platform). Why? Because of the reasons I mentioned before. What is also important, that CKAN supports multi-tier architecture, where local CKAN instances (for instance, for cities) can be aggregated on the regional CKAN instance. I will have an example of publicdata.eu portal to show how it can be achieved.
  7. On this slide I depicted a general overview of the CKAN architecture. As any web application, it consists of a back-end and a front-end. Organization example: We @AKSW group have an organization created at datahub.io portal, where we publish our datasets (to support dissemination).
  8. CKAN has a flexible architecture, where new functionality can be added via extensions. We’ve already seen what CKAN provides out-of-the-box Simple API dataset hits counter: Store a counter for calls to the “show” API command for a given dataset. CKAN + DCAT exposes dataset information as RDF. All the packages/resources fields are mapped to the dcat RDF vocabulary, which has a status of W3C recommendation.
  9. CKAN is relatively easy to deploy. The most complex installation is one from the source, it requires manual installation of ckan itself into the virual python environment and setup of apache solr (full text index, uses lucene). It is totally necessary to install CKAN from source only in case, if you want to write your own extension or modify the source code for some reason. The second option, that is installation from the operation system package, can be a good option if you wish to run only one CKAN per server (or virtual machine). The drawbacks for packages is that they are not very well maintained or in other words you will have to wait for a long time for it to be updated. The third option is relatively new and by far is the most suitable for large scale deployment. Or if you need several CKAN instances per server/VM. Docker image is assembled from the source code and the last image is available on docker hub. If it’s not available, you can compile it yourself. The overhead here is a person, who can work with docker. -) The environment we prefer at AKSW is Ubuntu Server last long-term support version.
  10. PublicData.eu is an initiative to make a one-stop portal for data in Europe. Aggregation was not a part of initial CKAN functionality. The special harvest extension was developed for this purpose. Therefore local governments can deploy their own CKAN instance and then they can be aggregated.
  11. You need a good API for your metadata to support the creation of cool applications on top of the data.