SlideShare une entreprise Scribd logo
1  sur  20
Retooling a Research Data Repository:
data.depositar.io
“Technology - Building Useful Tools”
ECAI at 20 Workshop in Conjunction with PNC 2017
November 9, 2017
NCKU, Tainan, Taiwan
莊庭瑞 Tyng-Ruey Chuang
黃韋菁 Andrea Wei-Ching Huang
李承錱 Cheng-Jen Lee
許煌鑫 Huang-Sin Syu
Institute of Information Science
Academia Sinica, Taipei, Taiwan
2
Outline
● Collaborative Research
● Software Tools
● Retooling a Research Data Repository
3
Collaborative Research
● Collaboration is the process of two or more
people or organizations working together to
realize or achieve something successfully. –
Wikipedia
● To do collaborative research, we should make
– the research project and
– the research data
open to project members (or even everyone).
4
Openness
● Libre
– can be used by people
● Digital
– can be used by machines and put online
● Raw
– can be modified and re-purposed
● Common (format & vocabulary)
– can be exchanged and interlinked
● Transparent
– (the process) can be fixed; meta-level
5
Openness Benefits Research
● Help disseminate research findings.
● Help reproduce and re-purpose research
results.
● Help encourage research collaborations.
6
Software Tools
(Of course, all are free and open source!)
7GNU MediaGoblin, CKAN, GitLab...
Tools Help Make Data Open
8
A Web-based Research Data
Repository
● Built with CKAN
– A free and open source data management
system
– For self-hosted publishing, storing, managing,
showing, and using data.
● Manage research datasets
9
Search and Discovery Data
● With free-text
● With filters
● With a given
spatial-temporal
extent
10
Data Visualization
11
Metadata
● Designed for cross-
disciplinary research
with spatial-temporal
information.
12
Example: Map Comparison
① Showing places extracted from
map of Tainan, Taiwan in 1924 (blue
place marks).
② Overlaying places in 1924 upon
1896 Rapid Survey Map in Tainan,
Taiwan.
③ Learning the fact that the Koxing
a Temple (延平郡王祠) in 1896 had
been changed to Koxinga Ancestral
Shrine (開山神社) in 1924 since Tai
wan was under Japanese rule.
13
Retooling a Research Data
Repository
● From Taijiang Research Data Repository (Since 2014)
– Taijiang.tw/en/
● To a general-purpose research data repository
– Data.depositar.io/en/
● Based on all the aforementioned functions
● With adjustments & enhancements
– Generalized and multilingual metadata
– Wikidata-powered keywords
– More fill-in snippets
– Latest CKAN goodies
14
Generalized and Multilingual
Metadata
● One set of simpler metadata fields for all kinds
of datasets, with three categories:
– Basic Information: title, description, data type...
– Descriptive Information: language, temporal &
spatial information, keywords...
– Management Information: license, author, created
time, organization, maintainer…
● Result: ~35% less metadata fields than
previous version
15
Generalized and Multilingual
Metadata
● Multilingual
metadata
16
Wikidata-powered Keywords
● Keywords: controlled vocabularies for tagging
datasets
● Adding keywords to a predefined list
– A never-ending process…
● Use Wikidata as data source
– 37M+ entries
– Multilingual
– Semantic relations enable data inference
● Ex. Tainan is part of Taiwan
– Placenames with coordinates and geonames.org
information
17
Wikidata-powered Keywords
1.Search and select keywords when creating a
new dataset
2. Keywords (as Wikidata IDs) are stored. Viewed
in English
3. Viewed in Chinese and other languages too!
18
More Fill-in Snippets
✔ A checkbox to open the dataset to organization
members only (default is to open to all).
✔ Auto-completion of maintainer information (with
name and email from logged-in account).
✔ Generate better dataset URLs from their titles
(e.g. titles in Chinese characters).
19
Latest CKAN Goodies
● Private datasets (which can only be seen by
organization members) are now included in the
dataset search results (for those who have
access).
● Separated site language translations from
CKAN core.
● Speed improvements for displaying a dataset.
20
Thank you!
data.depositar.io

Contenu connexe

Tendances

Tendances (20)

Iswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-finalIswc 2014-hammond-pasin-presentation-final
Iswc 2014-hammond-pasin-presentation-final
 
Week4
Week4Week4
Week4
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
 
co:op-READ-Convention Marburg - Sebastian Colutto
co:op-READ-Convention Marburg - Sebastian Coluttoco:op-READ-Convention Marburg - Sebastian Colutto
co:op-READ-Convention Marburg - Sebastian Colutto
 
WG5: A data wrangling experiment
WG5: A data wrangling experimentWG5: A data wrangling experiment
WG5: A data wrangling experiment
 
NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADT
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
 
ARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperabilityARIADNE Registry - towards interoperability
ARIADNE Registry - towards interoperability
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 

Similaire à Retooling a Research Data Repository: data.depositar.io

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
Flying solo: data librarians working outside (traditional) libraries
Flying solo: data librarians working outside (traditional) librariesFlying solo: data librarians working outside (traditional) libraries
Flying solo: data librarians working outside (traditional) libraries
Jane Frazier
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 

Similaire à Retooling a Research Data Repository: data.depositar.io (20)

Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]Day in the life of a data librarian [presentation for ANU 23Things group]
Day in the life of a data librarian [presentation for ANU 23Things group]
 
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research EnvironmentIFLA ARL Webinar Series: Research Ethics in an Open Research Environment
IFLA ARL Webinar Series: Research Ethics in an Open Research Environment
 
Flying solo: data librarians working outside (traditional) libraries
Flying solo: data librarians working outside (traditional) librariesFlying solo: data librarians working outside (traditional) libraries
Flying solo: data librarians working outside (traditional) libraries
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Research Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural HeritageResearch Data Management in GLAM: Managing Data for Cultural Heritage
Research Data Management in GLAM: Managing Data for Cultural Heritage
 
The University of Edinburgh Research Data Management Service Suite
The University of Edinburgh Research Data Management Service SuiteThe University of Edinburgh Research Data Management Service Suite
The University of Edinburgh Research Data Management Service Suite
 
Integrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science frameworkIntegrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science framework
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Assembling and Applying an Education Graph based on Learning Resources in Uni...
Assembling and Applying an Education Graph based on Learning Resources in Uni...Assembling and Applying an Education Graph based on Learning Resources in Uni...
Assembling and Applying an Education Graph based on Learning Resources in Uni...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data management
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 

Plus de Chengjen Lee

跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
Chengjen Lee
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
Chengjen Lee
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
Chengjen Lee
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)
Chengjen Lee
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
Chengjen Lee
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
Chengjen Lee
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
Chengjen Lee
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
Chengjen Lee
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
Chengjen Lee
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
Chengjen Lee
 

Plus de Chengjen Lee (17)

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
 
CKANCon 2016 & IODC16
CKANCon 2016 & IODC16CKANCon 2016 & IODC16
CKANCon 2016 & IODC16
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
 
CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
 
ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)ckan 2.0 Introduction (20140618 updated)
ckan 2.0 Introduction (20140618 updated)
 
ckan 2.0 Introduction (20140522 updated)
ckan 2.0 Introduction  (20140522 updated)ckan 2.0 Introduction  (20140522 updated)
ckan 2.0 Introduction (20140522 updated)
 
Report 140227
Report 140227Report 140227
Report 140227
 
Report 140213
Report 140213Report 140213
Report 140213
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Retooling a Research Data Repository: data.depositar.io

  • 1. Retooling a Research Data Repository: data.depositar.io “Technology - Building Useful Tools” ECAI at 20 Workshop in Conjunction with PNC 2017 November 9, 2017 NCKU, Tainan, Taiwan 莊庭瑞 Tyng-Ruey Chuang 黃韋菁 Andrea Wei-Ching Huang 李承錱 Cheng-Jen Lee 許煌鑫 Huang-Sin Syu Institute of Information Science Academia Sinica, Taipei, Taiwan
  • 2. 2 Outline ● Collaborative Research ● Software Tools ● Retooling a Research Data Repository
  • 3. 3 Collaborative Research ● Collaboration is the process of two or more people or organizations working together to realize or achieve something successfully. – Wikipedia ● To do collaborative research, we should make – the research project and – the research data open to project members (or even everyone).
  • 4. 4 Openness ● Libre – can be used by people ● Digital – can be used by machines and put online ● Raw – can be modified and re-purposed ● Common (format & vocabulary) – can be exchanged and interlinked ● Transparent – (the process) can be fixed; meta-level
  • 5. 5 Openness Benefits Research ● Help disseminate research findings. ● Help reproduce and re-purpose research results. ● Help encourage research collaborations.
  • 6. 6 Software Tools (Of course, all are free and open source!)
  • 7. 7GNU MediaGoblin, CKAN, GitLab... Tools Help Make Data Open
  • 8. 8 A Web-based Research Data Repository ● Built with CKAN – A free and open source data management system – For self-hosted publishing, storing, managing, showing, and using data. ● Manage research datasets
  • 9. 9 Search and Discovery Data ● With free-text ● With filters ● With a given spatial-temporal extent
  • 11. 11 Metadata ● Designed for cross- disciplinary research with spatial-temporal information.
  • 12. 12 Example: Map Comparison ① Showing places extracted from map of Tainan, Taiwan in 1924 (blue place marks). ② Overlaying places in 1924 upon 1896 Rapid Survey Map in Tainan, Taiwan. ③ Learning the fact that the Koxing a Temple (延平郡王祠) in 1896 had been changed to Koxinga Ancestral Shrine (開山神社) in 1924 since Tai wan was under Japanese rule.
  • 13. 13 Retooling a Research Data Repository ● From Taijiang Research Data Repository (Since 2014) – Taijiang.tw/en/ ● To a general-purpose research data repository – Data.depositar.io/en/ ● Based on all the aforementioned functions ● With adjustments & enhancements – Generalized and multilingual metadata – Wikidata-powered keywords – More fill-in snippets – Latest CKAN goodies
  • 14. 14 Generalized and Multilingual Metadata ● One set of simpler metadata fields for all kinds of datasets, with three categories: – Basic Information: title, description, data type... – Descriptive Information: language, temporal & spatial information, keywords... – Management Information: license, author, created time, organization, maintainer… ● Result: ~35% less metadata fields than previous version
  • 16. 16 Wikidata-powered Keywords ● Keywords: controlled vocabularies for tagging datasets ● Adding keywords to a predefined list – A never-ending process… ● Use Wikidata as data source – 37M+ entries – Multilingual – Semantic relations enable data inference ● Ex. Tainan is part of Taiwan – Placenames with coordinates and geonames.org information
  • 17. 17 Wikidata-powered Keywords 1.Search and select keywords when creating a new dataset 2. Keywords (as Wikidata IDs) are stored. Viewed in English 3. Viewed in Chinese and other languages too!
  • 18. 18 More Fill-in Snippets ✔ A checkbox to open the dataset to organization members only (default is to open to all). ✔ Auto-completion of maintainer information (with name and email from logged-in account). ✔ Generate better dataset URLs from their titles (e.g. titles in Chinese characters).
  • 19. 19 Latest CKAN Goodies ● Private datasets (which can only be seen by organization members) are now included in the dataset search results (for those who have access). ● Separated site language translations from CKAN core. ● Speed improvements for displaying a dataset.