SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Open Data Convergence
WHITE PAPER

International Institute of Information Technology, Bangalore
Table of Contents

Contents
Abstract _____________________________________________________________ 1
Introduction __________________________________________________________ 2
Problem Definition_____________________________________________________ 3
Solution _____________________________________________________________ 4
Benefits _____________________________________________________________ 7
Conclusion __________________________________________________________ 8
Contact Information____________________________________________________ 9
Open Data Convergence | White Paper

Pg. 01

Abstract
Open Data initiative by data.gov.in has provide a great platform for sharing the
datasets belonging to 55+ Departments consist of around 5000 datasets till date.
These datasets are available in various format such as excel, xml, csv, etc.
But the consumer of these datasets are facing few major issue, first one is Data
Integration, currently there is no mean to check what all dataset are linked with each
other semantically. Second issue is Data Quality, there is no uniform data
representation format, no quality check metric to show missing/invalid data.
In this white paper we are proposing a concept termed as ‘Data Convergence’ which
provides unified view of data collected from different sources (or departments) in
different formats. To demonstrate this we built a software application which will
provide unified view by means of HTTP API in JSON/XML formats.
Main objective is to maintain loose coupling between underline storage structure and
consumer client.
Open Data Convergence | White Paper

Pg. 02

“Make each bit
count – by
creating network
of datasets”

Introduction
Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they
do not provide any API to their datasets. Some of portal like http://data.worldbank.org/,
http://data.gov provide API but they are limited to produce output for a particular
dataset only, yes they do have custom API builder but no way to merge/combine two
or more different dataset which are having one or more common attribute/dimension.
Just having a lot of data is useless unless we can derive some useful information from
it. For example from the current representation of open data, It is not obvious to
identify effect of weather on number of tourist visited as both of this dataset belongs to
two different department earth science and tourism respectively. And consumer may
not be aware of existence of dependent dataset which could have worked as catalyst
in his/her analysis. It would have been much better to have an API which will identify
how datasets are connected / linked on which attribute/dimension and produce a
unified view of multiple datasets.
Linked Dataset is a basement of complete Data Convergence system which consists
of large number of graphs where a node represents dataset and edge between them
denotes the existence of relationship. Linked Data helps to identify correlation among
different parameters in different dataset.
Common flaws such as different file format, lack of consistent representation, missing/
invalid values can also be solved to certain extent in preprocessing phase of data
convergence.
Pg. 03

Open Data Convergence | White Paper

Problem Definition
In most of Open data portal consumer has to browse through a collection of datasets
and select one dataset at a time. User has no mean to know relationship/linking
among different datasets. Unless one goes through all dataset manually he/she won’t
get a clear idea about dependency, connectivity among datasets on certain parameter
which could have served as more aggregated information.
There should be provision where consumer can provide what all data he/she wants, in
which format as well as quantity and system should be able to identify how to process
this query and display relevant information.
There should also be a mechanism for searching a dataset not only by its name /
department, but also by its content / field / attribute / dimension.
System should be capable of displaying all linked datasets for a given dataset, where
user can visualize a graph of linked datasets, also it must be able to converge this
datasets and produce a unified view.
Open Data Convergence | White Paper

Pg. 04

“Leveraging open
data platform
with the help
of converged
datasets”

Solution
Overview
Data convergence system at a high level takes three inputs from consumer viz., what
all datasets user wants to converge? In which format? How many records per page?
Later it processes the input and generates API which will give unified view of
datasets.

Description
A typical use case in Data convergence system is stated below
1. Data convergence system has a Smart API builder which provides two kind of
user interface to select a collection of datasets.
1.1 Navigational
a. Step by step navigation to select datasets provided with filter such as
country, state, district, etc.,
b. User has to first select department, then master dataset and finally choose
as many to select all data.
1.2 Search based
a. User can search by data, dataset name, dataset description, and dataset
field/attribute/dimension name in open data repository.
b. Select any number of datasets which are displayed in query result.
2. From a given collection of datasets system identifies connected components.
For example- Assume user selects dataset of Authorize Travel agency, tourism
statistics, rainfall, storm and temperature. Then two connected component are
identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
Pg. 05

Open Data Convergence | White Paper
temperature]} depending upon what relationship is specified by dataset
producer while uploading into open data repository
3. API is generated for each connected components which provide flexibility to
modify number of records per page, offset and output format json/xml
json/xml.
Sample API:
http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4&
offset=1&limit=10& format=json
4. Whenever user perform GET on API using key our data convergence system
will retrieve metadata of API query create a temporary unified view and return
result set in JSON/XML format
5. JSON/XML are standard data exchange format, they can be easily integrated
into any application.
Data Convergence system also provides visualization of all connected dataset up to
3rd level (extensible Few more functionalities like filtering attributes, records in unified
extensible).
view / result set can be added.

(fig. Data Convergence System Block Diagram)
Pg. 06

Open Data Convergence | White Paper
Following image show converged output of two different dataset Road Accident and
Person Injured for a state Arunachal Pradesh in two different format json and xml.

(fig. API response in JSON and XML format)
Open Data Convergence | White Paper

Pg. 07

Benefits
Data Convergence System will provide a single uniform HTTP API access to a unified
view of dataset. It also provides user a facility to visualize the network of connected
datasets.
Key Highlights
 Easy Access to converged data through API,
 Standard data exchange format (JSON and XML) are supported
 Flexible (select dimension as required)
 Unified view support real time data
 Loose coupling
 Notifications generated with change data capture (CDC)

(fig. Visualization of Datasets relationship)
Pg. 08

Open Data Convergence | White Paper

Conclusion
An attempt is been made to introduce a concept which will provide an unified view for
Datasets present in open data portal based on the concept of Data Convergence
which will leverage the way of accessing open data in this document.
Consumer of Open Data can now have a much more idea and knowledge of Datasets
and their relationships, eventually it will stimulate their data analysis process.
Open Data Convergence | White Paper

Pg. 09

Contact Information
Prof. Chandrashekhar Ramanathan
Tel: +91 80 4140 7777
rc@iiitb.ac.in

Bisen Vikrantsingh M.
Tel: +91 8792708719
Bisenvikrantsingh.mohansingh@iiitb.org

Institute
IIIT Bangalore
IIIT-Bangalore,26/C, Electronics
City, Hosur Road, Bangalore,
560100
Tel+91 80 4140 7777 / 2852 7627
http://www.iiitb.ac.in

Kodamasimham Pridhvi
Tel: +91 8123160887
Pridhvi.kodamasimham @iiitb.org

Contenu connexe

Tendances

Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
IJRAT
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modeling
sharmila_yusof
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1
JEAN-MICHEL LETENNIER
 

Tendances (18)

Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases-IEEE Transaction Paper 2013
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
B131626
B131626B131626
B131626
 
Database model
Database modelDatabase model
Database model
 
Annotating search results from web databases
Annotating search results from web databasesAnnotating search results from web databases
Annotating search results from web databases
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
AtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviewsAtomiDB Dr Ashis Banerjee reviews
AtomiDB Dr Ashis Banerjee reviews
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
At33264269
At33264269At33264269
At33264269
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Concepts
 
Ijetcas14 347
Ijetcas14 347Ijetcas14 347
Ijetcas14 347
 
The three level of data modeling
The three level of data modelingThe three level of data modeling
The three level of data modeling
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Week 1 Lab Directions
Week 1 Lab DirectionsWeek 1 Lab Directions
Week 1 Lab Directions
 
Vision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result RecordsVision Based Deep Web data Extraction on Nested Query Result Records
Vision Based Deep Web data Extraction on Nested Query Result Records
 
Identity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity AnalysisIdentity Resolution across Different Social Networks using Similarity Analysis
Identity Resolution across Different Social Networks using Similarity Analysis
 

En vedette

Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Material
shakerra_davis
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochure
guymarshall
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teaching
norazilamat
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week
13websterkm2
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and Temperature
Ra Jay
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจ
Minni Minnie
 

En vedette (20)

C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Pre-Assessment Material
Pre-Assessment MaterialPre-Assessment Material
Pre-Assessment Material
 
Adp l11 practice_template
Adp l11 practice_templateAdp l11 practice_template
Adp l11 practice_template
 
On Target Recruitment Brochure
On Target Recruitment BrochureOn Target Recruitment Brochure
On Target Recruitment Brochure
 
Gerencia de proyectos mapa conceptual
Gerencia de  proyectos mapa conceptualGerencia de  proyectos mapa conceptual
Gerencia de proyectos mapa conceptual
 
Interview tips
Interview tips Interview tips
Interview tips
 
Web applications tools for learning and teaching
Web applications tools for learning and teachingWeb applications tools for learning and teaching
Web applications tools for learning and teaching
 
C2i2e powerpoint
C2i2e powerpointC2i2e powerpoint
C2i2e powerpoint
 
Kenya trip – horizons week
Kenya trip – horizons weekKenya trip – horizons week
Kenya trip – horizons week
 
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile LeanImbunatateste-ti calitatile de leader - aplica tehnicile Lean
Imbunatateste-ti calitatile de leader - aplica tehnicile Lean
 
Implementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continuaImplementarea 5S - primul pas spre imbunatatirea continua
Implementarea 5S - primul pas spre imbunatatirea continua
 
Thermodynamics - Heat and Temperature
Thermodynamics - Heat and TemperatureThermodynamics - Heat and Temperature
Thermodynamics - Heat and Temperature
 
Mawlid
MawlidMawlid
Mawlid
 
Teaching Writing through Genre Studies
Teaching Writing through Genre StudiesTeaching Writing through Genre Studies
Teaching Writing through Genre Studies
 
5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement5S Implementation - The first step to continuous improvement
5S Implementation - The first step to continuous improvement
 
Innovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBAInnovación impulsada por la comunidad @ CityCampBA
Innovación impulsada por la comunidad @ CityCampBA
 
Comisión federal de electricidad
Comisión federal de electricidadComisión federal de electricidad
Comisión federal de electricidad
 
NASA PDR Technical Report
NASA PDR Technical ReportNASA PDR Technical Report
NASA PDR Technical Report
 
ศิลปินในดวงใจ
ศิลปินในดวงใจศิลปินในดวงใจ
ศิลปินในดวงใจ
 
The woodlands home sales rpt january 2016
The woodlands home sales rpt   january 2016The woodlands home sales rpt   january 2016
The woodlands home sales rpt january 2016
 

Similaire à Data Convergence White Paper

Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
data publica
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
KaashivInfoTech Company
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Editor IJAIEM
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Rinke Hoekstra
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
IIRindia
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
Editor IJCATR
 

Similaire à Data Convergence White Paper (20)

Cross Domain Data Fusion
Cross Domain Data FusionCross Domain Data Fusion
Cross Domain Data Fusion
 
Open data Websmatch
Open data WebsmatchOpen data Websmatch
Open data Websmatch
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 
Week 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data ModelsWeek 2 Characteristics & Benefits of a Database & Types of Data Models
Week 2 Characteristics & Benefits of a Database & Types of Data Models
 
Study on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its ApplicationsStudy on Theoretical Aspects of Virtual Data Integration and its Applications
Study on Theoretical Aspects of Virtual Data Integration and its Applications
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big DataAuthentic and Anonymous Data Sharing with Data Partitioning in Big Data
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
 
Implementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record LinkageImplementation of Matching Tree Technique for Online Record Linkage
Implementation of Matching Tree Technique for Online Record Linkage
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
H017124652
H017124652H017124652
H017124652
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
An effective adaptive approach for joining data in data
An effective adaptive approach for joining data in dataAn effective adaptive approach for joining data in data
An effective adaptive approach for joining data in data
 
Business Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdfBusiness Analytics 1 Module 2.pdf
Business Analytics 1 Module 2.pdf
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Chemread – a chemical informant
Chemread – a chemical informantChemread – a chemical informant
Chemread – a chemical informant
 
Cal Essay
Cal EssayCal Essay
Cal Essay
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Data Convergence White Paper

  • 1. Open Data Convergence WHITE PAPER International Institute of Information Technology, Bangalore
  • 2. Table of Contents Contents Abstract _____________________________________________________________ 1 Introduction __________________________________________________________ 2 Problem Definition_____________________________________________________ 3 Solution _____________________________________________________________ 4 Benefits _____________________________________________________________ 7 Conclusion __________________________________________________________ 8 Contact Information____________________________________________________ 9
  • 3. Open Data Convergence | White Paper Pg. 01 Abstract Open Data initiative by data.gov.in has provide a great platform for sharing the datasets belonging to 55+ Departments consist of around 5000 datasets till date. These datasets are available in various format such as excel, xml, csv, etc. But the consumer of these datasets are facing few major issue, first one is Data Integration, currently there is no mean to check what all dataset are linked with each other semantically. Second issue is Data Quality, there is no uniform data representation format, no quality check metric to show missing/invalid data. In this white paper we are proposing a concept termed as ‘Data Convergence’ which provides unified view of data collected from different sources (or departments) in different formats. To demonstrate this we built a software application which will provide unified view by means of HTTP API in JSON/XML formats. Main objective is to maintain loose coupling between underline storage structure and consumer client.
  • 4. Open Data Convergence | White Paper Pg. 02 “Make each bit count – by creating network of datasets” Introduction Major shortcoming in most of open data portal (such as http://data.gov.in ) is that they do not provide any API to their datasets. Some of portal like http://data.worldbank.org/, http://data.gov provide API but they are limited to produce output for a particular dataset only, yes they do have custom API builder but no way to merge/combine two or more different dataset which are having one or more common attribute/dimension. Just having a lot of data is useless unless we can derive some useful information from it. For example from the current representation of open data, It is not obvious to identify effect of weather on number of tourist visited as both of this dataset belongs to two different department earth science and tourism respectively. And consumer may not be aware of existence of dependent dataset which could have worked as catalyst in his/her analysis. It would have been much better to have an API which will identify how datasets are connected / linked on which attribute/dimension and produce a unified view of multiple datasets. Linked Dataset is a basement of complete Data Convergence system which consists of large number of graphs where a node represents dataset and edge between them denotes the existence of relationship. Linked Data helps to identify correlation among different parameters in different dataset. Common flaws such as different file format, lack of consistent representation, missing/ invalid values can also be solved to certain extent in preprocessing phase of data convergence.
  • 5. Pg. 03 Open Data Convergence | White Paper Problem Definition In most of Open data portal consumer has to browse through a collection of datasets and select one dataset at a time. User has no mean to know relationship/linking among different datasets. Unless one goes through all dataset manually he/she won’t get a clear idea about dependency, connectivity among datasets on certain parameter which could have served as more aggregated information. There should be provision where consumer can provide what all data he/she wants, in which format as well as quantity and system should be able to identify how to process this query and display relevant information. There should also be a mechanism for searching a dataset not only by its name / department, but also by its content / field / attribute / dimension. System should be capable of displaying all linked datasets for a given dataset, where user can visualize a graph of linked datasets, also it must be able to converge this datasets and produce a unified view.
  • 6. Open Data Convergence | White Paper Pg. 04 “Leveraging open data platform with the help of converged datasets” Solution Overview Data convergence system at a high level takes three inputs from consumer viz., what all datasets user wants to converge? In which format? How many records per page? Later it processes the input and generates API which will give unified view of datasets. Description A typical use case in Data convergence system is stated below 1. Data convergence system has a Smart API builder which provides two kind of user interface to select a collection of datasets. 1.1 Navigational a. Step by step navigation to select datasets provided with filter such as country, state, district, etc., b. User has to first select department, then master dataset and finally choose as many to select all data. 1.2 Search based a. User can search by data, dataset name, dataset description, and dataset field/attribute/dimension name in open data repository. b. Select any number of datasets which are displayed in query result. 2. From a given collection of datasets system identifies connected components. For example- Assume user selects dataset of Authorize Travel agency, tourism statistics, rainfall, storm and temperature. Then two connected component are identified viz.,{[ Authorize Travel agency, tourism statistics], [rainfall, storm,
  • 7. Pg. 05 Open Data Convergence | White Paper temperature]} depending upon what relationship is specified by dataset producer while uploading into open data repository 3. API is generated for each connected components which provide flexibility to modify number of records per page, offset and output format json/xml json/xml. Sample API: http://opendata.com/api.jsp?key=fbd7939d674997cdb4692d34de8633c4& offset=1&limit=10& format=json 4. Whenever user perform GET on API using key our data convergence system will retrieve metadata of API query create a temporary unified view and return result set in JSON/XML format 5. JSON/XML are standard data exchange format, they can be easily integrated into any application. Data Convergence system also provides visualization of all connected dataset up to 3rd level (extensible Few more functionalities like filtering attributes, records in unified extensible). view / result set can be added. (fig. Data Convergence System Block Diagram)
  • 8. Pg. 06 Open Data Convergence | White Paper Following image show converged output of two different dataset Road Accident and Person Injured for a state Arunachal Pradesh in two different format json and xml. (fig. API response in JSON and XML format)
  • 9. Open Data Convergence | White Paper Pg. 07 Benefits Data Convergence System will provide a single uniform HTTP API access to a unified view of dataset. It also provides user a facility to visualize the network of connected datasets. Key Highlights  Easy Access to converged data through API,  Standard data exchange format (JSON and XML) are supported  Flexible (select dimension as required)  Unified view support real time data  Loose coupling  Notifications generated with change data capture (CDC) (fig. Visualization of Datasets relationship)
  • 10. Pg. 08 Open Data Convergence | White Paper Conclusion An attempt is been made to introduce a concept which will provide an unified view for Datasets present in open data portal based on the concept of Data Convergence which will leverage the way of accessing open data in this document. Consumer of Open Data can now have a much more idea and knowledge of Datasets and their relationships, eventually it will stimulate their data analysis process.
  • 11. Open Data Convergence | White Paper Pg. 09 Contact Information Prof. Chandrashekhar Ramanathan Tel: +91 80 4140 7777 rc@iiitb.ac.in Bisen Vikrantsingh M. Tel: +91 8792708719 Bisenvikrantsingh.mohansingh@iiitb.org Institute IIIT Bangalore IIIT-Bangalore,26/C, Electronics City, Hosur Road, Bangalore, 560100 Tel+91 80 4140 7777 / 2852 7627 http://www.iiitb.ac.in Kodamasimham Pridhvi Tel: +91 8123160887 Pridhvi.kodamasimham @iiitb.org