SlideShare une entreprise Scribd logo
1  sur  18
Anzo:
The Data Catalog for A Modern Data Fabric
Greg West
Senior Presales Engineer
• Over 10 years experience with ETL,
Semantics and Graph Databases
• Technical Consulting and Pre-Sales
Architecture and Vision
• Cisco Systems and Cambridge
Semantics
• 25 years experience in ETL, BI, Analytics
and Semantics
• Strategic Marketing and Go-To-Market
Planning and Execution
• DataStage (IBM), Netezza (IBM), and
Podium Data (Qlik)
today’s speakers
Barbara
Petrocelli
VP Strategy and Presales
Field Operations
“Unprecedented levels of data
scale and distribution are
making it almost impossible
for organizations to effectively
exploit their data assets”
Source: How To Use Semantics to Drive the Business Value of Your Data, Gartner Group, Guido De Simoni, 27 Nov. 2018
ENTERPRISE DATA MANAGEMENT TODAY REQUIRES
A MODERN DATA DISCOVERY AND INTEGRATION LAYER
Modern Catalogs in the Data Fabric Stack
Data management
Metadata/catalog
Data security
Data governance
Data processing
Data quality
Data lineage
Policies
Global distributed platform, in-memory,
embedded, self-service, APIs
Data modeling, preparation, curation,
graph engine
Transformation, integration, cleansing
Hadoop
NoSQL
Spark
Data platform
- processing
Data lake
EDW/BDW
Ingestion, steaming, data movement
Cloud Data sources On-premises
Global data
access
Data
discovery
Data
orchestration
Data processing
and persistence
Data ingestion
and streaming
FORRESTER RESEARCH DATA FABRIC REFERENCE ARCHITECTURE
Catalogs provide a discovery and
integration layers of the Data Fabric
©2020 Cambridge Semantics Inc. All rights reserved.
Data Catalogs Over Time
1995 20052000 2010 2015 2020
DATA
WAREHOUSE
• Metadata
management
• Document data
• Control, quality,
and governance
DATA
LAKE
• Operationalize hadoop
• Enable data citizens
• Crowdsource SME
• Simplifying access to
complex siloed data
• Deep blending
• Business
meaning/context
©2020 Cambridge Semantics Inc. All rights reserved.
Visualize and
explore data in
graphs, charts, and
dashboards
Publish datasets for
use with desktop BI
and analytic tools
Map like elements
into common
business terms
Align onboarded
data using
mappings into a
business model
Onboard data
from source
systems
Metadata from
source systems
Semantic Data Catalog:
Catalog of metadata and data from source systems which has been blended and mapped
into a business-friendly model.
Simplifies data discovery
Business users can see and explore data from
many complex and diverse source systems through
the lens of a unified, business-oriented data model
Builds blended datasets
Blend related data into analytic ready data sets
on a self service on-demand basis, even when the
data originates in many separate systems and
formats
Accelerates data delivery
More business-people can request and get bespoke
datasets quickly, enabling greater use of data to
drive competitive advantage and digital
transformation
A modern data discovery and integration platform
for your enterprise data fabric.
Anzo lets business users find, connect, and blend
enterprise data into analytic ready datasets.
Map and Explore
Enterprise Data
Build Blended
Analytic-Ready
Datasets
Apply Enterprise-Ready
Data Management
3 Big Ideas
1. Anzo maps the physical/logical layer to the business layer in a
data collection. This makes data in the catalog understandable
in business terms.
1. Anzo goes beyond data cataloging to allow you to apply
integration and data quality in a single data management
process.
1. Anzo collects and connects. It collects metadata that
documents the data. It also connects to the data itself so you
can immediately use the data you find through the catalog.
Anzo makes data understandable by connecting the
logical layer with the business layer using semantics.
Logical Layer
Business Layer
“We are witnessing that data
catalogs are an important source
of technical and active.
Active metadata is best utilized
when organizations can share it
with data integration and data
quality tools to inform and, in
some cases, even automate
integration design.”
Source: Modern Data and Analytics Requirements Demand a Convergence of Data Management Capabilities, Gartner Group, Sept 11, 2019
“Changing requirements are driving demand for data quality
tools, data catalogs, metadata management solutions and
data integration tools in one comprehensive solution.”
Claim
ID
Process
Date
Subscriber
ID
44223 10/3/2015 ID-BA213
44224 10/7/2015 ID-234I2
… … …
How it works: Data onboarding
Graph data models flexibly connect and transform new data sources.
Patient
ID
Condition Drug
Name
BA213 Sleep Apnea Narcoleptol
CS289 Type II
Diabetes
Insulin
… …
Claims
On July 3, 2016, Patient BA213
experiencing headache and
nausea following 500mg dosage
of sleep aid therapeutic,
Narcoleptol.
On Site Doctor NoteElectronic Health Records
BA213
PATIENT ID
Drug
PRESCRIBED
Narcoleptol
BRAND NAME
Sleep
Apnea
CONDITION
Patient
Record
500mg
DOSAGE
ABOUT
Note
3/7/2016
headache
and nausea
EVENT
-.05
SENTIMENT SCORE
WHEN
10/3/2015
PROCESS DATE
Subscriber
SUBSCRIBER ID
ID-BA123
Claim
44223
CLAIM ID
ABOUT
• Patients
• Encounters
• Providers
• Medications
• Costs
• Care Plans
• Claims
• Etc.
Providers
Care
Plans
Patients
Costs
Inpatient
Claims
Carrier
Claims
Outpatient
Claims
Prescriptiom
Drug_Events
Beneficiary
Summary
BestPractiseLinks
careprog2
careprog1
Medications
Patient
Encounters
Observations
Conditions
Allergies
Patients
Procedures
Imaging
Studies
Immunizations
Care
Plans
care planscanonicalelectronic medical records claims
How it works: Business models
Semantic data models to capture and navigate data relationships
How it works: Data lineage
Metadata documents the end to end data journey
Source
System
Source
Metadata
Mapping: Source
to Semantic Model
Semantic
Model
Graph
Representation
Graph
Data Set
In-Memory Graph
Data Blending
Analytics
and Access
PHASE 1: METADATA ONLY
Initial steps build up metadata to describe the data source and their
connections, as well as optionally materialize the semantic model.
PHASE 2: DATA AND METADATA
These steps use data to materialize the semantic model and enable
further data blending, enhancement and delivery.
Step 1:
Discover and explore linked
data in a semantic model.
Select the data you want to
use from among all the data
in the model.
Step 2:
Automatically generate code
to query the data model,
select the data you want, and
subset it out.
Step 3:
New selected subset of the data
is made available for use in
various analytic and data
visualization tools
Export: convert from graph to rectangular
How it works: User experience
The catalog of metadata linked to the semantic model becomes a platform to explore,
select, prepare, and use data in a variety of analytic tools, applications and algorithms
FrameworkofDataGovernance,DataSecurity,andMetadata
PREFIX :
<http://cambridgesemantics.com/ontolo
gies/Customer_360_Ontological_Model#
>
INSERT{
GRAPH ${targetGraph}{
#Create the connection between the
individual and their credit report
?individual :p_has_credit_report ?doc.
}
}
${usingSources}
WHERE{
#Get every individual and there SSN
?individual a :Individual.
….
Generate an Infinite Set of Blended Data Sets from the Catalog
Apply rules and
relationships to link,
conform and harmonize
Ingestion of raw
data
Surface business ready
datasets for analysis and
machine learning
©2020 Cambridge Semantics Inc. All rights reserved.
Data Cataloging Problem
2 Main Questions:
• What data has my
enterprise collected?
• How do I connect
these data sources? Customer
Complaints
Insurance Policy DB
Life Insurance Claims DB
Data Scientist Prepared Files
Crimes DB Extract
Census Demographics
Extract
Data Architect
©2020 Cambridge Semantics Inc. All rights reserved.
Step 1: Create a Searchable Catalog of Data
Datasets are cataloged
and can be searched by
concepts or categorized
tags.
©2020 Cambridge Semantics Inc. All rights reserved.
Step 2: Connect Selected Datasets

Contenu connexe

Tendances

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Tendances (20)

Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 

Similaire à Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric

using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 

Similaire à Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric (20)

Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
Knowledge Graph Discussion: Foundational Capability for Data Fabric, Data Int...
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
To Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialTo Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is Essential
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data Management
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 

Plus de Cambridge Semantics

Plus de Cambridge Semantics (20)

Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep LearningRisk Analytics Using Knowledge Graphs / FIBO with Deep Learning
Risk Analytics Using Knowledge Graphs / FIBO with Deep Learning
 
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
Using Machine Teaching in Text Analysis: Case Study on Using Machine Teaching...
 
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
Graph-driven Data Integration: Accelerating and Automating Data Delivery for ...
 
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020Fireside Chat with Bloor Research: State of the Graph Database Market 2020
Fireside Chat with Bloor Research: State of the Graph Database Market 2020
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Introduction to RDF*
Introduction to RDF*Introduction to RDF*
Introduction to RDF*
 
AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101AnzoGraph DB - SPARQL 101
AnzoGraph DB - SPARQL 101
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Healthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common DataHealthcare and Life Sciences: Two Industries Separated by Common Data
Healthcare and Life Sciences: Two Industries Separated by Common Data
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Modern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail BankingModern Data Discovery and Integration in Retail Banking
Modern Data Discovery and Integration in Retail Banking
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Accelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study AnalyticsAccelerate Pharma R&D with Cross-Study Analytics
Accelerate Pharma R&D with Cross-Study Analytics
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Accelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data FabricAccelerate Digital Transformation with an Enterprise Big Data Fabric
Accelerate Digital Transformation with an Enterprise Big Data Fabric
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric

  • 1. Anzo: The Data Catalog for A Modern Data Fabric
  • 2. Greg West Senior Presales Engineer • Over 10 years experience with ETL, Semantics and Graph Databases • Technical Consulting and Pre-Sales Architecture and Vision • Cisco Systems and Cambridge Semantics • 25 years experience in ETL, BI, Analytics and Semantics • Strategic Marketing and Go-To-Market Planning and Execution • DataStage (IBM), Netezza (IBM), and Podium Data (Qlik) today’s speakers Barbara Petrocelli VP Strategy and Presales Field Operations
  • 3. “Unprecedented levels of data scale and distribution are making it almost impossible for organizations to effectively exploit their data assets” Source: How To Use Semantics to Drive the Business Value of Your Data, Gartner Group, Guido De Simoni, 27 Nov. 2018 ENTERPRISE DATA MANAGEMENT TODAY REQUIRES A MODERN DATA DISCOVERY AND INTEGRATION LAYER
  • 4. Modern Catalogs in the Data Fabric Stack Data management Metadata/catalog Data security Data governance Data processing Data quality Data lineage Policies Global distributed platform, in-memory, embedded, self-service, APIs Data modeling, preparation, curation, graph engine Transformation, integration, cleansing Hadoop NoSQL Spark Data platform - processing Data lake EDW/BDW Ingestion, steaming, data movement Cloud Data sources On-premises Global data access Data discovery Data orchestration Data processing and persistence Data ingestion and streaming FORRESTER RESEARCH DATA FABRIC REFERENCE ARCHITECTURE Catalogs provide a discovery and integration layers of the Data Fabric
  • 5. ©2020 Cambridge Semantics Inc. All rights reserved. Data Catalogs Over Time 1995 20052000 2010 2015 2020 DATA WAREHOUSE • Metadata management • Document data • Control, quality, and governance DATA LAKE • Operationalize hadoop • Enable data citizens • Crowdsource SME • Simplifying access to complex siloed data • Deep blending • Business meaning/context
  • 6. ©2020 Cambridge Semantics Inc. All rights reserved. Visualize and explore data in graphs, charts, and dashboards Publish datasets for use with desktop BI and analytic tools Map like elements into common business terms Align onboarded data using mappings into a business model Onboard data from source systems Metadata from source systems Semantic Data Catalog: Catalog of metadata and data from source systems which has been blended and mapped into a business-friendly model. Simplifies data discovery Business users can see and explore data from many complex and diverse source systems through the lens of a unified, business-oriented data model Builds blended datasets Blend related data into analytic ready data sets on a self service on-demand basis, even when the data originates in many separate systems and formats Accelerates data delivery More business-people can request and get bespoke datasets quickly, enabling greater use of data to drive competitive advantage and digital transformation
  • 7. A modern data discovery and integration platform for your enterprise data fabric. Anzo lets business users find, connect, and blend enterprise data into analytic ready datasets. Map and Explore Enterprise Data Build Blended Analytic-Ready Datasets Apply Enterprise-Ready Data Management
  • 8. 3 Big Ideas 1. Anzo maps the physical/logical layer to the business layer in a data collection. This makes data in the catalog understandable in business terms. 1. Anzo goes beyond data cataloging to allow you to apply integration and data quality in a single data management process. 1. Anzo collects and connects. It collects metadata that documents the data. It also connects to the data itself so you can immediately use the data you find through the catalog.
  • 9. Anzo makes data understandable by connecting the logical layer with the business layer using semantics. Logical Layer Business Layer
  • 10. “We are witnessing that data catalogs are an important source of technical and active. Active metadata is best utilized when organizations can share it with data integration and data quality tools to inform and, in some cases, even automate integration design.” Source: Modern Data and Analytics Requirements Demand a Convergence of Data Management Capabilities, Gartner Group, Sept 11, 2019 “Changing requirements are driving demand for data quality tools, data catalogs, metadata management solutions and data integration tools in one comprehensive solution.”
  • 11. Claim ID Process Date Subscriber ID 44223 10/3/2015 ID-BA213 44224 10/7/2015 ID-234I2 … … … How it works: Data onboarding Graph data models flexibly connect and transform new data sources. Patient ID Condition Drug Name BA213 Sleep Apnea Narcoleptol CS289 Type II Diabetes Insulin … … Claims On July 3, 2016, Patient BA213 experiencing headache and nausea following 500mg dosage of sleep aid therapeutic, Narcoleptol. On Site Doctor NoteElectronic Health Records BA213 PATIENT ID Drug PRESCRIBED Narcoleptol BRAND NAME Sleep Apnea CONDITION Patient Record 500mg DOSAGE ABOUT Note 3/7/2016 headache and nausea EVENT -.05 SENTIMENT SCORE WHEN 10/3/2015 PROCESS DATE Subscriber SUBSCRIBER ID ID-BA123 Claim 44223 CLAIM ID ABOUT
  • 12. • Patients • Encounters • Providers • Medications • Costs • Care Plans • Claims • Etc. Providers Care Plans Patients Costs Inpatient Claims Carrier Claims Outpatient Claims Prescriptiom Drug_Events Beneficiary Summary BestPractiseLinks careprog2 careprog1 Medications Patient Encounters Observations Conditions Allergies Patients Procedures Imaging Studies Immunizations Care Plans care planscanonicalelectronic medical records claims How it works: Business models Semantic data models to capture and navigate data relationships
  • 13. How it works: Data lineage Metadata documents the end to end data journey Source System Source Metadata Mapping: Source to Semantic Model Semantic Model Graph Representation Graph Data Set In-Memory Graph Data Blending Analytics and Access PHASE 1: METADATA ONLY Initial steps build up metadata to describe the data source and their connections, as well as optionally materialize the semantic model. PHASE 2: DATA AND METADATA These steps use data to materialize the semantic model and enable further data blending, enhancement and delivery.
  • 14. Step 1: Discover and explore linked data in a semantic model. Select the data you want to use from among all the data in the model. Step 2: Automatically generate code to query the data model, select the data you want, and subset it out. Step 3: New selected subset of the data is made available for use in various analytic and data visualization tools Export: convert from graph to rectangular How it works: User experience The catalog of metadata linked to the semantic model becomes a platform to explore, select, prepare, and use data in a variety of analytic tools, applications and algorithms FrameworkofDataGovernance,DataSecurity,andMetadata PREFIX : <http://cambridgesemantics.com/ontolo gies/Customer_360_Ontological_Model# > INSERT{ GRAPH ${targetGraph}{ #Create the connection between the individual and their credit report ?individual :p_has_credit_report ?doc. } } ${usingSources} WHERE{ #Get every individual and there SSN ?individual a :Individual. ….
  • 15. Generate an Infinite Set of Blended Data Sets from the Catalog Apply rules and relationships to link, conform and harmonize Ingestion of raw data Surface business ready datasets for analysis and machine learning
  • 16. ©2020 Cambridge Semantics Inc. All rights reserved. Data Cataloging Problem 2 Main Questions: • What data has my enterprise collected? • How do I connect these data sources? Customer Complaints Insurance Policy DB Life Insurance Claims DB Data Scientist Prepared Files Crimes DB Extract Census Demographics Extract Data Architect
  • 17. ©2020 Cambridge Semantics Inc. All rights reserved. Step 1: Create a Searchable Catalog of Data Datasets are cataloged and can be searched by concepts or categorized tags.
  • 18. ©2020 Cambridge Semantics Inc. All rights reserved. Step 2: Connect Selected Datasets