SlideShare une entreprise Scribd logo
1  sur  22
The rise of big data
governance: Insight on this
emerging trend from active
open source initiatives
April 18, 2018 – Dataworks Summit Berlin 2018
@ODPiOrg
TODAY’S SPEAKERS
2
John Mertic,
Director of Program
Management, Linux
Foundation
Maryna Strelchuk,
Information Architect
and Application
Developer at ING
@ODPiOrg
IMAGINE …
An enterprise data catalogue that lists all
of your data, where it is located, its origin
(lineage), owner, structure, meaning,
classification and quality
No matter where the data resides
Search
@ODPiOrg
New tools from any vendor connect to your data catalogue out of the box
No vendor lock-in and no expensive population of yet another proprietary, siloed
metadata repository
Search
Open Metadata Management & Governance
IMAGINE …
@ODPiOrg
Metadata is added automatically to the catalogue as new
data is created
Databases
Applications
Function
Function
Functions
Files
It’s possible if data-driven enterprises collaborate to build it
Let’s talk about how
IMAGINE …
@ODPiOrg
• The Metadata Problem
• Building an Open Ecosystem
• Benefits for Data Governance Professionals
AGENDA
@ODPiOrg
1.Use data outside the application
that created it
2.Find the right data sets
3.Automate governance processes
WHY DO WE NEED METADATA?
@ODPiOrg
• Many data platforms do not
have metadata support
• Proprietary tools support a
limited range of data sources
and governance actions
• Expensive efforts to create
an enterprise data catalogue
TODAY’S REALITY
@ODPiOrg
TODAY’S REALITY
@ODPiOrg
i. The maintenance of metadata must be automated
ii. Metadata management must become ubiquitous
iii. Metadata access must become open and remotely accessible
iv. Metadata should be used to drive the governance of data
v. Wherever possible, discovery and maintenance of metadata has to an integral
part of all tools that access, change and move information.
10
METADATA GOVERNANCE MANIFESTO
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Custom
Metadata
repository
Open Metadata Repository Service
Open Metadata Access Service Open
and
Unified
Metadata
WHAT NEEDS TO CHANGE
@ODPiOrg
Update to Apache Atlas
12
Automation
Capture of metadata from data platforms, data
movement engines and data protection engines.
Exception management and stewardship
Business Value
Specialized services for key data roles such as CDO,
Data Scientist, Developer, DevOps Operator, Asset
Owner, Applications
Connectivity
Metadata Highway offering open metadata exchange,
linking and federation between heterogeneous
metadata repositories.
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Microsoft
SSAS
Open Metadata Repository Service
OMAS Open and
Unified
Metadata
CURRENTLY IN DEVELOPMENT
Information
View
Asset CatalogSubject Area
Catalog
Search UI
Power BI
@ODPiOrg
OPEN SOURCE COLLABORTION
14
@ODPiOrg
Good metadata enables subject matter experts to
collaborate around the data
Locate the data they need, quickly and efficiently
Feeding back their knowledge about the data and the uses
they have made about it to help others and support
economic evaluation of data
CO-CREATION WITH PRACTITIONERS
@ODPiOrg
Your governance program if based on established
definitions
Allow a broader range of tools in your organization
Automated governance processes protect and
manage your data
Metadata-driven access control
Auditing, metering and monitoring
Quality control and exception management
Rights management
Your metadata offerings will deliver value faster as
they tap into metadata collected by other vendor’s
tools.
ODPi packages extend your metadata system’s
and tools’ capabilities
Conformance tests minimize your effort in being
compliant with key standards and regulations.
Customers have increased confidence in your
tools and services due to ODPi certification.
Data Governance Professionals Vendors
HOW THIS HELPS
@ODPiOrg
ROADMAP
March April May June July August September
Data Governance PMC meets weekly
• Focus of meetings are to develop the
open metadata usage guidelines, best
practices, connector descriptions
• Two threads every other week on the
PMC
• Thread 1 : Compliance tools and packs
• Thread 2 : Practitioner - Subject matter
experts
• Learn more at
https://lists.odpi.org/g/odpi-pmc-
datagovernance
Strata,
San Jose
Dataworks
Summit,
Berlin
IBM Think,
Las Vegas Webinar for
Offering
Managers
Webinar for
Developers
Privacy Pack
GA
Apache Atlas
1.0 GA
Releases upcoming
• Privacy pack due in June
(https://jira.odpi.org/browse/DG-3)
• Apache Atlas 1.0 GA to support
work due in late June
(https://cwiki.apache.org/confluenc
e/display/ATLAS/Open+Metadata+
and+Governance)
Future work
• Metadata tools and solutions will
integrate through the open
metadata interfaces
• Integrated solutions and products
with the open metadata interfaces
Dataworks
Summit,
San Jose
Apache Atlas
1.0 beta
Strata,
NYC
@ODPiOrg
18
ODPi – A NEUTRAL HOME FOR COLLABORATION
FOUNDATIONS ENABLE TRUSTED
INNOVATION
Successful Projects depend
on members, developers,
infrastructure to develop
technology, which is turned
into products that the
market will adopt.
Ecosystem
GET INVOLVED WITH ODPi DATA GOVERNANCE
Have your organization support ODPi
https://www.odpi.org/about/join
Visit ODPi website and join the quarterly newsletter
https://www.odpi.org/
Learn more about Data Governance PMC
https://www.odpi.org/projects/data-governance-pmc
Join the Data Governance PMC Mailing List
https://lists.odpi.org/g/odpi-pmc-datagovernance
@ODPiOrg
z
zz
z
z
z
z
Questions?
@ODPiOrg

Contenu connexe

Tendances

Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
DataWorks Summit
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
DataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
DataWorks Summit
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
DataWorks Summit
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMatic
DataWorks Summit
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
DataWorks Summit
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
DataWorks Summit
 

Tendances (20)

IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMatic
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
 

Similaire à The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives

The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
DataWorks Summit
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData Inc.
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 

Similaire à The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives (20)

The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi Egeria
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
MarkLogic Semantic use cases
MarkLogic Semantic use cases MarkLogic Semantic use cases
MarkLogic Semantic use cases
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 

Plus de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives

  • 1. The rise of big data governance: Insight on this emerging trend from active open source initiatives April 18, 2018 – Dataworks Summit Berlin 2018
  • 2. @ODPiOrg TODAY’S SPEAKERS 2 John Mertic, Director of Program Management, Linux Foundation Maryna Strelchuk, Information Architect and Application Developer at ING
  • 3. @ODPiOrg IMAGINE … An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality No matter where the data resides Search
  • 4. @ODPiOrg New tools from any vendor connect to your data catalogue out of the box No vendor lock-in and no expensive population of yet another proprietary, siloed metadata repository Search Open Metadata Management & Governance IMAGINE …
  • 5. @ODPiOrg Metadata is added automatically to the catalogue as new data is created Databases Applications Function Function Functions Files It’s possible if data-driven enterprises collaborate to build it Let’s talk about how IMAGINE …
  • 6. @ODPiOrg • The Metadata Problem • Building an Open Ecosystem • Benefits for Data Governance Professionals AGENDA
  • 7. @ODPiOrg 1.Use data outside the application that created it 2.Find the right data sets 3.Automate governance processes WHY DO WE NEED METADATA?
  • 8. @ODPiOrg • Many data platforms do not have metadata support • Proprietary tools support a limited range of data sources and governance actions • Expensive efforts to create an enterprise data catalogue TODAY’S REALITY
  • 10. @ODPiOrg i. The maintenance of metadata must be automated ii. Metadata management must become ubiquitous iii. Metadata access must become open and remotely accessible iv. Metadata should be used to drive the governance of data v. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information. 10 METADATA GOVERNANCE MANIFESTO
  • 11. @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Custom Metadata repository Open Metadata Repository Service Open Metadata Access Service Open and Unified Metadata WHAT NEEDS TO CHANGE
  • 12. @ODPiOrg Update to Apache Atlas 12 Automation Capture of metadata from data platforms, data movement engines and data protection engines. Exception management and stewardship Business Value Specialized services for key data roles such as CDO, Data Scientist, Developer, DevOps Operator, Asset Owner, Applications Connectivity Metadata Highway offering open metadata exchange, linking and federation between heterogeneous metadata repositories.
  • 13. @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Microsoft SSAS Open Metadata Repository Service OMAS Open and Unified Metadata CURRENTLY IN DEVELOPMENT Information View Asset CatalogSubject Area Catalog Search UI Power BI
  • 15. @ODPiOrg Good metadata enables subject matter experts to collaborate around the data Locate the data they need, quickly and efficiently Feeding back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data CO-CREATION WITH PRACTITIONERS
  • 16. @ODPiOrg Your governance program if based on established definitions Allow a broader range of tools in your organization Automated governance processes protect and manage your data Metadata-driven access control Auditing, metering and monitoring Quality control and exception management Rights management Your metadata offerings will deliver value faster as they tap into metadata collected by other vendor’s tools. ODPi packages extend your metadata system’s and tools’ capabilities Conformance tests minimize your effort in being compliant with key standards and regulations. Customers have increased confidence in your tools and services due to ODPi certification. Data Governance Professionals Vendors HOW THIS HELPS
  • 17. @ODPiOrg ROADMAP March April May June July August September Data Governance PMC meets weekly • Focus of meetings are to develop the open metadata usage guidelines, best practices, connector descriptions • Two threads every other week on the PMC • Thread 1 : Compliance tools and packs • Thread 2 : Practitioner - Subject matter experts • Learn more at https://lists.odpi.org/g/odpi-pmc- datagovernance Strata, San Jose Dataworks Summit, Berlin IBM Think, Las Vegas Webinar for Offering Managers Webinar for Developers Privacy Pack GA Apache Atlas 1.0 GA Releases upcoming • Privacy pack due in June (https://jira.odpi.org/browse/DG-3) • Apache Atlas 1.0 GA to support work due in late June (https://cwiki.apache.org/confluenc e/display/ATLAS/Open+Metadata+ and+Governance) Future work • Metadata tools and solutions will integrate through the open metadata interfaces • Integrated solutions and products with the open metadata interfaces Dataworks Summit, San Jose Apache Atlas 1.0 beta Strata, NYC
  • 18. @ODPiOrg 18 ODPi – A NEUTRAL HOME FOR COLLABORATION
  • 19. FOUNDATIONS ENABLE TRUSTED INNOVATION Successful Projects depend on members, developers, infrastructure to develop technology, which is turned into products that the market will adopt. Ecosystem
  • 20. GET INVOLVED WITH ODPi DATA GOVERNANCE Have your organization support ODPi https://www.odpi.org/about/join Visit ODPi website and join the quarterly newsletter https://www.odpi.org/ Learn more about Data Governance PMC https://www.odpi.org/projects/data-governance-pmc Join the Data Governance PMC Mailing List https://lists.odpi.org/g/odpi-pmc-datagovernance

Notes de l'éditeur

  1. Metadata enables data to be used outside of the application that created it. Analytics and decision making New business applications Reporting and compliance Metadata describes the format and content of data allowing people to judge which data set to use for a new project Structure Meaning Origin Valid values and quality Usage and ownership Regulations and classifications that apply <more> Metadata describes the business context and classification of data allowing automated governance processes to operate.
  2. Many data platforms do not have metadata support Proprietary tools support a range of data sources and governance actions No-one supports everything you need and assumes all tools come from their suite Each tool starts “empty” requiring effort to populate metadata Each tool operates as if it is the only tool No integration/interoperability of metadata repositories from different vendors Expensive efforts to create an enterprise data catalogue
  3. The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business.   Metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it. Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata. Metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
  4. Code development and standards development relationship
  5. ODPi, a Linux Foundation Project, can provide the platform for industry collaboration on shared technology In pursuit of its mission to make Apache Hadoop and associated Big Data solutions ready for enterprise-wide deployment, ODPi is focused on the biggest hurdles In 2016, the largest hurdles were cross-distro harmonization Today, a key blocker to broad-based production use of Big Data is Governance
  6. Mention that individuals can get involved.