SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Copyright Synaltic 2015
Manage tracability with
Apache Atlas,
a flexible metadata repository
Charly Clairmont
Synaltic
@egwada
cclairmont@synaltic.fr
http://synaltic.fr
Copyright Synaltic 2015
More than ten years experience in IT mainly in BI
Cofounder of Altic, now Synaltic
Cofounder of the Hadoop User Groupe France
Belives in Open Source to help enterprises to create value
Helps open source projects to be known
via meetups and conference
Charly Clairmont
2
Copyright Synaltic 2015
An integrator company mainly focused in Data Management
Founded in 2004, Synaltic is the merge of two companies Synotis and
Altic
25 specialists in Data Management
A Swiss subsidiary, installed in Lausanne
Our values
●
Commitment
●
Expertise
●
Loyalty
Synaltic
3
R&D
Training
SupportProject
Expertise
Data
Intelligence
Data
Platform
Data
Governance
Data
Exchange
SYNALTIC
Copyright Synaltic 2015
What about your Data ?
4
Do you know where is your data ?
Do you know who is responsible of this specific
datasets ?
Do you know from which application or task this entity
was modified last friday ?
Copyright Synaltic 2015
Enterprise Data Governance
Provide a common approach to
data governance across all
systems and data within the
organization
– Transparent
– Reproductible
– Auditable
– Consistent
Copyright Synaltic 2015
Enterprise Data Governance, in Hadoop
No specific way to address this
requirement
– Each project proposes its
own way to resolve data
governance
– No integration with some
existing entreprise
frameworks for data
governance
Copyright Synaltic 2015
Apache Atlas
Data classification
Metadata Exchange
Centralized Auditing
Search & Lineage
Security & Policy engine
Copyright Synaltic 2015
Apache Atlas, Overview
Data Classification
●
Taxonomy business-oriented annotations
●
Relationships between data sets and underlying elements
including source, target, and derivation processes
●
Export metadata to third-party systems
Centralized Auditing
●
Security access information for every application, process
●
Operational information for execution, steps, and activities
Search & Lineage (Browse)
●
Navigation paths to explore the data classification and
audit information
●
Text-based search to locate what is relevant
●
Visualization of data set lineage
Security & Policy Engine
●
Compliance policy at runtime based on data classification
schemes
●
Advanced definition of policies for preventing data
derivation
Copyright Synaltic 2015
Apache Atlas, Knowledge Store
Knowledge store categorized with appropriate
business-oriented taxonomy
●
Data sets & objects
●
Tables / Columns
●
Logical context
●
Source, destination
Support exchange of metadata between foundation
components and third-party
applications/governance tools
Tech:
Titan with Apache HBase
Copyright Synaltic 2015
Apache Atlas, Data Lifecycle Management
Provenance
Multi-cluster replication
Data set retention/eviction
Late data handling
Automation
Tech:
●
Apache Falcon
Copyright Synaltic 2015
Apache Atlas, Audit Store
Historical repository for all
governance events
●
Security: Access Grant & Deny
●
Operational: Data Provenance &
Metrics
●
Indexed and Searchable
Tech:
●
YARN ATS, Apache HBase, Apache Hive, Solr,
ElasticSearch
(Pluggable)
Copyright Synaltic 2015
Apache Atlas, Security
Establish global security policies based
on data classification.
Copyright Synaltic 2015
Apache Atlas, Policy Engine
Runtime rationalization of policies rules
with respect to data asset combinations
and time. Fully extensible.
●
Metadata based
●
Geo based rules
●
Time-based rules
●
Column /Attribute Prohibitions
●
Preview: Hive Row and Column Masking
Tech:
●
Ranger
Copyright Synaltic 2015
Apache Atlas, RESTful interface
Extensible enterprise classification of
data assets, relationships and policies
organized in a meaningful way -- aligned
to business organization.
Supports exploration via user interface
Supports extensibility via API and CLI
exposure
Copyright Synaltic 2015
A use case
Our process
ImportImport
TwitterTwitter
HDFS :
Raw
data
HDFS :
Raw
data
Data source
RéférentielRéférentiel
Collect 
from 
twitter
Hive:
url
Hive:
url
Hive:
Hash tags
Hive:
Hash tags
Hive:
users
Hive:
users AnalyseAnalyse
Build
social network
Hive:
tweets
Hive:
tweets
Hive:
Social
network
Hive:
Social
network
Data Platform
Copyright Synaltic 2015
A use case
Search based
on tables
Copyright Synaltic 2015
A use case
Search based
on Services
Copyright Synaltic 2015
A use case
Table Metadata
Copyright Synaltic 2015
A use case
Lineage
Copyright Synaltic 2015
Thank you !

Contenu connexe

Tendances

Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017Vimal Sharma
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaDataWorks Summit/Hadoop Summit
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Olga Zinkevych
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopDataWorks Summit
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 

Tendances (20)

Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
 
The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 

En vedette

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Artem Ervits
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Webinaire Synaltic x Trifacta 27/10/2016
Webinaire Synaltic x Trifacta 27/10/2016Webinaire Synaltic x Trifacta 27/10/2016
Webinaire Synaltic x Trifacta 27/10/2016Synaltic Group
 
De l'idée à l'article, créer une viz en quelques étapes !
De l'idée à l'article, créer une viz en quelques étapes !De l'idée à l'article, créer une viz en quelques étapes !
De l'idée à l'article, créer une viz en quelques étapes !Synaltic Group
 
Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Seetharam Venkatesh
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
CDAP, la boîte à outil pour concevoir vos applications Big Data
CDAP,  la boîte à outil pour concevoir vos applications Big DataCDAP,  la boîte à outil pour concevoir vos applications Big Data
CDAP, la boîte à outil pour concevoir vos applications Big DataSynaltic Group
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지NAVER D2
 
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa영진 박
 
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !Talend Data Mapper : Simplifiez-vous l'intégration de SAP !
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !Synaltic Group
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellKoji Kawamura
 

En vedette (20)

Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreadingApache Atlasの現状とデータガバナンス事例 #hadoopreading
Apache Atlasの現状とデータガバナンス事例 #hadoopreading
 
Webinaire Synaltic x Trifacta 27/10/2016
Webinaire Synaltic x Trifacta 27/10/2016Webinaire Synaltic x Trifacta 27/10/2016
Webinaire Synaltic x Trifacta 27/10/2016
 
De l'idée à l'article, créer une viz en quelques étapes !
De l'idée à l'article, créer une viz en quelques étapes !De l'idée à l'article, créer une viz en quelques étapes !
De l'idée à l'article, créer une viz en quelques étapes !
 
Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013Apache Falcon at Hadoop Summit 2013
Apache Falcon at Hadoop Summit 2013
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
CDAP, la boîte à outil pour concevoir vos applications Big Data
CDAP,  la boîte à outil pour concevoir vos applications Big DataCDAP,  la boîte à outil pour concevoir vos applications Big Data
CDAP, la boîte à outil pour concevoir vos applications Big Data
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지
 
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !Talend Data Mapper : Simplifiez-vous l'intégration de SAP !
Talend Data Mapper : Simplifiez-vous l'intégration de SAP !
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 

Similaire à Manage tracability with Apache Atlas, a flexible metadata repository

VILT - Archiving and Decommissioning with OpenText InfoArchive
VILT - Archiving and Decommissioning with OpenText InfoArchiveVILT - Archiving and Decommissioning with OpenText InfoArchive
VILT - Archiving and Decommissioning with OpenText InfoArchiveVILT
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...DataWorks Summit
 
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorHow advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorMichael Haddad
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data DiscoveryHarald Erb
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...OW2
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancySathish K S
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa ClaraSandesh Rao
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on awsDon Gillis
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.DataArchiva
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnectorNigel Jones
 

Similaire à Manage tracability with Apache Atlas, a flexible metadata repository (20)

VILT - Archiving and Decommissioning with OpenText InfoArchive
VILT - Archiving and Decommissioning with OpenText InfoArchiveVILT - Archiving and Decommissioning with OpenText InfoArchive
VILT - Archiving and Decommissioning with OpenText InfoArchive
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
How advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sectorHow advanced analytics is impacting the banking sector
How advanced analytics is impacting the banking sector
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Data governance datalakes_multitenancy
Data governance datalakes_multitenancyData governance datalakes_multitenancy
Data governance datalakes_multitenancy
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Data meets AI - AICUG - Santa Clara
Data meets AI  - AICUG - Santa ClaraData meets AI  - AICUG - Santa Clara
Data meets AI - AICUG - Santa Clara
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on aws
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.How to create a successful data archiving strategy for your Salesforce Org.
How to create a successful data archiving strategy for your Salesforce Org.
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 

Dernier

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 

Dernier (20)

Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 

Manage tracability with Apache Atlas, a flexible metadata repository

  • 1. Copyright Synaltic 2015 Manage tracability with Apache Atlas, a flexible metadata repository Charly Clairmont Synaltic @egwada cclairmont@synaltic.fr http://synaltic.fr
  • 2. Copyright Synaltic 2015 More than ten years experience in IT mainly in BI Cofounder of Altic, now Synaltic Cofounder of the Hadoop User Groupe France Belives in Open Source to help enterprises to create value Helps open source projects to be known via meetups and conference Charly Clairmont 2
  • 3. Copyright Synaltic 2015 An integrator company mainly focused in Data Management Founded in 2004, Synaltic is the merge of two companies Synotis and Altic 25 specialists in Data Management A Swiss subsidiary, installed in Lausanne Our values ● Commitment ● Expertise ● Loyalty Synaltic 3 R&D Training SupportProject Expertise Data Intelligence Data Platform Data Governance Data Exchange SYNALTIC
  • 4. Copyright Synaltic 2015 What about your Data ? 4 Do you know where is your data ? Do you know who is responsible of this specific datasets ? Do you know from which application or task this entity was modified last friday ?
  • 5. Copyright Synaltic 2015 Enterprise Data Governance Provide a common approach to data governance across all systems and data within the organization – Transparent – Reproductible – Auditable – Consistent
  • 6. Copyright Synaltic 2015 Enterprise Data Governance, in Hadoop No specific way to address this requirement – Each project proposes its own way to resolve data governance – No integration with some existing entreprise frameworks for data governance
  • 7. Copyright Synaltic 2015 Apache Atlas Data classification Metadata Exchange Centralized Auditing Search & Lineage Security & Policy engine
  • 8. Copyright Synaltic 2015 Apache Atlas, Overview Data Classification ● Taxonomy business-oriented annotations ● Relationships between data sets and underlying elements including source, target, and derivation processes ● Export metadata to third-party systems Centralized Auditing ● Security access information for every application, process ● Operational information for execution, steps, and activities Search & Lineage (Browse) ● Navigation paths to explore the data classification and audit information ● Text-based search to locate what is relevant ● Visualization of data set lineage Security & Policy Engine ● Compliance policy at runtime based on data classification schemes ● Advanced definition of policies for preventing data derivation
  • 9. Copyright Synaltic 2015 Apache Atlas, Knowledge Store Knowledge store categorized with appropriate business-oriented taxonomy ● Data sets & objects ● Tables / Columns ● Logical context ● Source, destination Support exchange of metadata between foundation components and third-party applications/governance tools Tech: Titan with Apache HBase
  • 10. Copyright Synaltic 2015 Apache Atlas, Data Lifecycle Management Provenance Multi-cluster replication Data set retention/eviction Late data handling Automation Tech: ● Apache Falcon
  • 11. Copyright Synaltic 2015 Apache Atlas, Audit Store Historical repository for all governance events ● Security: Access Grant & Deny ● Operational: Data Provenance & Metrics ● Indexed and Searchable Tech: ● YARN ATS, Apache HBase, Apache Hive, Solr, ElasticSearch (Pluggable)
  • 12. Copyright Synaltic 2015 Apache Atlas, Security Establish global security policies based on data classification.
  • 13. Copyright Synaltic 2015 Apache Atlas, Policy Engine Runtime rationalization of policies rules with respect to data asset combinations and time. Fully extensible. ● Metadata based ● Geo based rules ● Time-based rules ● Column /Attribute Prohibitions ● Preview: Hive Row and Column Masking Tech: ● Ranger
  • 14. Copyright Synaltic 2015 Apache Atlas, RESTful interface Extensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization. Supports exploration via user interface Supports extensibility via API and CLI exposure
  • 15. Copyright Synaltic 2015 A use case Our process ImportImport TwitterTwitter HDFS : Raw data HDFS : Raw data Data source RéférentielRéférentiel Collect  from  twitter Hive: url Hive: url Hive: Hash tags Hive: Hash tags Hive: users Hive: users AnalyseAnalyse Build social network Hive: tweets Hive: tweets Hive: Social network Hive: Social network Data Platform
  • 16. Copyright Synaltic 2015 A use case Search based on tables
  • 17. Copyright Synaltic 2015 A use case Search based on Services
  • 18. Copyright Synaltic 2015 A use case Table Metadata
  • 19. Copyright Synaltic 2015 A use case Lineage

Notes de l'éditeur

  1. <numéro>
  2. <numéro>
  3. <numéro>
  4. <numéro>
  5. <numéro>
  6. <numéro>
  7. <numéro>
  8. <numéro>
  9. <numéro>
  10. <numéro>
  11. <numéro>
  12. <numéro>
  13. <numéro>
  14. <numéro>
  15. <numéro>
  16. <numéro>
  17. <numéro>
  18. <numéro>
  19. <numéro>
  20. <numéro>