Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Implementing the Business Catalog
in the Modern Enterprise:
Bridging Traditional EDW and
Hadoop with Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology...
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speakers
Andrew Ahn
Governance Director
Product Management
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Atlas Overview
• Near term roadmap
• Business Catalog
• Que...
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Overview
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
STRUCTURED
UNSTRUCTURED
Vision - Enterprise Data Governance Across P...
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ready for Trusted Governance
OPERATIONS SECURITY
GOVERNANCE
STORAGE
...
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DGI* Community becomes Apache Atlas
May
2015
Proto-type
Built
Apache...
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Metadata Services
• Cross- component dataset
lineage. ...
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Big Data Management Through Metadata
Management Scalability
Many tr...
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas High Level Architecture
Type System
Repository
Search ...
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Technical and Logical Metadata Exchange
Knowledge
Store
Atlas
REST ...
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Roadmap:
Summer 2016
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Expanded Native Connector: Da...
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Access Policy Driven by metadata
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Business Taxonomy UX Prototype
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We conduct open-ended user interviews so that we can learn more
abo...
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We test our prototype in InVision - a click through prototyping too...
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
After conducting interviews and usability testing we spend sometime...
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Usability Findings
• Understood the hierarchy and how to search for...
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Persona Findings
• Data Scientists typically have backgrounds in Ma...
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Principle Roles
• Data Steward – Curator, responsible for catalog v...
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UX proto-type: Taxonomy Navigation
Breadcrumbs for
taxonomy context...
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Taxonomy Creation
In place taxonomy
management
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Taxonomy Classification of Assets
Create new object
on the fly
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Object Details
Annotation for
policies and rules
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Object Lineage
Dataset Lineage
across components
Assign Tags
to ass...
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Comments
User comments for
collaboration
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Classify and Tag Assets
Keyword, DSL, and
Faceted search
Define aut...
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Hierarchical Taxonomy Creation
• Agile modeling: Model Conceptual...
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Availability:
- Tech Preview VMs: May 2016
- GA Release: Summer 2016
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions ?
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reference
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Online Resources
VM: https://s3.amazonaws.com/demo-drops.hortonwork...
Prochain SlideShare
Chargement dans…5
×

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

1 597 vues

Publié le

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

Publié dans : Technologie
  • Soyez le premier à commenter

Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas

  1. 1. Implementing the Business Catalog in the Modern Enterprise: Bridging Traditional EDW and Hadoop with Apache Atlas
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speakers Andrew Ahn Governance Director Product Management
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Atlas Overview • Near term roadmap • Business Catalog • Questions
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas Overview
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved STRUCTURED UNSTRUCTURED Vision - Enterprise Data Governance Across Platfroms TRADITIONAL RDBMS METADATA MPP APPLIANCES Project 1 Project 5 Project 4 Project 3 Metadata Project 6 DATA LAKE GOAL: Provide a common approach to data governance across all systems and data within the enterprise Transparent Governance standards and protocols must be clearly defined and available to all Reproducible Recreate the relevant data landscape at a point in time Auditable All relevant events and assets but be traceable with appropriate historical lineage Consistent Compliance practices must be consistent
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ready for Trusted Governance OPERATIONS SECURITY GOVERNANCE STORAGE STORAGE Machine Learning Batch StreamingInteractive Search GOVERNANCE YA R N D A T A O P E R A T I N G S Y S T E M Data Management along the entire data lifecycle with integrated provenance and lineage capability Modeling with Metadata enables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilities Interoperable Solutions across the Hadoop ecosystem, through a common metadata store
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DGI* Community becomes Apache Atlas May 2015 Proto-type Built Apache Atlas Incubation DGI group Kickoff Feb 2015 Dec 2014 July 2015 HDP 2.3 Foundation GA Release First kickoff to GA in 7 months Global Financial Company * DGI: Data Governance Initiative Faster & Safer Co-Development driven by customer use cases
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas: Metadata Services • Cross- component dataset lineage. Centralized location for all metadata inside HDP • Single Interface point for Metadata Exchange with platforms outside of HDP • Business Taxonomy based classification. Conceptual, Logical And Technical Apache Atlas Hive Ranger Falcon Sqoop Storm Kafka Spark NiFi
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Big Data Management Through Metadata Management Scalability Many traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ? Metadata Tools Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels Tags for Management, Discovery and Security Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Atlas High Level Architecture Type System Repository Search DSL Bridge Hive Storm Falcon Others REST API Graph DB Search Kafka Sqoop Connectors MessagingFramework
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Technical and Logical Metadata Exchange Knowledge Store Atlas REST API Structured Unstructured Files: XML / JSON 3rd Party Vendors Custom Reporter Non-Hadoop
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Near Term Roadmap: Summer 2016
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sqoop Teradata Connector Apache Kafka Expanded Native Connector: Dataset Lineage Custom Activity Reporter Metadata Repository RDBMS
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Access Policy Driven by metadata
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Business Taxonomy UX Prototype
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem. User Interviews
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups. Usability Testing
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved After conducting interviews and usability testing we spend sometime analyzing our findings and pulling out themes + insights. Synthesis + Analysis
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Usability Findings • Understood the hierarchy and how to search for data • Would generally search by file name or specific keyword • Would use tags for the purpose of searching • Would want to preview a subset of the data before analyzing the whole data set • Interested in the size of the data set • Concerned with how current and updated the information is • Would like the ability to contact a steward for more information regarding the data set • Would use an advanced boolean search if it were available • Viewing the popularity and access frequency would provide confidence • Would like to provide and view fellow user’s input
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Persona Findings • Data Scientists typically have backgrounds in Mathematics, Computer Science and Statistics • Responsible for analyzing and transforming data into more useful structures • Responsible for correcting missing values, typos and parsing issues • Typically fluent with SQL, Python and Hadoop tools • Require time upfront to understand and discover new data sets • Spend a significant amount of time reaching out to others with questions about data sets • Interact with Subject Matter Experts and Solution Architects • Noted that compliance is a big interest for enterprises and government • Felt Hadoop doesn’t support security and compliance very well • Find it difficult to see who is doing what in Hadoop
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Principle Roles • Data Steward – Curator, responsible for catalog verasity • Data Scientist – Analyst, primary consumer of Business Catalog • Administrator – Role management only • Data Engineer – Data ingress and egress, semantic data quality
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved UX proto-type: Taxonomy Navigation Breadcrumbs for taxonomy context path Contents at taxonomy context
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomy Creation In place taxonomy management
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomy Classification of Assets Create new object on the fly
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Object Details Annotation for policies and rules
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Object Lineage Dataset Lineage across components Assign Tags to assets
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User Comments User comments for collaboration
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Classify and Tag Assets Keyword, DSL, and Faceted search Define authoritive tags for the whole taxonomy
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • Hierarchical Taxonomy Creation • Agile modeling: Model Conceptual, Logical, Physical assets • Authorization: Steward / Analytic Roles • Tag management: Definition and assignment • DQ tab for profiling and sampling • User Comments Business Taxonomy UX Prototype What other information would you like to see included?
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Availability: - Tech Preview VMs: May 2016 - GA Release: Summer 2016
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions ?
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reference
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Online Resources VM: https://s3.amazonaws.com/demo-drops.hortonworks.com/HDP- Atlas-Ranger-TP.ova —> Download Public Preview VM Tutorial: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview Blog: http://hwxjojo.wpengine.com/blog/the-next-generation-of- hadoop-based-security-data-governance/ (this is giving an error, right now) Learn More: http://hortonworks.com/solutions/atlas-ranger- integration/

×