Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Modeling & Metadata for Graph Databases

1 817 vues

Publié le

Graph databases are seeing a spike in popularity as their value in leveraging large data sets for key areas such as fraud detection, marketing, and network optimization become increasingly apparent. With graph databases, it’s been said that ‘the data model and the metadata are the database’. What does this mean in a practical application, and how can this technology be optimized for maximum business value?

Publié dans : Technologie
  • Soyez le premier à commenter

Data Modeling & Metadata for Graph Databases

  1. 1. 5 MINUTE OVERVIEW STARDOG ENTERPRISE KNOWLEDGE GRAPH stardog.com
  2. 2. U N C O N N E C T E D D ATA I S A L I A B I L I T Y
  3. 3. E N T E R P R I S E S N E E D F L E X I B L E , R E U S A B L E D ATA O N D E M A N D , W I T H L E S S D I S R U P T I O N A N D O V E R H E A D
  4. 4. K N O W L E D G E G R A P H I S T H E A N S W E R F L E X I B L E R E U S A B L E A C C R E T I V E
  5. 5. K N O W L E D G E G R A P H = K N O W L E D G E T O O L K I T + G R A P H D B
  6. 6. W H AT ' S A K N O W L E D G E T O O L K I T ? V I RT U A L G R A P H S B U I L D K N O W L E D G E A C R O S S S I L O S B U S I N E S S L O G I C B U I L D S R E U S A B L E , L O G I C A L R E A S O N I N G I N T O T H E G R A P H M A C H I N E L E A R N I N G I N T E G R AT E S S TAT I S T I C A L R E A S O N I N G I N T E G R I T Y C O N S T R A I N T VA L I D AT I O N E M P O W E R S D ATA S TA N D A R D S
  7. 7. K N O W L E D G E = D ATA P L U S R E A S O N I N G FA C T C O U N T: 4 E X P L I C I T FA C T S Inferno Gareth Edwards Rogue One Felicity Jones Tom Hanks actor director actor actor
  8. 8. K N O W L E D G E = D ATA P L U S R E A S O N I N G actorOf inverseOf actor directorOf inverseOf director actorOf subPropertyOf workedOn directorOf subPropertyOf workedOn coworker propertyChain (workedOn [inverseOf workedOn]) coworker subPropertyOf connectedTo connectedTo a TransitiveProperty Inferno Gareth Edwards Rogue One Felicity Jones Tom Hanks actor director actor actor actorOf actorOf directorOf coworker connectedTo coworker connectedTo connectedTo , workedOn , workedOn , workedOn FA C T C O U N T: 1 5 E X P L I C I T / I M P L I C I T FA C T S B U S I N E S S L O G I C T H AT B E T T E R E X P L A I N S T H E D O M A I N
  9. 9. K N O W L E D G E G R A P H S C O N N E C T A L L D ATA C O N N E C T I N G A L L D ATA C H A N G E S E V E RY T H I N G
  10. 10. T H A N K Y O U A . J . C O O K , N O R T H A M E R I C A N S A L E S A J @ S TA R D O G . C O M
  11. 11. Data Modeling & Metadata for Graph Databases Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series July 27th, 2017
  12. 12. Global Data Strategy, Ltd. 2017 Donna Burbank Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi- faceted across consulting, product development, product management, brand strategy, marketing, and business leadership. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market. As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. She was on the review committee for the Object Management Group’s (OMG) Information Management Metamodel (IMM) and the Business Process Modeling Notation (BPMN). Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advices and gains insight on the latest BI and Analytics software in the market. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2 Follow on Twitter @donnaburbank Today’s hashtag: #LessonsDM
  13. 13. Global Data Strategy, Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 3 This Year’s Line Up
  14. 14. Global Data Strategy, Ltd. 2017 Word from our Sponsor 4 Stardog Enterprise Knowledge Graph www.stardog.com
  15. 15. Global Data Strategy, Ltd. 2017 Agenda • What is a Graph Database • Use Cases for Graph Databases • Data Modeling & Metadata for Graph Databases 5 What we’ll cover today
  16. 16. Global Data Strategy, Ltd. 2017 What is a Graph Database? • A graph database uses a set of nodes, edges, and properties to represent and store data. • With graph databases, the relationships between data points often matter more than the individual points themselves. In order to leverage those data relationships, your organization needs a database technology that stores • These relationships can help you discover new insights from your data. 6
  17. 17. Global Data Strategy, Ltd. 2017 Graph Database = Thing Relates to Thing 7
  18. 18. Global Data Strategy, Ltd. 2017 Graph Database = Thing Relates to Thing 8 Node Vertice Edge Relationship The more formal way of referring to “thing relates to thing” is “Nodes & Edges”, “Vertices & Relationships”, etc.
  19. 19. Global Data Strategy, Ltd. 2017 Graph Databases Mirror the Way We Think 9 Squirrel! I should go visit Mary I wonder how her brother John is doing? Is he still dating Stephanie? …In the mind, as in data, there are always random data points… Do they still have that house at the Lake? Riding their boats on the lake was great. Remember when John crashed the boat? Like my toy as a child. Graph databases can be intuitive to many, since they mirror the way the human brain typically thinks – through Association.
  20. 20. Global Data Strategy, Ltd. 2017 “Traditional” way of Looking at the World: Hierarchies • Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying biological systems. Kingdom Phylum Class Order Family Genus Species
  21. 21. Global Data Strategy, Ltd. 2017 “New” Way of Looking at the World - Emergence In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. - Wikipedia
  22. 22. Global Data Strategy, Ltd. 2017 Graph Databases Combine Flexibility w/ Structure & Meaning • In many ways, graph databases provide the “best of both worlds”. 12 Flexibility of the “New World” of Discovery & “Emergence” Structure & Meaning of the “Old World” through Ontologies+
  23. 23. Global Data Strategy, Ltd. 2017 It’s All About Relationships • In graph databases, relationships are first class constructs. • Rather ironically, relational databases lack relationships. • In relational databases, relationships are enforced through joins and constraints. • NoSQL (e.g. Key Value) databases are also weak at supporting relationships. 13 “A relational database isn’t about relationships, it’s about constraints.” – Karen Lopez Customer Account Is Owner Of <Customer> <Owner Of> <Account>
  24. 24. 14 Use Cases for Graph Databases
  25. 25. Global Data Strategy, Ltd. 2017 Social Networks 15 Donna Sad, Lonely Person who doesn’t like data Who are the cool kids? i.e. People linked with Donna
  26. 26. Global Data Strategy, Ltd. 2017 X Degrees of Separation – “The Bacon Number” • What’s Audrey Hepburn’s “Bacon Number”? i.e. degrees of separation/relation to actor Kevin Bacon • As always, metadata and data quality are important., i.e Which Audrey Hepburn? 16Courtesy of oracleofbacon.org
  27. 27. Global Data Strategy, Ltd. 2017 Fraud Detection in Online Transactions • Online transactions typically have certain identifiers, e.g. User ID, IP address, geo location, tracking cookie, credit card number, etc. • Graph patterns can help detect fraud, e.g. • The more interconnections exist among identifiers, the greater the cause for concern. • Typically they would be 1:1. • Some variations may occur, e.g. Multiple credit cards with one person. Families using same machine, etc. • Large and tightly-knit graphs are very strong indicators that fraud is taking place. • Triggers can be put into place so that these patterns are uncovered before they cause damage. 17 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 IP1 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17 Fraud? FamilyPersonal & Business Card
  28. 28. Global Data Strategy, Ltd. 2017 Recommendation Engines • Recommendation Engines are familiar to most of us who do any online shopping. • These engines can be powered by a graph database, e.g. • Capture a customer’s browsing behavior and demographics • Combine those with their buying history to provide relevant recommendations 18
  29. 29. Global Data Strategy, Ltd. 2017 Data Quality & Volume Matters • Recommendation engines are based on evaluating data sets. If those data sets are faulty or of poor quality, your results will be flawed. • Especially if the data sets are small 19
  30. 30. Global Data Strategy, Ltd. 2017 Master Data Management (MDM) • Master Data Management (MDM) is the practice of identifying, cleansing, storing & governance core data assets of the organization (e.g. customer, product, etc.) • There are many architectural approaches to MDM. Two are the following: 20 Centralized -- Commonly Relational Virtualized/Registry – Commonly Graph MDM Virtualization Layer • Core data stored in a common schema in a centralized “hub”. • Used as a common reference for operational systems, DW, etc. • Data remains in source systems. • Referenced through a common virtualization layer. BOTH require the same core foundation of data quality, parsing & matching, semantic meaning, data governance, etc. in order to be successful… and that’s usually the hardest stuff.
  31. 31. Global Data Strategy, Ltd. 2017 21 When you have a Hammer, everything looks like a nail i.e. Data Warehouses serve a particular purpose for aggregating & summarizing data. Not ideal for graph databases. Graph Databases for Data Warehousing
  32. 32. Global Data Strategy, Ltd. 2017 Data Warehousing & Enterprise Knowledge Graph 22 Data Warehouse …Show me Total Sales by Region and by Customer each month in 2017 Enterprise Knowledge Graph Relational & Dimensional data model Graph data model …Who are my most influential customers. (with the most connections)
  33. 33. Global Data Strategy, Ltd. 2017 Data Management & Ballroom Dancing “First you dance with yourself, then with your partner, then you dance with the room.” 23
  34. 34. Global Data Strategy, Ltd. 2017 An Enterprise Knowledge Graph Provides a Holistic View of the Organization through Relationships 24 “First you dance with yourself, then with your partner, then you dance with the room.” Customer Data Data Quality & Semantics are important for core enterprise data assets. Name: Audrey Hepburn DOB: May 4, 1929 Current Customer: No But the true value is in the interrelationships between data assets. Mother of Name: Luca Dotti DOB: February 8, 1970 Current Customer: Yes Purchased Yacht Insurance Purchased Home Insurance Filed a Claim
  35. 35. 25 Data Modeling & Metadata for Graph Databases
  36. 36. Global Data Strategy, Ltd. 2017 Data Modeling for Graph Databases • There are several dominant ways to model graph databases. Two popular ones include: • Resource Description Language (RDF) Triples • Labeled Property Graph 26 Labeled Property Graph • Made up of nodes, relationships, properties & labels • Sample Query language: Cypher • Sample Vendor: Neo4J Resource Description Language (RDF) Triples • Made up of subject, predicate object triples • Sample Query: SPARQL • Sample Vendor: Stardog • Both have a close affinity between logical & physical models • i.e. We already think in “thing relates to thing” • In the following slides, we’ll use the RDF example, since that is a W3C Open Standard.
  37. 37. Global Data Strategy, Ltd. 2017 Graph Query Languages • Unlike relational databases, where SQL is a general standard, there are a number of query language options available for graph databases: • SPARQL: is SQL-like declarative query language that was created by W3C to query RDF (Resource Description Framework) graphs. • Cypher: is also a declarative query language that resembles SQL. Created by Neo4J • GraphQL: is a query language for APIs. Isn’t specific to graph databases, but can be used for them. Developed by Facebook. • Gremlin: is a graph traversal language developed for Apache TinkerPop™, an open source, vendor-agnostic, graph computing framework distributed under the Apache2 license. 27 Again, we’ll use SPARQL in our examples since it’s a W3C standard.
  38. 38. Global Data Strategy, Ltd. 2017 Resource Description Framework (RDF) • The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C) provides a way to link resources on the web (people, places, things) using the concept of “triples”. • This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. 28 Subject Object Predicate RDF Triples
  39. 39. Global Data Strategy, Ltd. 2017 RDF Triple Example 29 Cynthia Fido Is Owner Of <Cynthia> <Owner Of> <Fido> Reference • Brackets indicate individual references in RDF. Note that these are defined by URIs in RDF, but have been simplified for this example. Subject Predicate Object
  40. 40. Global Data Strategy, Ltd. 2017 RDF Triples 30 <Cynthia> <type> <Person>. <Fido> <type> <Dog> <Cynthia> <hasName> “Cynthia Smith” <Fido> <hasName> “Fido” <Cynthia> <ownerOf> <Fido> Class Literal Instance
  41. 41. Global Data Strategy, Ltd. 2017 RDF Triple Graphical Representation • RDF triples can be intuitively visualized graphically 31 <Cynthia> <Person> <Fido> <ownerOf> “Cynthia Smith” <hasName> “Fido” <hasName> <type> <Dog> <type>
  42. 42. Global Data Strategy, Ltd. 2017 Logical Groupings @prefix example: http://example.org/example#. example: Cynthia rdf:type example: Person; example: hasName “Cynthia Smith” ; example: ownerOf example: Fido> . Example: Fido rdf:type example: Dog; example: hasName: “Fido” . 32 • A Person has a name • A Person can be an owner • A Dog has a name
  43. 43. Global Data Strategy, Ltd. 2017 Ontologies • An ontology is a data model of sorts to describe the “things” in RDF data. • Two types of languages include: • OWL (W3C Web Ontology): is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. • RDFS (RDF Schema): is a general-purpose language for representing simple RDF vocabularies. It is considered a precursor to OWL. • For example: 33 • People have Names • People can own kinds of things • Pets can be owned • A dog is a pet • Dogs can have names RDFS OWL can be more Expressive • A Mother is union of (Parent, Woman) • This Family ontology links with the Person ontology (meta-meta-metadata) • Etc.
  44. 44. Global Data Strategy, Ltd. 2017 Ontologies help Define Queries 34 People have Names People can own kinds of things Pets can be owned A dog is a pet Dogs can have names Ontology Show me all of the People who Own Dogs Query
  45. 45. Global Data Strategy, Ltd. 2017 Putting Ontologies & Queries Together 35 SELECT ?name WHERE { ?person type Person ; hasName ?name ; ownerOf ?pet . ?pet type Dog . } -> RESULT “Cynthia Smith” Define Variables ?person type Person ; hasName ?name ; ownerOf ?pet . ?pet type Dog. Write out the Graph using Variables Query across the Graph
  46. 46. Global Data Strategy, Ltd. 2017 Summary • Graph Databases provide powerful enterprise-wide association using simple constructs • “Thing Relates to Thing” • Relationships are first class constructs • Enterprise use cases are best suited to those that focus on interrelationships between data points • Social Networks • Fraud Detection • Recommendation Engines • Enterprise Knowledge Graph • Data Modeling & Metadata are supported by simple constructs • Data structures through Triples: Subject, Predicate, Object • Semantics through Ontologies (e.g. OWL) • Queries through SPARQL and other methods
  47. 47. Global Data Strategy, Ltd. 2017 About Global Data Strategy, Ltd • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. 37 Data-Driven Business Transformation Business Strategy Aligned With Data Strategy Visit www.globaldatastrategy.com for more information
  48. 48. Global Data Strategy, Ltd. 2017 Contact Info • Email: donna.burbank@globaldatastrategy.com • Twitter: @donnaburbank @GlobalDataStrat • Website: www.globaldatastrategy.com 38
  49. 49. Global Data Strategy, Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 39 This Year’s Line Up
  50. 50. Global Data Strategy, Ltd. 2017 Questions? 40 Thoughts? Ideas?

×