Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Real World Guide to Building Your Knowledge Graph

129 vues

Publié le

Speaker: Nir Avrahamov, Developer Relations, Neo4j

Abstract: Knowledge graphs are driving industry disruption and business transformation by bringing together previously disparate data, using connections for superior decision support, and adding context for more intelligent applications (including AI). In this session, we’ll walk through the fundamental elements of knowledge graphs including contextual relevancy, dynamic self-updating, understandability with intelligent metadata, and the combination of heterogeneous data.

Our use cases will cover the 3 main types of knowledge graphs (context-rich search, external insights sensing, and enterprise NLP) that build on each other. You’ll hear about real-world examples that include organizations such as Refinitiv a leading provider of financial information, the German Center for Diabetes Research, eBay, and NASA.

We’ll also cover how you can build analytical applications on top of your knowledge graph using Neo4j Solution Frameworks quickly and easily. Attend this session to see real world knowledge graphs and walk away with practical approaches for building your knowledge graph and leveraging it for business applications.

Publié dans : Technologie
  • Soyez le premier à commenter

Real World Guide to Building Your Knowledge Graph

  1. 1. Knowledge Graphs Nir Avrahamov Solutions Engineer nir.avrahamov@neo4j.com 10-01-2019
  2. 2. Index Free Adjacency - What is it? • While Any database can represent a graph, only a native graph database makes the graph structure explicit • In a graph database each node (or vertex) stores a collection of pointers to its adjacent nodes • This means that as the database grows in size the cost of each hop remains constant.
  3. 3. • Operational workloads • Analytics workloads Real-time Transactional and Analytic Processing • Interactive graph exploration • Graph representation of data Discovery and Visualization • Native property graph model • Dynamic schema Agility • Cypher - Declarative query language • Procedural language extensions • Worldwide developer community Developer Productivity • 10x less CPU with index-free adjacency • 10x less hardware than other platforms Hardware efficiency Neo4j: Why use Native Graph? Performance • Index-free adjacency • Millions of hops per second
  4. 4. The Knowledge Graph Problem Organizations have difficulty maintaining their corporate memory due to a variety of reasons: • Growth which drives need for new and continuous education • Digitalization / Digital Transformation initiatives to identify new markets • Turnover where long term knowledge is lost • Aging infrastructures and siloed information
  5. 5. Related entities are connected. (contextually related) Dynamically updating / not manual Uses intelligent labelling and ties in to the graph automatically Explainable - Intelligent metadata helps traverse to find answers to specific problems, even when we don’t know exactly how to ask for it. Usually contains heterogeneous data types. It combines and uncovers connections across silos of information. Key Principles of a Knowledge Graph
  6. 6. 8 Knowledge Graph Vs Knowledge Base “Unlike a simple knowledge base with flat structures and static content, a knowledge graph acquires and integrates adjacent information using data relationships to derive new knowledge.”
  7. 7. Purchases RELATIONAL DB Product Catalogue DOCUMENT STORE WIDE COLUMN STORE Views DOCUMENT STORE User Review RELATIONAL DB In-Store Purchase Shopping Cart KEY VALUE STORE Connector Apps and Systems Real-Time Queries
  8. 8. Customer Adress Store Phone Customer Email EmailAdress Phone Product Product Category Y Street Region Product Store Street Category X Simple Enterprise Knowledge Graphs Customer Graph Product Graph Supply Graph
  9. 9. Customer Graph Customer Adress Store Phone Customer Email EmailAdress Phone Product Product Category Y Street Region Product Store Street Category X Product Graph Supply Graph Simple Enterprise Knowledge Graph
  10. 10. Customer Graph Customer Adress Store Phone Customer Email EmailAdress Phone Product Product Category Y Street Region Product Store Street Category X Product Graph Supply Graph Unlock the Institutional Memory Real-time product recommendations Fraud Detection Real-time supply chain management Risk Management
  11. 11. Real-Time Recommendations Dynamic Pricing Artificial Intelligence & IoT-applications Fraud Detection Network Management Customer Engagement Supply Chain Efficiency Identity and Access Management Relationship-Driven Applications
  12. 12. Strictly Confidential Graph Algorithms in Neo4j • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi- Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors 14
  13. 13. • Parallel Breadth-First Search & Depth-First Search ○ Traverses tree structure by exploring nearest neighbors (BFS) or down each branch (DFS) • Single-Source Shortest Path ○ Calculates path between a node and all other nodes Algorithms - Pathfinding & Search Analyzing network flow • All-Pairs Shortest Path ○ Calculates shortest path group with all shortest paths between nodes • Minimum Weight Spanning Tree ○ Calculates the path with the smallest value for visiting all nodesLeast Cost Routing
  14. 14. Strictly Confidential Connected components to identify disjointed graphs sharing identifiers PageRank to measure influence and transaction volumes Louvain to identify communities that frequently interact Jaccard to measure account similarity Algorithms - Centrality &Community Detection Detecting Financial Fraud Large financial institutions have existing pipelines to identify fraud via knowledge graphs, heuristics, and ml models 16
  15. 15. Background • Brazil's largest bank, #38 on Forbes G2000 • $61B annual sales 95K employees • Most valuable brand in Brazil • 28.9M credit card & 25.6M debit card accounts • High integrity, customer-centric values Business Problem • Data silos made assessing credit worthiness hard • High sensitivity to fraud activity • 73% of all transactions over internet and mobile • Needed real-time detection for 2,000 analysts • Scale to trillions of relationships Solution and Benefits • Credit monitoring and fraud detection application • 4.2M nodes & 4B relationships for 100 analysts • Grow to 93T relationships for 2000 analysts by 2021 • Real time visibility into money flow across multiple customers Itau Unibanco FINANCIAL SERVICES Fraud Detection / Credit Monitoring17 CE Customer since 2016 Q1EE Customer since Q2 2017
  16. 16. Strictly Confidential het.io - HetioNet Knowledge graph integrating 50+ years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links Algorithms - Link Prediction Mining Data for Drug Discovery 18
  17. 17. Strictly Confidential Algorithms - Link Prediction Mining Data for Drug Discovery het.io - HetioNet Knowledge graph integrating 50+ years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links 19
  18. 18. 20 Data Orchestration Layer Data Sources CLIENT Admin Dashboard Session Data Feedback Scored Recommen- dations Graph Algorithms AI / ML Click Stream Data INTELLIGENT RECOMMENDATIONS FRAMEWORK Discovery Exclude Boost Diversity User Segmentation Item Similarity Recommendation Engines • Strategic Data Modelling • Continuous Data Capture • Automated Tagging & Labelling (NLP) • Real-time Scoring Pipelines & Algos • Preserved Data Lineage • Relevant Alerting • Auto & Semi-auto deduplication/entity resolution • ML integration RSS Feed Org. Feed (Graph) Generating Insights & Recommendations From Your Graph
  19. 19. 21 Hybrid Scoring-Based Approach is More Contextual Graph technology enables you to make recommendations that weight multiple methods Collaborative Filtering Based on user action history or product interaction Content Filtering Based on user's profile or product attributes Rules-Based Filtering Based on predefined rules and criteria Business Strategy Based on promotions, margins, inventory
  20. 20. Strictly Confidential Query-Based Knowledge Graphs Connecting the Dots Multiple graph layers of financial information Includes corporate data with cross- relationships, external news, and customized weighting Dashboards and tools • Credit risk • Investment risk • Portfolio news recommendations has become... 22
  21. 21. Strictly Confidential Demo
  22. 22. Background • Personal shopping assistant • Converses with buyer via text, picture and voice to provide real-time recommendations • Combines AI and natural language understanding (NLU) in Neo4j Knowledge Graph • First of many apps in eBay's AI Platform Business Problem • Improve personal context in online shopping • Transform buyer-provided context into ideal purchase recommendations over social platforms • "Feels like talking to a friend" Solution and Benefits • 3 developers, 8M nodes, 20M relationships • Needed high-performance traversals to respond to live customer requests • Easy to train new algorithms and grow model • Generating revenue since launch eBay Conversational Commerce ONLINE RETAIL Knowledge Graph powers Real-Time Recommendations24 EE Customer since 2016 Q3
  23. 23. Case Study: Knowledge Graphs at eBay
  24. 24. Case Study: Knowledge Graphs at eBay
  25. 25. Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  26. 26. Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  27. 27. Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  28. 28. Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  29. 29. Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  30. 30. Bags Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  31. 31. Men’s Backpack Handbag Case Study: Knowledge Graphs & Conversational Commerce online retail - eBay
  32. 32. Case Studies Neo4j Case Studies
  33. 33. Background • Large global bank • Deploying Reference Data to users and systems • 12 data domains, 18 datasets, 400+ integrations • Complex data management infrastructure Business Problem • Master data silos were inflexible and hard to consume • Needed simplification to reduce redundancy • Reduce risk when data is in consumers’ hands • Dramatically improve efficiency Solution and Benefits • Data distribution flows improved dramatically • Knowledge Base improves consumer access • Ad-hoc analytics improved • Governance, lineage and trust improved • Better service level from IT to data consumers UBS FINANCIAL SERVICES Master Data Management / Knowledge Graph35 CE Customer since 2016 Q1EE Customer since 2015
  34. 34. Background • SF-based C2C rental platform • Dataportal democratizes data access for growing number of employees while improving discoverability and trust • Data strewn everywhere—in silos, in segmented departments, nothing was universally accessible Business Problem • Data-driven culture hampered by variety and dependability of data, tribal knowledge and word-of-mouth distribution • Needed visibility into information usage, context, lineage and popularity across company of 3,000+ Solution and Benefits • Offers search with context & metadata, user & team-centric pages for origin & lineage • Nodes are resources: data tables, dashboards, reports, users, teams, business outcomes, etc. • Relationships reflect consumption, production, association, etc. • Neo4j, Elasticsearch, Python Airbnb Dataportal TRAVEL TECHNOLOGY Knowledge Graph, Metadata Management36 CE users since 2017
  35. 35. Background • 5 year long drug discovery research • Parse & Navigate over 25 Million scientific papers • Sourced from National Library of Research and tagging of “Medical Subject Headers” (MeSH tags) Business Problem • Seeking to automate phenotype, compound and protein cell behavior research by using previously documented research more effectively • Text mining for research elements like DNA strings, proteins, RNA, chemicals and diseases Solution and Benefits • Found ways to identify compound interaction behavior from millions of research documents • Relations between biological entities can be identified and validated by biologic experts • Still very challenging to keep up-to-date, add genomics data, and find a breakthrough Novartis PHARMACEUTICAL RESEARCH Content Management / Biomedical Research37 CE Customer since 2016 Q1CE Customer since 2012
  36. 36. Background • How Neo4j is used in investigations • Non-technical reporters manually gather data • “Low-tech” data curation • Journalists want to model data as a story, not as data Business Problem • Identify repeated business relationships among individuals and their holdings and accounts • Scan documents and identify possible entities, then create relationships between people and documents. • Names and alias variances Solution and Benefits • Uses Neo4j in “story discovery” phase • Uncovers shortest paths for leads for reporters • Many investigations underway now Columbia University EDUCATION Investigative Journalism / Fraud Detection38 CE Customer since 2016 Q1EE Customer since 2015 Q4
  37. 37. Background • Large Nordic Telecom Provider • 1M Broadband routers deployed in Sweden • Half of subscribership are over 55yrs old • Each household connects 10 devices • Goal to improve customer experience Business Problem • Broadband router enhancement to improve customer experience • Context-based in home services • How to build smart home platform that allows vendors to build new “home-centric” apps Solution and Benefits • New Features deployed to 1M homes • API-based platform for easy apps that: • Automatically assemble Spotify playlists based on who is in the house • Notify parents when children get home • Build smart shopping lists TELIA ZONE TELECOMMUNICATIONS Smart Home / Internet of Things39 EE Customer since 2016 Q4
  38. 38. Business Problem • Needed new asset management backbone to handle scheduling, ads, sales and pushing linear streams to satellites • Novell LDAP content hierarchy not flexible enough to store graph-based business content Solution and Benefits • Neo4j selected for performance and domain fit • Flexible, native storage of content hierarchy • Graph includes metadata used by all systems: TV series-->Episodes-->Blocks with Tags--> Linked Content, tagged with legal rights, actors, dubbing et al Background • Nashville-based developer of lifestyle- oriented content for TV, digital, mobile and publishing • Web properties generate tens of millions of unique visitors per month Scripps Networks MEDIA AND ENTERTAINMENT Knowledge Graph / Asset Management40
  39. 39. Business Problem • Needed to reimagine existing system to beat competition and provide 360-degree view of customers • Channel complexity necessitated move to graph database • Needed an enterprise-ready solution Solution and Benefits • Leapfrogged competition and increased digital business by 23% • Handles new data from mobile, social networks, experience and governance sources • After launch of new Neo4j MDM, Pitney Bowes stock declared a Buy Background • Connecticut-based leader in digital marketing communications • Helps clients provide omni-channel experience with in-context information Pitney Bowes MARKETING COMMUNICATIONS Master Data Management41
  40. 40. Background • Large Public University – “U-Dub” • IT staff for 80K+ students and employees • Transforming IT systems from mainframe to cloud • Providing IT & data warehousing services to 3 campuses, 6 hospitals, and 6,300 EDW users Business Problem • Old Sharepoint metadata was too complicated for users, not flexible and not transparent • $1B project to migrate HR system from mainframe to Workday needed to be smooth • Future projects needed repeatable predictability • Needed new glossary, impact analysis, analytics Solution and Benefits • Consulted with NDU peers, built simple model • Built Visualizer with Elasticsearch, Neo4j & D3.js • Improved predictability, lineage, and impact understanding for over 6,300 users University of Washington EDUCATION & RESEARCH Metadata Management, IT & Network Operations42 CE Customer since 2016 Q1
  41. 41. Background • World's largest hospitality / hotel company • 7th largest web site on internet • 1.5 M hotel rooms offered online by 2018 • Revenue Management System that allows property managers to update their pricing rates Business Problem • Provide the right room & price at the right time • Old rate program was inflexible and bogged down as they increased the pricing options per property per day • Lay the path to be an innovator in the future Solution and Benefits • 2016-era rate program embeds Neo4j as "cache" • Created a graph per hotel for 4500 properties in 3 clusters • 1000% increase in volume over 4 years • 50% decrease in infrastructure costs • "Use Neo4j Support!" MARRIOTT TRAVEL & HOSPITALITY SERVICES Pricing Recommendations Engine43 EE Customer since 2014 Q2
  42. 42. Strictly Confidential Better Predictions with Graphs Using the Data You Already Have • Current data science models ignore network structure • Graphs add highly predictive features to ML models, increasing accuracy • Otherwise unattainable predictions based on relationships Machine Learning Pipeline 44
  43. 43. Strictly ConfidentialStrictly Confidential The Market Sees Strong Synergy between Graphs and Artificial Intelligence 45 AI research papers focused on graphs New Book: 20K Downloads in first 2 weeks, ⅓ Net-new