Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Neo4j GraphTalks - Einführung in Graphdatenbanken

141 vues

Publié le

Neo4j GraphTalks Zürich
Bruno Ungermann - Neo4j

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Neo4j GraphTalks - Einführung in Graphdatenbanken

  1. 1. GraphTalks Zürich Herzlich Willkommen! Juli 2017 bruno.ungermann@neotechnology.com
  2. 2. Neo4j GraphTalks • 09:00-09:30 Frühstück und Networking • 09:30-10:00 Einführung in Graph-Datenbanken und Neo4j (Bruno Ungermann, Neo Technology) • 10:00-11:00 Visualisierung von Big Data Sets der Pharmaindustrie mittels Graphdatenbanken (Dr. Steffen Tomschke, Team-Lead und UX-Consultant, B-S-S Business Software Solutions GmbH ) • Open End (Dirk Möller, Alexander Erdl)
  3. 3. Complexity
  4. 4. The Internet (oT)
  5. 5. Domain Model Logistics Process
  6. 6. Traditional Approach: Fixed Schema, Tables
  7. 7. Graph Model: Nodes & Relationships Container Load USING_CARRIER Vessel Physical Container Container Load Shipment Carrier Emission Class A Shipment Carrier Route 10520km Route 823km Fueling Max Wgt 80 Type Gas B Town: Tokyo Town: Hong Kong Town: Hamburg Container LoadContainer LoadContainer Load Parcel Weight 15.5kg
  8. 8. Intuitiveness
  9. 9. A Naturally Adaptive Model vs Fixed Schema Flexibility
  10. 10. “We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.” - Volker Pacher, Senior Developer “Minutes to milliseconds” performance Queries up to 1000x faster than other tested database types Speed
  11. 11. Discrete Data Minimally connected data Neo4j is designed for data relationships Other NoSQL Relational DBMS Neo4j Graph DB Connected Data Focused on Data Relationships Development Benefits Easy model maintenance Easy query Deployment Benefits Ultra high performance Minimal resource usage Use the Right Database for the Right Job
  12. 12. 2000 2003 2007 2009 2011 2013 2014 20152012 GraphConnect, first conference for graph DBs First Global 2000 Customer Introduced first and only declarative query language for property graph Published O’Reilly book on Graph Databases First native graph DB in 24/7 production Invented property graph model Contributed first graph DB to open source Extended graph data model to labeled property graph 150-200+ customers 50-60K+ monthly downloads 500-600 graph DB events worldwide Neo4j: The Graph Database Leader 2016 2017 and beyond OpenCypher Industry partnerships Neo4j 3.X 250+ customers 65K+ monthly downloads Partner focus
  14. 14. 2012  2017 May 10th-11th, London CONFERENCE + TRAINING
  15. 15. “Forrester estimates that over 25% of enterprises will be using graph databases by 2017” “Neo4j is the current market leader in graph databases.” “Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions after the design of data capture.” IT Market Clock for Database Management Systems, 2014 https://www.gartner.com/doc/2852717/it-market-clock-database-management TechRadar™: Enterprise DBMS, Q1 2014 http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801 Graph Databases – and Their Potential to Transform How We Capture Interdependencies (Enterprise Management Associates) http://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-databasesand-potential-transform-capture-interdependencies/ Neo4j Leads the Graph Database Revolution
  16. 16. Graph Based Success
  17. 17. Real-Time Recommendation s Fraud Detection Network & IT Operations Knowledge Managemen t Graph Based Search Identity & Access Management Common Graph Use Cases
  18. 18. Knowledge Management: Status Quo Dr. Andreas Weber | semantic data management | 11.11.2016 QS / LIMS ERP Logistik Warehouse- management Produkt- management Technisches PDM/PLM Dokumenten- management Excel Excel Powerpoint Powerpoint Excel Excel
  19. 19. Logistik RDBMS CRM RDBMS Mails Mailsyst Dokumente Filesystem Media Library Filesyse m CMS RDBMS Social RDBMS LogFiles RDBMS Ecommerce RDBMS Graph Based Knowledge Management (MDM, Enterprise Search..)
  20. 20. Adidas Shared Meta Data Service 20 Knowledge Management Background • Global leader in sporting goods industry services firm footware, apparel, hardware, 14.5 bln sales, 53,000 people • Multitude of products, markets, media, assets and audiences Business Problem • Beset by a wide array of information silos including data about products, markets, social media, master data, digital assets, brand content and more • Provide the most compelling and relevant content to consumers • Offering enhanced recommendations to drive revenue Solution and Benefits • Save time and cost through stadardized access to content sharing-system with internal teams, partners, IT units, fast, reliable, searchable avoiding reduandancy • Inprove customer experience and increase revenue by providing relevant content and recommentations
  21. 21. Background • San Jose-based communications equipment giant ranks #91 in the Global 2000 with $44B in annual sales • Needed high-performance system that could provide master-data access services 24x7 to applications company-wide Solution and Benefits • New Hierarchy Management Platform (HMP) manages master data, rules and access • Cut access times from minutes to milliseconds • Graphs provided flexibility for business rules • Expanded master-data services to include product hierarchies Business Problem • Sales compensation system didn’t meet needs • Oracle RAC system had reached its limits • Inflexible handling of complex organizational hierarchies and mappings • ”Real-time” queries ran for more than a minute • P1 system must have zero downtime Cisco COMMUNICATIONS Master Data Management21
  22. 22. Background • Mid-size German insurer founded in 1858 • Project executed by Delvin, a subsidiary of die Bayerische Versicherung and an IT insurance specialist Business Problem • Field sales needed easy, dynamic, 24/7 access to policies and customer data • Existing DB2 system unable to meet performance and scaling demands Solution and Benefits • Enabled flexible searching of policies and associated personal data • Raised the bar on industry practices • Delivered high performance and scalability • Ported existing metadata easily Die Bayerische Versicherung INSURANCE Knowledge Management22
  23. 23. Background • Large global bank • Deploying Reference Data to users and systems • 12 data domains, 18 datasets, 400+ integrations • Complex data management infrastructure Business Problem • Master data silos were inflexible and hard to consume • Needed simplification to reduce redundancy • Reduce risk when data is in consumers’ hands • Dramatically improve efficiency Solution and Benefits • Data distribution flows improved dramatically • Knowledge Base improves consumer access • Ad-hoc analytics improved • Governance, lineage and trust improved • Better service level from IT to data consumers UBS FINANCIAL SERVICES Master Data Management / Metadata23 CE Customer since 2
  24. 24. Background • 5 year long drug discovery research • Parse & Navigate over 25 Million scientific papers • Sourcedfrom National Library of Research and tagging of “Medical Subject Headers” (MeSH tags) Business Problem • Seeking to automate phenotype, compound and protein cell behavior research by using previously documented research more effectively • Text mining for research elements like DNA strings, proteins, RNA, chemicals and diseases Solution and Benefits • Found ways to identify compound interaction behavior from millions of research documents • Relations between biological entities can be identified and validated by biologic experts • Still very challenging to keep up-to-date, add genomics data, and find a breakthrough Novartis PHARMACEUTICAL RESEARCH Content Management / Biomedical Research24
  25. 25. Background • SF-based C2C rental platform • Dataportal democratizes data access for growing number of employees while improving discoverability and trust • Data strewn everywhere—in silos, in segmented departments, nothing was universally accessible Business Problem • Data-driven culture hampered by variety and dependability of data, tribal knowledge and word-of- mouth distribution • Needed visibility into information usage, context, lineage and popularity across company of 3,000+ Solution and Benefits • Offers search with context & metadata, user & team- centric pages for origin & lineage • Nodes are resources: data tables, dashboards, reports, users, teams, business outcomes, etc. • Relationships reflect consumption, production, association, etc. • Neo4j, Elasticsearch, Python Airbnb Dataportal TRAVEL TECHNOLOGY Knowledge Graph, Metadata Management25
  26. 26. Related products People who bought X also bought Y The main product Recommendations (In Real-Time)
  28. 28. Returns Purchase History Price-range Home delivery Inventory Express goods Complaints reviews Tweets Emails Category Promotions Bundling Location KITCHE N AID SERIES
  29. 29. Business Problem • Optimize walmart.com user experience • Connect complex buyer and product data to gain super-fast insight into customer needs and product trends • RDBMS couldn’t handle complex queries Solution and Benefits • Replaced complex batch process real-time online recommendations • Built simple, real-time recommendation system with low-latency queries • Serve better and faster recommendations by combining historical and session data Background • Founded in 1962 and based in Arkansas • 11,000+ stores in 27 countries with walmart.com online store • 2M+ employees and $470 billion in annual revenues Walmart RETAIL Real-Time Recommendations30
  30. 30. Background • One of the world’s largest logistics carriers • Projected to outgrow capacity of old system • New parcel routing system Single source of truth for entire network B2C and B2B parcel tracking Real-time routing: up to 7M parcels per day Business Problem • Needed 365x24x7 availability • Peak loads of 3000+ parcels per second • Complex and diverse software stack • Need predictable performance, linear scalability • Daily changes to logistics network: route from any point to any point Solution and Benefits • Ideal domain fit: a logistics network is a graph • Extreme availability, performance via clustering • Greatly simplified routing queries vs. relational • Flexible data model reflect real-world data variance much better than relational • Whiteboard-friendly model easy to understand Accenture LOGISTICS 31 Real-Time Routing Recommendations
  31. 31. Business Problem • Provide the right room & price at the right time • Extremly complex individual pricing calculations • Moved from per month to per day calculation • Former system too slow, too inflexible Solution and Benefits • Huge performance increase through replacement of legacy system • 4 Core Laptop, 6% CPU usage provides better performance than 3 server 96 Core config with 80% CPU usage  „mind- blowing“, 50 decrease infrastructure costs • Overcame internal hurdles by using embedded, application internal cache vs new database system Background • World‘s largest hospitality / hotel company • 1.5 M hotel rooms offered online by 2018 • 15 Bln eCommerce Sales 2015, #7 IDC rating internet sales Marriott Hospitality Real-Time Recommendations32
  32. 32. Background • San Jose-based communications equipment giant ranks #91 in the Global 2000 with $44B in annual sales • Needed real-time recommendations to encourage knowledge base use on company’s support portal Solution and Benefits • Faster problem resolution for customers and decreased reliance on support teams • Scrape cases, solutions, articles et al continuously for cross-reference links • Provide real-time reading recommendations • Uses Neo4j Enterprise HA cluster Business Problem • Reduce call-center volumes and costs via improved online self-service quality • Leverage large amounts of knowledge stored in service cases, solutions, articles, forums, etc. • Reduce resolution times and support costs Cisco COMMUNICATIONS Real-Time Recommendations Solution Support Case Support Case Knowledge Base Article Message Knowledge Base Article Knowledge Base Article 33
  33. 33. Mesh Router Gatew ay Router Router Router Mesh Router Router Router Mesh Router Gatew ay Access Point CPU CPU CPU CPU Mobile Mobile Mobile Mobile Base Station CPU CPU CPU CPU Access Point
  34. 34. Background • Second largest communications company in France • Based in Paris, part of Vivendi Group, partnering with Vodafone Solution and Benefits • Flexible inventory management supports modeling, aggregation, troubleshooting • Single source of truth for entire network • New apps model network via near-1:1 mapping between graph and real world • Schema adapts to changing needs Network and IT Operations SFR COMMUNICATIONS Business Problem • Infrastructure maintenance took week to plan due to need to model network impacts • Needed what-if to model unplanned outages • Identify network weaknesses to uncover need for additional redundancy • Info lived on 30+ systems, with daily changes LINKED LINKED DEPENDS_ON Router Service Switch Switch Router Fiber Link Fiber Link Fiber Link Oceanfloor Cable 36
  35. 35. Business Problem • Original RDBMS solution could handle only 5,000 servers • Improve net performance company-wide • Leverage M&A legacy systems with no room for error Solution and Benefits • Store UNIX server and network config in Neo4j • Combine Splunk log data into an application that visualizes events on the network • Neo4j vastly improved app performance • New apps built much faster with Neo4j than SQL Large Investment Bank FINANCIAL SERVICES Network and IT Operations37 Background • One of the world’s oldest and largest banks • 100+ year-old bank with more than 1000 predecessor institutions • 500,000 employees and contractors • Needed to manage and visualize ~50,000 Unix servers in its network
  36. 36. Background • World’s largest provider of IT infrastructure, software and services • Unified Correlation Analyzer (UCA) helps comms operators manage large networks with carrier-class resource and service management, root cause and impact analysis Business Problem • Use network topology to identify root problems causes on the network • Simplify and speed alarm handling by operators • Automate handling of certain types of alarms • Filter/group/eliminate redundant alarms via event correlation Solution and Benefits • Accelerated product development time • Extremely fast network-topology queries • Graph representation a perfect domain fit • 24x7 carrier-grade reliability with Neo4j High Availability clustering • Met objective in under six months Hewlett Packard WEB/ISV COMMUNICATIONS Network and IT Operations38
  37. 37. Identity Relationship ManagementIdentity Access Management Applications and data Endpoints People Customers (millions) Partners and Suppliers Workforce (thousands) PCs Tablets On-premises Private Cloud Public Cloud Things (Tens of millions) WearablesPhones PCs Customers (millions) On-premises Applications and data Endpoints People
  38. 38. Background • Oslo-based telcom provider is #1 in Nordic countries and #10 in world • Online, mission-critical, self-serve system lets users manage subscriptions and plans • availability and responsiveness is critical to customer satisfaction Business Problem • Logins took minutes to retrieve relational access rights • Massive joins across millions of plans, customers, admins, groups • Nightly batch production required 9 hours and produced stale data Solution and Benefits • Shifted authentication from Sybase to Neo4j • Moved resource graph to Neo4j • Replaced batch process with real-time login response measured in milliseconds that delivers real-time data, vw yday’s snapshot • Mitigated customer retention risks Identity and Access Management Telenor COMMUNICATIONS SUBSCRIBED_BY CONTROLLED_BY PART_O F USER_ACCESS Account Customer CustomerUser Subscription 40
  39. 39. Background • Top investment bank with $1+ trillion in assets • Using a relational database and Gemfire to manage employee permissions to research document and application-service resources • Permissions for new investment managers and traders provisioned manually Business Problem • Lost an average of 5 days per new hire while they waited to be granted access to hundreds of resources, each with its own permissions • Replace an unsuccessful onboarding process implemented by a competitor • Regulations left no room for error Solution and Benefits • Store models, groups and entitlements in Neo4j • Exceeded performance requirements • Major productivity advantage due to domain fit • Graph visualization ease permissioning process • Fewer compromises than with relational • Expanded Neo4j solution to online brokerage UBS FINANCIAL SERVICES Identity and Access Management41
  40. 40. INVESTIGATE Revolving Debt Number of Accounts INVESTIGATE Normal behavior Fraud Detection with Discrete Analysis
  41. 41. Revolving Debt Number of Accounts Normal behavior Fraud Detection With Connected Analysis Fraudulent pattern
  42. 42. Background • Global financial services firm with trillions of dollars in assets • Varying compliance and governance considerations • Incredibly complex transaction systems, with ever- growing opportunities for fraud Business Problem • Needed to spot and prevent fraud detection in real time, especially in payments that fall within “normal” behavior metrics • Needed more accurate and faster credit risk analysis for payment transactions • Needed to dramatically reduce chargebacks Solution and Benefits • Lowered TCO by simplifying credit risk analysis and fraud detection processes • Identify entities and connections uniquely • Saved billions by reducing chargebacks and fraud • Enabled building real-time apps with non-uniform data and no sparse tables or schema changes London and New York Financial FINANCIAL SERVICES Fraud Detection s 44
  43. 43. Background • Panama based lawyers Mossack & Fonseca do business in hosting “letterbox companies” • Suspected to support tax saving and organized crime • Altogether: 2.6 TB, 11 milo files, 214.000 letter box companies Business Problem • Goal to unravel chains Bank-Person–Client– Address–Intermediaries – M&F • Earlier cases: spreadsheet based analysis (back- and-forth) & pencil to extract such connections • This case: sheer amount of data & arbitrarily chain length condemn such approaches to fail Solution and Benefits • 400 journalists, investigate/update/share, 2 people with IT background • Identify connections quickly and easily • Fast Results wouldn‘t be possible without GraphDB Panama Papers Fraud Detection Fraud Detection45
  44. 44. How to Start?
  45. 45. Bootcamp
  46. 46. GraphGists