Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Neo4j in Depth

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Intro to Neo4j
Intro to Neo4j
Chargement dans…3
×

Consultez-les par la suite

1 sur 88 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Neo4j in Depth (20)

Publicité

Plus par Max De Marzi (20)

Plus récents (20)

Publicité

Neo4j in Depth

  1. 1. Neo4j  in  Depth Max  De  Marzi
  2. 2. About  Me • Max  De  Marzi  -­‐  Neo4j  Field  Engineer     • My  Blog:  http://maxdemarzi.com   • Find  me  on  Twitter:  @maxdemarzi   • Email  me:  maxdemarzi@gmail.com   • GitHub:  http://github.com/maxdemarzi
  3. 3. TLDR:
  4. 4. Property  Graph  Data  Model
  5. 5. What  you  already  know
  6. 6. The  Problem • all JOINs are executed every time you query (traverse) the relationship •  executing a JOIN means to search for a key in another table •  with Indices executing a JOIN means to lookup a key •  B-Tree Index: O(log(n)) •  more entries => more lookups => slower JOINs
  7. 7. People ConferencesAttend 143 Max 326 Big Data Tech Con 725 NoSQL Now 981 Chariot Data IO143 981 143 725 143 326
  8. 8. Max Big Data Tech Con NoSQL Now Chariot Data IO 143 326 725 981 143 981 143 725 143 326
  9. 9. uid: MDM name: Max uid: BDTC where: Burlinggame uid: NSN where: San Francisco uid: CDIO where: Philadelphia Nodes Relationships member member member A Property Graph
  10. 10. Neo4j  Secret  Sauce • Pointers instead of look-ups • Fixed sized records for fast access • Do all your “Joining” on creation • Spin spin spin through this data structure
  11. 11. Relational  Databases  Can’t  Handle  Relationships  Well • Cannot  model  or  store  data  and  relationships   without  complexity   • Performance  degrades  with  number  &  levels  of   relationships,  and  database  size   • Query  complexity  grows  with  need  for  JOINs   • Adding  new  types  of    data  and  relationships   requires  schema  redesign,  increasing  time  to   market   …  making  traditional  databases  inappropriate  when   relationships  are  valuable  in  real-­‐time Slow  development
 Poor  performance
 Low  scalability
 Hard  to  maintain
  12. 12. NoSQL  Databases  Don’t  Handle  Relationships • No  data  structures  to  model  or  store   relationships   • No  query  constructs  to  support   relationships   • Relating  data  requires  “JOIN  logic”  in  the   application   • No  ACID  support  for  transactions   …  making  NoSQL  databases  inappropriate  when   relationships  are  valuable  in  real-­‐time
  13. 13. Real-­‐Time  Query  Performance
 Performance  must  hold  steady  with  scale Connectedness  and  Size  of  Data  Set Response  Time 0  to  2  hops
 0  to  3  degrees
 Thousands  of  connections Tens  to  hundreds  of  hops
 Thousands  of  degrees
 Billions    of  connections Relational  and
 Other  NoSQL
 Databases Neo4j Neo4j  is  
 1000x  faster
 Reduces  minutes  
 to  milliseconds
  14. 14. Re-­‐Imagine  Your  Data  as  a  Graph Neo4j  is  an  enterprise-­‐grade  graph   database  that  enables  you  to:   • Model  and  store  your  data  as  a   graph   • Query  relationships  with  ease   and  in  real-­‐time   • Seamlessly  evolve  applications   to  support  new  requirements  by  
 adding  new  kinds  of  data  and   relationships Agile  development
 High  performance
 Vertical  and  horizontal  scale
 Seamless  evolution
  15. 15. Neo4j  Overview Product   • Neo4j  -­‐  World’s  leading  graph   database   • 1M+  downloads,  adding  50k+  
 per  month   • 150+  enterprise  subscription   customers  including  over  
 50  of  the  Global  2000 Company   • Neo  Technology,  Creator  of  Neo4j   • 80  employees  with  HQ  in  Silicon   Valley,  London,  Munich,  Paris  and   Malmö   • $45M  in  funding  from  Fidelity,   Sunstone,  Conor,  Creandum,   Dawn  Capital
  16. 16.      2000                  2003                                2007      2009   2011 2013 2014 2015 Neo4j:  The  Graph  Database  Leader GraphConnect,  
 first  conference   for  graph  DBs First  Global  2000    
 Customer   Introduced  Cypher   a  declarative  query   language  for   property  graphs Published   O’Reilly  book
 on  Graph   Databases $11M  Series  A  
 from  Fidelity,   Sunstone
 and  Conor   $11M  Series  B  
 from  Fidelity,   Sunstone
 and  Conor   Commercial
 Leadership First  
 native  
 graph  DB  
 in  24/7   production Invented   property   graph   model Contributed   first  graph   DB  to  open   source $2.5M  Seed
 Round  from   Sunstone  
 and  Conor Funding Technical
 Leadership Extended  
 graph  data   model  to  
 labeled   property  graph 150+  customers   50K+  monthly
 downloads   500+  graph  
 DB  events
 worldwide  
 $20M  Series  C  
 led  by   Creandum,  with   Dawn  and   existing  investors
  17. 17. “Forrester  estimates  that  over  25%  of  enterprises  will  be  using   graph  databases  by  2017” Neo4j  Leads  the  Graph  Database  Revolution “Neo4j  is  the  current  market  leader  in  graph  databases.” “Graph  analysis  is  possibly  the  single  most  effective  competitive   differentiator  for  organizations  pursuing  data-­‐driven  operations   and  decisions  after  the  design  of  data  capture.” 1.  IT  Market  Clock  for  Database  Management  Systems,  2014   2.  TechRadar™:  Enterprise  DBMS,  Q1  2014   3.Graph  Databases  –  and  Their  Potential  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)
  18. 18. Building  a  Recommendation  Engine  in  2  Minutes  with  Neo4j  
 Developer  Experience:  Neo4j  UI  with  Cypher  Query  Language Two-­‐Minute  Video  Demo https://www.youtube.com/watch?v=qbZ_Q-­‐YnHYo
  19. 19. Neo4j  –  Key  Product  Features Native  Graph  Storage
 Ensures  data  consistency  and   performance   Native  Graph  Processing
 Millions  of  hops  per  second,  in  real  time   “Whiteboard  Friendly”  Data  Modeling
 Model  data  as  it  naturally  occurs   High  Data  Integrity
 Fully  ACID  transactions The  Graph  Query  Language:  Cypher
 Requires  10x  to  100x  less  code  than  SQL   Scalability  and  High  Availability
 Vertical  and  horizontal  scaling  optimized   for  graphs   Built-­‐in  ETL
 Seamless  import  from  other  databases   Integration
 Drivers  and  APIs  for  popular  languages MATCH
 (A)
  20. 20. CAR DRIVES name:  “Dan”   born:  May  29,  1970   twitter:  “@dan” name:  “Ann”   born:    Dec  5,  1975 since:  
 Jan  10,  2011 brand:  “Volvo”   model:  “V70” Property  Graph  Model  Components Nodes   • The  objects  in  the  graph   • Can  have  properties   • Can  be  labeled   Relationships   • Relate  nodes  by  type  and  direction   • Can  have  properties LOVES LOVES LIVES  WITH OW NS PERSON PERSON
  21. 21. Triple  Store/RDF  Model • Resource  Description  Framework   • Subject,  Predicate,  Object   • Standard  Data  Model   • Names  for  subjects,  predicates,   objects  must  be  URIs   • Names  must  be  Global   • No  properties  on  the  Relationships   • Like  “3rd  Normal  Form”  for  Relational   Databases  (but  really  more  like  5/6th)
  22. 22. Property  Graph  Data  Model  (Movies)
  23. 23. RDF  Data  Model  (Movies)
  24. 24. Property  Graph  Vs  Triple  Store • Property  Graph  is  a  more  generic  case  of  the  Triple  Store   • Lack  of  properties  on  relationships  for  Triple  Stores  reduce  (  or   complicate)  their  expressive  power
  25. 25. Query  Languages • Graph  Databases:   • Cypher  -­‐  declarative,  pattern   matching,  easy  to  understand   • Gremlin  -­‐  imperative,  step   driven,  math  inspired   • Native  APIs  (Java,  REST) • Triple  Stores:   • SPARQL  (standard)   • PROLOG  (or  prolog-­‐like   languages)
  26. 26. General  Use  Cases • Graph  Databases:   • Local  Queries  (anchor  on  a   node  or  set  of  nodes  then   traverse)   • Realtime  (<20ms)  requirements   • Complex,  deep  traversals   • Flexible  graph  models • Triple  Stores:   • Global  Queries  (find  pattern  in   large  volume  of  information)   • Browsing  Content   • Inference  Discovery
  27. 27. How  do  you  model  Flight  Data?
  28. 28. How  do  you  model  Flight  Data?
  29. 29. How  do  you  model  Flight  Data?
  30. 30. How  do  you  model  Flight  Data?
  31. 31. How  do  you  model  Flight  Data?
  32. 32. How  do  you  model  Flight  Data?
  33. 33. How  do  you  model  Flight  Data?
  34. 34. How  do  you  model  Comic  Books? How  do  you  model  a  world  where  anything  can  happen?
  35. 35. Graph  Databases  allow  Model  Flexibility https://vimeo.com/79399404 Watch  the  presentation  at:
  36. 36. Java  CORE  API Direct  access  to  Nodes  and   Relationships
  37. 37. Java  Core  API • Step  by  Step  from  GraphDatabaseService   • Start  a  transaction  (reads  and  writes)   • findNode(Label,  Property,  Value)   • findNodes(Label,  Property,  Value)   • findNodes(Label)   • getNodeById(Long)     • getRelationships(Direction,  Type)   • getProperty(Property,  (optional)  Default  Value)
  38. 38. Example  (get  the  friends  of  a  user)
  39. 39. Traversal  API Describe  Traversals
  40. 40. Traversal  API • Start  with  the  Simple  Defaults  (order,  relationships,   depth,  uniqueness,  etc)   • Custom  Expanders   • Where  should  I  go  next   • Custom  Evaluators   • I’ve  gone  there…  should  I  accept  this  path?
  41. 41. Traversal  API  Example
  42. 42. Cypher  Query  Language ASCII  Art  Pattern  Matching
  43. 43. Cypher:  Powerful  and  Expressive  Query  Language MATCH  (:Person  {  name:“Dan”}  )  -­‐[:LOVES]-­‐>  (:Person  {  name:“Ann”}  )   LOVES Dan Ann Label Property Label Property Node Node
  44. 44. MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)   WHERE  boss.name  =  “John  Doe”   RETURN  sub.name  AS  Subordinate,  
    count(report)  AS  Total Express  Complex  Queries  Easily  with  Cypher Find  all  direct  reports  and  
 how  many  people  they  manage,  
 up  to  3  levels  down Cypher  QuerySQL  Query
  45. 45. Hello  World  Recommendation
  46. 46. Hello  World  Recommendation
  47. 47. Movie  Data  Model
  48. 48. Cypher  Query:  Movie  Recommendation MATCH  (watched:Movie  {title:"Toy  Story”})  <-­‐[r1:RATED]-­‐  ()  -­‐[r2:RATED]-­‐>  (unseen:Movie)   WHERE  r1.rating  >  7  AND  r2.rating  >  7   AND  watched.genres  =  unseen.genres   AND  NOT(  (:Person  {username:”maxdemarzi"})  -­‐[:RATED|WATCHED]-­‐>  (unseen)  )   RETURN  unseen.title,  COUNT(*)   ORDER  BY  COUNT(*)  DESC   LIMIT  25 What  are  the  Top  25  Movies   • that  I  haven't  seen   • with  the  same  genres  as  Toy  Story     • given  high  ratings   • by  people  who  liked  Toy  Story
  49. 49. Movie  Data  Model
  50. 50. Cypher  Query:  k-­‐NN  Recommendation MATCH  (m:Movie)  <-­‐[r:RATED]-­‐  (b:Person)  -­‐[s:SIMILARITY]-­‐  (p:Person  {name:'Zoltan  Varju'})   WHERE  NOT(  (p)  -­‐[:RATED|WATCHED]-­‐>  (m)  )   WITH  m,  s.similarity  AS  similarity,  r.rating  AS  rating   ORDER  BY  m.name,  similarity  DESC   WITH  m.name  AS  movie,  COLLECT(rating)[0..3]  AS  ratings   WITH  movie,  REDUCE(s  =  0,  i  IN  ratings  |  s  +  i)*1.0  /  LENGTH(ratings)  AS  recommendation   ORDER  BY  recommendation  DESC   RETURN  movie,  recommendation
 LIMIT  25 What  are  the  Top  25  Movies   • that  Zoltan  Varju  has  not  seen   • using  the  average  rating   • by  my  top  3  neighbors  
  51. 51. Neo4j  Interface Server,  Service,  Library
  52. 52. High  Speed  Fraud  -­‐  1000  R/S http://maxdemarzi.com/2014/02/12/online-­‐payment-­‐risk-­‐management-­‐with-­‐neo4j/  
  53. 53. High  Speed  Fraud  -­‐  8000  R/S http://maxdemarzi.com/2014/02/27/neo4j-­‐at-­‐ludicrous-­‐speed/
  54. 54. High  Speed  Fraud  -­‐  28000  R/S http://maxdemarzi.com/2014/03/10/its-­‐over-­‐9000-­‐neo4j-­‐on-­‐websockets/
  55. 55. Neo4j Additional  Features
  56. 56. Neo4j  Clustering  
 Architecture  Optimized  for  Speed  &  Availability  at  Scale 57 Performance  Benefits:   • No  network  hops  within  queries   • Real-­‐time  operations  with  fast  and   consistent  response  times     • Cache  sharding  spreads  cache  across   cluster  for  very  large  graphs Clustering  Features:   • Master-­‐slave  replication  with  
 master  re-­‐election  and  failover     • Each  instance  has  its  own  local  cache   • Horizontal  scaling  &  disaster  recovery Load  Balancer Neo4jNeo4jNeo4j
  57. 57. Getting  Data  into  Neo4j Cypher-­‐Based  “LOAD  CSV”  Capability   • Transactional  (ACID)  writes   • Initial  and  incremental  loads  of  up  to  
 10  million  nodes  and  relationships   Command-­‐Line  Bulk  Loader        neo4j-­‐import   • For  initial  database  population   • For  loads  with  10B+  records   • Up  to  1M  records  per  second  4.58  million  things   and  their  relationships…   Loads  in  100  seconds!
  58. 58. Databases Data  Storage  and
 Business  Rules  Execution Data  Mining  
 and  Aggregation Neo4j  Fits  into  Your  Enterprise  Environment Application Graph  Database  Cluster Neo4j Neo4j Neo4j Ad  Hoc
 Analysis ETL Bulk  Analytic
 Infrastructure
 Graph  Compute  Engine
 Hadoop      EDW      … ETL Data   Scientist End  User
  59. 59. Value  from  Relationships  –  Common  Use  Cases Internal  Applications   Master  Data  Management     Network  and  IT  
 Operations   Fraud  Detection Customer-­‐Facing  Applications   Real-­‐time  Recommendations   Graph-­‐based  Search   Identity  and  
 Access  Management
  60. 60. Open  Corporates Uses  Neo4j
  61. 61. Open  Corporates
  62. 62. Open  Corporates Uses  Neo4j https://skillsmatter.com/skillscasts/4097-­‐case-­‐study-­‐how-­‐opencorporates-­‐uses-­‐neo4j-­‐to-­‐provide-­‐insight
  63. 63. Open  Source  Examples http://maxdemarzi.com/2012/10/18/matches- are-the-new-hotness/
  64. 64. What  are  the  Top  10  Jobs  for  me   • that  are  in  the  same  location  I’m  in   • for  which  I  have  the  necessary  qualifications
  65. 65. Partial  Subgraph  Search
  66. 66. Recommend  Love Find  your  soulmate  in  the  graph     • Are  they  energetic?   • Do  they  like  dogs?   • Have  a  good  sense  of  humor?   • Neat  and  tidy,  but  not  crazy  about  it? What  are  the  Top  10  Potential  Mates  for  me   • that  are  in  the  same  location   • are  sexually  compatible   • have  traits  I  want     • want  traits  I  have
  67. 67. Love  Recommendation
  68. 68. Two  Party  Partial  Subgraph  Search http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/
  69. 69. Real-­‐Time  Recommendations  with  Neo4j Social
 Recommendations Products  
 and  Services Content Routing
  70. 70. Walmart        BUSINESS  CASE World’s  largest  company
 by  revenue   World’s  largest  retailer  and   private  employer   SF-­‐based  global  
 e-­‐commerce  division   manages  several  websites   Found  in  1969
 Bentonville,  Arkansas   • Needed  online  customer  recommendations  to   keep  pace  with  competition   • Data  connections  provided  predictive  context,  but   were  not  in  a  usable  format   • Solution  had  to  serve  many  millions  of  customers   and  products  while  maintaining  superior   scalability  and  performance
  71. 71. Walmart        SOLUTION • Brings  customers,  preferences,  purchases,   products  and  locations  into  a  graph  model   • Uses  connections  to  make  product   recommendations   • Solution  deployed  across  WalMart  
 divisions  and  websites
  72. 72. Global  Courier        BUSINESS  CASE World’s  largest  courier   480,000  employees
 €55  billion  in  revenue     Needed  new  
 B2C  and  B2B  parcel  routing   system  for  its  logistics   practice   Legacy  system  neither   supported  the  full  network   nor  the  shift  to  online   demands Needed  to  replace  aging  B2B  and  B2C  parcel  routing   system  whose  requirements  include:   • 24x7  availability   • Peak  loads  of  5M  parcels  per  day,  3K  per  second   • Support  for  complex  and  diverse  software  stack   • Predictable  performance  with  linear  scalability   • Daily  changes  to  logistics  networks   • Route  from  any  point  to  any  point   • Single  point  of  truth  for  entire  network
  73. 73. Global  Courier        SOLUTION Neo4j  provides  the  ideal  domain  fit  since  
 a  logistics  network  is  a  graph   • High  availability  and  performance  via  Neo4j   clustering   • Greatly  simplified  Cypher  queries  for  routing   versus  relational  SQL  queries   • Flexible  data  model  that  reflects  the  real   logistics  world  far  better  than  relational   • Easy-­‐to-­‐grasp  whiteboard-­‐friendly  model
  74. 74. eBay        BUSINESS  CASE C2C  and  B2C
 retail  network   Full  e-­‐commerce   functionality  for  individuals   and  businesses   Integrated  with  logistics   vendors  for  product   deliveries • Needed  an  offering  to  compete  with  
 Amazon  Prime   • Enable  customer-­‐selected  delivery  inside  
 90  minutes   • Calculate  best  route  option  in  real-­‐time   • Scale  to  enable  a  variety  of  services   • Offer  more  predictable  delivery  times
  75. 75. eBay  Now          SOLUTION • Acquired  UK-­‐based  Shutl.  a  leader   in  same-­‐day  delivery   • Used  Neo4j  to  create  eBay  Now   • 1000  times  faster  than  the  prior  
 MySQL-­‐based  solution   • Faster  time-­‐to-­‐market   • Improved  code  quality  with  
 10  to  100  times  less  query  code
  76. 76. Classmates        BUSINESS  CASE Online  yearbook   connecting  friends  from   school,  work  and  military   in  US  and  Canada   Founded  as  
 Memory  Lane  in  Seattle   Develop  new  social  networking  capabilities  to   monetize  yearbook-­‐related  offerings   • Show  all  the  people  I  know  in  a  yearbook   • Show  yearbooks  my  friends  appear  in  most  often   • Show  sections  of  a  yearbook  that  my  friends   appear  most  in   • Show  me  other  schools  my  friends  attended
  77. 77. Classmates        SOLUTION Neo4j  provides  a  robust  and  scalable  graph   database  solution   • 3-­‐instance  cluster  with  cache  sharding  and   disaster-­‐recovery   • 18ms  response  time  for  top  4  queries   • 100M  nodes  and  600M  relationships  in   initial  graph—including  people,  images,   schools,  yearbooks  and  pages   • Projected  to  grow  to  1B  nodes  and  6B   relationships
  78. 78. National  Geographic        BUSINESS  CASE Non-­‐profit  scientific  and   educational  institution   founded  in  1888   Covers  geography,   archaeology,  natural  science,   environment  and  historical   conservation   Journals,  online  media,  
 radio,  TV,  documentaries,  
 live  events  and  consumer   content  and  goods • Improve  poor  performance  of  PostgreSQL  app   • Increase  user  engagement  by  linking  to  100+  years   of  multimedia  content     • Improve  targeting  by  understand  subscribers’   interests  better   • Recommend  content  and  services  to  users  based   on  their  interests
  79. 79. National  Geographic        SOLUTION • Enabled  complex  real-­‐time  analytics  across   eight  million  users  and  a  century  of  content   • Delivered  robust  performance  by  eliminating   triple-­‐nested  SQL  joins     • Cross-­‐refers  users  among  content,  live  events,   travel,  goods  and  causes   • Neo4j  solution  much  less  cumbersome  
 and  easier  to  maintain  than  previous  
 SQL  system
  80. 80. Curaspan        BUSINESS  CASE Leader  in  patient   management  for  discharges   and  referrals   Manages  patient  referrals   4600+  health  care  facilities   Connects  providers,  payers   via  web-­‐based  patient   management  platform   Founded  in  1999  in   Newton,  Massachusetts • Improve  poor  performance  of  Oracle  solution   • Support  more  complexity  including  granular,  
 role-­‐based  access  control   • Satisfy  complex  Graph  Search  queries  by  discharge   nurses  and  intake  coordinators   Find  a  skilled  nursing  facility  within  n  miles  of  a   given  location,  belonging  to  health  care  group   XYZ,  offering  speech  therapy  and  cardiac  care,   and  optionally  Italian  language  services
  81. 81. Curaspan        SOLUTION • Met  fast,  real-­‐time  performance  demands   • Supported  queries  span  multiple  hierarchies   including  provider  and  employee-­‐permissions   graphs   • Improved  data  model  to  handle  adding  more   dimensions  to  the  data  such  as  insurance   networks,  service  areas  and  care  organizations   • Greatly  simplified  queries,  simplifying  
 multi-­‐page  SQL  statements  into  one  
 Neo4j  function
  82. 82. FiftyThree      BUSINESS  CASE Maker  of  Paper,  
 one  of  the  top  apps  
 in  Apple’s  App  Store,  with   millions  of  users   Based  in  New  York  City • Add  social  capabilities  to  digital-­‐paper  app   • Support  social  collaboration  across  millions  of   users  in  new  Mix  app   • Enable  seamless  interaction  between  social   and  content-­‐asset  networks   • Ensure  new  apps  are  robust,  scalable  and  fast
  83. 83. FiftyThree        SOLUTION • Neo4j  data  model  ideal  for  social  network,  content   management  and  access  control   • Users  create,  publish  and  share  designs  simply   • Easy  to  develop  and  evolve  Neo4j-­‐based  app   • Integrates  well  with  FiftyThree  EC2  architecture   See  the  Neo4j  solution  in  action   Betting  the  Company  (Literally)  on  a  Graph  Database
 http://aseemk.com/talks/neo4j-­‐lessons-­‐learned#/ App  Store  Editor’s  Choice
 2012  iPad  App  of  Year
 Apple  Best  Apps  of  2014
  84. 84. Users  Love  Neo4j jQuery  Inventor Heroku  Founder
  85. 85. THANK  YOU

×