Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Airline reservations and routing: a graph use case

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 21 Publicité

Airline reservations and routing: a graph use case

Télécharger pour lire hors ligne

We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved.

Speaker
Jason Plurad, Open Source Developer and Advocate, IBM
Chin Huang, Software Engineer, IBM

We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved.

Speaker
Jason Plurad, Open Source Developer and Advocate, IBM
Chin Huang, Software Engineer, IBM

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Airline reservations and routing: a graph use case (20)

Publicité

Plus par DataWorks Summit (20)

Plus récents (20)

Publicité

Airline reservations and routing: a graph use case

  1. 1. Airline Reservations and Routing: A Graph Use Case Jason Plurad Chin Huang DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  2. 2. Pilots 2DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Jason Plurad is a software developer in IBM Digital Business Group. He develops open source software and builds open communities in the big data and analytics space, with a current focus on graph databases and graph analytics. He is a Technical Steering Committee member and committer on JanusGraph and Apache TinkerPop. Chin Huang is a software engineer at the IBM Open Technologies and Performance. He has worked on various enterprise and open source projects. His current focus is JanusGraph and node.js development and performance characterization.
  3. 3. How Did We Get Here? Jason • Raleigh (RDU) • Detroit (DTW) • Amsterdam (AMS) • Berlin (TXL) Chin • San Francisco (SFO) • Copenhagen (CPH) • Berlin (TXL) DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  4. 4. Graphs are not new 4DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  5. 5. Graph Data Use Cases 5 Social network analysis Configuration management database Master data management Recommendation engines Knowledge graphs Internet of things Cyber security attack analysis DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation C A B D
  6. 6. Property Graph 6DOC ID / Month XX, 2018 / © 2018 IBM Corporation RDU DTW AMS TXLSFO CPH Type: vertex Label: airport Name: Berlin Tegel Code: TXL City: Berlin Country: Germany Type: edge Label: route Flight: 343 Distance: 501 Depart: 13:05 Arrive: 14:57
  7. 7. Gremlin: Graph Traversal Language 7 What is the shortest path to Berlin? DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Apache TinkerPop https://tinkerpop.apache.org > g.V(rdu). repeat( out('route').simplePath() ). until( has('code’, TXL') ). limit(5). path().by('code'). toList() ==> [RDU, JFK, TXL] ==> [RDU, LAX, TXL] ==> [RDU, MIA, TXL] ==> [RDU, YYZ, TXL] ==> [RDU, SFO, TXL]
  8. 8. JanusGraph DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation 8 JanusGraph Maintainer The Linux Foundation License Apache Releases 0.3.0 planned 2Q 2018 https://janusgraph.org • Established in January 2017 • Fork of TitanDB • Scalable graph database distributed on multi-machine clusters with pluggable storage and indexing • Vendor-neutral, open community with open governance • Founders: Expero, Google, Grakn, Hortonworks, IBM • Members: Amazon, Huawei, Netflix, Orchestral Developments, Seeq, Uber • In Production: Celum, Finc, G- Data, IBM Cloud, Seeq
  9. 9. JanusGraph Architecture 9DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation http://docs.janusgraph.org/latest/arch-overview.html
  10. 10. Graph database storage backends: Performance evaluation Graph use case: Air travel reservation 10DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  11. 11. Performance Test Environment 11 Server spec • Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory • CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz • Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter • Disk: 720 GB SSD, RAID 5 • Operating system: Ubuntu 16.04.2 LTS Public tools • jMeter - load testing tool • nmon, nmon analyser - system performance monitor and analyze tool • VisualVM - all-in-one Java troubleshooting/profiling tool • GCeasy - garbage collection log analysis tool • Prometheus and grafana – monitoring dashboard DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  12. 12. JanusGraph Utility Tools 12 How about graph data in volume? • Lack of existing data or unavailable for performance evaluation • What are the performance characteristics for various volumes • Graph Data Generator generates graph data in different sizes and shapes, so you can easily simulate real data and performance How to manage graph schema? • Lack of graph schema management tools • Graph schemas may change for optimal performance • Graph Schema Loader enables you to quickly load and update schema definitions in JanusGraph How to massively load data into a graph database? • Lots of RDBMS support data export to CSV files • I have millions/billions of records! • Data Batch Importer allows you to fully utilize system resources to import data in CSV files into JanusGraph Open source code: https://github.com/IBM/janusgraph-utils DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  13. 13. Performance Test Topology 13DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla Cassandra HBase + HDFS + ZooKeeper Scylla JanusGraph Database Cluster Load injector queryinsert, update
  14. 14. Performance Evaluation: Insert Vertices 14DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 40 mil vertices in total • 2 properties for each vertex • Insert scenario • Fully utilize the injectors to generate the loading against the databases
  15. 15. Performance Evaluation: Insert Edges 15DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation • 30 mil edges in total • 1 property for each edge • Query and update scenario
  16. 16. Performance Evaluation: Graph Traversal 16DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  17. 17. Lessons Learned: Storage Backends 17 Cassandra • Cluster bootstrapping takes more efforts • Smaller memory footprint HBase • Uneven CPU% caused by hot regions • Need to carefully configure read and write cache settings for better throughput Scylla • Easy clustering – adding multiple nodes at once • Well self-tuned but also lacks documentation • Even load distributed • Fully utilize system resources • CPU utilization misrepresents real loads • Nice monitoring dashboard – prometheus + grafana • Works with existing Cassandra utility clients DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  18. 18. Flight Search Use Case 18 Flight search •All flights from airport A to airport B on a given date and time •# of stops: non-stop, one-stop, two-stop… Data spec •600+ airports, 350K+ flight schedules Graph Model DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation Vertex: Airport Airport code Vertex: Country Country name Edge: Flight Schedule Flight # Departure date Arrival date
  19. 19. Lessons Learned: Flight Search 19 Model your graph database for performance • Design data model for your use cases! • Understand workload read/write ratio • What kind of queries you want to support? How many levels deep into a traversal? • Consider denormalization… • Design and use various indexes supported in JanusGraph Try different approaches to get results back faster • Use pre-processor in custom app • Use gremlin queries, applying filters as early as possible in a query to limit the number of traversals • Use groovy methods as programmable extension Fine-tune for your workloads and systems • JanusGraph supports storage and index backends therefore tune your backends! • JanusGraph server configurations, such as threadPoolBoss and threadPoolWorker • JVM configurations, such as Xms (initial and minimum Java heap size) and Xmx (maximum Java heap size) You don’t want to see the annoying java.lang.OutOfMemoryError exceptions or long and slower GCs. • Use multiple threads and/or instances to your system’s capacity • Consider cloud and auto-scaling • Be thorough and be patient because it will take a few iterations! DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  20. 20. 20 Thank you compose.com/databases/janusgraph twitter.com/pluradj twitter.com/chinhuang007 github.com/IBM/janusgraph-utils developer.ibm.com/code/patterns DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation
  21. 21. 21DataWorks Summit Berlin / April 18, 2018 / © 2018 IBM Corporation

Notes de l'éditeur

  • We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved.

×