We live in a profoundly connected world. From supply chains to payment networks to digital business and complex portfolios, our ability to understand and navigate not just data, but relationships inside the data, play an increasingly important role in all aspects of business. Highly connected value chains that generate massive volumes of connected data create an opportunity for graph analysis, which Gartner describes as "the single most single most effective competitive differentiator for organizations pursuing data-driven operations and decisions." This talk will introduce the power of graph databases and share how the latest IBM Power Systems offerings featuring the POWER8 processor and CAPI-attached Flash enable unique scaling, performance and price-performance advantages for Neo4j workloads.
Handwritten Text Recognition for manuscripts and early printed texts
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
1. ON
ON
Neo4j on IBM POWER8
Philip Rathle
VP of Products
Neo Technology
Keshav Ranganathan
Senior Offering Manager, Data & Analytics Solutions
IBM POWER Systems
2. ONNeo4j on IBM POWER Systems
Key Takeaways:
• Why Graphs & Why Now?
• Unique Characteristics of Graph Data &
Architecture Implications
• IBM Power Systems Overview
• Why deploy Neo4j on IBM Power Systems
• Q&A
2
3. ON
ON
Neo4j on IBM Power Systems
Solves Massive-Scale,
Previously Unsolvable Problems
A paradigm shift accelerating time to
insight and real-time decision making…
Bringing big data insights into action
11. ON
“Graph analysis is possibly the single most effective competitive
differentiator for organizations pursuing data-driven operations
and decisions after the design of data capture.”
By the end of 2018, 70% of leading organizations will have one or
more pilot or proof-of-concept efforts underway utilizing graph
databases.
Analyst Perspective
“Forrester estimates that over 25% of enterprises will be using
graph databases by 2017”
IT Market Clock for Database Management Systems, 2014
https://www.gartner.com/doc/2852717/it-market-clock-database-management
TechRadar™: Enterprise DBMS, Q1 2014
http://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-/E-RES106801
Making Big Data Normal with Graph Analysis for the Masses, 2015
http://www.gartner.com/document/3100219
11
12. ON
100 Best in Show 2015
Magic Quadrant for
Operational DBMS 2015
Neo4j: World’s Leading Graph Database
Technology of the Year
2015, 2014
100 Companies that
Matter the Most in Data
2015
Neo4j named most popular
Graph Database, 2015
Neo4j declared
“Champion”, 2015 & 2016
“Most Popular and Widely
Deployed Database”
Winner of NoSQL: Graph
Database Technologies
DB-Engines Rankings
Source: http://db-engines.com/en/ranking/graph+dbms
12
14. ON
Queries can take non-sequential,
arbitrary paths through data
Real-time queries need speed and
consistent response times
Queries must run reliably
with consistent results
Q
A single query can
touch a lot of data
15
Relationship Queries Strain
Traditional Architectures
15. ON
UNIFIED, IN-MEMORY MAP
Lightning-fast
queries due to
replicated in-memory
architecture +
index-free adjacency
MACHINE 1 MACHINE 2 MACHINE 3
Slow queries
due to
index lookups +
network hops
Neo4j on IBM POWER8
Using Other NoSQL to Join Data
Q R
Q R
16
Data Relationship Queries
16. ONTraversal Speeds
• Realistic retail dataset from Amazon
• Social recommendation (Java procedure) equivalent to:
MATCH (you)-[:BOUGHT]->(something)<-[:BOUGHT]-(other)-[:BOUGHT]->(reco)
WHERE id(you)={id}
RETURN reco
Threads Hops/second
1 3-4M
10 17-29M
20 34-50M
30 36-60M
17
17. ONWrite Scale
• Import highly connected
Friendster dataset
• 1.8 billion relationships
takes around 20 minutes
• That is 1M
writes/second!
18
18. ON
Good News for Real-Time, In-Memory Graph Queries:
Big RAM is Eating Big Data
19
19. ONValue from Data Relationships: Top Use Cases
Internal Applications
Fraud Detection
Master Data Management
Network and
IT Operations
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
20
20. ONSolving Massive-Scale Challenges: Recommendations
21
People, Places, Things +
Interests +
Transactions +
Activity
Each requires a new & higher
level of scaling
21. ONSolving Massive-Scale Challenges: Fraud Detection
Estimated cost in 2014 $16.31B 1
Fraud and the costs to prevent fraud
are up 94% year over year 2
62% of companies subject to
payment fraud 3
Nearly 1 out of 4 declined
transactions are false positives 4
22 1 The Nilson Report, 2015; 2, 4 2015 LexisNexis True Cost of Fraud Survey; 3 2015 AFP Payments Fraud and Control Survey
26. ON
4X
Threads per core*
4X
Mem. Bandwidth*
4X
More cache* @
Lower Latency
SMT=Simultaneous Multi-Threading
OLTP = On-Line Transaction Processing
These design decisions result in best performance for data centric workloads like:
Database, NoSQL, Big Data Analytics, OLTP
POWER8: Designed for data to deliver breakthrough
performance
POWER8
SMT8
x86
Hyperthread
Parallel Processing
POWER8
pipe
Data flow
x86 pipe POWER8
x86 POWER8 +
OpenPOWER
x86
27
27. ON
250 Worldwide members
of
30 Hardware and
technology providers
100+ Collaborative
2,500+
Linux ISVs developing
on POWER
100,000+
Open source packages
innovations under way
The POWER of an open ecosystem
28
29. ON
Power Systems Portfolio – Enterprise and Scale-
out offerings
Offering OS capability Positioning in the Linux portfolio
Scale-up
E880
E870
E850
Equally run AIX, IBM i and
Linux with IFLs
Enterprise systems
Leadership Performance and Reliability
Utilization Guarantee (PowerVM – 70%/80%)
Flexible, dynamic Capacity on Demand & Enterprise Pools
Scale-Out
S824
S822
S814
Equally Run AIX, IBM i
and Linux
Scale out Systems
Utilization Guarantee (PowerVM – 65)
High performance, availability and resiliency
L line
S824L
S822L
S812L
Linux Only
Scale out Linux Systems
Price/Performance Leadership vs. X86
PowerVM, KVM
LC line
S812LC
S822LC BD
S822LC HPC
S821LC
Linux Only
Cluster-optimized Linux Systems
Lowest cost Power System
KVM
New
30
30. ON
- Design and cost
optimized for
deployments of
multiples (cloud and
cluster)
- Broad number of
optimal solutions
- Co-Designed with the
OpenPOWER Ecosystem Supported by Canonical
IBM Support
Community / 3rd
Party Support
running
The LC Line
The L Line
PurePower
Enterprise
& IFLs
- Enterprise level RAS
for single system
deployments
- Solutions for Big
Data & Analytics
- Converged
infrastructure
offering
- Rapid time to value
and simplicity of
management
- Enterprise level
robustness and IFL
capability
- Solution editions for
in memory
databases
- (HANA, DB2 BLU)
- Hosted cloud and
hybrid cloud
solutions
- Rapid deployments
and POCs
The IBM Power Systems Linux Portfolio
Pipeline of innovation
• Broad Linux portfolio deliver
all your Linux deployment
needs
• Expanding LC portfolio with
two servers for data centric
applications and 2nd
generation HPC server
POWER8 is designed for the Big Data era and delivers
price-performance leadership to the Linux Market!
31
31. ONPOWER8 CAPI
Coherent Accelerator Processor Interface
Custom
Hardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable Hardware
Application Accelerator
• Specific system SW, middleware, or user application
• Written to durable interface provided by PSL
POWER8
PCIe Gen3
Transport for encapsulated messages
Processor Service Layer (PSL)
• Present robust, durable interfaces to applications
• Offload complexity / content from CAPP
Virtual Addressing
• Accelerator can work with same memory addresses that
the processors use
• Pointers de-referenced same as the host application
• Removes OS & device driver overhead
Hardware Managed Cache Coherence
• Enables the accelerator to participate in “Locks” as a
normal thread Lowers Latency over IO communication
model
32
http://opencapi.org/
32. ONWhy CAPI is Better than Traditional PCIe
CAP
P
PCI
e
Power Processor
FPGA
AFU
IBM Supplied POWER
Service Layer
Typical I/O Model Flow
Flow with a Coherent Model
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Int
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
Advantages of Coherent Attachment Over I/O Attachment
• Virtual Addressing & Data Caching
– Shared Memory
– Lower latency for highly referenced data
• Easier, More Natural Programming Model
– Traditional thread level programming
– Long latency of I/O typically requires
restructuring of application
• Enables Applications Not Possible on I/O
– Pointer chasing, etc…
33
Total ~13µs for data prep
Total 0.36µs
33. ON
IBM Data Engine for NoSQL is an integrated platform for large and fast growing NoSQL data stores. It
builds on the CAPI capability of POWER8 systems and provides super-fast access to large flash
storage capacity. It delivers high speed access to both RAM and flash storage which can result in
significantly lower cost, and higher workload density for NoSQL deployments than a standard RAM-
based system. The solution offers superior performance and price-performance to scale out x86
server deployments that are either limited in available memory per server or have flash memory with
limited data access latency.
IBM Data Engine for NoSQL
Cost Savings for In-Memory NoSQL Data Stores
Up to 57TB of extended memory with one POWER8 server + CAPI attach FLASH
Power S822L /
S812L
Flash System 900
Power S822L / S812L / S822 LC
NEW
External Flash Configuration Integrated Flash Configuration
Up to 8TB of super-fast storage tier on one POWER8 server
34
34. ONCAPI Unlocks the Next Level of Performance for Flash
Identical hardware with 3 different
paths to data
FlashSystem
Conventional
I/O (FC) CAPI - E
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
Conventional CAPI - I CAPI - E
IOPS per Hardware Thread
0
20
40
60
80
100
120
140
160
180
200
Conventional CAPI - I CAPI - E
Latency (microseconds)
IBM POWER S822L
>3x better IOPS
per HW thread
Lower latency
35
CAPI – I : Integrated CAPI Flash Card
CAPI – E: CAPI attached External Flash
CAPI - I
35. ON
POWER8 with CAPI enabled acceleration running Neo4j delivers
1.82X the performance versus Intel Broadwell servers with NVMe
711
390
0
100
200
300
400
500
600
700
800
POWER8 x86
Representativemixedworkload
Throughput
IBM Power S822LC (20c/160t) x86 Broadwell Server
(24c/48t)
82%
More
Throughput
• Accelerate Graph Databases with CAPI on POWER8
• Real-World mixed graph transaction workload running Neo4j
on IBM Power S822LC server delivers 1.82X the throughput
versus Intel Xeon E5-2650 v4 server
– POWER8 (20 cores / 128 GB): 711 Ops/sec
– Intel Xeon E5 2650 v4 processor (24 cores / 128
GB): 390 Ops/sec
• Based on IBM internal testing of single system and OS image running mixed graph transaction s based on 200 GB data model internal IBM and Neo4j workload. Conducted under laboratory condition, individual result can vary
based on workload size, use of storage subsystems & other conditions. Data as of October 19, 2016
• IBM Power System S822LC; 20 cores (2 x 10c chips) / 160 threads, POWER8; 128 GB memory (16 x 8GB), 1.6 TB CAPI NVMe adapter , Neo4j 3.0.4, Ubuntu 16.04. Competitive stack: HP Proliant DL380 Gen9; 24 cores (2 x 12c chips) /
48 threads; Intel E5-2650 v4; 128 GB memory,(16 x 8GB), 1.6 TB NVMe adapter, Neo4j 3.0.4, Ubuntu 15.10.
36
36. ON
POWER8 with CAPI enabled acceleration running Neo4j delivers
1.61X the price-performance versus Intel Xeon E5-2650 v4 with NVMe
IBM Power
S822LC
(20-core, 128GB)
HP
DL380 Gen9
(24-core, 128GB)
Server price*
-3-year warranty
$19,123 $16,911
Mixed graph
transaction Workload
(total operations per second)
711 390
1.61X
Price-Performance
1.82X
Performance
per Server
• Based on IBM internal testing of single system and OS image running mixed graph transaction s based on 200 GB data model internal IBM and Neo4j workload. Conducted under laboratory condition, individual result can vary
based on workload size, use of storage subsystems & other conditions. Data as of October 19, 2016
• IBM Power System S822LC; 20 cores (2 x 10c chips) / 160 threads, POWER8; 128 GB memory (16 x 8GB), 1.6 TB CAPI NVMe adapter , Neo4j 3.0.4, Ubuntu 16.04. Competitive stack: HP Proliant DL380 Gen9; 24 cores (2 x 12c chips) /
48 threads; Intel E5-2650 v4; 128 GB memory,(16 x 8GB), 1.6 TB NVMe adapter, Neo4j 3.0.4, Ubuntu 15.10.
* Pricing is based bundled pricing for S822LC with Integrated CAPI Flash card (IBM ordering system) and HP Web price https://h22174.www2.hp.com/SimplifiedConfig/Index
37
37. ON
ON
Neo4j on IBM Power Systems
Solves Massive-Scale,
Previously Unsolvable Problems
A paradigm shift accelerating time to
insight and real-time decision making…
Bringing big data insights into action
38. ONWhere Do I Go Next?
If you think that you have a graph problem
Let’s qualify your use case
• neo4j.com/contact-us
• Info@neotechnology.com
• Your local IBM representative
• OpenDB@us.ibm.com
39
Learn more…
• About graphs & Neo4j @ http://neo4j.com
• Use cases / Case studies / Webinars / Training / Boot
camp for your organization or team
• About IBM Power Systems @ http://www-
03.ibm.com/systems/power
• About IBM Data Engine for NoSQL @ http://www-
03.ibm.com/systems/power/solutions/bigdata-
analytics/data-engine-nosql/
Notes de l'éditeur
Agenda:
Context & Value Proposition
Graph Database Uses, Architectures & Considerations
IBM POWER Overview
Neo4j and IBM POWER Systems Differentiation
Q&A
But also your problem… whatever size it may be.
R&D is ongoing in the lab. Current expectations are improvements of 2-4x over Intel with OpenPower, and CAPI Flash.
And deriving value from data-relationships is exactly what some of the most successful companies in the world have done.
Google created perhaps the most valuable advertising system of all time on top of their search-enginge, which is based on relationships between webpages.
Linkedin created perhaps the most valuable HR-tool ever based on relationships amongst professional
And this is also what pay-pal did, creating a peer-to-peer transaction service, based on relationships.
Coherent Accelerator Processor Interface
IBM Data Engine for NoSQL
If you zoom into the previous picture, this is what you see… an individual, at the center of whose nervous system is a brain, that is an incredible network
The brain in fact is a network comprised of around 86 billion neurons and over 100 trillion synapses.
Our intelligence lies in the connections: the connectome. And when we process information, currents move along these channels.
Fixa till den där effekten
Continuing on – We have receive very solid validation from these industry watchers that the market we are pursuing represents a huge opportunity and being anointed as the leader in this market that is likely to grow at this rate is very exciting.
Neo4j Perspective = Graph First + Community First
We have received recognition for a broad cross section of the industry. On one hand our technology continues to be seen as innovative and impactful in the database management and particularly the NoSQL space. While on the other hand, more and more analyst firms are recognizing us as the most impactful player in the segments they are focused on which is the enterprise segment.
Macro/Portfolio View
Key virtuous cycle that occurs in data, and it’s a common architecture pattern
One of the things I’d like you to take away today is a perspective on the use of big data, and relationships in data
Similar to the brain, making decisions
We’re so often used to making calculations in batch
Neo4j on POWER8 enables disruptive innovation by bringing the power of graph analysis into your customer and end-user facing business applications
You might ask: why do you need a special database?
And in SQL, it’s a 20-table join
1M+ pointer-chasing operations per second per hardware thread
Neo4j best in class on traversal speed and scaling reads.
Super important macro-level trend:
Jure Leskovec from Stanford: analysis from GRADES 2016
Over the past 10 years there is indication that in data analytics size of RAM has been growing 50% every year while size of the data only by 20%.
“Maybe your data increases faster. Maybe you think data is bigger and increasing faster. But facts should trump opinions” – Szilard Pafka
Let’s look at the memory.
100G commodity, soon to be high-end laptop territory.
10T off the shelf (Jure’s group bought a 12TB machine)
1P not yet available in a commodity single machine, but 50% per year growth mean’s it’s not far away (~7 years)
Real-time Recommendations: Power real-time recommendations that incorporate current session and historical data, even from multiple data sources.
Fraud Detection:Detect and stop fraud as it happens with real-time analysis of data relationships.
Just like Recommendations: People, Places, Things + Interests + Transactions + Activity
POWER8 compared to Haswell EX
Sources: Haswell EX: http://ark.intel.com/products/84685/Intel-Xeon-Processor-E7-8890-v3-45M-Cache-2_50-GHz
POWER8: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=BR&infotype=PM&appname=STGE_PO_PO_USEN&htmlfid=POB03046USEN
Key Message: Our leading IO and Memory capabilities power the need to fast data access and movement across a wide range of BDA applications.
POWER8 is the first microprocessor designed for Big Data and Analytics. In IBM we have the advantage of having a hardware and software organization that can work together to optimize the entire solution stack.
When systems are designed for big data, there are a couple of key attributes that are important to create a balanced system design.
First having the processing capability, second having the memory space, the workspace, and the third is having the bandwidth, the ability to move the information in and out of the system at the rapid speeds required.
We’re delivering 4 times more threads per core vs. commodity infrastructure. We can easily support a growing number of users who need reports, or to perform ad hoc analytics. This is because the processor can run more concurrent queries in parallel faster, across multiple cores with more threads per core.
We’re delivering 3 times more memory bandwidth. Increased memory bandwidth to access up to 1 TB of memory for data operations and enlarged cache in every processor. This delivers the levels of performance your teams need to make decisions in real time.
We’re delivering faster IO to ingest, move and access large volumes of data so that analytics results are available faster.
Power Systems provide the capabilities needed to handle the varying analytics initiatives your business requires.
Broad range of data and analytics – from operational to computational to business analytics, as well as cognitive solutions leveraging IBM Watson technology, Power Systems are optimized for performance and can scale to support demanding and growing workloads.
These solutions help you capitalize on the currency of data by finding business insights faster and more efficiently.
28
31
36
Scale up and/or out depending on your application
Performance and Scale as YOU Need