2. About
Me
• Max
De
Marzi
-‐
Neo4j
Field
Engineer
• My
Blog:
http://maxdemarzi.com
• Find
me
on
Twitter:
@maxdemarzi
• Email
me:
maxdemarzi@gmail.com
• GitHub:
http://github.com/maxdemarzi
3. Big
Data
-‐
What
is
it
good
for?
• Absolutely
Nothing!
• Benchmarks
Is
this
performing
better
then
that?
Yes,
why?
Uh.
• Recommendations
You
should
buy
this
right
now.
• Predictions
You
will
probably
buy
this.
8. Collaborative
Filtering
Recommendations
• Step
1:
Collect
User
Behavior
• Step
2:
Find
similar
Users
• Step
3:
Recommend
Behavior
taken
by
similar
users
• Example:
People
with
similar
musical
tastes
10. Using
Relationships
for
Recommendations
Content-‐based
filtering
Recommend
items
based
on
what
users
have
liked
in
the
past
Collaborative
filtering
Predict
what
users
like
based
on
the
similarity
of
their
behaviors,
activities
and
preferences
to
others
Movie
Person
Person
RATED
SIMILARITY
rating:
7
value:
.92
12. Benefits
of
Real-‐Time
Recommendations
Online
Retail
• Suggest
related
products
and
services
• Increase
revenue
and
engagement
Media
and
Broadcasting
• Create
an
engaging
experience
• Produce
personalized
content
and
offers
Logistics
• Recommend
optimal
routes
• Increase
network
efficiency
13. Challenges
for
Real-‐Time
Recommendations
Make
effective
real-‐time
recommendations
• Timing
is
everything
in
point-‐of-‐touch
applications
• Base
recommendations
on
current
data,
not
last
night’s
batch
load
Process
large
amounts
of
data
and
relationships
for
context
• Relevance
is
king:
Make
the
right
connections
• Drive
traffic:
Get
users
to
do
more
with
your
application
Accommodate
new
data
and
relationships
continuously
• Systems
get
richer
with
new
data
and
relationships
• Recommendations
become
more
relevant
14. Relational
vs.
Graph
Models
Relational
Model Graph
Model
RATED
RATED
RATED
MAX
Person MovieRatings
MAX
Terminator
Toy
Story
Titanic
15. Cypher
Query
Language
MATCH
(:Person
{
name:“Dan”}
)
-‐[:KNOWS]-‐>
(:Person
{
name:“Ann”}
)
KNOWS
Dan Ann
Label Property Label Property
Node Node
16. MATCH
(boss)-‐[:MANAGES*0..3]-‐>(sub),
(sub)-‐[:MANAGES*1..3]-‐>(report)
WHERE
boss.name
=
“John
Doe”
RETURN
sub.name
AS
Subordinate,
count(report)
AS
Total
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
QuerySQL
Query
20. Cypher
Query:
Movie
Recommendation
MATCH
(watched:Movie
{title:"Toy
Story”})
<-‐[r1:RATED]-‐
()
-‐[r2:RATED]-‐>
(unseen:Movie)
WHERE
r1.rating
>
7
AND
r2.rating
>
7
AND
watched.genres
=
unseen.genres
AND
NOT(
(:Person
{username:”maxdemarzi"})
-‐[:RATED|WATCHED]-‐>
(unseen)
)
RETURN
unseen.title,
COUNT(*)
ORDER
BY
COUNT(*)
DESC
LIMIT
25
What
are
the
Top
25
Movies
• that
I
haven't
seen
• with
the
same
genres
as
Toy
Story
• given
high
ratings
• by
people
who
liked
Toy
Story
22. Cypher
Query:
Ratings
of
Two
Users
MATCH
(p1:Person
{name:'Michael
Sherman’})
-‐[r1:RATED]-‐>
(m:Movie),
(p2:Person
{name:'Michael
Hunger’})
-‐[r2:RATED]-‐>
(m:Movie)
RETURN
m.name
AS
Movie,
r1.rating
AS
`M.
Sherman's
Rating`,
r2.rating
AS
`M.
Hunger's
Rating`
What
are
the
Movies
these
2
users
have
both
rated
24. Cypher
Query:
Cosine
Similarity
MATCH
(p1:Person)
-‐[x:RATED]-‐>
(m:Movie)
<-‐[y:RATED]-‐
(p2:Person)
WITH
SUM(x.rating
*
y.rating)
AS
xyDotProduct,
SQRT(REDUCE(xDot
=
0.0,
a
IN
COLLECT(x.rating)
|
xDot
+
a^2))
AS
xLength,
SQRT(REDUCE(yDot
=
0.0,
b
IN
COLLECT(y.rating)
|
yDot
+
b^2))
AS
yLength,
p1,
p2
MERGE
(p1)-‐[s:SIMILARITY]-‐(p2)
SET
s.similarity
=
xyDotProduct
/
(xLength
*
yLength)
Calculate
it
for
all
Person
nodes
with
at
least
one
Movie
between
them
26. Cypher
Query:
Your
nearest
neighbors
MATCH
(p1:Person
{name:'Grace
Andrews’})
-‐[s:SIMILARITY]-‐
(p2:Person)
WITH
p2,
s.score
AS
sim
ORDER
BY
sim
DESC
LIMIT
5
RETURN
p2.name
AS
Neighbor,
sim
AS
Similarity
Who
are
the
• top
5
Persons
and
their
similarity
score
• ordered
by
similarity
in
descending
order
• for
Grace
Andrews
28. Cypher
Query:
k-‐NN
Recommendation
MATCH
(m:Movie)
<-‐[r:RATED]-‐
(b:Person)
-‐[s:SIMILARITY]-‐
(p:Person
{name:'Zoltan
Varju'})
WHERE
NOT(
(p)
-‐[:RATED]-‐>
(m)
)
WITH
m,
s.similarity
AS
similarity,
r.rating
AS
rating
ORDER
BY
m.name,
similarity
DESC
WITH
m.name
AS
movie,
COLLECT(rating)[0..3]
AS
ratings
WITH
movie,
REDUCE(s
=
0,
i
IN
ratings
|
s
+
i)*1.0
/
LENGTH(ratings)
AS
recommendation
ORDER
BY
recommendation
DESC
RETURN
movie,
recommendation
LIMIT
25
What
are
the
Top
25
Movies
• that
Zoltan
Varju
has
not
seen
• using
the
average
rating
• by
my
top
3
neighbors
30. Recommend
Jobs
to
Job
Seekers
What
connects
them?
• location
• skills
• education
• experience
31. Cypher
Query:
Job
Recommendation
What
are
the
Top
10
Jobs
for
me
• that
are
in
the
same
location
I’m
in
• for
which
I
have
the
necessary
qualifications
32. Job
Recommendation
Results
Perfect
Candidate
for
100%
matches
• missing
qualifications
can
be
added
quickly
• might
encourage
exaggerated
resumes
33. Just
one
tiny
itsy
bitsy
problem
Job
Boards
get
paid
by
• Number
of
Applicants
to
a
Job
• Wholesale
Resume
sales
• Selling
your
data
34. Recommend
Love
Find
your
soulmate
in
the
graph
• Are
they
energetic?
• Do
they
like
dogs?
• Have
a
good
sense
of
humor?
• Neat
and
tidy,
but
not
crazy
about
it?
What
are
the
Top
10
Potential
Mates
for
me
• that
are
in
the
same
location
• are
sexually
compatible
• have
traits
I
want
• want
traits
I
have
48. Hacker
News
Recommendations
• Which
stories
should
I
read?
• Which
users
should
I
follow?
• What
else
should
I
be
interested
in?
• Who
seems
to
know
a
lot
about
X?
• Etc.
49. GraphAware
Recommendation
Framework
• Ability
to
trade
off
recommendation
quality
for
speed
• Ability
to
pre-‐compute
recommendations
• Built-‐in
algorithms
and
functions
• Ability
to
measure
recommendation
quality
• Ability
to
easily
run
in
A/B
test
environments
51. Walmart
BUSINESS
CASE
World’s
largest
company
by
revenue
World’s
largest
retailer
and
private
employer
SF-‐based
global
e-‐commerce
division
manages
several
websites
Found
in
1969
Bentonville,
Arkansas
• Needed
online
customer
recommendations
to
keep
pace
with
competition
• Data
connections
provided
predictive
context,
but
were
not
in
a
usable
format
• Solution
had
to
serve
many
millions
of
customers
and
products
while
maintaining
superior
scalability
and
performance
52. Walmart
SOLUTION
• Brings
customers,
preferences,
purchases,
products
and
locations
into
a
graph
model
• Uses
connections
to
make
product
recommendations
• Solution
deployed
across
WalMart
divisions
and
websites
53. Global
Courier
BUSINESS
CASE
World’s
largest
courier
480,000
employees
€55
billion
in
revenue
Needed
new
B2C
and
B2B
parcel
routing
system
for
its
logistics
practice
Legacy
system
neither
supported
the
full
network
nor
the
shift
to
online
demands
Needed
to
replace
aging
B2B
and
B2C
parcel
routing
system
whose
requirements
include:
• 24x7
availability
• Peak
loads
of
5M
parcels
per
day,
3K
per
second
• Support
for
complex
and
diverse
software
stack
• Predictable
performance
with
linear
scalability
• Daily
changes
to
logistics
networks
• Route
from
any
point
to
any
point
• Single
point
of
truth
for
entire
network
54. Global
Courier
SOLUTION
Neo4j
provides
the
ideal
domain
fit
since
a
logistics
network
is
a
graph
• High
availability
and
performance
via
Neo4j
clustering
• Greatly
simplified
Cypher
queries
for
routing
versus
relational
SQL
queries
• Flexible
data
model
that
reflects
the
real
logistics
world
far
better
than
relational
• Easy-‐to-‐grasp
whiteboard-‐friendly
model
55. eBay
BUSINESS
CASE
C2C
and
B2C
retail
network
Full
e-‐commerce
functionality
for
individuals
and
businesses
Integrated
with
logistics
vendors
for
product
deliveries
• Needed
an
offering
to
compete
with
Amazon
Prime
• Enable
customer-‐selected
delivery
inside
90
minutes
• Calculate
best
route
option
in
real-‐time
• Scale
to
enable
a
variety
of
services
• Offer
more
predictable
delivery
times
56. eBay
Now
SOLUTION
• Acquired
UK-‐based
Shutl.
a
leader
in
same-‐day
delivery
• Used
Neo4j
to
create
eBay
Now
• 1000
times
faster
than
the
prior
MySQL-‐based
solution
• Faster
time-‐to-‐market
• Improved
code
quality
with
10
to
100
times
less
query
code
57. Classmates
BUSINESS
CASE
Online
yearbook
connecting
friends
from
school,
work
and
military
in
US
and
Canada
Founded
as
Memory
Lane
in
Seattle
Develop
new
social
networking
capabilities
to
monetize
yearbook-‐related
offerings
• Show
all
the
people
I
know
in
a
yearbook
• Show
yearbooks
my
friends
appear
in
most
often
• Show
sections
of
a
yearbook
that
my
friends
appear
most
in
• Show
me
other
schools
my
friends
attended
58. Classmates
SOLUTION
Neo4j
provides
a
robust
and
scalable
graph
database
solution
• 3-‐instance
cluster
with
cache
sharding
and
disaster-‐recovery
• 18ms
response
time
for
top
4
queries
• 100M
nodes
and
600M
relationships
in
initial
graph—including
people,
images,
schools,
yearbooks
and
pages
• Projected
to
grow
to
1B
nodes
and
6B
relationships
59. National
Geographic
BUSINESS
CASE
Non-‐profit
scientific
and
educational
institution
founded
in
1888
Covers
geography,
archaeology,
natural
science,
environment
and
historical
conservation
Journals,
online
media,
radio,
TV,
documentaries,
live
events
and
consumer
content
and
goods
• Improve
poor
performance
of
PostgreSQL
app
• Increase
user
engagement
by
linking
to
100+
years
of
multimedia
content
• Improve
targeting
by
understand
subscribers’
interests
better
• Recommend
content
and
services
to
users
based
on
their
interests
60. National
Geographic
SOLUTION
• Enabled
complex
real-‐time
analytics
across
eight
million
users
and
a
century
of
content
• Delivered
robust
performance
by
eliminating
triple-‐nested
SQL
joins
• Cross-‐refers
users
among
content,
live
events,
travel,
goods
and
causes
• Neo4j
solution
much
less
cumbersome
and
easier
to
maintain
than
previous
SQL
system
61. Curaspan
BUSINESS
CASE
Leader
in
patient
management
for
discharges
and
referrals
Manages
patient
referrals
4600+
health
care
facilities
Connects
providers,
payers
via
web-‐based
patient
management
platform
Founded
in
1999
in
Newton,
Massachusetts
• Improve
poor
performance
of
Oracle
solution
• Support
more
complexity
including
granular,
role-‐based
access
control
• Satisfy
complex
Graph
Search
queries
by
discharge
nurses
and
intake
coordinators
Find
a
skilled
nursing
facility
within
n
miles
of
a
given
location,
belonging
to
health
care
group
XYZ,
offering
speech
therapy
and
cardiac
care,
and
optionally
Italian
language
services
62. Curaspan
SOLUTION
• Met
fast,
real-‐time
performance
demands
• Supported
queries
span
multiple
hierarchies
including
provider
and
employee-‐permissions
graphs
• Improved
data
model
to
handle
adding
more
dimensions
to
the
data
such
as
insurance
networks,
service
areas
and
care
organizations
• Greatly
simplified
queries,
simplifying
multi-‐page
SQL
statements
into
one
Neo4j
function
63. FiftyThree
BUSINESS
CASE
Maker
of
Paper,
one
of
the
top
apps
in
Apple’s
App
Store,
with
millions
of
users
Based
in
New
York
City
• Add
social
capabilities
to
digital-‐paper
app
• Support
social
collaboration
across
millions
of
users
in
new
Mix
app
• Enable
seamless
interaction
between
social
and
content-‐asset
networks
• Ensure
new
apps
are
robust,
scalable
and
fast
64. FiftyThree
SOLUTION
• Neo4j
data
model
ideal
for
social
network,
content
management
and
access
control
• Users
create,
publish
and
share
designs
simply
• Easy
to
develop
and
evolve
Neo4j-‐based
app
• Integrates
well
with
FiftyThree
EC2
architecture
See
the
Neo4j
solution
in
action
Betting
the
Company
(Literally)
on
a
Graph
Database
http://aseemk.com/talks/neo4j-‐lessons-‐learned#/
App
Store
Editor’s
Choice
2012
iPad
App
of
Year
Apple
Best
Apps
of
2014
65. Questions
• How
does
Neo4j
fit
into
my
existing
infrastructure?
As
a
Service.
• Will
Neo4j
scale?
Yes.