2. What’s ?
Passionate people sharing the ideas they care about
Social publishing platform with over 3 million users
100mm+ pageviews per month, Quantcast ranked #35
in US
3. Introducing Postcards
A brand new product from Squidoo
Currently in private beta (not public just yet)
Single page, beautifully designed personal
recommendations of books, movies, music albums,
quotes, and other products and media types
4.
5. Semantic Web
A group of methods
and technologies to
allow machines to
understand the
meaning - or
"semantics" - of
information
6.
7. Postcards get better with
the Semantic Web
We parse web pages and external APIs to extract
meaning.
Web pages - Meta and Open Graph tags
Title, Description, Photo, and Video
External APIs
Amazon, IMDB, Freebase, Google, YouTube, Bing,
and more
8. Problem is normalization
The meta tag “Hotel California” on a web page is not
particularly useful unless I know the tag is music related
- then I can search for music albums containing Hotel
California.
This is not easy, but the web as a whole is becoming
more structured.
9. Connecting the Dots
Crawl a web page or API to extract metadata
Store subjects, nouns, adjectives, and possessives into
Neo
Query Neo to organize subjects into Stacks based on
nouns, adjectives, and possessives
10. Stacking Up
Postcards are organized into Stacks. Stacks are a
taxonomy based on media type and other common
factors. Ex:
Books Stack
Crime Novel Books Stack
Tom Clancy Books Stack
Stacks created automatically based on metadata
associated with each Postcard.
Minimum of three Postcards is required for a Stack to
exist.
11. Modeling Taxonomy
Found that the “Parts of Speech” are a great way to
model Postcards taxonomy.
All Postcards have:
Name of the item (subject)
Domains or media types (nouns)
Descriptors (adjectives)
Owners or creators (possessives)
14. Very familiar with MySQL.
Extremely reliable.
Relational model makes normalization possible, but
scaling is a concern as joins get larger and larger.
15. Schema Queries
CREATE TABLE post_meta (
post_id BIGINT,
user_id VARCHAR,
date_created SMALLINT,
subject VARCHAR, Seth Godin’s Business Books
noun VARCHAR,
KEY (user_id), SELECT m.post_id FROM post_meta m
KEY (date_created), JOIN possessives USING(user_id)
KEY (subject), JOIN adjectives USING(user_id)
KEY (noun) WHERE
); possessive='Seth Godin'
AND adjective='Business'
CREATE TABLE adjectives ( AND noun='Book';
post_id BIGINT,
user_id VARCHAR, 90s Rock Music Albums
adjective VARCHAR,
PRIMARY KEY (user_id, adjective), SELECT m.post_id FROM post_meta m
KEY (adjective) JOIN adjectives USING(user_id)
); WHERE
adjective='Rock'
CREATE TABLE possessives ( AND noun='Music';
post_id BIGINT, AND date_created BETWEEN 1990 AND
user_id VARCHAR, 1999;
possessive VARCHAR,
PRIMARY KEY (user_id, possessive),
KEY (possessive)
);
16. At Squidoo, used primarily for analytics.
Massively scalable, but no relational model or
aggregation features. Heavy denormalization required.
Many operations have to be performed asynchronously
using queues or batch processes.
17. Truly Relational
Our data model is very much a graph problem
Recommendation systems are one query away (easy!)
Meets all our tech requirements
19. Evaluating Tech Requirements
High availability
Great administrative tools
Great PHP wrapper
https://github.com/jadell/neo4jphp
Commercial support
20. Learning to think in graphs was
HARD, but now feels NATURAL
Should it be a node or a property?
Which direction should the relationship
point?
More so than any other type of
database I’ve encountered, graph
DBs require you to know in advance
exactly what queries you’ll need to
perform.
24. First Prototype
Basic HTML
REST API only
Easy to get started,
but the real power
comes from Cypher
25. Extending the
Prototype with Cypher
Implement Cypher for recommendations and other
traversals.
Cypher looks intimidating at first, and the “it’s like SQL”
analogy was not particularly helpful for me.
However, Cypher is essential for using Neo’s most
powerful features, and is worth learning. Once you get
past the strange (but necessary) arrow syntax, it does
start to feel like SQL.
27. Tip #1: Use reference nodes
START ref=node:Meta(title = "Actor")
MATCH ref<-[:IS]-actor
RETURN actor;
28. Tip #2: Use reference properties
foreach ($posts as $post) {
if ($post->getProperty(‘type’) == ‘Actor’) {
// do something special for actors
}
}
29. Tip #3: Schema Changes
At first, there were a lot of schema changes during
development
No equivalent to MySQL’s ALTER TABLE or
TRUNCATE TABLE
Two options:
Shut down Neo, rm -rf data/graph.db/*, and restart
Or use this plugin: http://bitly.com/rHFSu6
With the plugin, node IDs do not restart from zero
30. Tip #3.1: Schema Changes
Wiped your DB and need to start over? Use an initialization script to set things up.
function initialize() {
$master = $this->client->getNode(0);
$master->setProperty('title', 'Master')->setProperty('parent', '')->save();
// should be node 1
$user_master = $this->client->makeNode();
$user_master->save();
$user_index = new EverymanNeo4jIndexNodeIndex($this->client, 'users');
$user_index->save();
$post_index = new EverymanNeo4jIndexNodeIndex($this->client, 'post');
$post_index->save();
$index = new EverymanNeo4jIndexNodeIndex($this->client, 'master');
$nouns = array('Movie', 'Music', 'TV', 'Book', 'Video', 'Article', 'Photo', 'Product', 'Game', 'Squidoo');
foreach ($nouns as $noun) {
$node = $this->client->makeNode();
$node->setProperty('title', $noun)->setProperty('type', 'master')->save();
$index->add($node, 'noun', $noun);
$index->save();
$node->relateTo($master, 'IS')->save();
$noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun);
$noun_index->save();
}
}
37. Single User’s Stack about Director
Martin Scorsese
START user=node({user_id})
MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive
WHERE possessive.title={meta} AND subject.type={noun}
RETURN DISTINCT post, COLLECT(subject) as subject;
{user_id} = 123
{meta} = 'Martin Scorsese'
{noun} = 'Movie'
38. Finding Stacks for a Postcard
START post=node:post(post_id={post_id})
MATCH post-[:POST]->subject-->adjective-[:IS]->parent
RETURN subject, adjective, parent;
39. Finding a user’s “Liked” Postcards
START user=node({user_id})
MATCH user-[:LIKED]->post-[:POST]->subject
RETURN DISTINCT post, COLLECT(subject) as subject;
40. Popularity Sorting
Popularity is based on Likes, Comments, and other social
signals, using a time decay factor to favor newer Postcards.
Difficult to find an algorithm that allowed us support time
decay without having to constantly re-score all Postcards.
Long story short, we use Cypher’s ORDER BY for sorting. We
perform a calculation based on pop_score and pop_date
properties that exist in each Postcard node.
An individual Postcard’s pop_score and pop_date are
updated in real time when someone interacts with it.