SlideShare a Scribd company logo
1 of 42
Neo4j at Seth
Godin’s Squidoo
with
Chief Engineer Gil Hildebrand
What’s                           ?

Passionate people sharing the ideas they care about
Social publishing platform with over 3 million users
100mm+ pageviews per month, Quantcast ranked #35
in US
Introducing Postcards
A brand new product from Squidoo
Currently in private beta (not public just yet)
Single page, beautifully designed personal
recommendations of books, movies, music albums,
quotes, and other products and media types
Semantic Web

         A group of methods
         and technologies to
         allow machines to
         understand the
         meaning - or
         "semantics" - of
         information
Postcards get better with
the Semantic Web
 We parse web pages and external APIs to extract
 meaning.
 Web pages - Meta and Open Graph tags
   Title, Description, Photo, and Video
 External APIs
   Amazon, IMDB, Freebase, Google, YouTube, Bing,
   and more
Problem is normalization

 The meta tag “Hotel California” on a web page is not
 particularly useful unless I know the tag is music related
 - then I can search for music albums containing Hotel
 California.
 This is not easy, but the web as a whole is becoming
 more structured.
Connecting the Dots

Crawl a web page or API to extract metadata
Store subjects, nouns, adjectives, and possessives into
Neo
Query Neo to organize subjects into Stacks based on
nouns, adjectives, and possessives
Stacking Up
Postcards are organized into Stacks. Stacks are a
taxonomy based on media type and other common
factors. Ex:
  Books Stack
  Crime Novel Books Stack
  Tom Clancy Books Stack
Stacks created automatically based on metadata
associated with each Postcard.
Minimum of three Postcards is required for a Stack to
exist.
Modeling Taxonomy
Found that the “Parts of Speech” are a great way to
model Postcards taxonomy.
All Postcards have:
  Name of the item (subject)
  Domains or media types (nouns)
  Descriptors (adjectives)
  Owners or creators (possessives)
Parts of Speech
Modeling with our existing
DB platforms
Very familiar with MySQL.
Extremely reliable.
Relational model makes normalization possible, but
scaling is a concern as joins get larger and larger.
Schema                                    Queries
CREATE TABLE post_meta (
   post_id BIGINT,
   user_id VARCHAR,
   date_created SMALLINT,
   subject VARCHAR,                     Seth Godin’s Business Books
   noun VARCHAR,
   KEY (user_id),                       SELECT m.post_id FROM post_meta m
   KEY (date_created),                  JOIN possessives USING(user_id)
   KEY (subject),                       JOIN adjectives USING(user_id)
   KEY (noun)                           WHERE
);                                        possessive='Seth Godin'
                                          AND adjective='Business'
CREATE TABLE adjectives (                 AND noun='Book';
   post_id BIGINT,
   user_id VARCHAR,                     90s Rock Music Albums
   adjective VARCHAR,
   PRIMARY KEY (user_id, adjective),    SELECT m.post_id FROM post_meta m
   KEY (adjective)                      JOIN adjectives USING(user_id)
);                                      WHERE
                                          adjective='Rock'
CREATE TABLE possessives (                AND noun='Music';
   post_id BIGINT,                        AND date_created BETWEEN 1990 AND
   user_id VARCHAR,                     1999;
   possessive VARCHAR,
   PRIMARY KEY (user_id, possessive),
   KEY (possessive)
);
At Squidoo, used primarily for analytics.
Massively scalable, but no relational model or
aggregation features. Heavy denormalization required.
Many operations have to be performed asynchronously
using queues or batch processes.
Truly Relational
Our data model is very much a graph problem
Recommendation systems are one query away (easy!)
Meets all our tech requirements
Week One with Neo
Evaluating Tech Requirements

 High availability
 Great administrative tools
 Great PHP wrapper
   https://github.com/jadell/neo4jphp
 Commercial support
Learning to think in graphs was
HARD, but now feels NATURAL

              Should it be a node or a property?

              Which direction should the relationship
              point?

              More so than any other type of
              database I’ve encountered, graph
              DBs require you to know in advance
              exactly what queries you’ll need to
              perform.
Reviewing Sample Graphs
        (It Helps)

Official Examples: http://bit.ly/RzCDY9
5 Common Graphs: http://slidesha.re/cnomwz
Movies: http://bitly.com/QZbGw0
Designing with paper or flow chart
Learning PHP wrapper
First Prototype

           Basic HTML
           REST API only
             Easy to get started,
             but the real power
             comes from Cypher
Extending the
Prototype with Cypher

 Implement Cypher for recommendations and other
 traversals.
 Cypher looks intimidating at first, and the “it’s like SQL”
 analogy was not particularly helpful for me.
 However, Cypher is essential for using Neo’s most
 powerful features, and is worth learning. Once you get
 past the strange (but necessary) arrow syntax, it does
 start to feel like SQL.
3 Graph Design Tips
Tip #1: Use reference nodes




   START ref=node:Meta(title = "Actor")
   MATCH ref<-[:IS]-actor
   RETURN actor;
Tip #2: Use reference properties




    foreach ($posts as $post) {
      if ($post->getProperty(‘type’) == ‘Actor’) {
      // do something special for actors
      }
    }
Tip #3: Schema Changes
At first, there were a lot of schema changes during
development
No equivalent to MySQL’s ALTER TABLE or
TRUNCATE TABLE
Two options:
  Shut down Neo, rm -rf data/graph.db/*, and restart
  Or use this plugin: http://bitly.com/rHFSu6
    With the plugin, node IDs do not restart from zero
Tip #3.1: Schema Changes
      Wiped your DB and need to start over? Use an initialization script to set things up.


function initialize() {
    $master = $this->client->getNode(0);
    $master->setProperty('title', 'Master')->setProperty('parent', '')->save();

      // should be node 1
      $user_master = $this->client->makeNode();
      $user_master->save();
      $user_index = new EverymanNeo4jIndexNodeIndex($this->client, 'users');
      $user_index->save();

      $post_index = new EverymanNeo4jIndexNodeIndex($this->client, 'post');
      $post_index->save();

      $index = new EverymanNeo4jIndexNodeIndex($this->client, 'master');
      $nouns = array('Movie', 'Music', 'TV', 'Book', 'Video', 'Article', 'Photo', 'Product', 'Game', 'Squidoo');

      foreach ($nouns as $noun) {
        $node = $this->client->makeNode();
        $node->setProperty('title', $noun)->setProperty('type', 'master')->save();
        $index->add($node, 'noun', $noun);
        $index->save();
        $node->relateTo($master, 'IS')->save();

          $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun);
          $noun_index->save();
      }
  }
Postcards Demo
Homepage
A Single Postcard
Nouns


           “Noun” is our word for the
        domain or media type associated
                with a Postcard
Movie Noun
Just one example. We have books, music albums, products, and many others!
Single User’s Stack about Director
Martin Scorsese
Single User’s Stack about Director
Martin Scorsese




    START user=node({user_id})
    MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive
    WHERE possessive.title={meta} AND subject.type={noun}
    RETURN DISTINCT post, COLLECT(subject) as subject;

    {user_id} = 123
    {meta} = 'Martin Scorsese'
    {noun} = 'Movie'
Finding Stacks for a Postcard




   START post=node:post(post_id={post_id})
   MATCH post-[:POST]->subject-->adjective-[:IS]->parent
   RETURN subject, adjective, parent;
Finding a user’s “Liked” Postcards




     START user=node({user_id})
     MATCH user-[:LIKED]->post-[:POST]->subject
     RETURN DISTINCT post, COLLECT(subject) as subject;
Popularity Sorting

 Popularity is based on Likes, Comments, and other social
 signals, using a time decay factor to favor newer Postcards.
 Difficult to find an algorithm that allowed us support time
 decay without having to constantly re-score all Postcards.
 Long story short, we use Cypher’s ORDER BY for sorting. We
 perform a calculation based on pop_score and pop_date
 properties that exist in each Postcard node.
 An individual Postcard’s pop_score and pop_date are
 updated in real time when someone interacts with it.
Next Steps


Follow Users and Stacks (Activity Stream)
Load Balancing
Disambiguation
The End


          Gil Hildebrand
          gil@squidoo.com

More Related Content

Similar to When Relational Isn't Enough: Neo4j at Squidoo

JavaScript for Flex Devs
JavaScript for Flex DevsJavaScript for Flex Devs
JavaScript for Flex DevsAaronius
 
Schema design short
Schema design shortSchema design short
Schema design shortMongoDB
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012Yaqi Zhao
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPJeremy Kendall
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial EnAnkur Dongre
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)Johannes Hoppe
 
2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL SpartakiadeJohannes Hoppe
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureDavid Hoerster
 
Scaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsScaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsMike Schinkel
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018artgillespie
 
CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 Karthik Padmanabhan
 
Creating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningCreating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningJonathan LeBlanc
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsWindows Developer
 
Drupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comDrupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comJD Leonard
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring DataEric Bottard
 
CCCDjango2010.pdf
CCCDjango2010.pdfCCCDjango2010.pdf
CCCDjango2010.pdfjayarao21
 
Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Tarunsingh198
 
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerJoomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerSteven Pignataro
 

Similar to When Relational Isn't Enough: Neo4j at Squidoo (20)

JavaScript for Flex Devs
JavaScript for Flex DevsJavaScript for Flex Devs
JavaScript for Flex Devs
 
Schema design short
Schema design shortSchema design short
Schema design short
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012
 
Leveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHPLeveraging the Power of Graph Databases in PHP
Leveraging the Power of Graph Databases in PHP
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
Ejb3 Struts Tutorial En
Ejb3 Struts Tutorial EnEjb3 Struts Tutorial En
Ejb3 Struts Tutorial En
 
CMS content
CMS contentCMS content
CMS content
 
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
2012-08-29 - NoSQL Bootcamp (Redis, RavenDB & MongoDB für .NET Entwickler)
 
2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade2013-03-23 - NoSQL Spartakiade
2013-03-23 - NoSQL Spartakiade
 
Freeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS ArchitectureFreeing Yourself from an RDBMS Architecture
Freeing Yourself from an RDBMS Architecture
 
Scaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise AppsScaling Complexity in WordPress Enterprise Apps
Scaling Complexity in WordPress Enterprise Apps
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016 CloudML talk at DevFest Madurai 2016
CloudML talk at DevFest Madurai 2016
 
Creating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data MiningCreating Operational Redundancy for Effective Web Data Mining
Creating Operational Redundancy for Effective Web Data Mining
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive Cards
 
Drupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.comDrupal 7 entities & TextbookMadness.com
Drupal 7 entities & TextbookMadness.com
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
CCCDjango2010.pdf
CCCDjango2010.pdfCCCDjango2010.pdf
CCCDjango2010.pdf
 
Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)
 
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan ShroyerJoomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
Joomla! Day Chicago 2011 - Templating the right way - Jonathan Shroyer
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

When Relational Isn't Enough: Neo4j at Squidoo

  • 1. Neo4j at Seth Godin’s Squidoo with Chief Engineer Gil Hildebrand
  • 2. What’s ? Passionate people sharing the ideas they care about Social publishing platform with over 3 million users 100mm+ pageviews per month, Quantcast ranked #35 in US
  • 3. Introducing Postcards A brand new product from Squidoo Currently in private beta (not public just yet) Single page, beautifully designed personal recommendations of books, movies, music albums, quotes, and other products and media types
  • 4.
  • 5. Semantic Web A group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information
  • 6.
  • 7. Postcards get better with the Semantic Web We parse web pages and external APIs to extract meaning. Web pages - Meta and Open Graph tags Title, Description, Photo, and Video External APIs Amazon, IMDB, Freebase, Google, YouTube, Bing, and more
  • 8. Problem is normalization The meta tag “Hotel California” on a web page is not particularly useful unless I know the tag is music related - then I can search for music albums containing Hotel California. This is not easy, but the web as a whole is becoming more structured.
  • 9. Connecting the Dots Crawl a web page or API to extract metadata Store subjects, nouns, adjectives, and possessives into Neo Query Neo to organize subjects into Stacks based on nouns, adjectives, and possessives
  • 10. Stacking Up Postcards are organized into Stacks. Stacks are a taxonomy based on media type and other common factors. Ex: Books Stack Crime Novel Books Stack Tom Clancy Books Stack Stacks created automatically based on metadata associated with each Postcard. Minimum of three Postcards is required for a Stack to exist.
  • 11. Modeling Taxonomy Found that the “Parts of Speech” are a great way to model Postcards taxonomy. All Postcards have: Name of the item (subject) Domains or media types (nouns) Descriptors (adjectives) Owners or creators (possessives)
  • 13. Modeling with our existing DB platforms
  • 14. Very familiar with MySQL. Extremely reliable. Relational model makes normalization possible, but scaling is a concern as joins get larger and larger.
  • 15. Schema Queries CREATE TABLE post_meta ( post_id BIGINT, user_id VARCHAR, date_created SMALLINT, subject VARCHAR, Seth Godin’s Business Books noun VARCHAR, KEY (user_id), SELECT m.post_id FROM post_meta m KEY (date_created), JOIN possessives USING(user_id) KEY (subject), JOIN adjectives USING(user_id) KEY (noun) WHERE ); possessive='Seth Godin' AND adjective='Business' CREATE TABLE adjectives ( AND noun='Book'; post_id BIGINT, user_id VARCHAR, 90s Rock Music Albums adjective VARCHAR, PRIMARY KEY (user_id, adjective), SELECT m.post_id FROM post_meta m KEY (adjective) JOIN adjectives USING(user_id) ); WHERE adjective='Rock' CREATE TABLE possessives ( AND noun='Music'; post_id BIGINT, AND date_created BETWEEN 1990 AND user_id VARCHAR, 1999; possessive VARCHAR, PRIMARY KEY (user_id, possessive), KEY (possessive) );
  • 16. At Squidoo, used primarily for analytics. Massively scalable, but no relational model or aggregation features. Heavy denormalization required. Many operations have to be performed asynchronously using queues or batch processes.
  • 17. Truly Relational Our data model is very much a graph problem Recommendation systems are one query away (easy!) Meets all our tech requirements
  • 19. Evaluating Tech Requirements High availability Great administrative tools Great PHP wrapper https://github.com/jadell/neo4jphp Commercial support
  • 20. Learning to think in graphs was HARD, but now feels NATURAL Should it be a node or a property? Which direction should the relationship point? More so than any other type of database I’ve encountered, graph DBs require you to know in advance exactly what queries you’ll need to perform.
  • 21. Reviewing Sample Graphs (It Helps) Official Examples: http://bit.ly/RzCDY9 5 Common Graphs: http://slidesha.re/cnomwz Movies: http://bitly.com/QZbGw0
  • 22. Designing with paper or flow chart
  • 24. First Prototype Basic HTML REST API only Easy to get started, but the real power comes from Cypher
  • 25. Extending the Prototype with Cypher Implement Cypher for recommendations and other traversals. Cypher looks intimidating at first, and the “it’s like SQL” analogy was not particularly helpful for me. However, Cypher is essential for using Neo’s most powerful features, and is worth learning. Once you get past the strange (but necessary) arrow syntax, it does start to feel like SQL.
  • 27. Tip #1: Use reference nodes START ref=node:Meta(title = "Actor") MATCH ref<-[:IS]-actor RETURN actor;
  • 28. Tip #2: Use reference properties foreach ($posts as $post) { if ($post->getProperty(‘type’) == ‘Actor’) { // do something special for actors } }
  • 29. Tip #3: Schema Changes At first, there were a lot of schema changes during development No equivalent to MySQL’s ALTER TABLE or TRUNCATE TABLE Two options: Shut down Neo, rm -rf data/graph.db/*, and restart Or use this plugin: http://bitly.com/rHFSu6 With the plugin, node IDs do not restart from zero
  • 30. Tip #3.1: Schema Changes Wiped your DB and need to start over? Use an initialization script to set things up. function initialize() { $master = $this->client->getNode(0); $master->setProperty('title', 'Master')->setProperty('parent', '')->save(); // should be node 1 $user_master = $this->client->makeNode(); $user_master->save(); $user_index = new EverymanNeo4jIndexNodeIndex($this->client, 'users'); $user_index->save(); $post_index = new EverymanNeo4jIndexNodeIndex($this->client, 'post'); $post_index->save(); $index = new EverymanNeo4jIndexNodeIndex($this->client, 'master'); $nouns = array('Movie', 'Music', 'TV', 'Book', 'Video', 'Article', 'Photo', 'Product', 'Game', 'Squidoo'); foreach ($nouns as $noun) { $node = $this->client->makeNode(); $node->setProperty('title', $noun)->setProperty('type', 'master')->save(); $index->add($node, 'noun', $noun); $index->save(); $node->relateTo($master, 'IS')->save(); $noun_index = new EverymanNeo4jIndexNodeIndex($this->client, $noun); $noun_index->save(); } }
  • 34. Nouns “Noun” is our word for the domain or media type associated with a Postcard
  • 35. Movie Noun Just one example. We have books, music albums, products, and many others!
  • 36. Single User’s Stack about Director Martin Scorsese
  • 37. Single User’s Stack about Director Martin Scorsese START user=node({user_id}) MATCH user-[:POSTED]->post-[:POST]->subject-[:`BY`]->possessive WHERE possessive.title={meta} AND subject.type={noun} RETURN DISTINCT post, COLLECT(subject) as subject; {user_id} = 123 {meta} = 'Martin Scorsese' {noun} = 'Movie'
  • 38. Finding Stacks for a Postcard START post=node:post(post_id={post_id}) MATCH post-[:POST]->subject-->adjective-[:IS]->parent RETURN subject, adjective, parent;
  • 39. Finding a user’s “Liked” Postcards START user=node({user_id}) MATCH user-[:LIKED]->post-[:POST]->subject RETURN DISTINCT post, COLLECT(subject) as subject;
  • 40. Popularity Sorting Popularity is based on Likes, Comments, and other social signals, using a time decay factor to favor newer Postcards. Difficult to find an algorithm that allowed us support time decay without having to constantly re-score all Postcards. Long story short, we use Cypher’s ORDER BY for sorting. We perform a calculation based on pop_score and pop_date properties that exist in each Postcard node. An individual Postcard’s pop_score and pop_date are updated in real time when someone interacts with it.
  • 41. Next Steps Follow Users and Stacks (Activity Stream) Load Balancing Disambiguation
  • 42. The End Gil Hildebrand gil@squidoo.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n