SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Modeling taste with Cassandra




Affinity is based on user tastes, preferences, and interests

                                                               1
What is a taste profile?

               Operational definition: the set of things you like and dislike

Stuff I like                                   Stuff I don’t like




     Challenge: how do you build a set of things you like and dislike
           Operational definition: the taste profile for someone?               2
Thesis: Likes are correlated
Inferring correlations
        D               1)   User A:
                              •   Democrat
                              •   Likes Arugula
                        2)   User B:
                    C
                              •   Republican
    E
            ?                 •   Dislikes Arugula
                        3)   User C indicates:
                              •   Democrat

                        What would we infer is User C’s affinity for
                        Arugula?

A
                        Answer: User C would like Arugula
                B




                                                                       4
Inferring correlations

               Like arugula
                                      User A


                                      <3, 2.5>

                     <1,1>
Dislike                           Like
Obama                             Obama
          User B


      <-2,-1.5>


    <-3,-3>
              Dislike arugula

                              User C           If someone’s affinity
                                               for Obama is 2.0,
                              <2,?>
                                               what is their affinity
                                               for arugula?

                                                                        5
Discovering latent factors
                                                            Obama
                                                                             Liberal
                                                     Arugula        <5, 5>
                                Like arugula
                                                                <4, 4>
                                                   User A

                                                     <3, 2>

                                      <1,1>
                 Dislike                            Like
                 Obama                              Obama
                           User B


                        <-2,-1.5>

            Iceberg
                       <-3,-3>
               <-4, -4>        Dislike arugula
    GOP


 <-5, -5>                                      User C         Predict 1.5 for how
                                                              much this person will
                                               <2,1.5>
Conservative                                                  like arugula.


                                                                                       6
Taste space = many latent factors

                                      <0.7, 4.4, -.1>
                        Liberal

                                  <0.5, 2.4, -.4>
                                     A
                                     Extroverted


Masculine                                          Feminine



                        <-0.5, -3.1, 0.1>
        Introverted
                       B
                      Conservative




                                                              7
What is a taste profile profile?

                 Operational definition: a coordinate in taste space

Stuff I like (close to me in taste space)   Stuff I don’t like (far away in taste space)




           Operational definition: the set of things you like and dislike
        Challenge: how do you calculate taste coordinates?                            8
Calculating taste coordinates
                       D                     Edge weight = dot product of nodes
? <x, y>
                                             to constrain similar items to be
                   2            <1, -1>
                                             close to each other.
                                     C       Assume edge weights of:
               E                                +2 = “love”
                                                -2 = “hate”
      2    <1, -0.5>
                                             Democratic node must solve:
                                               1*x -2*y = 2 (edge from A)
           2
                           -2                  1*x -1*y = 2 (edge from C)
      A
                                             Solution = <2, 0>
 <1, -2>                         B
                           <-1, 2>




                                                                             9
Updating taste coordinates

          User A purchases a camera...

<1, -1>
                                                          <1, -0.5>
                          2         <1, -1>
                                                                                      2         <1, -1>
                                         C
                                                                                                     C
                                              <-1, 0.5>
                                                                                                          <-1, 0.5>
                  <1, -0.5>
                                                                      2       <1, -0.5>

              2
                               -2                                             2
                                              2                                            -2
          A                                                                                               2
                                                                      A
   <1, -2>                           B
                                                               <0.75, -2.5>                      B
                              <-1, 2>
                                                                                          <-1, 2>

          Resulting in blue coordinates changing.
v1 System overview - Model updates

                                     1) Receive event
  Rec.              Updater          (eg, Purchase)
  Engine



    3) Write user             2a) Write Purchase edge
    and item                  2b) Read other edges
    coordinates               for this user and item




Reco. DB            Taste graph
User -> coord
Item -> coord
v1 System overview - Rec serving

                     1) Page load        Rec.          Updater
                     requests            Engine
                     recommendations


                               2) Rec. engine
                               finds other
                               cameras close
                               to user’s
3) Recommendations             coordinates
shown to user


                                       Reco. DB        Taste graph
                                       User -> coord
                                       Item -> coord
v1 Taste Graph data size


40 billion edges
2 billion item nodes
200 million user nodes

5TB of data, takes up 10TB with Replication Factor of 2

We expect this to quadruple next year as we get more events and add
  new types of edges




                                                                      13
v1 Taste Graph DB configuration


32 Linux machines
  128GB RAM
  1TB iSCSI SSD
  10 GigE NIC


Cassandra version 1.0.8

8GB JVM heap space

Size-tiered compaction strategy
v1 Taste Graph schema

User Edges
              (timestamp, edge_type, item_id)   …
   user_id               <empty>
Item Edges
              (timestamp, edge_type, user_id)   …
    item_id              <empty>
User Nodes
                   tastevector
    user_id   200 bytes (50 floats)
Item Nodes
                   tastevector
    item_id   200 bytes (50 floats)
v1 Real-time taste updates

Edges and nodes read per second
v1 Real-time taste updates

Edges and nodes written per second
Questions?


tp@hunch.com




                        18

Contenu connexe

Tendances

SteelEye 표준 제안서
SteelEye 표준 제안서SteelEye 표준 제안서
SteelEye 표준 제안서
Yong-uk Choe
 
Identity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor TechnologyIdentity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor Technology
David J Rosenthal
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 

Tendances (20)

AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
 
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
 
SteelEye 표준 제안서
SteelEye 표준 제안서SteelEye 표준 제안서
SteelEye 표준 제안서
 
Running Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWSRunning Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWS
 
Identity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor TechnologyIdentity and Access Management from Microsoft and Razor Technology
Identity and Access Management from Microsoft and Razor Technology
 
Azure Reference Architectures
Azure Reference ArchitecturesAzure Reference Architectures
Azure Reference Architectures
 
[금융고객을 위한 AWS re:Invent 2022 re:Cap] 3.AWS reInvent 2022 Technical Highlights...
[금융고객을 위한 AWS re:Invent 2022 re:Cap] 3.AWS reInvent 2022 Technical Highlights...[금융고객을 위한 AWS re:Invent 2022 re:Cap] 3.AWS reInvent 2022 Technical Highlights...
[금융고객을 위한 AWS re:Invent 2022 re:Cap] 3.AWS reInvent 2022 Technical Highlights...
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Azure virtual network
Azure virtual networkAzure virtual network
Azure virtual network
 
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdfSuresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
Suresh Poopandi_Generative AI On AWS-MidWestCommunityDay-Final.pdf
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
 
Dapr - A 10x Developer Framework for Any Language
Dapr - A 10x Developer Framework for Any LanguageDapr - A 10x Developer Framework for Any Language
Dapr - A 10x Developer Framework for Any Language
 
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?
 
Cloud migration slides
Cloud migration slidesCloud migration slides
Cloud migration slides
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Graph Machine Learning in Production with Neo4j
Graph Machine Learning in Production with Neo4jGraph Machine Learning in Production with Neo4j
Graph Machine Learning in Production with Neo4j
 
AWS Migration Planning Roadmap
AWS Migration Planning RoadmapAWS Migration Planning Roadmap
AWS Migration Planning Roadmap
 
Azure Active Directory - An Introduction
Azure Active Directory  - An IntroductionAzure Active Directory  - An Introduction
Azure Active Directory - An Introduction
 
Migration Planning
Migration PlanningMigration Planning
Migration Planning
 

En vedette

NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
InfiniteGraph
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
Neo4j
 

En vedette (17)

Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Lju Lazarevic
Lju LazarevicLju Lazarevic
Lju Lazarevic
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - Castano
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 

Plus de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Graph Based Recommendation Systems at eBay

  • 1. Modeling taste with Cassandra Affinity is based on user tastes, preferences, and interests 1
  • 2. What is a taste profile? Operational definition: the set of things you like and dislike Stuff I like Stuff I don’t like Challenge: how do you build a set of things you like and dislike Operational definition: the taste profile for someone? 2
  • 3. Thesis: Likes are correlated
  • 4. Inferring correlations D 1) User A: • Democrat • Likes Arugula 2) User B: C • Republican E ? • Dislikes Arugula 3) User C indicates: • Democrat What would we infer is User C’s affinity for Arugula? A Answer: User C would like Arugula B 4
  • 5. Inferring correlations Like arugula User A <3, 2.5> <1,1> Dislike Like Obama Obama User B <-2,-1.5> <-3,-3> Dislike arugula User C If someone’s affinity for Obama is 2.0, <2,?> what is their affinity for arugula? 5
  • 6. Discovering latent factors Obama Liberal Arugula <5, 5> Like arugula <4, 4> User A <3, 2> <1,1> Dislike Like Obama Obama User B <-2,-1.5> Iceberg <-3,-3> <-4, -4> Dislike arugula GOP <-5, -5> User C Predict 1.5 for how much this person will <2,1.5> Conservative like arugula. 6
  • 7. Taste space = many latent factors <0.7, 4.4, -.1> Liberal <0.5, 2.4, -.4> A Extroverted Masculine Feminine <-0.5, -3.1, 0.1> Introverted B Conservative 7
  • 8. What is a taste profile profile? Operational definition: a coordinate in taste space Stuff I like (close to me in taste space) Stuff I don’t like (far away in taste space) Operational definition: the set of things you like and dislike Challenge: how do you calculate taste coordinates? 8
  • 9. Calculating taste coordinates D Edge weight = dot product of nodes ? <x, y> to constrain similar items to be 2 <1, -1> close to each other. C Assume edge weights of: E +2 = “love” -2 = “hate” 2 <1, -0.5> Democratic node must solve: 1*x -2*y = 2 (edge from A) 2 -2 1*x -1*y = 2 (edge from C) A Solution = <2, 0> <1, -2> B <-1, 2> 9
  • 10. Updating taste coordinates User A purchases a camera... <1, -1> <1, -0.5> 2 <1, -1> 2 <1, -1> C C <-1, 0.5> <-1, 0.5> <1, -0.5> 2 <1, -0.5> 2 -2 2 2 -2 A 2 A <1, -2> B <0.75, -2.5> B <-1, 2> <-1, 2> Resulting in blue coordinates changing.
  • 11. v1 System overview - Model updates 1) Receive event Rec. Updater (eg, Purchase) Engine 3) Write user 2a) Write Purchase edge and item 2b) Read other edges coordinates for this user and item Reco. DB Taste graph User -> coord Item -> coord
  • 12. v1 System overview - Rec serving 1) Page load Rec. Updater requests Engine recommendations 2) Rec. engine finds other cameras close to user’s 3) Recommendations coordinates shown to user Reco. DB Taste graph User -> coord Item -> coord
  • 13. v1 Taste Graph data size 40 billion edges 2 billion item nodes 200 million user nodes 5TB of data, takes up 10TB with Replication Factor of 2 We expect this to quadruple next year as we get more events and add new types of edges 13
  • 14. v1 Taste Graph DB configuration 32 Linux machines 128GB RAM 1TB iSCSI SSD 10 GigE NIC Cassandra version 1.0.8 8GB JVM heap space Size-tiered compaction strategy
  • 15. v1 Taste Graph schema User Edges (timestamp, edge_type, item_id) … user_id <empty> Item Edges (timestamp, edge_type, user_id) … item_id <empty> User Nodes tastevector user_id 200 bytes (50 floats) Item Nodes tastevector item_id 200 bytes (50 floats)
  • 16. v1 Real-time taste updates Edges and nodes read per second
  • 17. v1 Real-time taste updates Edges and nodes written per second