SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Bullseye P13n Platform
April 7, 2014
Charles Bracher
Bullseye Dev Manager
Ranjan Sinha, PhD
Lead Research Scientist
Bullseye	
  
Outline
P13n Platform
Why Cassandra?
Cassandra Setup
Cassandra Usage
Cassandra Issues and Resolutions
Hand over to Ranjan for the Data Science Perspective
Bullseye
Bullseye
Bullseye Functional Architecture
Offline AnalysisOffline Database/
Batch Processing
Recent User Data
1-5 days
(Cassandra)
Real Time Model
Evaluation & Caching
(sharded/full user state
in memory)
Client
Access
Near Real Time
Event Collection
Tracking
Long Term
User Data
(Local SSD)
Why Cassandra?
Great write performance
Great replication performance
Reasonable read performance
Reasonable cost
Client controlled consistency settings
Bullseye
Cassandra Setup
Cassandra Version 1.2.9
We use Replication
–  Cassandra rings deployed to 3 datacenters
Cassandra clients
–  We use both the Datastax Java and C++ Beta clients
Using CQL Table specifications and commands
Not on SSDs
Bullseye
Cassandra Usage
Column Family Design:
– Avoid Tombstones
– Avoid Compaction
With Focus on Short Term Storage:
– Turn off automatic compaction / only manual compaction
– Use unique column key names to avoid tombstones
– Clear out old data with truncation
Bullseye
Cache Miss Flow (New Session)
Bullseye
CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME));
Write to active day column family with key user id.
Truncate the oldest day column family.
When going from one day to the next, do a manual compaction for the old day.
On read, pull user id info from all col. families newer than the local SSD data.
Queuing Flow (Ongoing Activity)
Bullseye
CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME));
Read/Write from active hour with key timestamp rounded to nearest second
Store the column family one hour old to offline DB
Truncate the column family two hours old
Do async probe of record for current second as well as recent seconds till
state is captured. Data may be read 1-3 times. More if replication is lagging.
Cassandra Issues and Resolutions
Issues with C++ Datastax Cassandra beta client
– open sourced, so could apply fixes
Performance issues with the cache miss query
– increased heap size
– reduced replication factor
– turned off cross colo read repair
– deployed data center aware policy for C++
Bullseye
Personalization Applications
Ranjan Sinha, PhD
Lead Research Scientist
April 7, 2014
Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
Outline
Why Personalize?
P13N Platform
– Introduction
– Conceptual architecture
– Modeling stages
P13N Applications
– User badges
– Search ranking
– Contextual models
– Deals
Personalization Applications
Why Personalize?
Enable more relevant experience
Retention of existing users
New user acquisition
Reactivating churned users
Increasing activity per user
Improving conversion from visits to transactions
Personalization Applications
P13N Platform: Introduction
Maintains activity timeline information
Enables event processing at near real-time
Enables in-session personalization
Provides environment for predictive model evaluation
Backup and restore to and from Hadoop/HBase
Personalization Applications
P13N Platform: Conceptual Architecture
Personalization Applications
Tracking Event
Source
m1 m3m2
….
Model Executor
Filters and forwards
events
Activity
Timeline
+
User Badges
In-memory
Cache +
Model
Evaluation
CEP Processor
Client Access
Hadoop/
HBase
Offline Modeling
Platform
User Badges
mn
Cassandra
P13N Platform: Modeling stages
Realtime
– In-session user intent
– Contextual Models
Nearline
– Update propensity models (aka User Badges)
Offline
– Bootstrap propensity models by mining long-term behavior history
Personalization Applications
Application (1): User Badges
Personalization Applications
Name Description
SaleType Auction vs. Buy-it-now
ItemCondition New vs. Used
Category Preference of categories
Price Price range of purchasing activity
Deals Propensity to purchase deals
Social Share Propensity to share items in social media
Profile based on long-term behavior
Application (2): Search Ranking …
Should all queries be personalized in the same manner?
– For some queries (ebay or google), everyone would like the same results
– For other queries, different people may want completely different results
Personalization Applications
Query: “big ben puzzles”
Not_P13N
Rank
P13N
Rank
Sold IsNew Title
1 1 No No
LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500
PUZZLES EUC
2 3 No Yes
1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan
North Cascades National Park WA
3 2 Yes No
COMPLETE Fishing Village,Smalls Island MB Big
Ben Puzzle 1000 Piece Puzzle Size!
User: always buys used items
Application (3): Contextual models …
Personalization Applications
Infer categories that user is interested in within the current session
Long and Short term behavior
– Historic behavior may provide benefits at the start of the session
– Short-term behavior may contribute gains in an extended search session
– Combination of session and historic behavior may outperform using either alone
e2
t
Nearline, after session expiry
Online, in-session
Offline, historical
e3e1 …events… e1
Event
source
Application (4): Deals
Personalization Applications
Personalize
categories
Personalize
modules
Personalize
tabs
Personalize
items
fin
Personalization Applications

Contenu connexe

Plus de DataStax Academy

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

Plus de DataStax Academy (20)

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

  • 1. Bullseye P13n Platform April 7, 2014 Charles Bracher Bullseye Dev Manager Ranjan Sinha, PhD Lead Research Scientist Bullseye  
  • 2. Outline P13n Platform Why Cassandra? Cassandra Setup Cassandra Usage Cassandra Issues and Resolutions Hand over to Ranjan for the Data Science Perspective Bullseye
  • 3. Bullseye Bullseye Functional Architecture Offline AnalysisOffline Database/ Batch Processing Recent User Data 1-5 days (Cassandra) Real Time Model Evaluation & Caching (sharded/full user state in memory) Client Access Near Real Time Event Collection Tracking Long Term User Data (Local SSD)
  • 4. Why Cassandra? Great write performance Great replication performance Reasonable read performance Reasonable cost Client controlled consistency settings Bullseye
  • 5. Cassandra Setup Cassandra Version 1.2.9 We use Replication –  Cassandra rings deployed to 3 datacenters Cassandra clients –  We use both the Datastax Java and C++ Beta clients Using CQL Table specifications and commands Not on SSDs Bullseye
  • 6. Cassandra Usage Column Family Design: – Avoid Tombstones – Avoid Compaction With Focus on Short Term Storage: – Turn off automatic compaction / only manual compaction – Use unique column key names to avoid tombstones – Clear out old data with truncation Bullseye
  • 7. Cache Miss Flow (New Session) Bullseye CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); Write to active day column family with key user id. Truncate the oldest day column family. When going from one day to the next, do a manual compaction for the old day. On read, pull user id info from all col. families newer than the local SSD data.
  • 8. Queuing Flow (Ongoing Activity) Bullseye CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME)); Read/Write from active hour with key timestamp rounded to nearest second Store the column family one hour old to offline DB Truncate the column family two hours old Do async probe of record for current second as well as recent seconds till state is captured. Data may be read 1-3 times. More if replication is lagging.
  • 9. Cassandra Issues and Resolutions Issues with C++ Datastax Cassandra beta client – open sourced, so could apply fixes Performance issues with the cache miss query – increased heap size – reduced replication factor – turned off cross colo read repair – deployed data center aware policy for C++ Bullseye
  • 10. Personalization Applications Ranjan Sinha, PhD Lead Research Scientist April 7, 2014 Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
  • 11. Outline Why Personalize? P13N Platform – Introduction – Conceptual architecture – Modeling stages P13N Applications – User badges – Search ranking – Contextual models – Deals Personalization Applications
  • 12. Why Personalize? Enable more relevant experience Retention of existing users New user acquisition Reactivating churned users Increasing activity per user Improving conversion from visits to transactions Personalization Applications
  • 13. P13N Platform: Introduction Maintains activity timeline information Enables event processing at near real-time Enables in-session personalization Provides environment for predictive model evaluation Backup and restore to and from Hadoop/HBase Personalization Applications
  • 14. P13N Platform: Conceptual Architecture Personalization Applications Tracking Event Source m1 m3m2 …. Model Executor Filters and forwards events Activity Timeline + User Badges In-memory Cache + Model Evaluation CEP Processor Client Access Hadoop/ HBase Offline Modeling Platform User Badges mn Cassandra
  • 15. P13N Platform: Modeling stages Realtime – In-session user intent – Contextual Models Nearline – Update propensity models (aka User Badges) Offline – Bootstrap propensity models by mining long-term behavior history Personalization Applications
  • 16. Application (1): User Badges Personalization Applications Name Description SaleType Auction vs. Buy-it-now ItemCondition New vs. Used Category Preference of categories Price Price range of purchasing activity Deals Propensity to purchase deals Social Share Propensity to share items in social media Profile based on long-term behavior
  • 17. Application (2): Search Ranking … Should all queries be personalized in the same manner? – For some queries (ebay or google), everyone would like the same results – For other queries, different people may want completely different results Personalization Applications Query: “big ben puzzles” Not_P13N Rank P13N Rank Sold IsNew Title 1 1 No No LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500 PUZZLES EUC 2 3 No Yes 1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan North Cascades National Park WA 3 2 Yes No COMPLETE Fishing Village,Smalls Island MB Big Ben Puzzle 1000 Piece Puzzle Size! User: always buys used items
  • 18. Application (3): Contextual models … Personalization Applications Infer categories that user is interested in within the current session Long and Short term behavior – Historic behavior may provide benefits at the start of the session – Short-term behavior may contribute gains in an extended search session – Combination of session and historic behavior may outperform using either alone e2 t Nearline, after session expiry Online, in-session Offline, historical e3e1 …events… e1 Event source
  • 19. Application (4): Deals Personalization Applications Personalize categories Personalize modules Personalize tabs Personalize items