SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Razorfish: Use of EMR for Marketing
     Segmentation




© 2009 Razorfish. All rights reserved.
Agenda
• Who we are.
• Razorfish, ATLAS, Microsoft
• ATLAS What is it?, Problems
• AWS – EMR – Why move?
• EMR Solution Outline
• Benefits gained, Opportunities




                                   Page 2 © 2012 Razorfish. All
                                              rights reserved.
Who we are
– Razorfish London is a full-service digital agency.
– Founded in London in 1996
– We are now 250 people strong and experts at creative, design, social
  media, digital media, analytics, technology, service operations and
  user experience.
– We are part of one of the world's largest interactive agency networks
  with more than 2,800 people.
– According to LinkedIn, Razorfish is the 31st most desirable employer in
  the world (even beating Starbucks).
– For the last three years we’ve been the only agency recognised by
  Forrester Research as a ‘leader’ in both the Media & Interactive
  Marketing and Experience Design & Technology categories.
– We are Adobe’s ‘Digital Marketing Global Partner of the Year, 2012’
– We are No. 4 in the last Ad Age ‘Agency A-List’ - the highest ranked
  digital agency.




                                                       Page 3 © 2009 Razorfish. All
                                                                  rights reserved.
RF – Atlas - Microsoft
• Razorfish: Developed the ATLAS ad serving engine
• Atlas was seperated from Razorfish, but had a
symbiotic relationship
• Google bought DoubleClick
• Microsoft bought Aquantive Group
• Microsoft incorporated Atlas into MS Advertising
and Publishing
• Sold Razorfish to Publicis group
• RF continue to have a strong relationship with
Atlas, but have gone on to develop Razorfish Edge,
Insight On Demand (IoD), that use Atlas data
extensively.

                                       Page 4 © 2009 Razorfish. All
                                                  rights reserved.
Atlas
•Razorfish: Developed the ATLAS ad serving
engine
• Single cookie & atlas tags
• 90% of Browsers
• Clickstream analysis of data, current and
historical, log file data. User are placed into
buckets - segmented
• Segmentation used to serve ads and cross
sell

                                    Page 5 © 2009 Razorfish. All
                                               rights reserved.
Problem
45 Terabytes of raw clickstream (log) data
 45 Terabytes of raw clickstream and log data


Business logic and metrics against loosely structured data

   • ROI
   • Custom ROI base on complex, client specific business rules
   • Rich Media and Analytics


Custom user profiling


Custom analysis of web surfing activity


Targeting


                                                                  Page 6 © 2009 Razorfish. All
                                                                             rights reserved.
Problem
• Giant Datasets
• Build infrastructure requires large
continuous investment
• Building for peak/holiday traffic
• Data mining apps / Physical DB’s at or
near limit
• Client expectations/data volumes
increasing

                                   Page 7 © 2009 Razorfish. All
                                              rights reserved.
Previously 2009
•Custom Distributed Log Processing Engine

 • Sorted by cookie_id by time

  • Need to segment granularly across larger no/ segments (Cust || Prospect)
•SQL

 • 60 SQL Server boxes

 • Shared resources (contention issues)

  • In a DR configuration
•OLAP

 • In house constrained

By the end of 2009 (x-mas holiday season), RF needed $500k to keep up with data
processing needs.




                                                                    Page 8 © 2009 Razorfish. All
                                                                               rights reserved.
AWS + EMR
•   Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as
    needed based on load, reducing cost and the risk of processing delays.
•   Configuration: Amazon Elastic MapReduce and Cascading lets Razorfish focus on
    application development without having to worry about time-consuming set-up,
    management, or tuning of clusters or the compute capacity upon which they sit.
•   Ease of integration: Amazon Elastic MapReduce with Cascading allows data
    processing in the cloud without any changes to the underlying algorithms.
•   Flexible: EMR with Cascading is flexible enough to allow “agile” implementation
    and unit testing of sophisticated algorithms.
•   Adaptable: Cascading simplifies the integration of Hadoop with external ad
    systems.
•   Scalable: AWS infrastructure helps Razorfish reliably store and process huge
    (Petabytes) data sets.




                                                                  Page 9 © 2009 Razorfish. All
                                                                             rights reserved.
AWS + EMR


          AWS                               EMR                               Segmentation




•S3 Storage 45tb of log   • Measurement of customer value              • Actionable
data
                          • Measurement of customer affinity           • Rules flexible /
                                                                         customizable
                          • Joining 2.8 billion transactions against
                            known site categorization
                            information

                          • Unbalanced so there is a hit to the
                            reducers




                                                                              Page 10 © 2009 Razorfish. All
                                                                                          rights reserved.
We import a lot of Atlas Data




24 servers
                                          Cloud Storage

                     Upload 200 + GB
                      of data per day




             ( ½ Trillion ICA records )
We filter out the relevant cookies



Cloud Storage                                 Elastic Mapreduce




 100 Machine Cluster Created on demand. We filter for only the
  transactions that we need to process (more than 3.5 billion)




           ( about 71 million unique cookies a day)
Filter by behavior


Filtered Transactions




                                                                           SKU Table
                           Generate list of products that have been seen




                        ( Match these cookies to 100,000’s of skus )
Match to their affinity



                                             Join transactions
                                                  to site genre
                                                   information                    Sport
                                                                                Enthusiast
                                                                   70 million
Filtered Transactions




                                                                  placements



                             Determinee profile information by the
                               types of sites the user has visited




                        ( Cookies are matched to 3.5 billion ICA records )
…and run custom business rules



                                                            Join site
                                                         behavior to    SKU Table
                                                        product info                 In market
                                                                                       Gamer
 Filtered Transactions




                                      Determine the types of products the
                           user is interested from what they have done on the site




( super–computing power determines some key categories )
We bring it all together




category                   affinity               generation

  In market
    Gamer        +         Sport Enthusiast
                                              +    Purchaser Home
                                                       Theater




              ( 1 of N “Personalization” segments )
Drive a personalized message




   User recently purchased
   a home theater system
    and is now looking for   Target Ad
        sports games




              ( 1.7 million per day )
Each and every day




This all happens in about 8 hours every day




                  ( not bad )
AWS + EMR
– Perfect clarity of cost
– No upfront infrastructure investment
– No client processing contention
– We couldn’t have done it.
– Without EMR/Hadoop process takes 3 days and heavy
  reliance on manual processes. Now 5-8hrs
– Elasticity to complete a job faster if it’s worth the cost.
– We can meet our SLA’s




                                                Page 19 © 2009 Razorfish. All
                                                            rights reserved.
Expanding Data Landscape
• EMR allows us to deal with the ever expanding number of
  channels and user interactions with sites and data:
• Clickstream data available from tools like Atlas and
  Doubleclick—who have cookied over 90% of the Internet
• Digital experience tracked through tools like Omniture,
  Webtrends and Google Analytics
• Other channel data across touchpoints (email, call center,
  mobile)
• Client Data
• Transactional data
• Survey-based (Nielsen’s)
• Social data available through open APIs (hosepipes)


                                               Page 20 © 2009 Razorfish. All
                                                           rights reserved.
Thank you



     •Mandhir Gidda




© 2009 Razorfish. All rights reserved.

Contenu connexe

Similaire à Use of EMR for Marketing Segmentation

Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYC
Amazon Web Services
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
Amazon Web Services
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 
Common MongoDB Use Cases Webinar
Common MongoDB Use Cases WebinarCommon MongoDB Use Cases Webinar
Common MongoDB Use Cases Webinar
MongoDB
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 

Similaire à Use of EMR for Marketing Segmentation (20)

Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Amazon QuickSight First Call Deck
Amazon QuickSight First Call DeckAmazon QuickSight First Call Deck
Amazon QuickSight First Call Deck
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right Database
 
Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYC
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Deep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedomDeep dive session - how to achieve database freedom
Deep dive session - how to achieve database freedom
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Managed NoSQL databases
Managed NoSQL databasesManaged NoSQL databases
Managed NoSQL databases
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
 
FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory Reporting
 
Common MongoDB Use Cases Webinar
Common MongoDB Use Cases WebinarCommon MongoDB Use Cases Webinar
Common MongoDB Use Cases Webinar
 
AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
Immersion Day - Como gerenciar seu catálogo de dados e processo de transform...
Immersion Day -  Como gerenciar seu catálogo de dados e processo de transform...Immersion Day -  Como gerenciar seu catálogo de dados e processo de transform...
Immersion Day - Como gerenciar seu catálogo de dados e processo de transform...
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 

Plus de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Use of EMR for Marketing Segmentation

  • 1. Razorfish: Use of EMR for Marketing Segmentation © 2009 Razorfish. All rights reserved.
  • 2. Agenda • Who we are. • Razorfish, ATLAS, Microsoft • ATLAS What is it?, Problems • AWS – EMR – Why move? • EMR Solution Outline • Benefits gained, Opportunities Page 2 © 2012 Razorfish. All rights reserved.
  • 3. Who we are – Razorfish London is a full-service digital agency. – Founded in London in 1996 – We are now 250 people strong and experts at creative, design, social media, digital media, analytics, technology, service operations and user experience. – We are part of one of the world's largest interactive agency networks with more than 2,800 people. – According to LinkedIn, Razorfish is the 31st most desirable employer in the world (even beating Starbucks). – For the last three years we’ve been the only agency recognised by Forrester Research as a ‘leader’ in both the Media & Interactive Marketing and Experience Design & Technology categories. – We are Adobe’s ‘Digital Marketing Global Partner of the Year, 2012’ – We are No. 4 in the last Ad Age ‘Agency A-List’ - the highest ranked digital agency. Page 3 © 2009 Razorfish. All rights reserved.
  • 4. RF – Atlas - Microsoft • Razorfish: Developed the ATLAS ad serving engine • Atlas was seperated from Razorfish, but had a symbiotic relationship • Google bought DoubleClick • Microsoft bought Aquantive Group • Microsoft incorporated Atlas into MS Advertising and Publishing • Sold Razorfish to Publicis group • RF continue to have a strong relationship with Atlas, but have gone on to develop Razorfish Edge, Insight On Demand (IoD), that use Atlas data extensively. Page 4 © 2009 Razorfish. All rights reserved.
  • 5. Atlas •Razorfish: Developed the ATLAS ad serving engine • Single cookie & atlas tags • 90% of Browsers • Clickstream analysis of data, current and historical, log file data. User are placed into buckets - segmented • Segmentation used to serve ads and cross sell Page 5 © 2009 Razorfish. All rights reserved.
  • 6. Problem 45 Terabytes of raw clickstream (log) data 45 Terabytes of raw clickstream and log data Business logic and metrics against loosely structured data • ROI • Custom ROI base on complex, client specific business rules • Rich Media and Analytics Custom user profiling Custom analysis of web surfing activity Targeting Page 6 © 2009 Razorfish. All rights reserved.
  • 7. Problem • Giant Datasets • Build infrastructure requires large continuous investment • Building for peak/holiday traffic • Data mining apps / Physical DB’s at or near limit • Client expectations/data volumes increasing Page 7 © 2009 Razorfish. All rights reserved.
  • 8. Previously 2009 •Custom Distributed Log Processing Engine • Sorted by cookie_id by time • Need to segment granularly across larger no/ segments (Cust || Prospect) •SQL • 60 SQL Server boxes • Shared resources (contention issues) • In a DR configuration •OLAP • In house constrained By the end of 2009 (x-mas holiday season), RF needed $500k to keep up with data processing needs. Page 8 © 2009 Razorfish. All rights reserved.
  • 9. AWS + EMR • Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. • Configuration: Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of clusters or the compute capacity upon which they sit. • Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms. • Flexible: EMR with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms. • Adaptable: Cascading simplifies the integration of Hadoop with external ad systems. • Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets. Page 9 © 2009 Razorfish. All rights reserved.
  • 10. AWS + EMR AWS EMR Segmentation •S3 Storage 45tb of log • Measurement of customer value • Actionable data • Measurement of customer affinity • Rules flexible / customizable • Joining 2.8 billion transactions against known site categorization information • Unbalanced so there is a hit to the reducers Page 10 © 2009 Razorfish. All rights reserved.
  • 11. We import a lot of Atlas Data 24 servers Cloud Storage Upload 200 + GB of data per day ( ½ Trillion ICA records )
  • 12. We filter out the relevant cookies Cloud Storage Elastic Mapreduce 100 Machine Cluster Created on demand. We filter for only the transactions that we need to process (more than 3.5 billion) ( about 71 million unique cookies a day)
  • 13. Filter by behavior Filtered Transactions SKU Table Generate list of products that have been seen ( Match these cookies to 100,000’s of skus )
  • 14. Match to their affinity Join transactions to site genre information Sport Enthusiast 70 million Filtered Transactions placements Determinee profile information by the types of sites the user has visited ( Cookies are matched to 3.5 billion ICA records )
  • 15. …and run custom business rules Join site behavior to SKU Table product info In market Gamer Filtered Transactions Determine the types of products the user is interested from what they have done on the site ( super–computing power determines some key categories )
  • 16. We bring it all together category affinity generation In market Gamer + Sport Enthusiast + Purchaser Home Theater ( 1 of N “Personalization” segments )
  • 17. Drive a personalized message User recently purchased a home theater system and is now looking for Target Ad sports games ( 1.7 million per day )
  • 18. Each and every day This all happens in about 8 hours every day ( not bad )
  • 19. AWS + EMR – Perfect clarity of cost – No upfront infrastructure investment – No client processing contention – We couldn’t have done it. – Without EMR/Hadoop process takes 3 days and heavy reliance on manual processes. Now 5-8hrs – Elasticity to complete a job faster if it’s worth the cost. – We can meet our SLA’s Page 19 © 2009 Razorfish. All rights reserved.
  • 20. Expanding Data Landscape • EMR allows us to deal with the ever expanding number of channels and user interactions with sites and data: • Clickstream data available from tools like Atlas and Doubleclick—who have cookied over 90% of the Internet • Digital experience tracked through tools like Omniture, Webtrends and Google Analytics • Other channel data across touchpoints (email, call center, mobile) • Client Data • Transactional data • Survey-based (Nielsen’s) • Social data available through open APIs (hosepipes) Page 20 © 2009 Razorfish. All rights reserved.
  • 21. Thank you •Mandhir Gidda © 2009 Razorfish. All rights reserved.