SlideShare une entreprise Scribd logo
1  sur  10
7/11/2016
2
VIEWBIX ENHANCED CREATIVE
VIDEO
BRANDING
CALL TO
ACTION
3
4
- Send tracking events as query string params to
server hosted on Rackspace
- Hourly job to parse log files and insert summary data into SQL
- Problems:
- Network Bottleneck – dropping events
- Managing SQL server drive space
- No scalability
- Because of sizing problems we limited ourselves in
what we collected – poor analytics
- No enrichment process
Solution 1
5
- Distribute the collection of the tracking events to Akamai cloud (GET
requests to CDN endpoint)
- Akamai aggregate logs and send every 4 hours a batch of logs via
FTP
- Hadoop – Hive – SQL summary tables all hosted in Azure cloud
- Problems:
- Need for faster end to end reporting
- To stay scalable need for summary tables- lose granular reporting
- Changes to the data we need to report on requires re-building and
possibly re-importing of raw data – data modeling
Hadoop/HIVE/SQL
Akamai
Solution 2
6
Requirements doc for new solution
- Work with Flash and Javascript trackers
- Robust data modeling - Ability to change business requirements on the
fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Fast and flexible end to end solution
3rd Party Solution
- Ability to send unlimited events and unstructured data
- Pricing not based on event volume (Dec. 779 Million)
- We own the data
- Hand holding- Managed service
- Beautiful and useful visualizations and data export API (may require
additional 3rd party)
7
How’d we do?
- Work with Flash and Javascript trackers
- Pricing not based on event volume (Dec. 779 Million)
- Ability to send unlimited events and unstructured data
- Hand Holding
- Fast and flexible end to end solution
- We own the data
- Robust data modeling - Ability to change business
requirements on the fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Beautiful and useful visualizations and data export
API (may require additional 3rd party)
Solution- Snowplow
- We wrote an Open Source AS3 tracker
- Fixed monthly fee + AWS usage
- No limits on size or event type
- Amazing customer service
- Pipeline can be adjusted based on needs
- Sits in our AWS account
- Because all data is stored we can change the
pipeline rules and at any time and re-run
- We learned to live with summary data
- Constantly growing- today surpasses our needs
- Today using Bime Analytics – soon to be in house
charting components or Amazon Quicksite
8
Gotchas we ran into
- Errors in the raw data being sent in – garbage in garbage out!
- Solution- at the time- was not auto-scaling.
- Redshift is not MS SQL server- need to understand nuances of
columnar database queries and optimizations
- Real data analysts don’t want charts- they want data. We spent
a lot of time and money perfecting our charts when ultimately our
customers want csv exports. Today our charts are about 95% for
marketing purposes.
- AWS cost forecasting and control
- Data modeling - Ultimately we do need to summarize but at an
acceptable level.
- Invest heavily in this stage.
- Overestimate your needs – You don’t know what you don’t
know.
- Work with Snowplow (at extra cost) to get it right
9
What value do our analytics
provide?
It’s not that big data is bad, but by looking
for the big wins, we risk losing the most
exciting potential of big data: the very
small actionable insights that are unique
to each individual. The real future
potential of big data isn’t in its capacity to
be big, but rather in just how small it can
get.
Glen Tullman - Forbes
“
“
10
THANK YOU

Contenu connexe

Tendances

2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modelingyalisassoon
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we doyalisassoon
 
Introducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowIntroducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowGiuseppe Gaviani
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics dataRobert Kingston
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...yalisassoon
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow Analytics
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event datayalisassoon
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisStewart Duncan
 
Data science as a service
Data science as a serviceData science as a service
Data science as a serviceidan_by
 
The culture trip snowplow implementation
The culture trip snowplow implementationThe culture trip snowplow implementation
The culture trip snowplow implementationidan_by
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processingidan_by
 
Streetlife's real time analytics stack
Streetlife's real time analytics stackStreetlife's real time analytics stack
Streetlife's real time analytics stackidan_by
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look mlyalisassoon
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
Why Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfWhy Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfIn Marketing We Trust
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsSingleStore
 

Tendances (20)

2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
 
Introducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowIntroducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from Snowplow
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics data
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution Analysis
 
Data science as a service
Data science as a serviceData science as a service
Data science as a service
 
The culture trip snowplow implementation
The culture trip snowplow implementationThe culture trip snowplow implementation
The culture trip snowplow implementation
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
Streetlife's real time analytics stack
Streetlife's real time analytics stackStreetlife's real time analytics stack
Streetlife's real time analytics stack
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Why Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfWhy Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted Conf
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 

En vedette

Putting data to work
Putting data to workPutting data to work
Putting data to workidan_by
 
Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup idan_by
 
Social Media Manager's Calendar
Social Media Manager's CalendarSocial Media Manager's Calendar
Social Media Manager's CalendarIoana Barbu
 
Automotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slidesAutomotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slidesPhocas Software
 
Waffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention PlatformWaffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention PlatformWaffor Retail
 
Remodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce ExperienceRemodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce ExperienceRemodista
 
Unleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics IntegrationsUnleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics IntegrationsBedrock Data, Inc.
 
The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0Jörg Höhner
 
Targeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social DataTargeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social DataNetBase Solutions Inc.
 
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Tealium
 
True Single Customer View
True Single Customer View True Single Customer View
True Single Customer View Veer Endra
 
Internal vs. external identity access management
Internal vs. external identity access managementInternal vs. external identity access management
Internal vs. external identity access managementTatiana Grisham
 

En vedette (15)

Putting data to work
Putting data to workPutting data to work
Putting data to work
 
Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup
 
Social Media Manager's Calendar
Social Media Manager's CalendarSocial Media Manager's Calendar
Social Media Manager's Calendar
 
Automotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slidesAutomotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slides
 
Waffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention PlatformWaffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention Platform
 
Remodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce ExperienceRemodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce Experience
 
Unleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics IntegrationsUnleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics Integrations
 
The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0
 
Dealers and OEMs in the Omni Channel World
Dealers and OEMs in the Omni Channel World Dealers and OEMs in the Omni Channel World
Dealers and OEMs in the Omni Channel World
 
Autosure digital
Autosure digitalAutosure digital
Autosure digital
 
Targeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social DataTargeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social Data
 
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
 
New approach for availability management
New approach for availability managementNew approach for availability management
New approach for availability management
 
True Single Customer View
True Single Customer View True Single Customer View
True Single Customer View
 
Internal vs. external identity access management
Internal vs. external identity access managementInternal vs. external identity access management
Internal vs. external identity access management
 

Similaire à Viewbix tracking journey

Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeSoftware Guru
 
The role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing StartupThe role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing StartupMaximilian Ehrlich
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data centerAdam Cataldo
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
 
Distributed Data Systems
Distributed Data SystemsDistributed Data Systems
Distributed Data SystemsJared Kerim
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011Shay Hassidim
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Data virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-valueData virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-valueAvinash Deshpande
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeDatabricks
 
Cap intro oct2014 pdf
Cap intro oct2014 pdfCap intro oct2014 pdf
Cap intro oct2014 pdfMarkku Ranta
 
CAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesCAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesMarkku Ranta
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementDobler Consulting
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in PracticeTom Diederich
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)Chinmay Kulkarni
 
Data Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-ValueData Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-ValueDenodo
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 

Similaire à Viewbix tracking journey (20)

Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
The role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing StartupThe role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing Startup
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Distributed Data Systems
Distributed Data SystemsDistributed Data Systems
Distributed Data Systems
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Data virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-valueData virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-value
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Cap intro oct2014 pdf
Cap intro oct2014 pdfCap intro oct2014 pdf
Cap intro oct2014 pdf
 
Cap server log file analytics
Cap server log file analyticsCap server log file analytics
Cap server log file analytics
 
CAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesCAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log files
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Data Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-ValueData Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-Value
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 

Dernier

Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 

Dernier (17)

Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 

Viewbix tracking journey

  • 3. 3
  • 4. 4 - Send tracking events as query string params to server hosted on Rackspace - Hourly job to parse log files and insert summary data into SQL - Problems: - Network Bottleneck – dropping events - Managing SQL server drive space - No scalability - Because of sizing problems we limited ourselves in what we collected – poor analytics - No enrichment process Solution 1
  • 5. 5 - Distribute the collection of the tracking events to Akamai cloud (GET requests to CDN endpoint) - Akamai aggregate logs and send every 4 hours a batch of logs via FTP - Hadoop – Hive – SQL summary tables all hosted in Azure cloud - Problems: - Need for faster end to end reporting - To stay scalable need for summary tables- lose granular reporting - Changes to the data we need to report on requires re-building and possibly re-importing of raw data – data modeling Hadoop/HIVE/SQL Akamai Solution 2
  • 6. 6 Requirements doc for new solution - Work with Flash and Javascript trackers - Robust data modeling - Ability to change business requirements on the fly - No need for summary data – granular reporting - Robust and reliable enrichment process - Fast and flexible end to end solution 3rd Party Solution - Ability to send unlimited events and unstructured data - Pricing not based on event volume (Dec. 779 Million) - We own the data - Hand holding- Managed service - Beautiful and useful visualizations and data export API (may require additional 3rd party)
  • 7. 7 How’d we do? - Work with Flash and Javascript trackers - Pricing not based on event volume (Dec. 779 Million) - Ability to send unlimited events and unstructured data - Hand Holding - Fast and flexible end to end solution - We own the data - Robust data modeling - Ability to change business requirements on the fly - No need for summary data – granular reporting - Robust and reliable enrichment process - Beautiful and useful visualizations and data export API (may require additional 3rd party) Solution- Snowplow - We wrote an Open Source AS3 tracker - Fixed monthly fee + AWS usage - No limits on size or event type - Amazing customer service - Pipeline can be adjusted based on needs - Sits in our AWS account - Because all data is stored we can change the pipeline rules and at any time and re-run - We learned to live with summary data - Constantly growing- today surpasses our needs - Today using Bime Analytics – soon to be in house charting components or Amazon Quicksite
  • 8. 8 Gotchas we ran into - Errors in the raw data being sent in – garbage in garbage out! - Solution- at the time- was not auto-scaling. - Redshift is not MS SQL server- need to understand nuances of columnar database queries and optimizations - Real data analysts don’t want charts- they want data. We spent a lot of time and money perfecting our charts when ultimately our customers want csv exports. Today our charts are about 95% for marketing purposes. - AWS cost forecasting and control - Data modeling - Ultimately we do need to summarize but at an acceptable level. - Invest heavily in this stage. - Overestimate your needs – You don’t know what you don’t know. - Work with Snowplow (at extra cost) to get it right
  • 9. 9 What value do our analytics provide? It’s not that big data is bad, but by looking for the big wins, we risk losing the most exciting potential of big data: the very small actionable insights that are unique to each individual. The real future potential of big data isn’t in its capacity to be big, but rather in just how small it can get. Glen Tullman - Forbes “ “