SlideShare a Scribd company logo
1 of 40
Download to read offline
Counterfactual analysis: a Big Data
case-study using Cosmos/SCOPE
Ed Snelson
Work by
Jonas Peters Joaquin Quiñonero Candela
Denis Xavier Charles D. Max Chickering
Elon Portugaly Dipankar Ray
Patrice Simard Ed Snelson
Léon Bottou
http://jmlr.org/papers/v14/bottou13a.html
I. MOTIVATION
Search ads
The search ads ecosystem
User Advertiser
Queries
Ads &
Bids
Ads Prices
Clicks (and consequences)
Learning
ADVERTISER
FEEDBACK LOOP
LEARNING
FEEDBACK LOOP
USER
FEEDBACK
LOOP
Search-engine
Learning to run a marketplace
• The learning machine
is not a machine but
is an organization with lots
of people doing stuff!
How can we help?
• Goal: improve marketplace machinery such that its
long term revenue is maximal
• Approximate goal by improving multiple
performance measures (KPIs) related to all players
• Provide data for decision making
• Automatically optimize parts of the
system
Outline from here on
II. Online Experimentation
III. Counterfactual measurements
IV. Cosmos/SCOPE
V. Implementation details
II. ONLINE EXPERIMENTATION
How do parameters affect KPIs?
• We want to determine how certain auction
parameters affect KPIs
• Three options:
1. Offline log analysis – “correlational”
2. Auction simulation
3. Online experimentation – “causal”
The problem with correlation analysis
(Simpson’s paradox)
Trying to decide whether a drug helps or not
• Historical data:
• Conclusion: don’t give the drug
But what if the Drs. were saving the drug for the severe cases?
• Conclusion reversed: drug helps for both severe and mild cases
All Survived Died
Survival
Rate
Treated 5,000 2,100 2,900 42%
Not Treated 5,000 2,900 2,100 58%
Severe cases (treatment rate 80%)
All Survived Died
Survival
Rate
Treated 4,000 1,200 2,800 30%
Not Treated 1,000 100 900 10%
Mild case (treatment rate 20%)
All Survived Died
Survival
Rate
Treated 1,000 900 100 90%
Not Treated 4,000 2,800 1,300 70%
Overkill?
Pervasive causation paradoxes in ad data!
Example.
– Logged data shows a positive correlation between
event A “First mainline ad gets a high quality score”
and event B “Second mainline ad receives a click”.
– Do high quality ads encourage clicking below?
– Controlling for event C ”Query categorized as commercial” reverses
the correlation for both commercial and non-commercial queries.
Randomized experiments
Randomly select who to treat
• Selection independent of all confounding factors
• Therefore eliminates Simpson’s paradox and allows:
Counterfactual estimates
• If we had given drug to 𝑥% of the patients,
the success rate would have been 60% × 𝑥 + 40% × 1 − 𝑥
All population (treatment rate 30%)
All Survived Died
Survival
Rate
Treated 3,000 1,800 1,200 60%
Not Treated 7,000 2,800 4,200 40%
Experiments in the online world
• A/B tests are used throughout the online world
to compare different versions of the system
– A random fraction of the traffic (a flight) uses click-
prediction system A
– Another random fraction uses click-prediction
system B
• Wait for a week, measure KPIs, choose best!
• Our framework takes this one step further…
III. COUNTERFACTUAL
MEASUREMENTS
Counterfactuals
Measuring something that did not happen
“How would the system have performed if,
when the data was collected, we had used
𝑠𝑦𝑠𝑡𝑒𝑚∗
instead of 𝑠𝑦𝑠𝑡𝑒𝑚?”
Replaying past data
Classification example
• Collect labeled data in existing setup
• Replay the past data to evaluate what the performance would
have been if we had used classifier θ.
• Requires knowledge of all functions connecting the point of
change to the point of measurement.
𝑠
*
Concrete example: mainline reserve (MLR)
Mainline
Sidebar
Ad Score >
MLR
Online randomization
Q: Can we estimate the results of a change counterfactually
(without actually performing the change)?
A: Yes, if 𝑠𝑦𝑠𝑡𝑒𝑚∗ and 𝑠𝑦𝑠𝑡𝑒𝑚 are non-deterministic (and close
enough)
𝑃(𝑀𝐿𝑅)
𝑃∗(𝑀𝐿𝑅)
MLR MLR
𝑀𝐿𝑅 𝑀𝐿𝑅 ∗
Deterministic Randomized
For each auction, a random MLR is used online, drawn from the data-collection
distribution 𝑃(𝑀𝐿𝑅)
Estimating counterfactual KPIs
𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙
∗
~
𝑖
𝑤𝑖
∗
𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙 =
𝑖
𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
Usual additive KPI:
Counterfactual KPI:
• Weighted sum: auctions with MLRs “closer” to the counterfactual
distribution get higher weight
𝑤𝑖
∗
=
𝑃∗
(𝑀𝐿𝑅𝑖)
𝑃 𝑀𝐿𝑅𝑖
Exploration
𝑃(𝜔) 𝑃∗(𝜔) Quality of the estimation
• Confidence intervals reveal
whether the data collection
distribution 𝑃 𝜔 performs
sufficient exploration to
answer the counterfactual
question of interest.
𝑃(𝜔) 𝑃∗(𝜔)
Clicks vs MLR
Inner
“exploration”
intervalOuter “sample-
size” interval
Control with no
randomization
Control with
18% lower MLR
Number of Mainline Ads vs MLR
This is easy to
estimate
Revenue vs MLR
Revenue has
always high
sample variance
More with the same data
How is this related to A/B testing?
• A/B testing tests 2 specific settings against each
other
• Need to know what questions you want to ask
beforehand!
Big advantage of more general randomization:
• Collect data first, choose question(s) later
• Randomizing more stuff increases opportunities
But…
• Requires more sophisticated offline log processing
IV. COSMOS/SCOPE
Ad Auction Logs
• ≈ 10TB per day ad-auction logs
• Cooked and joined from various raw logs
• Stored in Cosmos, queried via SCOPE
• Small fraction of total Bing logs and jobs:
– Tens of thousands SCOPE jobs daily
– Tens of PBs read/write daily
Cosmos/SCOPE
≈ PIG/HIVE
≈ HDFS
http://research.microsoft.com/en-us/um/people/jrzhou/pub/Scope.pdf
http://research.microsoft.com/en-us/um/people/jrzhou/pub/scope-vldbj.pdf
Cosmos
• Microsoft’s internal distributed data store
• Tens of thousands of commodity servers
≈ HDFS, GFS
• Append-only file system, optimized for
sequential I/O
• Data replication and compression
Data Representation
1. Unstructured streams
– Custom Extractors: converts a sequence of bytes into
a RowSet, specifying a schema for the columns
2. Structured streams
– Data stored alongside metadata information: a well-
defined schema, and structural properties (e.g.
partitioning and sorting information)
– Can be horizontally partitioned into tens of thousands
of partitions e.g. hash or range partitioning
– Indexes for random access and index-based joins
SCOPE scripting language
• SQL-like (in syntax) declarative language
specifying data transformation pipeline
• Each scope statement takes as input one or
more RowSets, and outputs another RowSet
• Highly extensible with C# expressions, custom
operators and data types
• Scope compiler and optimizer responsible for
generating a data flow DAG for an efficient
parallel execution
C# Expressions and functions
R1 = SELECT A+C AS ac, B.Trim() AS B1
FROM R
WHERE StringOccurs(C, “xyz”) > 2;
#CS
public static int StringOccurs(string str, string ptrn)
{
int cnt=0;
int pos=-1;
while (pos+1 < str.Length)
{
pos = str.IndexOf(ptrn, pos+1);
if (pos < 0) break;
cnt++;
}
return cnt;
}
#ENDCS
C# String method
C# String expression
C# User-defined types (UDTs)
– Arbitrary C# classes can be used as column types
in scripts
– Extremely convenient for easy
serialization/deserialization
– Can be referenced in external dlls, C# backing files,
and in-script (#CS … #ENDCS)
SELECT UserId, SessionId,
new RequestInfo(binaryData)
AS Request
FROM InputStream
WHERE Request.Browser.IsIE();
C# User-defined operators
– User defined aggregates
• Aggregate Interface: Intialize, Accumulate, Finalize
• Can be declared recursive: allows partial aggregation
– MapReduce-like extensions
• PROCESS
• REDUCE
– Can be declared recursive
• COMBINE
SCOPE compilation and execution
SELECT query, COUNT() AS count FROM "search.log“
USING LogExtractor
GROUP BY query
HAVING count > 1000
ORDER BY count DESC;
OUTPUT TO "qcount.result";
Runtime cost-based optimizer
SCOPE: Pros/Cons (an opinion)
• Pros:
– Very quick to write simple queries without thinking
about parallelization and execution
– Highly extensible with deep C# integration
– UDT columns and C# functions
– Easy development and debugging from VS
• Intellisense
• Cons:
– No loop/iteration support means a poor fit for many
ML algorithms
– Batch, rather than interactive
V. IMPLEMENTATION
Counterfactual computation
• Ideal for Map-Reduce setting
• Map: 𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖 → 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
• Reduce: 𝑖 𝑤𝑖
∗
…
𝐾𝑃𝐼𝑡𝑜𝑡𝑎𝑙
∗
=
𝑖
𝑤𝑖
∗
𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
Counterfactual grid
SCOPE pseudo-code for counterfactuals
AuctionLogs = VIEW CosmosLogPath;
SELECT Auction
FROM AuctionLogs;
SELECT ComputeKPIs(Auction) AS KPIs,
ComputeWeightGrid(Auction) AS WeightGrid;
SELECT ComputeWeightedKPIs(KPIs, GridPoint) AS wKPIs,
CROSS APPLY WeightGrid AS GridPoint;
SELECT AggregateKPIs(wKPIs) AS TotalKPIs
GROUP BY GridPoint;
SELECT GridPoint, TotalKPIs.Finalize() AS FinalKPIs
OUTPUT TO “Results.tsv”;
C# UDT: Wraps all logged info
about a single auction
C# UDFs
Call instance method on
“TotalKPIs” UDT
Recursive
Aggregator:
𝑤𝑖, 𝑤𝑖 𝐾𝑃𝐼𝑖
etc.
Unroll the
weight grid
Conclusions
• There are systems in the real world that are too
complex to easily formalize
• Causal inference clarifies many problems
– Ignoring causality => Simpson’s paradox
– Randomness allows inferring causality
• The counterfactual framework is modular
– Randomize in advance, ask later
• Counterfactual analysis ideally suited to batch
map-reduce

More Related Content

What's hot

Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speedmarkgrover
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Big Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and BlueprintsBig Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and BlueprintsAshnikbiz
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Spark Summit
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineDatabricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLDatabricks
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...Databricks
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Spark Summit
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in SparkDatabricks
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataDatabricks
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
 

What's hot (20)

Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speed
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Big Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and BlueprintsBig Data Business Transformation - Big Picture and Blueprints
Big Data Business Transformation - Big Picture and Blueprints
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Gender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML PipelineGender Prediction with Databricks AutoML Pipeline
Gender Prediction with Databricks AutoML Pipeline
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on SparkBig Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
 
The Future of Real-Time in Spark
The Future of Real-Time in SparkThe Future of Real-Time in Spark
The Future of Real-Time in Spark
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Building Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous DataBuilding Identity Graphs over Heterogeneous Data
Building Identity Graphs over Heterogeneous Data
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan ZvaraSpark Summit EU talk by Zoltan Zvara
Spark Summit EU talk by Zoltan Zvara
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 

Viewers also liked

Andrei Kirilenkov. Vertica
Andrei Kirilenkov. VerticaAndrei Kirilenkov. Vertica
Andrei Kirilenkov. VerticaVolha Banadyseva
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemVolha Banadyseva
 
Brian Bulkowski. Aerospike
Brian Bulkowski. AerospikeBrian Bulkowski. Aerospike
Brian Bulkowski. AerospikeVolha Banadyseva
 
Ramunas Urbonas. The Journey
Ramunas Urbonas. The JourneyRamunas Urbonas. The Journey
Ramunas Urbonas. The JourneyVolha Banadyseva
 
Tadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to CassandraTadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to CassandraVolha Banadyseva
 
Dionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data IntroDionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data IntroVolha Banadyseva
 

Viewers also liked (6)

Andrei Kirilenkov. Vertica
Andrei Kirilenkov. VerticaAndrei Kirilenkov. Vertica
Andrei Kirilenkov. Vertica
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
 
Brian Bulkowski. Aerospike
Brian Bulkowski. AerospikeBrian Bulkowski. Aerospike
Brian Bulkowski. Aerospike
 
Ramunas Urbonas. The Journey
Ramunas Urbonas. The JourneyRamunas Urbonas. The Journey
Ramunas Urbonas. The Journey
 
Tadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to CassandraTadas Pivorius. Married to Cassandra
Tadas Pivorius. Married to Cassandra
 
Dionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data IntroDionizas Antipenkovas. Big Data Intro
Dionizas Antipenkovas. Big Data Intro
 

Similar to Ed Snelson. Counterfactual Analysis

Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningitstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning SystemsTrieu Nguyen
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesRevolution Analytics
 
Aline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon
 
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...Alison Hitchens
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionMatt Stubbs
 
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextInMobi Technology
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration TestingNick van Beest
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Olivier Chapelle
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 

Similar to Ed Snelson. Counterfactual Analysis (20)

Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Aline Pichon: NextBuy
Aline Pichon: NextBuyAline Pichon: NextBuy
Aline Pichon: NextBuy
 
sigir16
sigir16sigir16
sigir16
 
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統
 
Big Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration Testing
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 

More from Volha Banadyseva

Андрей Светлов. Aiohttp
Андрей Светлов. AiohttpАндрей Светлов. Aiohttp
Андрей Светлов. AiohttpVolha Banadyseva
 
Сергей Зефиров
Сергей ЗефировСергей Зефиров
Сергей ЗефировVolha Banadyseva
 
Валерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИРВалерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИРVolha Banadyseva
 
Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»Volha Banadyseva
 
Татьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУТатьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУVolha Banadyseva
 
Trillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital TrendsTrillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital TrendsVolha Banadyseva
 
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернетеАлександр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернетеVolha Banadyseva
 
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в storeМастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в storeVolha Banadyseva
 
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App StoreБахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App StoreVolha Banadyseva
 
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...Volha Banadyseva
 
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google PlayЕвгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google PlayVolha Banadyseva
 
Евгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипированияЕвгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипированияVolha Banadyseva
 
Егор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложенийЕгор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложенийVolha Banadyseva
 
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформСтанислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформVolha Banadyseva
 
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...Volha Banadyseva
 
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...Volha Banadyseva
 
Юлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитиеЮлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитиеVolha Banadyseva
 
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позжеАлександр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позжеVolha Banadyseva
 

More from Volha Banadyseva (20)

Андрей Светлов. Aiohttp
Андрей Светлов. AiohttpАндрей Светлов. Aiohttp
Андрей Светлов. Aiohttp
 
Сергей Зефиров
Сергей ЗефировСергей Зефиров
Сергей Зефиров
 
Eugene Burmako
Eugene BurmakoEugene Burmako
Eugene Burmako
 
Heather Miller
Heather MillerHeather Miller
Heather Miller
 
Валерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИРВалерий Прытков, декан факультета КСиС, БГУИР
Валерий Прытков, декан факультета КСиС, БГУИР
 
Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»Елена Локтева, «Инфопарк»
Елена Локтева, «Инфопарк»
 
Татьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУТатьяна Милова, директор института непрерывного образования БГУ
Татьяна Милова, директор института непрерывного образования БГУ
 
Trillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital TrendsTrillhaas Goetz. Innovations in Google and Global Digital Trends
Trillhaas Goetz. Innovations in Google and Global Digital Trends
 
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернетеАлександр Чекан. 28 правДИвых слайдов о белорусах в интернете
Александр Чекан. 28 правДИвых слайдов о белорусах в интернете
 
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в storeМастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
Мастер-класс Ильи Красинского и Елены Столбовой. Жизнь до и после выхода в store
 
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App StoreБахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
Бахрам Исмаилов. Продвижение мобильного приложение - оптимизация в App Store
 
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
Евгений Пальчевский. Что можно узнать из отзывов пользователей в мобильных ма...
 
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google PlayЕвгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
Евгений Невгень. Оптимизация мета-данных приложения для App Store и Google Play
 
Евгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипированияЕвгений Козяк. Tips & Tricks мобильного прототипирования
Евгений Козяк. Tips & Tricks мобильного прототипирования
 
Егор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложенийЕгор Белый. Модели успешной монетизации мобильных приложений
Егор Белый. Модели успешной монетизации мобильных приложений
 
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформСтанислав Пацкевич. Инструменты аналитики для мобильных платформ
Станислав Пацкевич. Инструменты аналитики для мобильных платформ
 
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
Артём Азевич. Эффективные подходы к разработке приложений. Как найти своего п...
 
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
Дина Сударева. Развитие игровой команды и ее самоорганизация. Роль менеджера ...
 
Юлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитиеЮлия Ерина. Augmented Reality Games: становление и развитие
Юлия Ерина. Augmented Reality Games: становление и развитие
 
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позжеАлександр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
Александр Дзюба. Знать игрока: плейтест на стадии прототипа и позже
 

Recently uploaded

Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 

Recently uploaded (20)

Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 

Ed Snelson. Counterfactual Analysis

  • 1. Counterfactual analysis: a Big Data case-study using Cosmos/SCOPE Ed Snelson
  • 2. Work by Jonas Peters Joaquin Quiñonero Candela Denis Xavier Charles D. Max Chickering Elon Portugaly Dipankar Ray Patrice Simard Ed Snelson Léon Bottou http://jmlr.org/papers/v14/bottou13a.html
  • 5. The search ads ecosystem User Advertiser Queries Ads & Bids Ads Prices Clicks (and consequences) Learning ADVERTISER FEEDBACK LOOP LEARNING FEEDBACK LOOP USER FEEDBACK LOOP Search-engine
  • 6. Learning to run a marketplace • The learning machine is not a machine but is an organization with lots of people doing stuff! How can we help? • Goal: improve marketplace machinery such that its long term revenue is maximal • Approximate goal by improving multiple performance measures (KPIs) related to all players • Provide data for decision making • Automatically optimize parts of the system
  • 7. Outline from here on II. Online Experimentation III. Counterfactual measurements IV. Cosmos/SCOPE V. Implementation details
  • 9. How do parameters affect KPIs? • We want to determine how certain auction parameters affect KPIs • Three options: 1. Offline log analysis – “correlational” 2. Auction simulation 3. Online experimentation – “causal”
  • 10. The problem with correlation analysis (Simpson’s paradox) Trying to decide whether a drug helps or not • Historical data: • Conclusion: don’t give the drug But what if the Drs. were saving the drug for the severe cases? • Conclusion reversed: drug helps for both severe and mild cases All Survived Died Survival Rate Treated 5,000 2,100 2,900 42% Not Treated 5,000 2,900 2,100 58% Severe cases (treatment rate 80%) All Survived Died Survival Rate Treated 4,000 1,200 2,800 30% Not Treated 1,000 100 900 10% Mild case (treatment rate 20%) All Survived Died Survival Rate Treated 1,000 900 100 90% Not Treated 4,000 2,800 1,300 70%
  • 11. Overkill? Pervasive causation paradoxes in ad data! Example. – Logged data shows a positive correlation between event A “First mainline ad gets a high quality score” and event B “Second mainline ad receives a click”. – Do high quality ads encourage clicking below? – Controlling for event C ”Query categorized as commercial” reverses the correlation for both commercial and non-commercial queries.
  • 12. Randomized experiments Randomly select who to treat • Selection independent of all confounding factors • Therefore eliminates Simpson’s paradox and allows: Counterfactual estimates • If we had given drug to 𝑥% of the patients, the success rate would have been 60% × 𝑥 + 40% × 1 − 𝑥 All population (treatment rate 30%) All Survived Died Survival Rate Treated 3,000 1,800 1,200 60% Not Treated 7,000 2,800 4,200 40%
  • 13. Experiments in the online world • A/B tests are used throughout the online world to compare different versions of the system – A random fraction of the traffic (a flight) uses click- prediction system A – Another random fraction uses click-prediction system B • Wait for a week, measure KPIs, choose best! • Our framework takes this one step further…
  • 15. Counterfactuals Measuring something that did not happen “How would the system have performed if, when the data was collected, we had used 𝑠𝑦𝑠𝑡𝑒𝑚∗ instead of 𝑠𝑦𝑠𝑡𝑒𝑚?”
  • 16. Replaying past data Classification example • Collect labeled data in existing setup • Replay the past data to evaluate what the performance would have been if we had used classifier θ. • Requires knowledge of all functions connecting the point of change to the point of measurement. 𝑠 *
  • 17. Concrete example: mainline reserve (MLR) Mainline Sidebar Ad Score > MLR
  • 18. Online randomization Q: Can we estimate the results of a change counterfactually (without actually performing the change)? A: Yes, if 𝑠𝑦𝑠𝑡𝑒𝑚∗ and 𝑠𝑦𝑠𝑡𝑒𝑚 are non-deterministic (and close enough) 𝑃(𝑀𝐿𝑅) 𝑃∗(𝑀𝐿𝑅) MLR MLR 𝑀𝐿𝑅 𝑀𝐿𝑅 ∗ Deterministic Randomized For each auction, a random MLR is used online, drawn from the data-collection distribution 𝑃(𝑀𝐿𝑅)
  • 19. Estimating counterfactual KPIs 𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙 ∗ ~ 𝑖 𝑤𝑖 ∗ 𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖) 𝐶𝑙𝑖𝑐𝑘𝑠𝑡𝑜𝑡𝑎𝑙 = 𝑖 𝐶𝑙𝑖𝑐𝑘𝑠(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖) Usual additive KPI: Counterfactual KPI: • Weighted sum: auctions with MLRs “closer” to the counterfactual distribution get higher weight 𝑤𝑖 ∗ = 𝑃∗ (𝑀𝐿𝑅𝑖) 𝑃 𝑀𝐿𝑅𝑖
  • 20. Exploration 𝑃(𝜔) 𝑃∗(𝜔) Quality of the estimation • Confidence intervals reveal whether the data collection distribution 𝑃 𝜔 performs sufficient exploration to answer the counterfactual question of interest. 𝑃(𝜔) 𝑃∗(𝜔)
  • 21. Clicks vs MLR Inner “exploration” intervalOuter “sample- size” interval Control with no randomization Control with 18% lower MLR
  • 22. Number of Mainline Ads vs MLR This is easy to estimate
  • 23. Revenue vs MLR Revenue has always high sample variance
  • 24. More with the same data How is this related to A/B testing? • A/B testing tests 2 specific settings against each other • Need to know what questions you want to ask beforehand! Big advantage of more general randomization: • Collect data first, choose question(s) later • Randomizing more stuff increases opportunities But… • Requires more sophisticated offline log processing
  • 26. Ad Auction Logs • ≈ 10TB per day ad-auction logs • Cooked and joined from various raw logs • Stored in Cosmos, queried via SCOPE • Small fraction of total Bing logs and jobs: – Tens of thousands SCOPE jobs daily – Tens of PBs read/write daily
  • 28. Cosmos • Microsoft’s internal distributed data store • Tens of thousands of commodity servers ≈ HDFS, GFS • Append-only file system, optimized for sequential I/O • Data replication and compression
  • 29. Data Representation 1. Unstructured streams – Custom Extractors: converts a sequence of bytes into a RowSet, specifying a schema for the columns 2. Structured streams – Data stored alongside metadata information: a well- defined schema, and structural properties (e.g. partitioning and sorting information) – Can be horizontally partitioned into tens of thousands of partitions e.g. hash or range partitioning – Indexes for random access and index-based joins
  • 30. SCOPE scripting language • SQL-like (in syntax) declarative language specifying data transformation pipeline • Each scope statement takes as input one or more RowSets, and outputs another RowSet • Highly extensible with C# expressions, custom operators and data types • Scope compiler and optimizer responsible for generating a data flow DAG for an efficient parallel execution
  • 31. C# Expressions and functions R1 = SELECT A+C AS ac, B.Trim() AS B1 FROM R WHERE StringOccurs(C, “xyz”) > 2; #CS public static int StringOccurs(string str, string ptrn) { int cnt=0; int pos=-1; while (pos+1 < str.Length) { pos = str.IndexOf(ptrn, pos+1); if (pos < 0) break; cnt++; } return cnt; } #ENDCS C# String method C# String expression
  • 32. C# User-defined types (UDTs) – Arbitrary C# classes can be used as column types in scripts – Extremely convenient for easy serialization/deserialization – Can be referenced in external dlls, C# backing files, and in-script (#CS … #ENDCS) SELECT UserId, SessionId, new RequestInfo(binaryData) AS Request FROM InputStream WHERE Request.Browser.IsIE();
  • 33. C# User-defined operators – User defined aggregates • Aggregate Interface: Intialize, Accumulate, Finalize • Can be declared recursive: allows partial aggregation – MapReduce-like extensions • PROCESS • REDUCE – Can be declared recursive • COMBINE
  • 34. SCOPE compilation and execution SELECT query, COUNT() AS count FROM "search.log“ USING LogExtractor GROUP BY query HAVING count > 1000 ORDER BY count DESC; OUTPUT TO "qcount.result"; Runtime cost-based optimizer
  • 35. SCOPE: Pros/Cons (an opinion) • Pros: – Very quick to write simple queries without thinking about parallelization and execution – Highly extensible with deep C# integration – UDT columns and C# functions – Easy development and debugging from VS • Intellisense • Cons: – No loop/iteration support means a poor fit for many ML algorithms – Batch, rather than interactive
  • 37. Counterfactual computation • Ideal for Map-Reduce setting • Map: 𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖 → 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖) • Reduce: 𝑖 𝑤𝑖 ∗ … 𝐾𝑃𝐼𝑡𝑜𝑡𝑎𝑙 ∗ = 𝑖 𝑤𝑖 ∗ 𝐾𝑃𝐼(𝑎𝑢𝑐𝑡𝑖𝑜𝑛𝑖)
  • 39. SCOPE pseudo-code for counterfactuals AuctionLogs = VIEW CosmosLogPath; SELECT Auction FROM AuctionLogs; SELECT ComputeKPIs(Auction) AS KPIs, ComputeWeightGrid(Auction) AS WeightGrid; SELECT ComputeWeightedKPIs(KPIs, GridPoint) AS wKPIs, CROSS APPLY WeightGrid AS GridPoint; SELECT AggregateKPIs(wKPIs) AS TotalKPIs GROUP BY GridPoint; SELECT GridPoint, TotalKPIs.Finalize() AS FinalKPIs OUTPUT TO “Results.tsv”; C# UDT: Wraps all logged info about a single auction C# UDFs Call instance method on “TotalKPIs” UDT Recursive Aggregator: 𝑤𝑖, 𝑤𝑖 𝐾𝑃𝐼𝑖 etc. Unroll the weight grid
  • 40. Conclusions • There are systems in the real world that are too complex to easily formalize • Causal inference clarifies many problems – Ignoring causality => Simpson’s paradox – Randomness allows inferring causality • The counterfactual framework is modular – Randomize in advance, ask later • Counterfactual analysis ideally suited to batch map-reduce