SlideShare a Scribd company logo
1 of 9
Download to read offline
Have Your Cake and Eat It Too! Preserving Privacy while
Achieving High Behavioral Targeting Performance
Qi Zhao, Yi Zhang
School of Engineering
University of California Santa
Cruz
{manazhao,yiz}@soe.ucsc.edu
Lucian Vlad Lita
Blue Kai, Inc
Cupertino, CA 95014
lucian@bluekai.com
ABSTRACT
Privacy is a major concern for Internet users and Internet
policy regulators. Privacy violations usually entail either
sharing Personally Identifying Information (PII) or non-PII
information such as a site visitor’s behavior on a website.
On the other hand, Internet advertising through behavioral
targeting is an important part of the Internet ecosystem,
as it provides users more relevant information and enables
content/data providers to provide free services to end users.
In order to achieve effective behavioral targeting, it is de-
sirable for the advertisers to access a set of users with the
targeted behaviors. A key question is how should data flow
from a provider (e.g. publisher) to a third party advertiser
to achieve effective behavioral targeting, all while without
directly sharing exact user behavior data. This paper at-
tempts to answer this question and proposes a privacy pre-
serving technique for behavioral targeting that does not en-
tail a drastic reduction in advertising effectiveness. When
behavioral targeting data is transferred to an advertiser, we
use a smart, data mining-based noise injection method that
perturbs the results (a set of users meeting specified cri-
teria) by carefully adding noisy data points that maintain
a high level of performance. Upon receiving the data, the
advertiser cannot distinguish accurate data points adhering
to specifications, versus noisy data, which does not meet
the specifications. Using data from a major US top Online
Travel Agent (OTA), we evaluated the proposed technique
for location-based behavioral targeting, whereby advertisers
run data campaigns targeting travelers for specific destina-
tions. Our experimental results demonstrate that such data
campaigns obtain results that enhance or preserve user pri-
vacy while maintaining a high level of targeting performance.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ADKDD’12, August, 12–16, Beijing, China
Copyright 2012 ACM 978-1-4503-1545-6/12/08 ...$15.00.
General Terms
Algorithms, Performance, Security
Keywords
Behavioral targeting, privacy, data mining
1. INTRODUCTION
Recently, we have observed an exponential growth in the
number of web services spanning search engines and e-commerce
websites and propagating across multiple verticals. In most
cases we, the users, provide personal information in order
to enjoy these Internet services. For example, we willingly
register our demographic and geographic information with
social networks like Facebook so our friends can wish us a
happy birthday and follow us virtually on our vacations. We
want and sometimes need to have our purchase history in our
Amazon and eBay accounts for convenience and discounts
[2]. We rate movies in exchange for better movie recom-
mendations and higher enjoyment levels since for example,
Netflix now learns our cinematic preferences from our and
our friends’ movie rating activities.
The high availability of massive individual information on
the Internet also makes it a perfect place for delivering both
content and advertisements. Through leveraging web users’
search history, purchase history, demographic, geographic
location and other miscellaneous information, marketers are
able to identify those who are interested in their products.
This process is called behavioral targeting and is widely em-
ployed by most successful sites. Behavioral targeting helps
marketers gain advertising effectiveness and reduce their ad-
vertising budget. It focuses advertising spending towards
consumers who are already in market for a product or are
likely to purchase it. Since advertisers spend less on serv-
ing media (ads) in general, they can afford to offer more
aggressive discounts, thus bluring the line between content
and advertising. Behavioral targeting is also an important
part of the Internet ecosystem, as it enables many Internet
service providers to provide free services - from email and
social networking to games, discussion forums and produc-
tivity tools. As a side effect, it also improves consumers’
online experience since it reduces friction, clutter, and by-
passes irrelevant commercial ads.
While acknowledging the value of behavioral targeting,
privacy issues are of major concern since data sharing (via
partnerships, data exchanges, or direct deals) is the fuel for
content, media, and context optimization. Although users’
Personal Identifiable Information (PII) – such as SSN, email
address and telephone number – is usually removed in be-
havioral targeting, it is sometimes still possible to identify
specific users through non-PII data analysis. Real-world pri-
vacy breach cases occurred even though only non-PII data
was released. The Netflix competition[21] is one such exam-
ple, where Netflix released a set of anonymized film ratings
to the public for the purpose of improving the movie pre-
diction accuracy. Though the dataset is anonymized before
being released, researchers were still able to identify some
of the profile owners by leveraging external datasets such as
the Internet Movie Database (IMDB)[20]. A similar privacy
infringement happened after AOL publicly released search
data[11]. Clearly, great caution is required when releasing,
sharing or selling consumer data whether PII or non-PII,
such as a site visitor browsing actions/behavior.
The widespread use of behavioral targeting is increasing
consumer privacy concerns, triggering protests from privacy
and consumer advocacy groups, and leading governments to
pass laws to regulate the usage of online consumer data in
several countries[10]. As pointed out by Goldfarb et al. [10],
behavioral targeting usage reduction will lead to a consid-
erable drop in advertising effectiveness. Without effective
behavioral targeting, the Internet content providers which
subsidize the cost through advertising might become unable
to continue providing their free services to the web users
due to a reduction in their advertising revenue stream. In
an extreme case, web users could be asked to pay for services
and content, or alternately be exposed to highly irrelevant
and irritating advertisements. As a consequence, both pri-
vacy concerns and advertising effectiveness are important
and none of them can be ignored. The end goal is to con-
verge towards building an ecosystem with a high privacy
bar, low irrelevant content, high advertising efficiency, and
no friction among publishers, advertisers, and consumers.
This paper focuses on addressing privacy issues in behav-
ioral targeting by reducing the trade-off against advertising
effectiveness and making this trade-off explicit. Before delv-
ing into the discussion, it is worthwhile gaining an under-
standing of the current online behavioral targeting ecosys-
tem. Online behavioral targeting is a very large and com-
plex industry; for the discussion at hand, we can artificially
group the stakeholders into three major groups: data sup-
pliers, data aggregators and data buyers (Fig. 1).
Data aggregators pool data from multiple data suppliers.
The aggregated data covers a wide spectrum of web user
facets including demographic, geographic, interests, purchas-
ing history, etc. To optimize media campaigns data buyers
will turn to data aggregators and purchase relevant behav-
ioral targeting data by running real-time data campaigns.
For example, a marketer representing a car manufacturer
will run a data campaign purchasing user ids known to rep-
resent women in market for new luxury cars from a data
exchange. Subsequently, as the same users/consumers visit
publisher sites, the marketer will serve more ads to the users
known to be relevant (i.e. women in market for luxury cars).
In this paper, we use audience and online users/consumers
interchangeably. Here are two example requests from data
buyers (i.e. advertisers/marketers):
TRAVEL: A travel marketer wants to show ads to a set of
users who will travel to a specific destination. The mar-
keter requests audience data (i.e. runs a data campaign)
from a data aggregator; in this case, the audience is made of
users intending to fly to the given destination or book hotels
Data Aggretators
DMP
Data BuyersData Suppliers
Figure 1: Overview of the current online advertis-
ing ecosystem. Consumer data is traded online,
frequently with the end goal of placing ads that
lead to branding or conversions. A data supplier
owns data sources or serves content on websites fre-
quented by the consumer. A data buyer will use
the data to optimize media (ads) they will place on
publisher sites, usually via entities such as ad net-
works, ad exchanges, or demand side platforms. In
some cases, large data suppliers and large ad net-
works also serve as a data aggregators, while data
suppliers frequently serve as publishers.
around that location.
AUTO: An auto marketer wants to show ads to a set of In-
ternet users who are likely to buy a car in the near future.
The marketer requests audience data from aggregators: con-
sumers who have shown purchase intent for certain automo-
bile brands, down to the specific make and model.
In both cases, the data campaigns are typically started by
buyers directly through a web service or via a user interface.
Simply stated, campaigns are equivalent to long-running
queries issued by the buyer, running in a data exchange,
and targeting audience behaviors. As data becomes avail-
able - i.e. behaviors emerge - results are being generated and
corresponding micro-transactions occur. For clarity, in this
paper we will treat data campaign specifications as queries
of the following form: “find audience who exhibit purchase
intent for X“. and the corresponding query is “find audi-
ence who exhibit behavior X ”. This excludes demographic
and geographic data attributes such as: age, gender etc1
.
We focus on audience purchase intent behavior for two rea-
sons. First, it constitutes extremely valuable data, since
it translates into hightly discriminating variables leading to
clicks and conversions, and as a consequence is the target
of most campaigns running in data exchanges. Second, the
problem setting is intuitive, straight forward (Section 3.1),
and is very often ignored in the literature, since it is not
considered PII.
In our problem setting, there are a limited number of data
aggregators/exchanges who have achieved a significant scale
in terms of data, and a large number of data buyers (mar-
keters, advertisers, agencies) that are targeting relevant user
data by running data campaigns - i.e. running queries such
as “find users who will travel to San Francisco“). We assume
that aggregators process the data thereby filtering out PII
content. We also assume data aggregators offer secure and
1
The technique is not specific to a given data type and we
plan to expand on this aspect in future work
trusted data platforms and are white-hat, certified entities in
the ecosystem. Data buyers, on the other hand vary in their
level of security, trust, and ability. Due to the high num-
ber of buyers who purchase behavioral targeting data, pri-
vacy infringement, accidental or intentional, becomes more
of a concern. Even though de-identification, PII content fil-
tering, and data retention policies are a huge step towards
maintaining privacy, more can be done. One might argue
that since data buyers only target/purchase a limited set of
user behaviors, it unlikely that user identification could hap-
pen. However, an adversarial buyer may further aggregate
data from multiple campaigns, public and private sources
and attempt to identify users by corrborating and linking
different parts of their profiles[25]. The process is akin to
the Netflix and AOL privacy breach cases addressed above.
One characteristic of these privacy breaches arise from
the fact that data campaigns (or available datasets) are ex-
act, even though user identity is obscured. In other words,
user profiles are scrubbed, yet behavioral data campaigns /
queries produce exact results with missing attributes. This
makes it easy for technically savvy parties to join various
datasets together and recover the missing attributes. One
privacy preserving strategy that follows from this observa-
tion is to make this dataset joining problem impossible, or
extremely difficult. We can do this by preventing data buy-
ers from accurately reconstituting individual profiles if the
audiences and datasets available to data campaigns contain
uncertainty. With this motivation, we propose a smart noise
injection method to alleviate privacy concerns in the behav-
ioral targeting world. The method works by injecting noisy
data into the originally matching audience. The buyer will
be aware that the data is not 100 percent accurate, will also
know the signal to noise ratio, but will not be able to know
with certainty which users match their criteria and which do
not. At the same time, we also need to maintain the perfor-
mance (e.g. conversion rate) of the obfuscated audience. To
achieve this goal, noisy candidates are selected by a machine
learning algorithm instead of being selected randomly.
Further details are presented in subsequent sections. Sec-
tion 2 reviews related work and section 3 presents the pro-
posed method. In Section 4 we cover experimental design
and evaluate the privacy preserving method using real world
data.
2. RELATED WORK
Various Privacy Preserving Data Mining(PPDM) algo-
rithms have been developed to combat privacy violations
while keeping a minimum reduction of data utility [1] [24]
[6]. Privacy Preserving Data Mining (PPDM) can be traced
back to the early 90s and it became a very active research
topic early this century. Randomized methods and cryp-
tographic methods are the two major approaches that can
let people do data mining without access to precise infor-
mation in individual data records. In [1], R.Agrawal et al.
proposed a randomization algorithm to prevent exposure of
sensitive attributes. It works by perturbing the attribute
values with additive noise. Since the probability distribu-
tion of the noise is known, the distribution of the original
data can bey recovered by applying Bayesian inference to
the noise and perturbed data. With the recovered distribu-
tion, a decision tree algorithm can be run on the synthesized
data and obtain descent result in terms of privacy and classi-
fication accuracy. Data swapping is another method related
to randomized perturbation in the sense of running data
mining algorithms on the aggregate statistics of the origi-
nal data. It works by swapping the attribute values across
different records[9].
k-anonymity is another heavily studied PPDM method.
The basic idea of k-anonymity is to reduce the granularity
of the attribute value such that every combination of values
of quasi-identifiers will have at least k respondents[24]. This
is achieved through generalization and suppression. Gener-
alization means modifying the attribute value to a wider
range and suppression means removing the attribute com-
pletely. k-anonymity is an NP-hard problem. Researchers
have proposed approximate algorithms to find the solution
more efficiently[12]. Although k-anonymity has gained great
popularity, it suffers from subtle but severe privacy prob-
lems and could leads to insufficiency of diversity in sensitive
attributes. To overcome these problems, several methods
have been proposed, such as l-diversity [14], t-closeness[13],
(α,k)-anonymity[26], (k,3)-anonymity[28] and (c,k)-safety[15].
One major problem of k-anonymity is that it cannot provide
guaranteed privacy as it cannot account for all possible back-
ground knowledge. In contrast, differential privacy[7] gives
rigorous data privacy control without making assumptions
of background knowledge, while providing data to enable
effective data mining. To help data miners reveal accurate
statistics about a population while preserving the privacy of
individuals in the data, differential privacy ensures the out-
put of any nearly identical input sets are nearly identical and
therefore eliminates the possibility of deriving individual at-
tributes through differentiating outputs of multiple queries.
Most of the existing PPDM methods are addressed for the
scenario where data is shared between a trusted data holder
and untrusted individuals. In most cases, data sharing is for
the purpose of seeking a best data mining algorithm which
unveils the rationale concealed in the data. Privacy breaches
occur when adversary data recipients attempt to reconstruct
the identity of the anonymized subjects in the shared data.
There are two major data sharing frameworks: interactive
framework[5, 8, 23, 17] and non-interactive framework[3,
27, 18]. The interactive framework achieves privacy by per-
turbing the results of the query and limiting the number of
queries. The non-interactive framework releases sanitized
data that meets the differential privacy requirement. The
setting of our problem can also be viewed as an interactive
one: data buyers query the data aggregators for an audience
(i.e. results or web users). Similar to differential privacy, our
goal is is to prevent the data buyer from knowing an indi-
vidual’s true behaviors.
Compared to existing algorithms, the setting of our prob-
lem, behavioral targeting, differs in the following ways,
Larger scaled data. The subjects involved are online users
in the universe.
Higher attribute dimensionality. Diverse user Internet
activities as mentioned earlier.
The way data is shared. Each data request is a specific
requirement about targeted behavior(s). A subset (i.e. a set
of Internet users) instead of the whole data (all users and
their the behavior information) is shared for each advertis-
ing campaign request. Conversion rate as the utility of
the data. The definition of data utility in our task is the
conversion rate instead of the effectiveness of data mining.
As we will see later, these changes pose great challenges
to adopting or extending existing methods for behavioral
targeting. On the other hand, the specific problem setting
also gives us the opportunity to tailor the privacy preserving
technique to optimize the performance of the task.
Preventing privacy violation in behavioral targeting is a
problem that has been looked at by industry practitioners,
government agencies, and the research community. Some
countries have implemented very strict privacy regulations
to restrict the collection and use of consumer data. Un-
fortunately, the side effects of these regulations are drastic.
For example, after Privacy Directive2
was passed, the ad-
vertising effectiveness decreased significantly [10]. Jonathan
Mayer et al. advocate using Do Not Track technology to en-
able user to opt out of tracking by websites they do not visit,
including analytics services, advertising networks and social
platforms[16]. Most personalization services store user pro-
files in the server side, which is out of users’ control. Instead
of aggregating user profiles at server site, Mikhail Bilenko et
al. proposed to store user profiles as cookie string at the
client side[4]. This gives users complete control of their pro-
file and leads to decreased concern of privacy. Toubiana et
al. proposed a cryptographic billing system to enable behav-
ioral targeting taking place in the user’s browser. Provost et
al. proposed a technique to find an audience for a brand by
selecting the pseudo-social network (i.e. web page co-visiting
network) neighbors of brand seed users, without saving in-
formation about the browser identities page content [22].
However, most of the existing solutions do not solve the
privacy issues in the data sharing scenario discussed before
(Figure 1), although this is a common practice in the ad-
vertising industry. This paper focuses on preserving privacy
while sharing data for behavioral targeting, and tries to get
a solution that can be practically implemented into and per-
form well in the existing advertising industry.
3. OUR APPROACH
3.1 Problem Setting
As shown in Fig. 1, the players in the computational ad-
vertising industry can be roughly grouped into three cat-
egories, namely, data suppliers, data aggregators and data
buyers. A data aggregator holds a wide spectrum of web
user information including demographics, searching history,
purchasing history, etc. The information serves as a source
for data buyers to conduct campaigns. In a typical adver-
tising campaign the data sharing process operates by three
steps3
: first, a data buyer proposes campaign requirements
and express the requirements in the form of queries; second,
the data buyer submits the queries to a data aggregator; fi-
nally, the data aggregators return the data buyer a set of
web users who might meet the campaign requirements. We
assume the data buyer tells the goal of a campaign is to“find
audience who will do ( Y)”, and the queries are expressed in
the form “find audience who have done X ”. If a returned
web user later does do ( Y), we say this user is converted.
The utility of the data sharing process is aligned with how
2
European Uniion “Privacy and Electronic Communications
Directive”
3
Variations of this exist in the industry. For example, a data
buyer may only submit the requirement/goal and the data
aggregate tells the buyer the corresponding query. These
variations won’t affect the analysis of this paper much.
well the goal is achieved, which is measured by conversion
rate (more details in Section 4.3).
A reasonable assumption for the above setting is that the
data aggregator is a trustworthy player who already owns
the data, while the data buyer is not since there are millions
of possible buyers. When a data aggregator responds to each
campaign query with a group of audience members (i.e. web
users), the data buyer gains information about each web
user’s behavior, which leads to possible privacy breaches.
Though only a few attributes of web users could be learned
for each single campaign, richer individual profiles may be
obtained by joining multiple campaign results. This actually
increases the likelihood for the data buyer to derive sensitive
information about web users using the linking techniques
[25].
3.2 Noise Injection
We found the root cause of the privacy breach problem
is that the returned audience satisfies the campaign criteria
(i.e. the query) exactly. Motivated by this observation and
prior research on randomized methods for privacy preserv-
ing, we propose to obfuscate the exact audience by injecting
a noisy audience who would disqualify the campaign criteria.
In other words, we include audience members who did not
exhibit the requested behavior(s) in the returned set. The
presence of the noisy audience reduces data buyers’ belief
about an audience member’s behavior. Let Φ denote the
set of all possible behaviors requested by data buyers. Each
audience member can be represented as a binary vector
−→
b ,
where each dimension
−→
b k corresponds to the kth behavior
in Φ:4
−→
b k =
1 if exhibit behavior Φk
0 otherwise
(1)
The belief over each of the noise perturbed audience member
is measured as
P(
−→
b k = 1) =
N(
−→
b k = 1)
N(
−→
b k = 1) + N(
−→
b k = 0)
(2)
where N is a function counting the number of audience mem-
bers. Here we define belief and privacy as opposite endpoints
of a scale. Adjusting the ratio between N(
−→
b k = 1) and
N(
−→
b k = 0) leads to desirable belief/privacy level. Consid-
ering a concrete campaign in which the data buyer requests 1
million audience members for “Lexus”. Instead of returning
1 million exact matching web users to the data buyer, the
data aggregator returns half million web users who searched
“Lexus” and a second half million web users who did not
search “Lexus”. The data buyer’s belief over the behavior
(search “Lexus”) of each returned web user becomes 50%.
Equation 2 describes the belief over the behavior for a sin-
gle campaign. In a real world scenario, a data buyer can sub-
mit multiple campaign queries. By overlapping the results
from multiple campaigns, the data buyer can learn many
behaviors about the same web user. In this case, we need to
consider the joint belief over multiple behaviors. We assume
campaigns are independent. This assumption eliminates pri-
vacy concerns stemming from analyzing results from corre-
lated campaigns. 5
The joint belief for a set of independent
4
In this setting, “search Lexus once” and “search Lexus
twice” will be treated as different behaviors.
5
Enforcing noise injection consistency at a user level, so that
campaigns can be decomposed into products of belief over
single belief. Let
−→
b A represent the same user’s multiple be-
haviors gathered through K independent campaigns. The
joint belief is:
P(
−→
b A = 1) =
K
i=1
P(
−→
b i = 1) (3)
where P(
−→
b i = 1) is defined in Equation 2. Equation 3
indicates that the joint belief decreases as the dimensionality
grows. Considering a 5-dimension behavior vector with 70%
belief on each behavior, the overall belief is 0.75
≈ 0.17.
This property helps counterbalance the use of the linking
technique for reidentification[25].
3.3 Smart Noise Injection
So far we have not discussed the way the noisy audience
is generated. If they are randomly picked from those who
do not match the query, it is very unlikely they will convert
(i.e. meet the campaign goal and will do y). In this scenario,
effectiveness and privacy are conflicting objectives, because
increasing the level of injected random noise leads to better
privacy protection at the cost of effectiveness reduction. As
we have discussed in Section 1, none of the objectives could
be ignored. It is desirable to seek a solution that achieves
both.
Can we select a better set of noisy audience members in a
more controlled manner? This question leads us to propose
the following smart noise injection approach. Smart noise
injection aims to select noisy audience members who are
most likely convert to a campaign. We can do this by first
predicting the probability a user will convert (i.e. will do
y in the future) for each user, and then add noisy audience
members who are likely to do y and didn’t satisfy the query.
P(y = 1) = f(−→u ) (4)
where f is the prediction function and −→u is the profile of
an audience member which captures all information data
aggregators have for the audience member under consider-
ation. The information could be demographic, geographic
location, search history, purchase history, etc. We rank all
users by P(y = 1) in descending order. Top ranked users
will be added into the results as smart noise.
The key of smart noise injection idea is the prediction
function f used in Equation 4. f could be based either on
heuristics or data mining methods.
Heuristics Using Taxonomy Proximity
An example taxonomy tree is illustrated in Fig. 2. A tax-
onomy tree is a way to represent the associations between
concepts. We can match each user to the node(s) on the
taxonomy tree and match the goal y or the original query to
the node(s) on the same tree. One heuristic is to add users
associated with nodes that are siblings of target node(s).
Consider an auto maker (data buyer) that needs to run a
campaign for Lexus cars. The campaign request would be
the same user would not be cast as interested in traveling to
SFO and not interested at the same time, is one way to
address joint campaigns. Another is adopting query audit-
ing like mechanisms[19]. Optimizing both performance and
privacy cross-campaign is an interesting topic, however is
outside the scope of this paper.
'
&
$
%
IMT
Travel
Air Travel
. . .
Car Rental
. . .
Hotel
Autos
Makes
Ford Audi BMW Lexus
Retails
. . .
Figure 2: An example taxonomy showing the rela-
tionship between various categories.
“find audience who have searched Lexus”. The correspond-
ing smart noise audience could be those who have searched
Audi, BMW, etc. This is based on a heuristic rule that
users who have searched any child node of luxury cars (e.g.
BMW, Audi and Lexus) might be interested in everything
under the luxury car node.
Data Mining Approach
The rich information in user profile −→u allows us go beyond
heuristics like taxonomy trees. Considering the Lexus cam-
paign example, we can integrate various information such
as past behavior, geo-location, age, education, etc. to make
more reliable predictions. A data mining approach can be
used to learn the prediction function f from past conversion
data. We can treat f as a regression function, whose pa-
rameters can be obtained from the training data. Various
data mining algorithms (logistic regression, support vector
machines, gradient boosting trees, etc.) can be used to learn
the function.
Assuming the data aggregators have past conversion records
from a campaign, we can generate training data from these
past records. We can generate training data D = (−→x u, yu)
by preparing each audience member u in past records:
−→x u ⇐ (−→u , y) (5)
yu =
1 if user u converted
0 otherwise
(6)
−→x is the feature vector to represent each record and this
step could be application specific or be very general and
simple (e.g. let −→x = −→u ). Depending on the data mining
algorithm used, we can use an existing approach to learn the
parameters.
4. EXPERIMENTAL DESIGN
We carried out experiments to evaluate the effectiveness
of the proposed approach.
4.1 Data Set
We created an evaluation data set based on web log data
from a top Online Travel Agent (OTA) website in the United
States. The data contains 20 days of web users’ air ticket
searches and purchase history at the booking website. For
each airport, we have information about its location (lon-
gitude and latitude), text description about its attraction
(gathered from tripadvisor (http://ww.tripadvisor.com)),
and climate description such as monthly temperature, pre-
cipitation, etc. .
We used this data to simulate real-world campaigns. Each
audience involved in the experimental data is identified by
LAX SFOUser 1
User 2 LAX
ConversionCampaign
Figure 3: Two examples of segments. Each segment
contains one or more search behaviors followed by
one purchasing behavior. The top example shows
that User1 purchased a ticket to SFO after searching
LAX, LAX, SJC, SFO.
an anonymized user id and has airport search and ticket pur-
chase behaviors with a time stamp. Each audience is pre-
processed by sorting its behaviors in chronological order and
dividing the behavior’s time sequence into multiple segments
where each segment ends when a user finished purchasing a
ticket. Each segment contains the user’s search and pur-
chasing behaviors in that time frame. Fig. 3 includes two
example segments.
This gives us about 200k flight records (i.e. segments),
where each record contains a time sequence that includes
airports the user searched and the airport user purchased.
For each segment, we insert an advertising campaign at the
time point where the user have finished 20% of the behav-
ior in the segment. The goal of each campaign is “find an
audience who will fly to airport y” and the corresponding
campaign is “find an audience who has shown interest in
(i.e. searched) airport y”.
We compared the following three strategies to generate
the audience set for each campaign:
Exact Targeting(ET): identify an audience who has searched
airport y.
Random Noise(RN): replace part of the audience identified
by Exact Targeting with an audience uniformly sampled
from noisy audience.
Smart Noise(SN): the same as Random Noise except that
the noisy audience is selected by using the data mining tech-
nique.
4.2 Smart Noise Prediction Model
As discussed before (Equation 5), we need to prepare each
audience in a record/campaign to generate a feature vector
−→x based on user profile −→u and the target y. Fig. 3 illus-
trates running campaign “find an audience who will fly to
SFO”. Each audience is represented by the search behaviors
before the campaign. Thus we generated a 20-dimensional
feature vector −→x , where each dimension captures a specific
similarity between the searched airports and the converted
airport.
We used 50% of the data for training and 50% for evalu-
ation. For training, a 5-fold cross validation is applied. We
used the well-known Gradient Boosting Trees with bagging
to build the prediction model based on the training data.
We chose GBT because it can learn the interaction of fea-
tures and generalize well in various applications. For each
testing record we first predict the probability of each user
will convert to the airport specified in the campaign, and
Table 1: Symbols Used for Evaluation
Symbols Description
Ωt set of targeted users,
Nt number of targeted users, |Ωt|
Ωet users by exacting targeting
Ωet complementary set of Ωet
Ωr noisy users randomly chosen from Ωet
Ωs noisy users chosen from Ωet in a smart manner
Ωc
t set of converted users.
add the top ranking users as smart noise.
4.3 Evaluation Metric
The effectiveness of the audience targeting strategies is
measured by Conversion Rate which is defined as
CR =
|Ωc
t |
|Ωt|
(7)
where Ωc
t and Ωt denote the converted audience and the tar-
geted audience. Ωt can be computed by any of the targeting
strategies considered here. For the ease of exploration, we
introduced the symbols defined in Tab. 1. For exact target-
ing, its targeted audience consists of all audience by exact
targeting, namely, Ωt = Ωet. For the random noise and
smart noise injection approaches, their final campaign user
set is a mixture of p%×|Ωt| of exact records randomly sam-
pled from Ωet and (100 − p)% × |Ωet| of noisy users Ωr/s by
their own selecting method.
Ωr/s = smpr/s(Ωet, (100 − p)% × |Ωet|) (8)
5. EXPERIMENTAL RESULTS
We compared the performance of Exact Targeting (i.e. no
privacy protection), Random Noise, and Smart Noise meth-
ods. As we discussed before (Equation 2), the belief (privacy
level) could be tuned by adjusting the ratio between exact
audience and noisy audience. At the same time, changing
the ratio will also impact the effectiveness for Smart Noise
approach. To see the effectiveness/privacy trade-off, we eval-
uate all three behavioral targeting methods under various
noise levels. For each method at each noise level, the aver-
age conversion rate over all campaigns is reported (Fig. 4).
It’s clear that the Smart Noise method is consistently better
than the Random Noise method.
5.1 Further Analysis
Airport (Campaign) Examples
Each airport has different popularity among travelers. In
this paper we measure the popularity of an airport using
the number of records that converted. We are interested
to know how the popularity factor affects the performance.
We present the results for SFO (San Francisco International
Airport) , SEA (Seatle-Tacoma International Airport) and
SJC (Mineta San Jose International Airport), in Fig. 5. The
popularity of the three airports are very different (SFO >
SEA > SJC). In this data set, 3053 segments converted to
SFO, 2179 segments converted to SEA, and 789 segments
converted to SJC. 4189 segments contain search for SFO,
2833 segments contain search for SEA, and 1500 segments
contains search for SJC. For each airport, we calculated the
conversion rate under three noise levels (0.3, 0.6 and 0.8,
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
419	
   838	
   1257	
   1676	
   2095	
   2514	
   2933	
   3351	
   3770	
   4189	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(a) SFO - 0.3
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
419	
   838	
   1257	
   1676	
   2095	
   2514	
   2933	
   3351	
   3770	
   4189	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(b) SFO - 0.6
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
419	
   838	
   1257	
   1676	
   2095	
   2514	
   2933	
   3351	
   3770	
   4189	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(c) SFO - 0.8
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
0.35	
  
0.4	
  
281	
   562	
   844	
   1125	
   1407	
   1688	
   1969	
   2251	
   2532	
   2813	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(d) SEA - 0.3
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
0.35	
  
0.4	
  
281	
   562	
   844	
   1125	
   1407	
   1688	
   1969	
   2251	
   2532	
   2813	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(e) SEA - 0.6
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
0.35	
  
0.4	
  
281	
   562	
   844	
   1125	
   1407	
   1688	
   1969	
   2251	
   2532	
   2813	
  
Exact	
  Targe6ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(f) SEA - 0.8
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
150	
   300	
   450	
   600	
   750	
   900	
   1050	
   1200	
   1350	
   1500	
  
Exact	
  Targe5ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(g) SJC - 0.3
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
150	
   300	
   450	
   600	
   750	
   900	
   1050	
   1200	
   1350	
   1500	
  
Exact	
  Targe5ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(h) SJC - 0.6
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
150	
   300	
   450	
   600	
   750	
   900	
   1050	
   1200	
   1350	
   1500	
  
Exact	
  Targe5ng	
   Random	
  Noise	
   Smart	
  Noise	
  
(i) SJC - 0.8
Figure 5: How conversion rates vary with recall rates for different noise levels. For each graph, the horizontal
axis denotes the number of audience targeted/returned to data buyer and the vertical axis denotes the
conversion rate.
respectively). Each plot in Fig. 5 corresponds to one airport
and one noise level, which are indicated in the title of the
plot.
We have three major observations. First, we observe that
the conversion rate decreases with both the number of tar-
geted audience and noise level. It can be seen from the plots
that Smart Noise injection works better for popular airport
(SFO) than the others (SEA and SJC). A possible reason is
that popular airport SFO may have more high quality noisy
audience (i.e. audience who will convert to the airport), and
thus the top-ranked smart nosie for SFO are of higher qual-
ity. Second, we found the Smart Noise performs better than
exact targeting in some cases (Fig. 5.(a)-(c)). This suggests
the top ranking list of the Smart Noise prediction is bet-
ter than the audience manually selected for the campaign.
Third, we were surprised to see that the conversion rate (ac-
curacy) goes up at certain points for some of the plots. A
close look at the prediction results reveals that this is due to
insufficiency of representation of the audience. For example,
the jump from 900 to 1050 in plot (h) is due to the large
number of users who have the same airport search sequence
OAK → OAK → OAK, and all of them are around the
same position (around 900-1050) in the smart noise predic-
tion ranking list, since their feature vectors −→x are the same.
Some of them went (i.e. converted) to SJC and many oth-
ers went to other airports nearby6
. In this case, additional
information about the audience is needed to better present
and differentiate them in order to make accurate or smooth
predictions.
We used GBT to learn the prediction model in our ex-
periment. However, the proposed idea is not restricted to
specific machine learning techniques. GBT can be replaced
by other techniques upon the specific problem.
Influences of Features
We further examine the gradient boosting model learned to
see the influence of each input features. As shown in Fig. 6,
we found that LT-DIST and ER-DIST are still the most
influential features. This is as expected, since if there are
multiple airports near a city a user wants to go, the user
might search for all of those airports and finally fly to one of
the airports. ER-STYLE, MF-STYLE and LT-PRECIP are
also useful features. Although they are much weaker than
the distance feature they are still moving the performance
6
OAK,SJC, SFO are airports near each other.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Noise Level
ConversionRate
0.00.10.20.30.4
Exact Targeting
Random Noise
Smart Noise
Figure 4: Average conversion rate over all 255 air-
ports.
needle. This is not surprising since some people on vacation
might choose among cities of similar climates or have similar
attractions.
6. CONCLUSION AND FUTURE WORK
The advertising industry and its encompassing ecosystem
are increasingly moving towards higher level of security and
consumer privacy. However, privacy concerns still exist and
are often mistakenly seen in a harsh balance, zero-sum game
with advertising performance. In this paper we proposed a
data mining method to handle data sharing privacy issues for
behavioral targeting. We tested our method against a very
large, behavioral targeting dataset collected from a major
travel web site and ran location based data campaigns over
all 255 major United States airports, obtaining conversion
rates at various noise percentages. We show that the method
scales well with data size and attribute dimensionality. It
is clear that the smart noise strategy is consistently better
than uniform random noise injection. We also observed that
performance improvements vary across different campaigns
(i.e. airports). Instead of having to always trade off privacy
for utility, in certain cases we can even obtain more utility
(i.e. higher conversion rate) than exact data campaigns, by
inserting smart noise (e.g. in the case of “SFO travelers”).
This paper opens up a new direction in the advertising
world by proving it is possible to pursue privacy and per-
formance simultaneously. We are certain more methods will
emerge and we hope the field will grow and mature as a
result. Our experiments are based on data campaigns tar-
geting travelers to specific destinations. Although we ex-
pect a similar outcome when applied to other data types
(e.g. geographic, demographic etc), additional future work
LT−PRECIP
MP−OV
MP−TEMP
LT−OV
MP−PRECIP
MF−OV
LT−TEMP
MF−TEMP
ER−TEMP
MP−STYLE
MF−PRECIP
LT−STYLE
ER−PRECIP
LT−PRECIP
MF−STYLE
MP−DIST
MF−DIST
ER−STYLE
ER−DIST
LT−DIST
Feature Influence
0
10
20
30
40
50
60
Figure 6: Influence of each feature. Each name con-
tains the specific types of airports searched and in-
formation used to calculate the feature. LT, ER,
MP, MF denote latest, earliest, most popular, most
frequent airports in the user profile (i.e. the user
airport search sequence); DIST, STYLE, PRECIP,
TEMP denote the distance, attraction style similar-
ity, precipitation and temperature respectively. For
example, LT-DIST means the distance similarity be-
tween the latest airports searched and the target
airport.
is needed to validate this assumption. The proposed tech-
nique also needs to be adjusted and optimized for different
verticals (e.g. automobiles, retail, education, etc.). In our
experiments, we described a way to generate features and
take advantage of behavioral data richness. Even though
in our case the work focuses on features in the travel data
realm, it can be generalized and adapted to other applica-
tions. The regression function is not limited to GBT and
can be replaced by other alternatives depending on the ap-
plication. Our new approach is not specific to data sharing
between data aggregators/exchanges and data buyers, and
we are interested in applying it to data sharing between data
suppliers and data exchanges and other similar scenarios.
7. REFERENCES
[1] R. Agrawal and R. Srikant. Privacy-preserving data
mining. In ACM Sigmod Record, volume 29, pages
439–450. ACM, 2000.
[2] Amazon. Amazon mom program.
http://www.amazon.com/gp/mom/signup/info (last
visited: Jan 2012).
[3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale,
F. McSherry, and K. Talwar. Privacy, accuracy, and
consistency too: a holistic solution to contingency
table release. In Proceedings of the twenty-sixth ACM
SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems, pages 273–282. ACM, 2007.
[4] M. Bilenko and M. Richardson. Predictive client-side
profiles for personalized advertising. In Proceedings of
17th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD-11), 2011.
[5] I. Dinur and K. Nissim. Revealing information while
preserving privacy. In Proceedings of the twenty-second
ACM SIGMOD-SIGACT-SIGART symposium on
Principles of database systems, pages 202–210. ACM,
2003.
[6] C. Dwork. Differential privacy. Automata, languages
and programming, pages 1–12, 2006.
[7] C. Dwork. Differential privacy. In Automata,
Languages and Programming, volume 4052 of Lecture
Notes in Computer Science, pages 1–12. Springer
Berlin / Heidelberg, 2006.
[8] C. Dwork, F. McSherry, K. Nissim, and A. Smith.
Calibrating noise to sensitivity in private data
analysis. Theory of Cryptography, pages 265–284, 2006.
[9] S. Fienberg and J. McIntyre. Data swapping:
Variations on a theme by dalenius and reiss. In
Privacy in Statistical Databases, pages 519–519.
Springer, 2004.
[10] A. Goldfarb and C. Tucker. Privacy regulation and
online advertising. Management Science, 57(1):57–71,
2011.
[11] S. Hansell. Aol removes search data on vast group of
web users. New York Times, 8:C4, 2006.
[12] K. LeFevre, D. DeWitt, and R. Ramakrishnan.
Incognito: Efficient full-domain k-anonymity. In
Proceedings of the 2005 ACM SIGMOD international
conference on Management of data, pages 49–60.
ACM, 2005.
[13] N. Li, T. Li, and S. Venkatasubramanian. t-closeness:
Privacy beyond k-anonymity and l-diversity. In Data
Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 106–115. IEEE,
2007.
[14] A. Machanavajjhala, D. Kifer, J. Gehrke, and
M. Venkitasubramaniam. l-diversity: Privacy beyond
k-anonymity. ACM Transactions on Knowledge
Discovery from Data (TKDD), 1(1):3, 2007.
[15] D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke,
and J. Halpern. Worst-case background knowledge for
privacy-preserving data publishing. In Data
Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 126–135. IEEE,
2007.
[16] J. Mayer and A. Narayanan. Do not track: Universal
web tracking opt out. http://donottrack.us/.
[17] F. McSherry and I. Mironov. Differentially private
recommender systems: building privacy into the net.
In KDD ’09: Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and
data mining, pages 627–636. ACM, 2009.
[18] N. Mohammed, R. Chen, B. Fung, and S. Philip.
Differentially private data release for data mining.
Engineer, 18(40):2, 2011.
[19] S. Nabar, K. Kenthapadi, N. Mishra, and R. Motwani.
A survey of query auditing techniques for data privacy.
Privacy-Preserving Data Mining, pages 415–431, 2008.
[20] A. Narayanan and V. Shmatikov. How to break
anonymity of the netflix prize dataset. CoRR, pages
–1–1, 2006.
[21] Netflix. Netflix Prize, 2006(begin), 2009(close).
http://www.netflixprize.com/.
[22] F. Provost, B. Dalessandro, R. Hook, X. Zhang, and
A. Murray. Audience selection for on-line brand
advertising: privacy-friendly social network targeting.
In Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and
data mining, KDD ’09, pages 707–716, New York, NY,
USA, 2009. ACM.
[23] A. Roth and T. Roughgarden. Interactive privacy via
the median mechanism. In Proceedings of the 42nd
ACM symposium on Theory of computing, pages
765–774. ACM, 2010.
[24] P. Samarati. Protecting respondents’ identities in
microdata release. IEEE Transactions on Knowledge
and Data Engineering, pages 1010–1027, 2001.
[25] L. Sweeney et al. k-anonymity: A model for protecting
privacy. International Journal of Uncertainty
Fuzziness and Knowledge Based Systems,
10(5):557–570, 2002.
[26] R. Wong, J. Li, A. Fu, and K. Wang. (α,
k)-anonymity: an enhanced k-anonymity model for
privacy preserving data publishing. In Proceedings of
the 12th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 754–759.
ACM, 2006.
[27] X. Xiao, G. Wang, and J. Gehrke. Differential privacy
via wavelet transforms. In Data Engineering (ICDE),
2010 IEEE 26th International Conference on, pages
225 –236, march 2010.
[28] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu.
Aggregate query answering on anonymized tables. In
Data Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 116–125. IEEE,
2007.

More Related Content

What's hot

Paradigm shift in advertising, Perceptive media, Behavioural Advertisment
Paradigm shift in advertising, Perceptive media, Behavioural AdvertismentParadigm shift in advertising, Perceptive media, Behavioural Advertisment
Paradigm shift in advertising, Perceptive media, Behavioural Advertisment
miteshb89
 
Playbook The Ins and Outs of First Party Data_final
Playbook The Ins and Outs of First Party Data_finalPlaybook The Ins and Outs of First Party Data_final
Playbook The Ins and Outs of First Party Data_final
Donna Chang
 

What's hot (15)

Online Behavioral Advertising (OBA) Legal & Regulatory Compliance
Online Behavioral Advertising (OBA) Legal & Regulatory ComplianceOnline Behavioral Advertising (OBA) Legal & Regulatory Compliance
Online Behavioral Advertising (OBA) Legal & Regulatory Compliance
 
Paradigm shift in advertising, Perceptive media, Behavioural Advertisment
Paradigm shift in advertising, Perceptive media, Behavioural AdvertismentParadigm shift in advertising, Perceptive media, Behavioural Advertisment
Paradigm shift in advertising, Perceptive media, Behavioural Advertisment
 
Thomvest Native Advertising Overview
Thomvest Native Advertising OverviewThomvest Native Advertising Overview
Thomvest Native Advertising Overview
 
Trusted Download Program: A Year in the Trenches - How Trusted Downloads Make...
Trusted Download Program: A Year in the Trenches - How Trusted Downloads Make...Trusted Download Program: A Year in the Trenches - How Trusted Downloads Make...
Trusted Download Program: A Year in the Trenches - How Trusted Downloads Make...
 
A Byte of Programmatic Buying
A Byte of Programmatic BuyingA Byte of Programmatic Buying
A Byte of Programmatic Buying
 
TRUSTe whitepaper- A Checklist of Practices that Impact Consumer Trust
TRUSTe whitepaper- A Checklist of Practices that Impact Consumer TrustTRUSTe whitepaper- A Checklist of Practices that Impact Consumer Trust
TRUSTe whitepaper- A Checklist of Practices that Impact Consumer Trust
 
Case Study: Natura
Case Study: NaturaCase Study: Natura
Case Study: Natura
 
Playbook The Ins and Outs of First Party Data_final
Playbook The Ins and Outs of First Party Data_finalPlaybook The Ins and Outs of First Party Data_final
Playbook The Ins and Outs of First Party Data_final
 
Lead Generation in B2B Communities
Lead Generation in B2B CommunitiesLead Generation in B2B Communities
Lead Generation in B2B Communities
 
RESEARCH PROPOSAL
RESEARCH PROPOSALRESEARCH PROPOSAL
RESEARCH PROPOSAL
 
Internet advertising
Internet advertisingInternet advertising
Internet advertising
 
FirstPartner Data Driven Marketing Market Map 2014
FirstPartner Data Driven Marketing Market Map 2014FirstPartner Data Driven Marketing Market Map 2014
FirstPartner Data Driven Marketing Market Map 2014
 
Integrating Offline Marketing Strategies Into the Digital World, Jan Jindra, ...
Integrating Offline Marketing Strategies Into the Digital World, Jan Jindra, ...Integrating Offline Marketing Strategies Into the Digital World, Jan Jindra, ...
Integrating Offline Marketing Strategies Into the Digital World, Jan Jindra, ...
 
Get Advertising Smart - Transforming Customer Relationships with the GDPR
Get Advertising Smart - Transforming Customer Relationships with the GDPRGet Advertising Smart - Transforming Customer Relationships with the GDPR
Get Advertising Smart - Transforming Customer Relationships with the GDPR
 
Members, Subcommittee on Digital Commerce and Consumer Protection Re: Underst...
Members, Subcommittee on Digital Commerce and Consumer Protection Re: Underst...Members, Subcommittee on Digital Commerce and Consumer Protection Re: Underst...
Members, Subcommittee on Digital Commerce and Consumer Protection Re: Underst...
 

Viewers also liked

Vì đâu dân văn phòng bị thoái hóa đốt sống cổ
Vì đâu dân văn phòng bị thoái hóa đốt sống cổVì đâu dân văn phòng bị thoái hóa đốt sống cổ
Vì đâu dân văn phòng bị thoái hóa đốt sống cổ
freeman352
 
cần bán đồng hồ casio thể thao
cần bán đồng hồ casio thể thaocần bán đồng hồ casio thể thao
cần bán đồng hồ casio thể thao
freeman352
 
Why is EducationalTechnology Important ?
Why is EducationalTechnology Important ?Why is EducationalTechnology Important ?
Why is EducationalTechnology Important ?
Catherine Bacalso
 
cửa hàng bán đồng hồ casio uy tín
cửa hàng bán đồng hồ casio uy tíncửa hàng bán đồng hồ casio uy tín
cửa hàng bán đồng hồ casio uy tín
teisha506
 
trung tâm bán đồng hồ casio kim
trung tâm bán đồng hồ casio kimtrung tâm bán đồng hồ casio kim
trung tâm bán đồng hồ casio kim
teodoro529
 
cửa hàng bán đồng hồ casio cao cấp
cửa hàng bán đồng hồ casio cao cấpcửa hàng bán đồng hồ casio cao cấp
cửa hàng bán đồng hồ casio cao cấp
neville371
 
tư vấn mua đồng hồ casio ở tphcm
tư vấn mua đồng hồ casio ở tphcmtư vấn mua đồng hồ casio ở tphcm
tư vấn mua đồng hồ casio ở tphcm
jone403
 

Viewers also liked (10)

Vì đâu dân văn phòng bị thoái hóa đốt sống cổ
Vì đâu dân văn phòng bị thoái hóa đốt sống cổVì đâu dân văn phòng bị thoái hóa đốt sống cổ
Vì đâu dân văn phòng bị thoái hóa đốt sống cổ
 
cần bán đồng hồ casio thể thao
cần bán đồng hồ casio thể thaocần bán đồng hồ casio thể thao
cần bán đồng hồ casio thể thao
 
Why is EducationalTechnology Important ?
Why is EducationalTechnology Important ?Why is EducationalTechnology Important ?
Why is EducationalTechnology Important ?
 
Why Choose Ireland for Meetings and Incentives
Why Choose Ireland for Meetings and IncentivesWhy Choose Ireland for Meetings and Incentives
Why Choose Ireland for Meetings and Incentives
 
cửa hàng bán đồng hồ casio uy tín
cửa hàng bán đồng hồ casio uy tíncửa hàng bán đồng hồ casio uy tín
cửa hàng bán đồng hồ casio uy tín
 
trung tâm bán đồng hồ casio kim
trung tâm bán đồng hồ casio kimtrung tâm bán đồng hồ casio kim
trung tâm bán đồng hồ casio kim
 
Master Limited Partnership - How do They Work?
Master Limited Partnership - How do They Work?Master Limited Partnership - How do They Work?
Master Limited Partnership - How do They Work?
 
Go on, Take the Money and Run! How to get the most out of reimbursement claims.
Go on, Take the Money and Run! How to get the most out of reimbursement claims.Go on, Take the Money and Run! How to get the most out of reimbursement claims.
Go on, Take the Money and Run! How to get the most out of reimbursement claims.
 
cửa hàng bán đồng hồ casio cao cấp
cửa hàng bán đồng hồ casio cao cấpcửa hàng bán đồng hồ casio cao cấp
cửa hàng bán đồng hồ casio cao cấp
 
tư vấn mua đồng hồ casio ở tphcm
tư vấn mua đồng hồ casio ở tphcmtư vấn mua đồng hồ casio ở tphcm
tư vấn mua đồng hồ casio ở tphcm
 

Similar to a6-zhao

Behavioral Targeting and Dynamic Content creation
Behavioral Targeting and Dynamic Content creationBehavioral Targeting and Dynamic Content creation
Behavioral Targeting and Dynamic Content creation
Simon Hjorth
 
Boosting impact bcg
Boosting impact bcgBoosting impact bcg
Boosting impact bcg
AdCMO
 
AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4
Jeremy Chia
 
DLS_Electronic_Notice_Audit_Report
DLS_Electronic_Notice_Audit_ReportDLS_Electronic_Notice_Audit_Report
DLS_Electronic_Notice_Audit_Report
Todd B. Hilsee
 

Similar to a6-zhao (20)

What Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS NewsWhat Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS News
 
What Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS NewsWhat Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS News
 
What Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS NewsWhat Is Behavioral Targeting? - CBS News
What Is Behavioral Targeting? - CBS News
 
Behavioral Targeting and Dynamic Content creation
Behavioral Targeting and Dynamic Content creationBehavioral Targeting and Dynamic Content creation
Behavioral Targeting and Dynamic Content creation
 
IAB Europe Report: Using Data Effectively in Programmatic V2.0 (GDPR Update)
IAB Europe Report: Using Data Effectively in Programmatic V2.0 (GDPR Update)IAB Europe Report: Using Data Effectively in Programmatic V2.0 (GDPR Update)
IAB Europe Report: Using Data Effectively in Programmatic V2.0 (GDPR Update)
 
Winterberry group the state of consumer data onboarding november 2016
Winterberry group   the state of consumer data onboarding november 2016Winterberry group   the state of consumer data onboarding november 2016
Winterberry group the state of consumer data onboarding november 2016
 
Big Data, Analytics and Data Science
Big Data, Analytics and Data ScienceBig Data, Analytics and Data Science
Big Data, Analytics and Data Science
 
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold  Digging Beneath The Surface Of Data MiningAll That Glitters Is Not Gold  Digging Beneath The Surface Of Data Mining
All That Glitters Is Not Gold Digging Beneath The Surface Of Data Mining
 
Data Monetization: Leveraging Subscriber Data to Create New Opportunities
Data Monetization: Leveraging Subscriber Data to Create New OpportunitiesData Monetization: Leveraging Subscriber Data to Create New Opportunities
Data Monetization: Leveraging Subscriber Data to Create New Opportunities
 
Collective 2010-display-study
Collective 2010-display-studyCollective 2010-display-study
Collective 2010-display-study
 
How to Safely Scrape Data from Social Media Platforms and News Websites.pptx
How to Safely Scrape Data from Social Media Platforms and News Websites.pptxHow to Safely Scrape Data from Social Media Platforms and News Websites.pptx
How to Safely Scrape Data from Social Media Platforms and News Websites.pptx
 
Boosting impact bcg
Boosting impact bcgBoosting impact bcg
Boosting impact bcg
 
Google and Boston Consulting Group Case Study
Google and Boston Consulting Group Case StudyGoogle and Boston Consulting Group Case Study
Google and Boston Consulting Group Case Study
 
A Practitioner’s Guide to Web Analytics: Designed for the B-to-B Marketer
A Practitioner’s Guide to Web Analytics: Designed for the B-to-B MarketerA Practitioner’s Guide to Web Analytics: Designed for the B-to-B Marketer
A Practitioner’s Guide to Web Analytics: Designed for the B-to-B Marketer
 
AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4
 
Online macro environment, The digital marketing environment
Online macro environment, The digital marketing environmentOnline macro environment, The digital marketing environment
Online macro environment, The digital marketing environment
 
How to Safely Scrape Data from Social Media Platforms and News Websites.pdf
How to Safely Scrape Data from Social Media Platforms and News Websites.pdfHow to Safely Scrape Data from Social Media Platforms and News Websites.pdf
How to Safely Scrape Data from Social Media Platforms and News Websites.pdf
 
Getting attribution right
Getting attribution rightGetting attribution right
Getting attribution right
 
DLS_Electronic_Notice_Audit_Report
DLS_Electronic_Notice_Audit_ReportDLS_Electronic_Notice_Audit_Report
DLS_Electronic_Notice_Audit_Report
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 

a6-zhao

  • 1. Have Your Cake and Eat It Too! Preserving Privacy while Achieving High Behavioral Targeting Performance Qi Zhao, Yi Zhang School of Engineering University of California Santa Cruz {manazhao,yiz}@soe.ucsc.edu Lucian Vlad Lita Blue Kai, Inc Cupertino, CA 95014 lucian@bluekai.com ABSTRACT Privacy is a major concern for Internet users and Internet policy regulators. Privacy violations usually entail either sharing Personally Identifying Information (PII) or non-PII information such as a site visitor’s behavior on a website. On the other hand, Internet advertising through behavioral targeting is an important part of the Internet ecosystem, as it provides users more relevant information and enables content/data providers to provide free services to end users. In order to achieve effective behavioral targeting, it is de- sirable for the advertisers to access a set of users with the targeted behaviors. A key question is how should data flow from a provider (e.g. publisher) to a third party advertiser to achieve effective behavioral targeting, all while without directly sharing exact user behavior data. This paper at- tempts to answer this question and proposes a privacy pre- serving technique for behavioral targeting that does not en- tail a drastic reduction in advertising effectiveness. When behavioral targeting data is transferred to an advertiser, we use a smart, data mining-based noise injection method that perturbs the results (a set of users meeting specified cri- teria) by carefully adding noisy data points that maintain a high level of performance. Upon receiving the data, the advertiser cannot distinguish accurate data points adhering to specifications, versus noisy data, which does not meet the specifications. Using data from a major US top Online Travel Agent (OTA), we evaluated the proposed technique for location-based behavioral targeting, whereby advertisers run data campaigns targeting travelers for specific destina- tions. Our experimental results demonstrate that such data campaigns obtain results that enhance or preserve user pri- vacy while maintaining a high level of targeting performance. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ADKDD’12, August, 12–16, Beijing, China Copyright 2012 ACM 978-1-4503-1545-6/12/08 ...$15.00. General Terms Algorithms, Performance, Security Keywords Behavioral targeting, privacy, data mining 1. INTRODUCTION Recently, we have observed an exponential growth in the number of web services spanning search engines and e-commerce websites and propagating across multiple verticals. In most cases we, the users, provide personal information in order to enjoy these Internet services. For example, we willingly register our demographic and geographic information with social networks like Facebook so our friends can wish us a happy birthday and follow us virtually on our vacations. We want and sometimes need to have our purchase history in our Amazon and eBay accounts for convenience and discounts [2]. We rate movies in exchange for better movie recom- mendations and higher enjoyment levels since for example, Netflix now learns our cinematic preferences from our and our friends’ movie rating activities. The high availability of massive individual information on the Internet also makes it a perfect place for delivering both content and advertisements. Through leveraging web users’ search history, purchase history, demographic, geographic location and other miscellaneous information, marketers are able to identify those who are interested in their products. This process is called behavioral targeting and is widely em- ployed by most successful sites. Behavioral targeting helps marketers gain advertising effectiveness and reduce their ad- vertising budget. It focuses advertising spending towards consumers who are already in market for a product or are likely to purchase it. Since advertisers spend less on serv- ing media (ads) in general, they can afford to offer more aggressive discounts, thus bluring the line between content and advertising. Behavioral targeting is also an important part of the Internet ecosystem, as it enables many Internet service providers to provide free services - from email and social networking to games, discussion forums and produc- tivity tools. As a side effect, it also improves consumers’ online experience since it reduces friction, clutter, and by- passes irrelevant commercial ads. While acknowledging the value of behavioral targeting, privacy issues are of major concern since data sharing (via partnerships, data exchanges, or direct deals) is the fuel for content, media, and context optimization. Although users’ Personal Identifiable Information (PII) – such as SSN, email
  • 2. address and telephone number – is usually removed in be- havioral targeting, it is sometimes still possible to identify specific users through non-PII data analysis. Real-world pri- vacy breach cases occurred even though only non-PII data was released. The Netflix competition[21] is one such exam- ple, where Netflix released a set of anonymized film ratings to the public for the purpose of improving the movie pre- diction accuracy. Though the dataset is anonymized before being released, researchers were still able to identify some of the profile owners by leveraging external datasets such as the Internet Movie Database (IMDB)[20]. A similar privacy infringement happened after AOL publicly released search data[11]. Clearly, great caution is required when releasing, sharing or selling consumer data whether PII or non-PII, such as a site visitor browsing actions/behavior. The widespread use of behavioral targeting is increasing consumer privacy concerns, triggering protests from privacy and consumer advocacy groups, and leading governments to pass laws to regulate the usage of online consumer data in several countries[10]. As pointed out by Goldfarb et al. [10], behavioral targeting usage reduction will lead to a consid- erable drop in advertising effectiveness. Without effective behavioral targeting, the Internet content providers which subsidize the cost through advertising might become unable to continue providing their free services to the web users due to a reduction in their advertising revenue stream. In an extreme case, web users could be asked to pay for services and content, or alternately be exposed to highly irrelevant and irritating advertisements. As a consequence, both pri- vacy concerns and advertising effectiveness are important and none of them can be ignored. The end goal is to con- verge towards building an ecosystem with a high privacy bar, low irrelevant content, high advertising efficiency, and no friction among publishers, advertisers, and consumers. This paper focuses on addressing privacy issues in behav- ioral targeting by reducing the trade-off against advertising effectiveness and making this trade-off explicit. Before delv- ing into the discussion, it is worthwhile gaining an under- standing of the current online behavioral targeting ecosys- tem. Online behavioral targeting is a very large and com- plex industry; for the discussion at hand, we can artificially group the stakeholders into three major groups: data sup- pliers, data aggregators and data buyers (Fig. 1). Data aggregators pool data from multiple data suppliers. The aggregated data covers a wide spectrum of web user facets including demographic, geographic, interests, purchas- ing history, etc. To optimize media campaigns data buyers will turn to data aggregators and purchase relevant behav- ioral targeting data by running real-time data campaigns. For example, a marketer representing a car manufacturer will run a data campaign purchasing user ids known to rep- resent women in market for new luxury cars from a data exchange. Subsequently, as the same users/consumers visit publisher sites, the marketer will serve more ads to the users known to be relevant (i.e. women in market for luxury cars). In this paper, we use audience and online users/consumers interchangeably. Here are two example requests from data buyers (i.e. advertisers/marketers): TRAVEL: A travel marketer wants to show ads to a set of users who will travel to a specific destination. The mar- keter requests audience data (i.e. runs a data campaign) from a data aggregator; in this case, the audience is made of users intending to fly to the given destination or book hotels Data Aggretators DMP Data BuyersData Suppliers Figure 1: Overview of the current online advertis- ing ecosystem. Consumer data is traded online, frequently with the end goal of placing ads that lead to branding or conversions. A data supplier owns data sources or serves content on websites fre- quented by the consumer. A data buyer will use the data to optimize media (ads) they will place on publisher sites, usually via entities such as ad net- works, ad exchanges, or demand side platforms. In some cases, large data suppliers and large ad net- works also serve as a data aggregators, while data suppliers frequently serve as publishers. around that location. AUTO: An auto marketer wants to show ads to a set of In- ternet users who are likely to buy a car in the near future. The marketer requests audience data from aggregators: con- sumers who have shown purchase intent for certain automo- bile brands, down to the specific make and model. In both cases, the data campaigns are typically started by buyers directly through a web service or via a user interface. Simply stated, campaigns are equivalent to long-running queries issued by the buyer, running in a data exchange, and targeting audience behaviors. As data becomes avail- able - i.e. behaviors emerge - results are being generated and corresponding micro-transactions occur. For clarity, in this paper we will treat data campaign specifications as queries of the following form: “find audience who exhibit purchase intent for X“. and the corresponding query is “find audi- ence who exhibit behavior X ”. This excludes demographic and geographic data attributes such as: age, gender etc1 . We focus on audience purchase intent behavior for two rea- sons. First, it constitutes extremely valuable data, since it translates into hightly discriminating variables leading to clicks and conversions, and as a consequence is the target of most campaigns running in data exchanges. Second, the problem setting is intuitive, straight forward (Section 3.1), and is very often ignored in the literature, since it is not considered PII. In our problem setting, there are a limited number of data aggregators/exchanges who have achieved a significant scale in terms of data, and a large number of data buyers (mar- keters, advertisers, agencies) that are targeting relevant user data by running data campaigns - i.e. running queries such as “find users who will travel to San Francisco“). We assume that aggregators process the data thereby filtering out PII content. We also assume data aggregators offer secure and 1 The technique is not specific to a given data type and we plan to expand on this aspect in future work
  • 3. trusted data platforms and are white-hat, certified entities in the ecosystem. Data buyers, on the other hand vary in their level of security, trust, and ability. Due to the high num- ber of buyers who purchase behavioral targeting data, pri- vacy infringement, accidental or intentional, becomes more of a concern. Even though de-identification, PII content fil- tering, and data retention policies are a huge step towards maintaining privacy, more can be done. One might argue that since data buyers only target/purchase a limited set of user behaviors, it unlikely that user identification could hap- pen. However, an adversarial buyer may further aggregate data from multiple campaigns, public and private sources and attempt to identify users by corrborating and linking different parts of their profiles[25]. The process is akin to the Netflix and AOL privacy breach cases addressed above. One characteristic of these privacy breaches arise from the fact that data campaigns (or available datasets) are ex- act, even though user identity is obscured. In other words, user profiles are scrubbed, yet behavioral data campaigns / queries produce exact results with missing attributes. This makes it easy for technically savvy parties to join various datasets together and recover the missing attributes. One privacy preserving strategy that follows from this observa- tion is to make this dataset joining problem impossible, or extremely difficult. We can do this by preventing data buy- ers from accurately reconstituting individual profiles if the audiences and datasets available to data campaigns contain uncertainty. With this motivation, we propose a smart noise injection method to alleviate privacy concerns in the behav- ioral targeting world. The method works by injecting noisy data into the originally matching audience. The buyer will be aware that the data is not 100 percent accurate, will also know the signal to noise ratio, but will not be able to know with certainty which users match their criteria and which do not. At the same time, we also need to maintain the perfor- mance (e.g. conversion rate) of the obfuscated audience. To achieve this goal, noisy candidates are selected by a machine learning algorithm instead of being selected randomly. Further details are presented in subsequent sections. Sec- tion 2 reviews related work and section 3 presents the pro- posed method. In Section 4 we cover experimental design and evaluate the privacy preserving method using real world data. 2. RELATED WORK Various Privacy Preserving Data Mining(PPDM) algo- rithms have been developed to combat privacy violations while keeping a minimum reduction of data utility [1] [24] [6]. Privacy Preserving Data Mining (PPDM) can be traced back to the early 90s and it became a very active research topic early this century. Randomized methods and cryp- tographic methods are the two major approaches that can let people do data mining without access to precise infor- mation in individual data records. In [1], R.Agrawal et al. proposed a randomization algorithm to prevent exposure of sensitive attributes. It works by perturbing the attribute values with additive noise. Since the probability distribu- tion of the noise is known, the distribution of the original data can bey recovered by applying Bayesian inference to the noise and perturbed data. With the recovered distribu- tion, a decision tree algorithm can be run on the synthesized data and obtain descent result in terms of privacy and classi- fication accuracy. Data swapping is another method related to randomized perturbation in the sense of running data mining algorithms on the aggregate statistics of the origi- nal data. It works by swapping the attribute values across different records[9]. k-anonymity is another heavily studied PPDM method. The basic idea of k-anonymity is to reduce the granularity of the attribute value such that every combination of values of quasi-identifiers will have at least k respondents[24]. This is achieved through generalization and suppression. Gener- alization means modifying the attribute value to a wider range and suppression means removing the attribute com- pletely. k-anonymity is an NP-hard problem. Researchers have proposed approximate algorithms to find the solution more efficiently[12]. Although k-anonymity has gained great popularity, it suffers from subtle but severe privacy prob- lems and could leads to insufficiency of diversity in sensitive attributes. To overcome these problems, several methods have been proposed, such as l-diversity [14], t-closeness[13], (α,k)-anonymity[26], (k,3)-anonymity[28] and (c,k)-safety[15]. One major problem of k-anonymity is that it cannot provide guaranteed privacy as it cannot account for all possible back- ground knowledge. In contrast, differential privacy[7] gives rigorous data privacy control without making assumptions of background knowledge, while providing data to enable effective data mining. To help data miners reveal accurate statistics about a population while preserving the privacy of individuals in the data, differential privacy ensures the out- put of any nearly identical input sets are nearly identical and therefore eliminates the possibility of deriving individual at- tributes through differentiating outputs of multiple queries. Most of the existing PPDM methods are addressed for the scenario where data is shared between a trusted data holder and untrusted individuals. In most cases, data sharing is for the purpose of seeking a best data mining algorithm which unveils the rationale concealed in the data. Privacy breaches occur when adversary data recipients attempt to reconstruct the identity of the anonymized subjects in the shared data. There are two major data sharing frameworks: interactive framework[5, 8, 23, 17] and non-interactive framework[3, 27, 18]. The interactive framework achieves privacy by per- turbing the results of the query and limiting the number of queries. The non-interactive framework releases sanitized data that meets the differential privacy requirement. The setting of our problem can also be viewed as an interactive one: data buyers query the data aggregators for an audience (i.e. results or web users). Similar to differential privacy, our goal is is to prevent the data buyer from knowing an indi- vidual’s true behaviors. Compared to existing algorithms, the setting of our prob- lem, behavioral targeting, differs in the following ways, Larger scaled data. The subjects involved are online users in the universe. Higher attribute dimensionality. Diverse user Internet activities as mentioned earlier. The way data is shared. Each data request is a specific requirement about targeted behavior(s). A subset (i.e. a set of Internet users) instead of the whole data (all users and their the behavior information) is shared for each advertis- ing campaign request. Conversion rate as the utility of the data. The definition of data utility in our task is the conversion rate instead of the effectiveness of data mining. As we will see later, these changes pose great challenges
  • 4. to adopting or extending existing methods for behavioral targeting. On the other hand, the specific problem setting also gives us the opportunity to tailor the privacy preserving technique to optimize the performance of the task. Preventing privacy violation in behavioral targeting is a problem that has been looked at by industry practitioners, government agencies, and the research community. Some countries have implemented very strict privacy regulations to restrict the collection and use of consumer data. Un- fortunately, the side effects of these regulations are drastic. For example, after Privacy Directive2 was passed, the ad- vertising effectiveness decreased significantly [10]. Jonathan Mayer et al. advocate using Do Not Track technology to en- able user to opt out of tracking by websites they do not visit, including analytics services, advertising networks and social platforms[16]. Most personalization services store user pro- files in the server side, which is out of users’ control. Instead of aggregating user profiles at server site, Mikhail Bilenko et al. proposed to store user profiles as cookie string at the client side[4]. This gives users complete control of their pro- file and leads to decreased concern of privacy. Toubiana et al. proposed a cryptographic billing system to enable behav- ioral targeting taking place in the user’s browser. Provost et al. proposed a technique to find an audience for a brand by selecting the pseudo-social network (i.e. web page co-visiting network) neighbors of brand seed users, without saving in- formation about the browser identities page content [22]. However, most of the existing solutions do not solve the privacy issues in the data sharing scenario discussed before (Figure 1), although this is a common practice in the ad- vertising industry. This paper focuses on preserving privacy while sharing data for behavioral targeting, and tries to get a solution that can be practically implemented into and per- form well in the existing advertising industry. 3. OUR APPROACH 3.1 Problem Setting As shown in Fig. 1, the players in the computational ad- vertising industry can be roughly grouped into three cat- egories, namely, data suppliers, data aggregators and data buyers. A data aggregator holds a wide spectrum of web user information including demographics, searching history, purchasing history, etc. The information serves as a source for data buyers to conduct campaigns. In a typical adver- tising campaign the data sharing process operates by three steps3 : first, a data buyer proposes campaign requirements and express the requirements in the form of queries; second, the data buyer submits the queries to a data aggregator; fi- nally, the data aggregators return the data buyer a set of web users who might meet the campaign requirements. We assume the data buyer tells the goal of a campaign is to“find audience who will do ( Y)”, and the queries are expressed in the form “find audience who have done X ”. If a returned web user later does do ( Y), we say this user is converted. The utility of the data sharing process is aligned with how 2 European Uniion “Privacy and Electronic Communications Directive” 3 Variations of this exist in the industry. For example, a data buyer may only submit the requirement/goal and the data aggregate tells the buyer the corresponding query. These variations won’t affect the analysis of this paper much. well the goal is achieved, which is measured by conversion rate (more details in Section 4.3). A reasonable assumption for the above setting is that the data aggregator is a trustworthy player who already owns the data, while the data buyer is not since there are millions of possible buyers. When a data aggregator responds to each campaign query with a group of audience members (i.e. web users), the data buyer gains information about each web user’s behavior, which leads to possible privacy breaches. Though only a few attributes of web users could be learned for each single campaign, richer individual profiles may be obtained by joining multiple campaign results. This actually increases the likelihood for the data buyer to derive sensitive information about web users using the linking techniques [25]. 3.2 Noise Injection We found the root cause of the privacy breach problem is that the returned audience satisfies the campaign criteria (i.e. the query) exactly. Motivated by this observation and prior research on randomized methods for privacy preserv- ing, we propose to obfuscate the exact audience by injecting a noisy audience who would disqualify the campaign criteria. In other words, we include audience members who did not exhibit the requested behavior(s) in the returned set. The presence of the noisy audience reduces data buyers’ belief about an audience member’s behavior. Let Φ denote the set of all possible behaviors requested by data buyers. Each audience member can be represented as a binary vector −→ b , where each dimension −→ b k corresponds to the kth behavior in Φ:4 −→ b k = 1 if exhibit behavior Φk 0 otherwise (1) The belief over each of the noise perturbed audience member is measured as P( −→ b k = 1) = N( −→ b k = 1) N( −→ b k = 1) + N( −→ b k = 0) (2) where N is a function counting the number of audience mem- bers. Here we define belief and privacy as opposite endpoints of a scale. Adjusting the ratio between N( −→ b k = 1) and N( −→ b k = 0) leads to desirable belief/privacy level. Consid- ering a concrete campaign in which the data buyer requests 1 million audience members for “Lexus”. Instead of returning 1 million exact matching web users to the data buyer, the data aggregator returns half million web users who searched “Lexus” and a second half million web users who did not search “Lexus”. The data buyer’s belief over the behavior (search “Lexus”) of each returned web user becomes 50%. Equation 2 describes the belief over the behavior for a sin- gle campaign. In a real world scenario, a data buyer can sub- mit multiple campaign queries. By overlapping the results from multiple campaigns, the data buyer can learn many behaviors about the same web user. In this case, we need to consider the joint belief over multiple behaviors. We assume campaigns are independent. This assumption eliminates pri- vacy concerns stemming from analyzing results from corre- lated campaigns. 5 The joint belief for a set of independent 4 In this setting, “search Lexus once” and “search Lexus twice” will be treated as different behaviors. 5 Enforcing noise injection consistency at a user level, so that
  • 5. campaigns can be decomposed into products of belief over single belief. Let −→ b A represent the same user’s multiple be- haviors gathered through K independent campaigns. The joint belief is: P( −→ b A = 1) = K i=1 P( −→ b i = 1) (3) where P( −→ b i = 1) is defined in Equation 2. Equation 3 indicates that the joint belief decreases as the dimensionality grows. Considering a 5-dimension behavior vector with 70% belief on each behavior, the overall belief is 0.75 ≈ 0.17. This property helps counterbalance the use of the linking technique for reidentification[25]. 3.3 Smart Noise Injection So far we have not discussed the way the noisy audience is generated. If they are randomly picked from those who do not match the query, it is very unlikely they will convert (i.e. meet the campaign goal and will do y). In this scenario, effectiveness and privacy are conflicting objectives, because increasing the level of injected random noise leads to better privacy protection at the cost of effectiveness reduction. As we have discussed in Section 1, none of the objectives could be ignored. It is desirable to seek a solution that achieves both. Can we select a better set of noisy audience members in a more controlled manner? This question leads us to propose the following smart noise injection approach. Smart noise injection aims to select noisy audience members who are most likely convert to a campaign. We can do this by first predicting the probability a user will convert (i.e. will do y in the future) for each user, and then add noisy audience members who are likely to do y and didn’t satisfy the query. P(y = 1) = f(−→u ) (4) where f is the prediction function and −→u is the profile of an audience member which captures all information data aggregators have for the audience member under consider- ation. The information could be demographic, geographic location, search history, purchase history, etc. We rank all users by P(y = 1) in descending order. Top ranked users will be added into the results as smart noise. The key of smart noise injection idea is the prediction function f used in Equation 4. f could be based either on heuristics or data mining methods. Heuristics Using Taxonomy Proximity An example taxonomy tree is illustrated in Fig. 2. A tax- onomy tree is a way to represent the associations between concepts. We can match each user to the node(s) on the taxonomy tree and match the goal y or the original query to the node(s) on the same tree. One heuristic is to add users associated with nodes that are siblings of target node(s). Consider an auto maker (data buyer) that needs to run a campaign for Lexus cars. The campaign request would be the same user would not be cast as interested in traveling to SFO and not interested at the same time, is one way to address joint campaigns. Another is adopting query audit- ing like mechanisms[19]. Optimizing both performance and privacy cross-campaign is an interesting topic, however is outside the scope of this paper. ' & $ % IMT Travel Air Travel . . . Car Rental . . . Hotel Autos Makes Ford Audi BMW Lexus Retails . . . Figure 2: An example taxonomy showing the rela- tionship between various categories. “find audience who have searched Lexus”. The correspond- ing smart noise audience could be those who have searched Audi, BMW, etc. This is based on a heuristic rule that users who have searched any child node of luxury cars (e.g. BMW, Audi and Lexus) might be interested in everything under the luxury car node. Data Mining Approach The rich information in user profile −→u allows us go beyond heuristics like taxonomy trees. Considering the Lexus cam- paign example, we can integrate various information such as past behavior, geo-location, age, education, etc. to make more reliable predictions. A data mining approach can be used to learn the prediction function f from past conversion data. We can treat f as a regression function, whose pa- rameters can be obtained from the training data. Various data mining algorithms (logistic regression, support vector machines, gradient boosting trees, etc.) can be used to learn the function. Assuming the data aggregators have past conversion records from a campaign, we can generate training data from these past records. We can generate training data D = (−→x u, yu) by preparing each audience member u in past records: −→x u ⇐ (−→u , y) (5) yu = 1 if user u converted 0 otherwise (6) −→x is the feature vector to represent each record and this step could be application specific or be very general and simple (e.g. let −→x = −→u ). Depending on the data mining algorithm used, we can use an existing approach to learn the parameters. 4. EXPERIMENTAL DESIGN We carried out experiments to evaluate the effectiveness of the proposed approach. 4.1 Data Set We created an evaluation data set based on web log data from a top Online Travel Agent (OTA) website in the United States. The data contains 20 days of web users’ air ticket searches and purchase history at the booking website. For each airport, we have information about its location (lon- gitude and latitude), text description about its attraction (gathered from tripadvisor (http://ww.tripadvisor.com)), and climate description such as monthly temperature, pre- cipitation, etc. . We used this data to simulate real-world campaigns. Each audience involved in the experimental data is identified by
  • 6. LAX SFOUser 1 User 2 LAX ConversionCampaign Figure 3: Two examples of segments. Each segment contains one or more search behaviors followed by one purchasing behavior. The top example shows that User1 purchased a ticket to SFO after searching LAX, LAX, SJC, SFO. an anonymized user id and has airport search and ticket pur- chase behaviors with a time stamp. Each audience is pre- processed by sorting its behaviors in chronological order and dividing the behavior’s time sequence into multiple segments where each segment ends when a user finished purchasing a ticket. Each segment contains the user’s search and pur- chasing behaviors in that time frame. Fig. 3 includes two example segments. This gives us about 200k flight records (i.e. segments), where each record contains a time sequence that includes airports the user searched and the airport user purchased. For each segment, we insert an advertising campaign at the time point where the user have finished 20% of the behav- ior in the segment. The goal of each campaign is “find an audience who will fly to airport y” and the corresponding campaign is “find an audience who has shown interest in (i.e. searched) airport y”. We compared the following three strategies to generate the audience set for each campaign: Exact Targeting(ET): identify an audience who has searched airport y. Random Noise(RN): replace part of the audience identified by Exact Targeting with an audience uniformly sampled from noisy audience. Smart Noise(SN): the same as Random Noise except that the noisy audience is selected by using the data mining tech- nique. 4.2 Smart Noise Prediction Model As discussed before (Equation 5), we need to prepare each audience in a record/campaign to generate a feature vector −→x based on user profile −→u and the target y. Fig. 3 illus- trates running campaign “find an audience who will fly to SFO”. Each audience is represented by the search behaviors before the campaign. Thus we generated a 20-dimensional feature vector −→x , where each dimension captures a specific similarity between the searched airports and the converted airport. We used 50% of the data for training and 50% for evalu- ation. For training, a 5-fold cross validation is applied. We used the well-known Gradient Boosting Trees with bagging to build the prediction model based on the training data. We chose GBT because it can learn the interaction of fea- tures and generalize well in various applications. For each testing record we first predict the probability of each user will convert to the airport specified in the campaign, and Table 1: Symbols Used for Evaluation Symbols Description Ωt set of targeted users, Nt number of targeted users, |Ωt| Ωet users by exacting targeting Ωet complementary set of Ωet Ωr noisy users randomly chosen from Ωet Ωs noisy users chosen from Ωet in a smart manner Ωc t set of converted users. add the top ranking users as smart noise. 4.3 Evaluation Metric The effectiveness of the audience targeting strategies is measured by Conversion Rate which is defined as CR = |Ωc t | |Ωt| (7) where Ωc t and Ωt denote the converted audience and the tar- geted audience. Ωt can be computed by any of the targeting strategies considered here. For the ease of exploration, we introduced the symbols defined in Tab. 1. For exact target- ing, its targeted audience consists of all audience by exact targeting, namely, Ωt = Ωet. For the random noise and smart noise injection approaches, their final campaign user set is a mixture of p%×|Ωt| of exact records randomly sam- pled from Ωet and (100 − p)% × |Ωet| of noisy users Ωr/s by their own selecting method. Ωr/s = smpr/s(Ωet, (100 − p)% × |Ωet|) (8) 5. EXPERIMENTAL RESULTS We compared the performance of Exact Targeting (i.e. no privacy protection), Random Noise, and Smart Noise meth- ods. As we discussed before (Equation 2), the belief (privacy level) could be tuned by adjusting the ratio between exact audience and noisy audience. At the same time, changing the ratio will also impact the effectiveness for Smart Noise approach. To see the effectiveness/privacy trade-off, we eval- uate all three behavioral targeting methods under various noise levels. For each method at each noise level, the aver- age conversion rate over all campaigns is reported (Fig. 4). It’s clear that the Smart Noise method is consistently better than the Random Noise method. 5.1 Further Analysis Airport (Campaign) Examples Each airport has different popularity among travelers. In this paper we measure the popularity of an airport using the number of records that converted. We are interested to know how the popularity factor affects the performance. We present the results for SFO (San Francisco International Airport) , SEA (Seatle-Tacoma International Airport) and SJC (Mineta San Jose International Airport), in Fig. 5. The popularity of the three airports are very different (SFO > SEA > SJC). In this data set, 3053 segments converted to SFO, 2179 segments converted to SEA, and 789 segments converted to SJC. 4189 segments contain search for SFO, 2833 segments contain search for SEA, and 1500 segments contains search for SJC. For each airport, we calculated the conversion rate under three noise levels (0.3, 0.6 and 0.8,
  • 7. 0   0.1   0.2   0.3   0.4   0.5   0.6   419   838   1257   1676   2095   2514   2933   3351   3770   4189   Exact  Targe6ng   Random  Noise   Smart  Noise   (a) SFO - 0.3 0   0.1   0.2   0.3   0.4   0.5   0.6   419   838   1257   1676   2095   2514   2933   3351   3770   4189   Exact  Targe6ng   Random  Noise   Smart  Noise   (b) SFO - 0.6 0   0.1   0.2   0.3   0.4   0.5   0.6   419   838   1257   1676   2095   2514   2933   3351   3770   4189   Exact  Targe6ng   Random  Noise   Smart  Noise   (c) SFO - 0.8 0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4   281   562   844   1125   1407   1688   1969   2251   2532   2813   Exact  Targe6ng   Random  Noise   Smart  Noise   (d) SEA - 0.3 0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4   281   562   844   1125   1407   1688   1969   2251   2532   2813   Exact  Targe6ng   Random  Noise   Smart  Noise   (e) SEA - 0.6 0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4   281   562   844   1125   1407   1688   1969   2251   2532   2813   Exact  Targe6ng   Random  Noise   Smart  Noise   (f) SEA - 0.8 0   0.05   0.1   0.15   0.2   0.25   0.3   150   300   450   600   750   900   1050   1200   1350   1500   Exact  Targe5ng   Random  Noise   Smart  Noise   (g) SJC - 0.3 0   0.05   0.1   0.15   0.2   0.25   0.3   150   300   450   600   750   900   1050   1200   1350   1500   Exact  Targe5ng   Random  Noise   Smart  Noise   (h) SJC - 0.6 0   0.05   0.1   0.15   0.2   0.25   0.3   150   300   450   600   750   900   1050   1200   1350   1500   Exact  Targe5ng   Random  Noise   Smart  Noise   (i) SJC - 0.8 Figure 5: How conversion rates vary with recall rates for different noise levels. For each graph, the horizontal axis denotes the number of audience targeted/returned to data buyer and the vertical axis denotes the conversion rate. respectively). Each plot in Fig. 5 corresponds to one airport and one noise level, which are indicated in the title of the plot. We have three major observations. First, we observe that the conversion rate decreases with both the number of tar- geted audience and noise level. It can be seen from the plots that Smart Noise injection works better for popular airport (SFO) than the others (SEA and SJC). A possible reason is that popular airport SFO may have more high quality noisy audience (i.e. audience who will convert to the airport), and thus the top-ranked smart nosie for SFO are of higher qual- ity. Second, we found the Smart Noise performs better than exact targeting in some cases (Fig. 5.(a)-(c)). This suggests the top ranking list of the Smart Noise prediction is bet- ter than the audience manually selected for the campaign. Third, we were surprised to see that the conversion rate (ac- curacy) goes up at certain points for some of the plots. A close look at the prediction results reveals that this is due to insufficiency of representation of the audience. For example, the jump from 900 to 1050 in plot (h) is due to the large number of users who have the same airport search sequence OAK → OAK → OAK, and all of them are around the same position (around 900-1050) in the smart noise predic- tion ranking list, since their feature vectors −→x are the same. Some of them went (i.e. converted) to SJC and many oth- ers went to other airports nearby6 . In this case, additional information about the audience is needed to better present and differentiate them in order to make accurate or smooth predictions. We used GBT to learn the prediction model in our ex- periment. However, the proposed idea is not restricted to specific machine learning techniques. GBT can be replaced by other techniques upon the specific problem. Influences of Features We further examine the gradient boosting model learned to see the influence of each input features. As shown in Fig. 6, we found that LT-DIST and ER-DIST are still the most influential features. This is as expected, since if there are multiple airports near a city a user wants to go, the user might search for all of those airports and finally fly to one of the airports. ER-STYLE, MF-STYLE and LT-PRECIP are also useful features. Although they are much weaker than the distance feature they are still moving the performance 6 OAK,SJC, SFO are airports near each other.
  • 8. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Noise Level ConversionRate 0.00.10.20.30.4 Exact Targeting Random Noise Smart Noise Figure 4: Average conversion rate over all 255 air- ports. needle. This is not surprising since some people on vacation might choose among cities of similar climates or have similar attractions. 6. CONCLUSION AND FUTURE WORK The advertising industry and its encompassing ecosystem are increasingly moving towards higher level of security and consumer privacy. However, privacy concerns still exist and are often mistakenly seen in a harsh balance, zero-sum game with advertising performance. In this paper we proposed a data mining method to handle data sharing privacy issues for behavioral targeting. We tested our method against a very large, behavioral targeting dataset collected from a major travel web site and ran location based data campaigns over all 255 major United States airports, obtaining conversion rates at various noise percentages. We show that the method scales well with data size and attribute dimensionality. It is clear that the smart noise strategy is consistently better than uniform random noise injection. We also observed that performance improvements vary across different campaigns (i.e. airports). Instead of having to always trade off privacy for utility, in certain cases we can even obtain more utility (i.e. higher conversion rate) than exact data campaigns, by inserting smart noise (e.g. in the case of “SFO travelers”). This paper opens up a new direction in the advertising world by proving it is possible to pursue privacy and per- formance simultaneously. We are certain more methods will emerge and we hope the field will grow and mature as a result. Our experiments are based on data campaigns tar- geting travelers to specific destinations. Although we ex- pect a similar outcome when applied to other data types (e.g. geographic, demographic etc), additional future work LT−PRECIP MP−OV MP−TEMP LT−OV MP−PRECIP MF−OV LT−TEMP MF−TEMP ER−TEMP MP−STYLE MF−PRECIP LT−STYLE ER−PRECIP LT−PRECIP MF−STYLE MP−DIST MF−DIST ER−STYLE ER−DIST LT−DIST Feature Influence 0 10 20 30 40 50 60 Figure 6: Influence of each feature. Each name con- tains the specific types of airports searched and in- formation used to calculate the feature. LT, ER, MP, MF denote latest, earliest, most popular, most frequent airports in the user profile (i.e. the user airport search sequence); DIST, STYLE, PRECIP, TEMP denote the distance, attraction style similar- ity, precipitation and temperature respectively. For example, LT-DIST means the distance similarity be- tween the latest airports searched and the target airport. is needed to validate this assumption. The proposed tech- nique also needs to be adjusted and optimized for different verticals (e.g. automobiles, retail, education, etc.). In our experiments, we described a way to generate features and take advantage of behavioral data richness. Even though in our case the work focuses on features in the travel data realm, it can be generalized and adapted to other applica- tions. The regression function is not limited to GBT and can be replaced by other alternatives depending on the ap- plication. Our new approach is not specific to data sharing between data aggregators/exchanges and data buyers, and we are interested in applying it to data sharing between data suppliers and data exchanges and other similar scenarios. 7. REFERENCES [1] R. Agrawal and R. Srikant. Privacy-preserving data mining. In ACM Sigmod Record, volume 29, pages 439–450. ACM, 2000. [2] Amazon. Amazon mom program. http://www.amazon.com/gp/mom/signup/info (last visited: Jan 2012). [3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 273–282. ACM, 2007.
  • 9. [4] M. Bilenko and M. Richardson. Predictive client-side profiles for personalized advertising. In Proceedings of 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-11), 2011. [5] I. Dinur and K. Nissim. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202–210. ACM, 2003. [6] C. Dwork. Differential privacy. Automata, languages and programming, pages 1–12, 2006. [7] C. Dwork. Differential privacy. In Automata, Languages and Programming, volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer Berlin / Heidelberg, 2006. [8] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography, pages 265–284, 2006. [9] S. Fienberg and J. McIntyre. Data swapping: Variations on a theme by dalenius and reiss. In Privacy in Statistical Databases, pages 519–519. Springer, 2004. [10] A. Goldfarb and C. Tucker. Privacy regulation and online advertising. Management Science, 57(1):57–71, 2011. [11] S. Hansell. Aol removes search data on vast group of web users. New York Times, 8:C4, 2006. [12] K. LeFevre, D. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 49–60. ACM, 2005. [13] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106–115. IEEE, 2007. [14] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):3, 2007. [15] D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 126–135. IEEE, 2007. [16] J. Mayer and A. Narayanan. Do not track: Universal web tracking opt out. http://donottrack.us/. [17] F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636. ACM, 2009. [18] N. Mohammed, R. Chen, B. Fung, and S. Philip. Differentially private data release for data mining. Engineer, 18(40):2, 2011. [19] S. Nabar, K. Kenthapadi, N. Mishra, and R. Motwani. A survey of query auditing techniques for data privacy. Privacy-Preserving Data Mining, pages 415–431, 2008. [20] A. Narayanan and V. Shmatikov. How to break anonymity of the netflix prize dataset. CoRR, pages –1–1, 2006. [21] Netflix. Netflix Prize, 2006(begin), 2009(close). http://www.netflixprize.com/. [22] F. Provost, B. Dalessandro, R. Hook, X. Zhang, and A. Murray. Audience selection for on-line brand advertising: privacy-friendly social network targeting. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 707–716, New York, NY, USA, 2009. ACM. [23] A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In Proceedings of the 42nd ACM symposium on Theory of computing, pages 765–774. ACM, 2010. [24] P. Samarati. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, pages 1010–1027, 2001. [25] L. Sweeney et al. k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge Based Systems, 10(5):557–570, 2002. [26] R. Wong, J. Li, A. Fu, and K. Wang. (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 754–759. ACM, 2006. [27] X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 225 –236, march 2010. [28] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate query answering on anonymized tables. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 116–125. IEEE, 2007.