1. Have Your Cake and Eat It Too! Preserving Privacy while
Achieving High Behavioral Targeting Performance
Qi Zhao, Yi Zhang
School of Engineering
University of California Santa
Cruz
{manazhao,yiz}@soe.ucsc.edu
Lucian Vlad Lita
Blue Kai, Inc
Cupertino, CA 95014
lucian@bluekai.com
ABSTRACT
Privacy is a major concern for Internet users and Internet
policy regulators. Privacy violations usually entail either
sharing Personally Identifying Information (PII) or non-PII
information such as a site visitor’s behavior on a website.
On the other hand, Internet advertising through behavioral
targeting is an important part of the Internet ecosystem,
as it provides users more relevant information and enables
content/data providers to provide free services to end users.
In order to achieve effective behavioral targeting, it is de-
sirable for the advertisers to access a set of users with the
targeted behaviors. A key question is how should data flow
from a provider (e.g. publisher) to a third party advertiser
to achieve effective behavioral targeting, all while without
directly sharing exact user behavior data. This paper at-
tempts to answer this question and proposes a privacy pre-
serving technique for behavioral targeting that does not en-
tail a drastic reduction in advertising effectiveness. When
behavioral targeting data is transferred to an advertiser, we
use a smart, data mining-based noise injection method that
perturbs the results (a set of users meeting specified cri-
teria) by carefully adding noisy data points that maintain
a high level of performance. Upon receiving the data, the
advertiser cannot distinguish accurate data points adhering
to specifications, versus noisy data, which does not meet
the specifications. Using data from a major US top Online
Travel Agent (OTA), we evaluated the proposed technique
for location-based behavioral targeting, whereby advertisers
run data campaigns targeting travelers for specific destina-
tions. Our experimental results demonstrate that such data
campaigns obtain results that enhance or preserve user pri-
vacy while maintaining a high level of targeting performance.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ADKDD’12, August, 12–16, Beijing, China
Copyright 2012 ACM 978-1-4503-1545-6/12/08 ...$15.00.
General Terms
Algorithms, Performance, Security
Keywords
Behavioral targeting, privacy, data mining
1. INTRODUCTION
Recently, we have observed an exponential growth in the
number of web services spanning search engines and e-commerce
websites and propagating across multiple verticals. In most
cases we, the users, provide personal information in order
to enjoy these Internet services. For example, we willingly
register our demographic and geographic information with
social networks like Facebook so our friends can wish us a
happy birthday and follow us virtually on our vacations. We
want and sometimes need to have our purchase history in our
Amazon and eBay accounts for convenience and discounts
[2]. We rate movies in exchange for better movie recom-
mendations and higher enjoyment levels since for example,
Netflix now learns our cinematic preferences from our and
our friends’ movie rating activities.
The high availability of massive individual information on
the Internet also makes it a perfect place for delivering both
content and advertisements. Through leveraging web users’
search history, purchase history, demographic, geographic
location and other miscellaneous information, marketers are
able to identify those who are interested in their products.
This process is called behavioral targeting and is widely em-
ployed by most successful sites. Behavioral targeting helps
marketers gain advertising effectiveness and reduce their ad-
vertising budget. It focuses advertising spending towards
consumers who are already in market for a product or are
likely to purchase it. Since advertisers spend less on serv-
ing media (ads) in general, they can afford to offer more
aggressive discounts, thus bluring the line between content
and advertising. Behavioral targeting is also an important
part of the Internet ecosystem, as it enables many Internet
service providers to provide free services - from email and
social networking to games, discussion forums and produc-
tivity tools. As a side effect, it also improves consumers’
online experience since it reduces friction, clutter, and by-
passes irrelevant commercial ads.
While acknowledging the value of behavioral targeting,
privacy issues are of major concern since data sharing (via
partnerships, data exchanges, or direct deals) is the fuel for
content, media, and context optimization. Although users’
Personal Identifiable Information (PII) – such as SSN, email
2. address and telephone number – is usually removed in be-
havioral targeting, it is sometimes still possible to identify
specific users through non-PII data analysis. Real-world pri-
vacy breach cases occurred even though only non-PII data
was released. The Netflix competition[21] is one such exam-
ple, where Netflix released a set of anonymized film ratings
to the public for the purpose of improving the movie pre-
diction accuracy. Though the dataset is anonymized before
being released, researchers were still able to identify some
of the profile owners by leveraging external datasets such as
the Internet Movie Database (IMDB)[20]. A similar privacy
infringement happened after AOL publicly released search
data[11]. Clearly, great caution is required when releasing,
sharing or selling consumer data whether PII or non-PII,
such as a site visitor browsing actions/behavior.
The widespread use of behavioral targeting is increasing
consumer privacy concerns, triggering protests from privacy
and consumer advocacy groups, and leading governments to
pass laws to regulate the usage of online consumer data in
several countries[10]. As pointed out by Goldfarb et al. [10],
behavioral targeting usage reduction will lead to a consid-
erable drop in advertising effectiveness. Without effective
behavioral targeting, the Internet content providers which
subsidize the cost through advertising might become unable
to continue providing their free services to the web users
due to a reduction in their advertising revenue stream. In
an extreme case, web users could be asked to pay for services
and content, or alternately be exposed to highly irrelevant
and irritating advertisements. As a consequence, both pri-
vacy concerns and advertising effectiveness are important
and none of them can be ignored. The end goal is to con-
verge towards building an ecosystem with a high privacy
bar, low irrelevant content, high advertising efficiency, and
no friction among publishers, advertisers, and consumers.
This paper focuses on addressing privacy issues in behav-
ioral targeting by reducing the trade-off against advertising
effectiveness and making this trade-off explicit. Before delv-
ing into the discussion, it is worthwhile gaining an under-
standing of the current online behavioral targeting ecosys-
tem. Online behavioral targeting is a very large and com-
plex industry; for the discussion at hand, we can artificially
group the stakeholders into three major groups: data sup-
pliers, data aggregators and data buyers (Fig. 1).
Data aggregators pool data from multiple data suppliers.
The aggregated data covers a wide spectrum of web user
facets including demographic, geographic, interests, purchas-
ing history, etc. To optimize media campaigns data buyers
will turn to data aggregators and purchase relevant behav-
ioral targeting data by running real-time data campaigns.
For example, a marketer representing a car manufacturer
will run a data campaign purchasing user ids known to rep-
resent women in market for new luxury cars from a data
exchange. Subsequently, as the same users/consumers visit
publisher sites, the marketer will serve more ads to the users
known to be relevant (i.e. women in market for luxury cars).
In this paper, we use audience and online users/consumers
interchangeably. Here are two example requests from data
buyers (i.e. advertisers/marketers):
TRAVEL: A travel marketer wants to show ads to a set of
users who will travel to a specific destination. The mar-
keter requests audience data (i.e. runs a data campaign)
from a data aggregator; in this case, the audience is made of
users intending to fly to the given destination or book hotels
Data Aggretators
DMP
Data BuyersData Suppliers
Figure 1: Overview of the current online advertis-
ing ecosystem. Consumer data is traded online,
frequently with the end goal of placing ads that
lead to branding or conversions. A data supplier
owns data sources or serves content on websites fre-
quented by the consumer. A data buyer will use
the data to optimize media (ads) they will place on
publisher sites, usually via entities such as ad net-
works, ad exchanges, or demand side platforms. In
some cases, large data suppliers and large ad net-
works also serve as a data aggregators, while data
suppliers frequently serve as publishers.
around that location.
AUTO: An auto marketer wants to show ads to a set of In-
ternet users who are likely to buy a car in the near future.
The marketer requests audience data from aggregators: con-
sumers who have shown purchase intent for certain automo-
bile brands, down to the specific make and model.
In both cases, the data campaigns are typically started by
buyers directly through a web service or via a user interface.
Simply stated, campaigns are equivalent to long-running
queries issued by the buyer, running in a data exchange,
and targeting audience behaviors. As data becomes avail-
able - i.e. behaviors emerge - results are being generated and
corresponding micro-transactions occur. For clarity, in this
paper we will treat data campaign specifications as queries
of the following form: “find audience who exhibit purchase
intent for X“. and the corresponding query is “find audi-
ence who exhibit behavior X ”. This excludes demographic
and geographic data attributes such as: age, gender etc1
.
We focus on audience purchase intent behavior for two rea-
sons. First, it constitutes extremely valuable data, since
it translates into hightly discriminating variables leading to
clicks and conversions, and as a consequence is the target
of most campaigns running in data exchanges. Second, the
problem setting is intuitive, straight forward (Section 3.1),
and is very often ignored in the literature, since it is not
considered PII.
In our problem setting, there are a limited number of data
aggregators/exchanges who have achieved a significant scale
in terms of data, and a large number of data buyers (mar-
keters, advertisers, agencies) that are targeting relevant user
data by running data campaigns - i.e. running queries such
as “find users who will travel to San Francisco“). We assume
that aggregators process the data thereby filtering out PII
content. We also assume data aggregators offer secure and
1
The technique is not specific to a given data type and we
plan to expand on this aspect in future work
3. trusted data platforms and are white-hat, certified entities in
the ecosystem. Data buyers, on the other hand vary in their
level of security, trust, and ability. Due to the high num-
ber of buyers who purchase behavioral targeting data, pri-
vacy infringement, accidental or intentional, becomes more
of a concern. Even though de-identification, PII content fil-
tering, and data retention policies are a huge step towards
maintaining privacy, more can be done. One might argue
that since data buyers only target/purchase a limited set of
user behaviors, it unlikely that user identification could hap-
pen. However, an adversarial buyer may further aggregate
data from multiple campaigns, public and private sources
and attempt to identify users by corrborating and linking
different parts of their profiles[25]. The process is akin to
the Netflix and AOL privacy breach cases addressed above.
One characteristic of these privacy breaches arise from
the fact that data campaigns (or available datasets) are ex-
act, even though user identity is obscured. In other words,
user profiles are scrubbed, yet behavioral data campaigns /
queries produce exact results with missing attributes. This
makes it easy for technically savvy parties to join various
datasets together and recover the missing attributes. One
privacy preserving strategy that follows from this observa-
tion is to make this dataset joining problem impossible, or
extremely difficult. We can do this by preventing data buy-
ers from accurately reconstituting individual profiles if the
audiences and datasets available to data campaigns contain
uncertainty. With this motivation, we propose a smart noise
injection method to alleviate privacy concerns in the behav-
ioral targeting world. The method works by injecting noisy
data into the originally matching audience. The buyer will
be aware that the data is not 100 percent accurate, will also
know the signal to noise ratio, but will not be able to know
with certainty which users match their criteria and which do
not. At the same time, we also need to maintain the perfor-
mance (e.g. conversion rate) of the obfuscated audience. To
achieve this goal, noisy candidates are selected by a machine
learning algorithm instead of being selected randomly.
Further details are presented in subsequent sections. Sec-
tion 2 reviews related work and section 3 presents the pro-
posed method. In Section 4 we cover experimental design
and evaluate the privacy preserving method using real world
data.
2. RELATED WORK
Various Privacy Preserving Data Mining(PPDM) algo-
rithms have been developed to combat privacy violations
while keeping a minimum reduction of data utility [1] [24]
[6]. Privacy Preserving Data Mining (PPDM) can be traced
back to the early 90s and it became a very active research
topic early this century. Randomized methods and cryp-
tographic methods are the two major approaches that can
let people do data mining without access to precise infor-
mation in individual data records. In [1], R.Agrawal et al.
proposed a randomization algorithm to prevent exposure of
sensitive attributes. It works by perturbing the attribute
values with additive noise. Since the probability distribu-
tion of the noise is known, the distribution of the original
data can bey recovered by applying Bayesian inference to
the noise and perturbed data. With the recovered distribu-
tion, a decision tree algorithm can be run on the synthesized
data and obtain descent result in terms of privacy and classi-
fication accuracy. Data swapping is another method related
to randomized perturbation in the sense of running data
mining algorithms on the aggregate statistics of the origi-
nal data. It works by swapping the attribute values across
different records[9].
k-anonymity is another heavily studied PPDM method.
The basic idea of k-anonymity is to reduce the granularity
of the attribute value such that every combination of values
of quasi-identifiers will have at least k respondents[24]. This
is achieved through generalization and suppression. Gener-
alization means modifying the attribute value to a wider
range and suppression means removing the attribute com-
pletely. k-anonymity is an NP-hard problem. Researchers
have proposed approximate algorithms to find the solution
more efficiently[12]. Although k-anonymity has gained great
popularity, it suffers from subtle but severe privacy prob-
lems and could leads to insufficiency of diversity in sensitive
attributes. To overcome these problems, several methods
have been proposed, such as l-diversity [14], t-closeness[13],
(α,k)-anonymity[26], (k,3)-anonymity[28] and (c,k)-safety[15].
One major problem of k-anonymity is that it cannot provide
guaranteed privacy as it cannot account for all possible back-
ground knowledge. In contrast, differential privacy[7] gives
rigorous data privacy control without making assumptions
of background knowledge, while providing data to enable
effective data mining. To help data miners reveal accurate
statistics about a population while preserving the privacy of
individuals in the data, differential privacy ensures the out-
put of any nearly identical input sets are nearly identical and
therefore eliminates the possibility of deriving individual at-
tributes through differentiating outputs of multiple queries.
Most of the existing PPDM methods are addressed for the
scenario where data is shared between a trusted data holder
and untrusted individuals. In most cases, data sharing is for
the purpose of seeking a best data mining algorithm which
unveils the rationale concealed in the data. Privacy breaches
occur when adversary data recipients attempt to reconstruct
the identity of the anonymized subjects in the shared data.
There are two major data sharing frameworks: interactive
framework[5, 8, 23, 17] and non-interactive framework[3,
27, 18]. The interactive framework achieves privacy by per-
turbing the results of the query and limiting the number of
queries. The non-interactive framework releases sanitized
data that meets the differential privacy requirement. The
setting of our problem can also be viewed as an interactive
one: data buyers query the data aggregators for an audience
(i.e. results or web users). Similar to differential privacy, our
goal is is to prevent the data buyer from knowing an indi-
vidual’s true behaviors.
Compared to existing algorithms, the setting of our prob-
lem, behavioral targeting, differs in the following ways,
Larger scaled data. The subjects involved are online users
in the universe.
Higher attribute dimensionality. Diverse user Internet
activities as mentioned earlier.
The way data is shared. Each data request is a specific
requirement about targeted behavior(s). A subset (i.e. a set
of Internet users) instead of the whole data (all users and
their the behavior information) is shared for each advertis-
ing campaign request. Conversion rate as the utility of
the data. The definition of data utility in our task is the
conversion rate instead of the effectiveness of data mining.
As we will see later, these changes pose great challenges
4. to adopting or extending existing methods for behavioral
targeting. On the other hand, the specific problem setting
also gives us the opportunity to tailor the privacy preserving
technique to optimize the performance of the task.
Preventing privacy violation in behavioral targeting is a
problem that has been looked at by industry practitioners,
government agencies, and the research community. Some
countries have implemented very strict privacy regulations
to restrict the collection and use of consumer data. Un-
fortunately, the side effects of these regulations are drastic.
For example, after Privacy Directive2
was passed, the ad-
vertising effectiveness decreased significantly [10]. Jonathan
Mayer et al. advocate using Do Not Track technology to en-
able user to opt out of tracking by websites they do not visit,
including analytics services, advertising networks and social
platforms[16]. Most personalization services store user pro-
files in the server side, which is out of users’ control. Instead
of aggregating user profiles at server site, Mikhail Bilenko et
al. proposed to store user profiles as cookie string at the
client side[4]. This gives users complete control of their pro-
file and leads to decreased concern of privacy. Toubiana et
al. proposed a cryptographic billing system to enable behav-
ioral targeting taking place in the user’s browser. Provost et
al. proposed a technique to find an audience for a brand by
selecting the pseudo-social network (i.e. web page co-visiting
network) neighbors of brand seed users, without saving in-
formation about the browser identities page content [22].
However, most of the existing solutions do not solve the
privacy issues in the data sharing scenario discussed before
(Figure 1), although this is a common practice in the ad-
vertising industry. This paper focuses on preserving privacy
while sharing data for behavioral targeting, and tries to get
a solution that can be practically implemented into and per-
form well in the existing advertising industry.
3. OUR APPROACH
3.1 Problem Setting
As shown in Fig. 1, the players in the computational ad-
vertising industry can be roughly grouped into three cat-
egories, namely, data suppliers, data aggregators and data
buyers. A data aggregator holds a wide spectrum of web
user information including demographics, searching history,
purchasing history, etc. The information serves as a source
for data buyers to conduct campaigns. In a typical adver-
tising campaign the data sharing process operates by three
steps3
: first, a data buyer proposes campaign requirements
and express the requirements in the form of queries; second,
the data buyer submits the queries to a data aggregator; fi-
nally, the data aggregators return the data buyer a set of
web users who might meet the campaign requirements. We
assume the data buyer tells the goal of a campaign is to“find
audience who will do ( Y)”, and the queries are expressed in
the form “find audience who have done X ”. If a returned
web user later does do ( Y), we say this user is converted.
The utility of the data sharing process is aligned with how
2
European Uniion “Privacy and Electronic Communications
Directive”
3
Variations of this exist in the industry. For example, a data
buyer may only submit the requirement/goal and the data
aggregate tells the buyer the corresponding query. These
variations won’t affect the analysis of this paper much.
well the goal is achieved, which is measured by conversion
rate (more details in Section 4.3).
A reasonable assumption for the above setting is that the
data aggregator is a trustworthy player who already owns
the data, while the data buyer is not since there are millions
of possible buyers. When a data aggregator responds to each
campaign query with a group of audience members (i.e. web
users), the data buyer gains information about each web
user’s behavior, which leads to possible privacy breaches.
Though only a few attributes of web users could be learned
for each single campaign, richer individual profiles may be
obtained by joining multiple campaign results. This actually
increases the likelihood for the data buyer to derive sensitive
information about web users using the linking techniques
[25].
3.2 Noise Injection
We found the root cause of the privacy breach problem
is that the returned audience satisfies the campaign criteria
(i.e. the query) exactly. Motivated by this observation and
prior research on randomized methods for privacy preserv-
ing, we propose to obfuscate the exact audience by injecting
a noisy audience who would disqualify the campaign criteria.
In other words, we include audience members who did not
exhibit the requested behavior(s) in the returned set. The
presence of the noisy audience reduces data buyers’ belief
about an audience member’s behavior. Let Φ denote the
set of all possible behaviors requested by data buyers. Each
audience member can be represented as a binary vector
−→
b ,
where each dimension
−→
b k corresponds to the kth behavior
in Φ:4
−→
b k =
1 if exhibit behavior Φk
0 otherwise
(1)
The belief over each of the noise perturbed audience member
is measured as
P(
−→
b k = 1) =
N(
−→
b k = 1)
N(
−→
b k = 1) + N(
−→
b k = 0)
(2)
where N is a function counting the number of audience mem-
bers. Here we define belief and privacy as opposite endpoints
of a scale. Adjusting the ratio between N(
−→
b k = 1) and
N(
−→
b k = 0) leads to desirable belief/privacy level. Consid-
ering a concrete campaign in which the data buyer requests 1
million audience members for “Lexus”. Instead of returning
1 million exact matching web users to the data buyer, the
data aggregator returns half million web users who searched
“Lexus” and a second half million web users who did not
search “Lexus”. The data buyer’s belief over the behavior
(search “Lexus”) of each returned web user becomes 50%.
Equation 2 describes the belief over the behavior for a sin-
gle campaign. In a real world scenario, a data buyer can sub-
mit multiple campaign queries. By overlapping the results
from multiple campaigns, the data buyer can learn many
behaviors about the same web user. In this case, we need to
consider the joint belief over multiple behaviors. We assume
campaigns are independent. This assumption eliminates pri-
vacy concerns stemming from analyzing results from corre-
lated campaigns. 5
The joint belief for a set of independent
4
In this setting, “search Lexus once” and “search Lexus
twice” will be treated as different behaviors.
5
Enforcing noise injection consistency at a user level, so that
5. campaigns can be decomposed into products of belief over
single belief. Let
−→
b A represent the same user’s multiple be-
haviors gathered through K independent campaigns. The
joint belief is:
P(
−→
b A = 1) =
K
i=1
P(
−→
b i = 1) (3)
where P(
−→
b i = 1) is defined in Equation 2. Equation 3
indicates that the joint belief decreases as the dimensionality
grows. Considering a 5-dimension behavior vector with 70%
belief on each behavior, the overall belief is 0.75
≈ 0.17.
This property helps counterbalance the use of the linking
technique for reidentification[25].
3.3 Smart Noise Injection
So far we have not discussed the way the noisy audience
is generated. If they are randomly picked from those who
do not match the query, it is very unlikely they will convert
(i.e. meet the campaign goal and will do y). In this scenario,
effectiveness and privacy are conflicting objectives, because
increasing the level of injected random noise leads to better
privacy protection at the cost of effectiveness reduction. As
we have discussed in Section 1, none of the objectives could
be ignored. It is desirable to seek a solution that achieves
both.
Can we select a better set of noisy audience members in a
more controlled manner? This question leads us to propose
the following smart noise injection approach. Smart noise
injection aims to select noisy audience members who are
most likely convert to a campaign. We can do this by first
predicting the probability a user will convert (i.e. will do
y in the future) for each user, and then add noisy audience
members who are likely to do y and didn’t satisfy the query.
P(y = 1) = f(−→u ) (4)
where f is the prediction function and −→u is the profile of
an audience member which captures all information data
aggregators have for the audience member under consider-
ation. The information could be demographic, geographic
location, search history, purchase history, etc. We rank all
users by P(y = 1) in descending order. Top ranked users
will be added into the results as smart noise.
The key of smart noise injection idea is the prediction
function f used in Equation 4. f could be based either on
heuristics or data mining methods.
Heuristics Using Taxonomy Proximity
An example taxonomy tree is illustrated in Fig. 2. A tax-
onomy tree is a way to represent the associations between
concepts. We can match each user to the node(s) on the
taxonomy tree and match the goal y or the original query to
the node(s) on the same tree. One heuristic is to add users
associated with nodes that are siblings of target node(s).
Consider an auto maker (data buyer) that needs to run a
campaign for Lexus cars. The campaign request would be
the same user would not be cast as interested in traveling to
SFO and not interested at the same time, is one way to
address joint campaigns. Another is adopting query audit-
ing like mechanisms[19]. Optimizing both performance and
privacy cross-campaign is an interesting topic, however is
outside the scope of this paper.
'
&
$
%
IMT
Travel
Air Travel
. . .
Car Rental
. . .
Hotel
Autos
Makes
Ford Audi BMW Lexus
Retails
. . .
Figure 2: An example taxonomy showing the rela-
tionship between various categories.
“find audience who have searched Lexus”. The correspond-
ing smart noise audience could be those who have searched
Audi, BMW, etc. This is based on a heuristic rule that
users who have searched any child node of luxury cars (e.g.
BMW, Audi and Lexus) might be interested in everything
under the luxury car node.
Data Mining Approach
The rich information in user profile −→u allows us go beyond
heuristics like taxonomy trees. Considering the Lexus cam-
paign example, we can integrate various information such
as past behavior, geo-location, age, education, etc. to make
more reliable predictions. A data mining approach can be
used to learn the prediction function f from past conversion
data. We can treat f as a regression function, whose pa-
rameters can be obtained from the training data. Various
data mining algorithms (logistic regression, support vector
machines, gradient boosting trees, etc.) can be used to learn
the function.
Assuming the data aggregators have past conversion records
from a campaign, we can generate training data from these
past records. We can generate training data D = (−→x u, yu)
by preparing each audience member u in past records:
−→x u ⇐ (−→u , y) (5)
yu =
1 if user u converted
0 otherwise
(6)
−→x is the feature vector to represent each record and this
step could be application specific or be very general and
simple (e.g. let −→x = −→u ). Depending on the data mining
algorithm used, we can use an existing approach to learn the
parameters.
4. EXPERIMENTAL DESIGN
We carried out experiments to evaluate the effectiveness
of the proposed approach.
4.1 Data Set
We created an evaluation data set based on web log data
from a top Online Travel Agent (OTA) website in the United
States. The data contains 20 days of web users’ air ticket
searches and purchase history at the booking website. For
each airport, we have information about its location (lon-
gitude and latitude), text description about its attraction
(gathered from tripadvisor (http://ww.tripadvisor.com)),
and climate description such as monthly temperature, pre-
cipitation, etc. .
We used this data to simulate real-world campaigns. Each
audience involved in the experimental data is identified by
6. LAX SFOUser 1
User 2 LAX
ConversionCampaign
Figure 3: Two examples of segments. Each segment
contains one or more search behaviors followed by
one purchasing behavior. The top example shows
that User1 purchased a ticket to SFO after searching
LAX, LAX, SJC, SFO.
an anonymized user id and has airport search and ticket pur-
chase behaviors with a time stamp. Each audience is pre-
processed by sorting its behaviors in chronological order and
dividing the behavior’s time sequence into multiple segments
where each segment ends when a user finished purchasing a
ticket. Each segment contains the user’s search and pur-
chasing behaviors in that time frame. Fig. 3 includes two
example segments.
This gives us about 200k flight records (i.e. segments),
where each record contains a time sequence that includes
airports the user searched and the airport user purchased.
For each segment, we insert an advertising campaign at the
time point where the user have finished 20% of the behav-
ior in the segment. The goal of each campaign is “find an
audience who will fly to airport y” and the corresponding
campaign is “find an audience who has shown interest in
(i.e. searched) airport y”.
We compared the following three strategies to generate
the audience set for each campaign:
Exact Targeting(ET): identify an audience who has searched
airport y.
Random Noise(RN): replace part of the audience identified
by Exact Targeting with an audience uniformly sampled
from noisy audience.
Smart Noise(SN): the same as Random Noise except that
the noisy audience is selected by using the data mining tech-
nique.
4.2 Smart Noise Prediction Model
As discussed before (Equation 5), we need to prepare each
audience in a record/campaign to generate a feature vector
−→x based on user profile −→u and the target y. Fig. 3 illus-
trates running campaign “find an audience who will fly to
SFO”. Each audience is represented by the search behaviors
before the campaign. Thus we generated a 20-dimensional
feature vector −→x , where each dimension captures a specific
similarity between the searched airports and the converted
airport.
We used 50% of the data for training and 50% for evalu-
ation. For training, a 5-fold cross validation is applied. We
used the well-known Gradient Boosting Trees with bagging
to build the prediction model based on the training data.
We chose GBT because it can learn the interaction of fea-
tures and generalize well in various applications. For each
testing record we first predict the probability of each user
will convert to the airport specified in the campaign, and
Table 1: Symbols Used for Evaluation
Symbols Description
Ωt set of targeted users,
Nt number of targeted users, |Ωt|
Ωet users by exacting targeting
Ωet complementary set of Ωet
Ωr noisy users randomly chosen from Ωet
Ωs noisy users chosen from Ωet in a smart manner
Ωc
t set of converted users.
add the top ranking users as smart noise.
4.3 Evaluation Metric
The effectiveness of the audience targeting strategies is
measured by Conversion Rate which is defined as
CR =
|Ωc
t |
|Ωt|
(7)
where Ωc
t and Ωt denote the converted audience and the tar-
geted audience. Ωt can be computed by any of the targeting
strategies considered here. For the ease of exploration, we
introduced the symbols defined in Tab. 1. For exact target-
ing, its targeted audience consists of all audience by exact
targeting, namely, Ωt = Ωet. For the random noise and
smart noise injection approaches, their final campaign user
set is a mixture of p%×|Ωt| of exact records randomly sam-
pled from Ωet and (100 − p)% × |Ωet| of noisy users Ωr/s by
their own selecting method.
Ωr/s = smpr/s(Ωet, (100 − p)% × |Ωet|) (8)
5. EXPERIMENTAL RESULTS
We compared the performance of Exact Targeting (i.e. no
privacy protection), Random Noise, and Smart Noise meth-
ods. As we discussed before (Equation 2), the belief (privacy
level) could be tuned by adjusting the ratio between exact
audience and noisy audience. At the same time, changing
the ratio will also impact the effectiveness for Smart Noise
approach. To see the effectiveness/privacy trade-off, we eval-
uate all three behavioral targeting methods under various
noise levels. For each method at each noise level, the aver-
age conversion rate over all campaigns is reported (Fig. 4).
It’s clear that the Smart Noise method is consistently better
than the Random Noise method.
5.1 Further Analysis
Airport (Campaign) Examples
Each airport has different popularity among travelers. In
this paper we measure the popularity of an airport using
the number of records that converted. We are interested
to know how the popularity factor affects the performance.
We present the results for SFO (San Francisco International
Airport) , SEA (Seatle-Tacoma International Airport) and
SJC (Mineta San Jose International Airport), in Fig. 5. The
popularity of the three airports are very different (SFO >
SEA > SJC). In this data set, 3053 segments converted to
SFO, 2179 segments converted to SEA, and 789 segments
converted to SJC. 4189 segments contain search for SFO,
2833 segments contain search for SEA, and 1500 segments
contains search for SJC. For each airport, we calculated the
conversion rate under three noise levels (0.3, 0.6 and 0.8,
7. 0
0.1
0.2
0.3
0.4
0.5
0.6
419
838
1257
1676
2095
2514
2933
3351
3770
4189
Exact
Targe6ng
Random
Noise
Smart
Noise
(a) SFO - 0.3
0
0.1
0.2
0.3
0.4
0.5
0.6
419
838
1257
1676
2095
2514
2933
3351
3770
4189
Exact
Targe6ng
Random
Noise
Smart
Noise
(b) SFO - 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
419
838
1257
1676
2095
2514
2933
3351
3770
4189
Exact
Targe6ng
Random
Noise
Smart
Noise
(c) SFO - 0.8
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
281
562
844
1125
1407
1688
1969
2251
2532
2813
Exact
Targe6ng
Random
Noise
Smart
Noise
(d) SEA - 0.3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
281
562
844
1125
1407
1688
1969
2251
2532
2813
Exact
Targe6ng
Random
Noise
Smart
Noise
(e) SEA - 0.6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
281
562
844
1125
1407
1688
1969
2251
2532
2813
Exact
Targe6ng
Random
Noise
Smart
Noise
(f) SEA - 0.8
0
0.05
0.1
0.15
0.2
0.25
0.3
150
300
450
600
750
900
1050
1200
1350
1500
Exact
Targe5ng
Random
Noise
Smart
Noise
(g) SJC - 0.3
0
0.05
0.1
0.15
0.2
0.25
0.3
150
300
450
600
750
900
1050
1200
1350
1500
Exact
Targe5ng
Random
Noise
Smart
Noise
(h) SJC - 0.6
0
0.05
0.1
0.15
0.2
0.25
0.3
150
300
450
600
750
900
1050
1200
1350
1500
Exact
Targe5ng
Random
Noise
Smart
Noise
(i) SJC - 0.8
Figure 5: How conversion rates vary with recall rates for different noise levels. For each graph, the horizontal
axis denotes the number of audience targeted/returned to data buyer and the vertical axis denotes the
conversion rate.
respectively). Each plot in Fig. 5 corresponds to one airport
and one noise level, which are indicated in the title of the
plot.
We have three major observations. First, we observe that
the conversion rate decreases with both the number of tar-
geted audience and noise level. It can be seen from the plots
that Smart Noise injection works better for popular airport
(SFO) than the others (SEA and SJC). A possible reason is
that popular airport SFO may have more high quality noisy
audience (i.e. audience who will convert to the airport), and
thus the top-ranked smart nosie for SFO are of higher qual-
ity. Second, we found the Smart Noise performs better than
exact targeting in some cases (Fig. 5.(a)-(c)). This suggests
the top ranking list of the Smart Noise prediction is bet-
ter than the audience manually selected for the campaign.
Third, we were surprised to see that the conversion rate (ac-
curacy) goes up at certain points for some of the plots. A
close look at the prediction results reveals that this is due to
insufficiency of representation of the audience. For example,
the jump from 900 to 1050 in plot (h) is due to the large
number of users who have the same airport search sequence
OAK → OAK → OAK, and all of them are around the
same position (around 900-1050) in the smart noise predic-
tion ranking list, since their feature vectors −→x are the same.
Some of them went (i.e. converted) to SJC and many oth-
ers went to other airports nearby6
. In this case, additional
information about the audience is needed to better present
and differentiate them in order to make accurate or smooth
predictions.
We used GBT to learn the prediction model in our ex-
periment. However, the proposed idea is not restricted to
specific machine learning techniques. GBT can be replaced
by other techniques upon the specific problem.
Influences of Features
We further examine the gradient boosting model learned to
see the influence of each input features. As shown in Fig. 6,
we found that LT-DIST and ER-DIST are still the most
influential features. This is as expected, since if there are
multiple airports near a city a user wants to go, the user
might search for all of those airports and finally fly to one of
the airports. ER-STYLE, MF-STYLE and LT-PRECIP are
also useful features. Although they are much weaker than
the distance feature they are still moving the performance
6
OAK,SJC, SFO are airports near each other.
8. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Noise Level
ConversionRate
0.00.10.20.30.4
Exact Targeting
Random Noise
Smart Noise
Figure 4: Average conversion rate over all 255 air-
ports.
needle. This is not surprising since some people on vacation
might choose among cities of similar climates or have similar
attractions.
6. CONCLUSION AND FUTURE WORK
The advertising industry and its encompassing ecosystem
are increasingly moving towards higher level of security and
consumer privacy. However, privacy concerns still exist and
are often mistakenly seen in a harsh balance, zero-sum game
with advertising performance. In this paper we proposed a
data mining method to handle data sharing privacy issues for
behavioral targeting. We tested our method against a very
large, behavioral targeting dataset collected from a major
travel web site and ran location based data campaigns over
all 255 major United States airports, obtaining conversion
rates at various noise percentages. We show that the method
scales well with data size and attribute dimensionality. It
is clear that the smart noise strategy is consistently better
than uniform random noise injection. We also observed that
performance improvements vary across different campaigns
(i.e. airports). Instead of having to always trade off privacy
for utility, in certain cases we can even obtain more utility
(i.e. higher conversion rate) than exact data campaigns, by
inserting smart noise (e.g. in the case of “SFO travelers”).
This paper opens up a new direction in the advertising
world by proving it is possible to pursue privacy and per-
formance simultaneously. We are certain more methods will
emerge and we hope the field will grow and mature as a
result. Our experiments are based on data campaigns tar-
geting travelers to specific destinations. Although we ex-
pect a similar outcome when applied to other data types
(e.g. geographic, demographic etc), additional future work
LT−PRECIP
MP−OV
MP−TEMP
LT−OV
MP−PRECIP
MF−OV
LT−TEMP
MF−TEMP
ER−TEMP
MP−STYLE
MF−PRECIP
LT−STYLE
ER−PRECIP
LT−PRECIP
MF−STYLE
MP−DIST
MF−DIST
ER−STYLE
ER−DIST
LT−DIST
Feature Influence
0
10
20
30
40
50
60
Figure 6: Influence of each feature. Each name con-
tains the specific types of airports searched and in-
formation used to calculate the feature. LT, ER,
MP, MF denote latest, earliest, most popular, most
frequent airports in the user profile (i.e. the user
airport search sequence); DIST, STYLE, PRECIP,
TEMP denote the distance, attraction style similar-
ity, precipitation and temperature respectively. For
example, LT-DIST means the distance similarity be-
tween the latest airports searched and the target
airport.
is needed to validate this assumption. The proposed tech-
nique also needs to be adjusted and optimized for different
verticals (e.g. automobiles, retail, education, etc.). In our
experiments, we described a way to generate features and
take advantage of behavioral data richness. Even though
in our case the work focuses on features in the travel data
realm, it can be generalized and adapted to other applica-
tions. The regression function is not limited to GBT and
can be replaced by other alternatives depending on the ap-
plication. Our new approach is not specific to data sharing
between data aggregators/exchanges and data buyers, and
we are interested in applying it to data sharing between data
suppliers and data exchanges and other similar scenarios.
7. REFERENCES
[1] R. Agrawal and R. Srikant. Privacy-preserving data
mining. In ACM Sigmod Record, volume 29, pages
439–450. ACM, 2000.
[2] Amazon. Amazon mom program.
http://www.amazon.com/gp/mom/signup/info (last
visited: Jan 2012).
[3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale,
F. McSherry, and K. Talwar. Privacy, accuracy, and
consistency too: a holistic solution to contingency
table release. In Proceedings of the twenty-sixth ACM
SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems, pages 273–282. ACM, 2007.
9. [4] M. Bilenko and M. Richardson. Predictive client-side
profiles for personalized advertising. In Proceedings of
17th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD-11), 2011.
[5] I. Dinur and K. Nissim. Revealing information while
preserving privacy. In Proceedings of the twenty-second
ACM SIGMOD-SIGACT-SIGART symposium on
Principles of database systems, pages 202–210. ACM,
2003.
[6] C. Dwork. Differential privacy. Automata, languages
and programming, pages 1–12, 2006.
[7] C. Dwork. Differential privacy. In Automata,
Languages and Programming, volume 4052 of Lecture
Notes in Computer Science, pages 1–12. Springer
Berlin / Heidelberg, 2006.
[8] C. Dwork, F. McSherry, K. Nissim, and A. Smith.
Calibrating noise to sensitivity in private data
analysis. Theory of Cryptography, pages 265–284, 2006.
[9] S. Fienberg and J. McIntyre. Data swapping:
Variations on a theme by dalenius and reiss. In
Privacy in Statistical Databases, pages 519–519.
Springer, 2004.
[10] A. Goldfarb and C. Tucker. Privacy regulation and
online advertising. Management Science, 57(1):57–71,
2011.
[11] S. Hansell. Aol removes search data on vast group of
web users. New York Times, 8:C4, 2006.
[12] K. LeFevre, D. DeWitt, and R. Ramakrishnan.
Incognito: Efficient full-domain k-anonymity. In
Proceedings of the 2005 ACM SIGMOD international
conference on Management of data, pages 49–60.
ACM, 2005.
[13] N. Li, T. Li, and S. Venkatasubramanian. t-closeness:
Privacy beyond k-anonymity and l-diversity. In Data
Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 106–115. IEEE,
2007.
[14] A. Machanavajjhala, D. Kifer, J. Gehrke, and
M. Venkitasubramaniam. l-diversity: Privacy beyond
k-anonymity. ACM Transactions on Knowledge
Discovery from Data (TKDD), 1(1):3, 2007.
[15] D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke,
and J. Halpern. Worst-case background knowledge for
privacy-preserving data publishing. In Data
Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 126–135. IEEE,
2007.
[16] J. Mayer and A. Narayanan. Do not track: Universal
web tracking opt out. http://donottrack.us/.
[17] F. McSherry and I. Mironov. Differentially private
recommender systems: building privacy into the net.
In KDD ’09: Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and
data mining, pages 627–636. ACM, 2009.
[18] N. Mohammed, R. Chen, B. Fung, and S. Philip.
Differentially private data release for data mining.
Engineer, 18(40):2, 2011.
[19] S. Nabar, K. Kenthapadi, N. Mishra, and R. Motwani.
A survey of query auditing techniques for data privacy.
Privacy-Preserving Data Mining, pages 415–431, 2008.
[20] A. Narayanan and V. Shmatikov. How to break
anonymity of the netflix prize dataset. CoRR, pages
–1–1, 2006.
[21] Netflix. Netflix Prize, 2006(begin), 2009(close).
http://www.netflixprize.com/.
[22] F. Provost, B. Dalessandro, R. Hook, X. Zhang, and
A. Murray. Audience selection for on-line brand
advertising: privacy-friendly social network targeting.
In Proceedings of the 15th ACM SIGKDD
international conference on Knowledge discovery and
data mining, KDD ’09, pages 707–716, New York, NY,
USA, 2009. ACM.
[23] A. Roth and T. Roughgarden. Interactive privacy via
the median mechanism. In Proceedings of the 42nd
ACM symposium on Theory of computing, pages
765–774. ACM, 2010.
[24] P. Samarati. Protecting respondents’ identities in
microdata release. IEEE Transactions on Knowledge
and Data Engineering, pages 1010–1027, 2001.
[25] L. Sweeney et al. k-anonymity: A model for protecting
privacy. International Journal of Uncertainty
Fuzziness and Knowledge Based Systems,
10(5):557–570, 2002.
[26] R. Wong, J. Li, A. Fu, and K. Wang. (α,
k)-anonymity: an enhanced k-anonymity model for
privacy preserving data publishing. In Proceedings of
the 12th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 754–759.
ACM, 2006.
[27] X. Xiao, G. Wang, and J. Gehrke. Differential privacy
via wavelet transforms. In Data Engineering (ICDE),
2010 IEEE 26th International Conference on, pages
225 –236, march 2010.
[28] Q. Zhang, N. Koudas, D. Srivastava, and T. Yu.
Aggregate query answering on anonymized tables. In
Data Engineering, 2007. ICDE 2007. IEEE 23rd
International Conference on, pages 116–125. IEEE,
2007.