3. Image Analyzer for Creative Tester
User Issue / Yahoo! Challenge
Roadmap Theme & Goal
• Creative Tester gets approximately around a
million creatives per day to be tested for
malicious content. Of this 2 %– 5 % of adverts
are of category windows mimic. These needs
to be detected and banned at the earliest, with
less human intervention.
• Need to validate brand safety and ensure
quality impressions for advertisers.
• Trust and Safety team in collaboration with
Sciences came up with a Image Analyzer
module that can detect the malicious
advertisements like windows mimic or fake
brands with phony downloads and tag them
appropriately to be banned.
Value Proposition/Positioning – To reduce
the manual effort in recognizing and
banning of malicious advertisements that
can be visually identified as fraudulent
3
12/4/2013
4. IONIX / CT Ecosystem
Downloaders
(Chrome,
Firefox, IE)
Cqueuer
(RMX
Apps)
Primary/Secondary
Creative/Click_URL Review
IONIX
Minbar/Technical
Tags
Creatives/LineItems gets
banned with Min-Bar
Min-bar /
Technical
Tags
Classifiers
Creative
Tester
(CT)
Creative Feed based
on Advertisers profile
Domain
Lookup
Service
Virus
Checker
(ClamAv /
Trend Micro)
Image
Analyzer
TRF_PRO
D DB
Creatives Banned
Media Guard Manual
Audit Queue.
4
Media Trust
(3rd Party)
Flash
Checker
12/4/2013
11. What Does Success Look Like
• Who are the customers?
– RMX and APT creative serving systems.
– Moneyball (Going forward)
• Success metrics
– Reducing the manual effort needed in identifying win mimic based
advertisements
– This would be measured by the confidence score generated by the system, that
would eventually help us do everything automated
– Reduction in customer complaints.
• Key business stakeholders who have/will validate success
– Serving systems
– Business teams
– Manual review teams
11
12/4/2013
12. Competitive Landscape
• 3rd party ad verification companies.
etc.,
• What differentiates our product/Solution?
–
–
–
–
Avoiding the need to expose and send out demand inventory.
Flexibility to keep improvising the algorithms for higher precision/recall.
Quick turn around time for validation.
Building highly targeted models ( for ex: fake facebook, or fake adobe)
12
12/4/2013
Notes de l'éditeur
Cequer -> Ionix -> Can send to DLSor to CT -> set of tests like downloader, virus checker, flash checker etc.,classifier for landing page detection and also 3rd party validations.The results are tagged and sent back.Business logic on what action to take will be done by cequer.
Feature Extraction:In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Result of application of local neighborhood operation on the image. Neighborhood operation means, going to every point and then applying a function on that point, based on its neighbor. Visit each point p in the image data and do { N = a neighborhood or region of the image data around the point p result(p) = f(N)} Edge detection:Edge detection is the name for a set of mathematical methods which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. Corner detection:Interest point detection or corner can be defined as the intersection of two edges. A corner can also be defined as a point for which there are two dominant and different edge directions in a local neighborhood of the point. Blob detection:Informally, a blob is a region of a digital image in which some properties are constant or vary within a prescribed range of values; all the points in a blob can be considered in some sense to be similar to each other.Sift - Scale invariant feature transform:Sift algorithm will calculate features that are scale invariant. Which means the image can still be recognized, when it is rotated or scaled or when viewed from a different view point.CBOW - Contextual Bag of WordsIn computer vision, the bag-of-words model (BoW model) can be applied to image classification, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a sparse vector of occurrence counts of a vocabulary of local image features.SVM:a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks.