SlideShare une entreprise Scribd logo
1  sur  28
Fight against robots with
enbrite.ly data platform
Joe MÉSZÁROS
Joe MÉSZÁROS
lead software engineer
@joemesz
joemeszaros
Who we are?
Our vision is to revolutionize the KPIs and metrics the online
advertisement industry currently using. With our products,
Antifraud, Brandsafety and Viewability we provide actionable
data to our customers.
Ad display fraud
(ad stacking, pixel stuffing)
Ad viewability
Brand safety
Detecting traffic that comes from unwanted
categories (e.g. adult), countries and single domains
39%
39%Anti fraud detection
DATA
COLLECTION
ANALYZE
DATA PROCESSION
ANTI FRAUD
VIEWABILITY
BRAND SAFETY
REPORT + API
What we do?
How we do? DATA PLATFORM
...so we need do analyze vast amount of data
Infrastucture Big Data
technologies
+
enbrite.ly
data
platform
=
Amazon Web Services (AWS)
● Most popular cloud service provider
● ~70 services, 13 geographical "regions"
● Amazon Big Data = Elastic Map Reduce
● BUT Do not trust the BIG guy (API problem)
https://aws.amazon.com/
Apache Hadoop
● de facto Big Data technology
● open source software
● distributed storage (HDFS) + data processing
(MapReduce)
● ecosystem: many additional softwares
http://hadoop.apache.org/ | https://github.com/apache/hadoop
Apache Spark
● large-scale data processing engine
● open source software (popular)
● modules: core, sql, sreaming, graph, ML
● faster than Hadoop MapReduce
http://spark.apache.org/ | https://github.com/apache/spark
Data platform in numbers
20+ node cluster
16 services 110 servers
0.5 - 4 TB /day
100+ TB on S3
How we do? DATA COLLECTION
How we do? DATA PROCESSION
Let me tell you a short story...
Real world example
You have a simple idea to detect bot traffic, which saves
the world. Let’s implement it!
Real world example
THE IDEA: Analyse events which are too hasty and deviate from
regular, humanlike profiles: too many clicks in a defined
timeframe.
INPUT: Collected events on Amazon S3
OUTPUT: Invalid sessions
Step 1: sessionize events
How to solve it?
Step 2: detect too many clicks
code: https://github.com/enbritely/startup-safary
Step 1: event to session
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
Application code : https://github.com/enbritely/startup-safary
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
JavaPairRDD<String, List<Event>> grouped = clicks
.groupBy(Event::sessionId);
//configure Spark application
//read events from HDFS
JavaRDD<Event> events = lines.map(Converter::jsonToEvent);
JavaRDD<Event> clicks = events.filter(e ->
e.type.equals("click"));
JavaPairRDD<String, List<Event>> grouped = clicks
.groupBy(Event::sessionId);
JavaRDD<Session> sessions = grouped.mapValues(sessionizer);
Step 1: event to session
//Sessionizer
(Function<Iterable<Event>, Session>) unorderedEvents -> {
List<Event> clickOrdered = sortyByTimestamp(unorderedEvents);
Session session = new Session(sessionId);
for (Event event: clickOrdered) {
session.addClick(event.getTimestamp());
}
return session;
}
Application code : https://github.com/enbritely/startup-safary
Step 2: apply heuristic
Application code : https://github.com/enbritely/startup-safary
JavaRDD<String> badSessions = sessions
.filter(s -> s.getClickCount() > threshold)
.map(s -> s.sessionId + ":" + s.clickCount);
// save output to HDFS
Live demo!
● 4 node EMR (Hadoop) Cluster
● Apache Spark 1.6.1
● 1 GB input events
build app : create-cluster : events S3 -> HDFS : submit app
Congratulation!
MISSION COMPLETED
YOU just saved the world with a
simple idea within ~10 minutes.
WE ARE HIRING!
working @exPrezi office, K9
check out the company in Forbes :-)
amazing company culture
BUT the real reason ….
WE ARE HIRING!
… is our mood manager, Bigyó :)
BEYOND enbrite.ly
...our investor and event sponsor
is looking for talented guys
Joe MÉSZÁROS
lead software engineer
joe@enbrite.ly
@joemesz
@enbritely
joemeszaros
enbritely
THANK YOU!
QUESTIONS?

Contenu connexe

Tendances

DevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps ToolchainsDevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps ToolchainsChris Gates
 
AWS Survival Guide
AWS Survival GuideAWS Survival Guide
AWS Survival GuideKen Johnson
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваMail.ru Group
 
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017Codemotion
 
HAProxy as Egress Controller
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress ControllerJulien Pivotto
 
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Codemotion
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Codemotion
 
HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩smalltown
 
Dynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningSean Chittenden
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityJulien Pivotto
 
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...Matt Raible
 
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 201910 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019Matt Raible
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
Incident Resolution as Code
Incident Resolution as CodeIncident Resolution as Code
Incident Resolution as CodeJulien Pivotto
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations CenterJimmy Mesta
 
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...Codemotion
 
JavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the CloudJavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the CloudArun Gupta
 
Managing secrets at scale
Managing secrets at scaleManaging secrets at scale
Managing secrets at scaleAlex Schoof
 
You wouldn't build a toast, would you?
You wouldn't build a toast, would you?You wouldn't build a toast, would you?
You wouldn't build a toast, would you?Yan Cui
 

Tendances (20)

DevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps ToolchainsDevOOPS: Attacks and Defenses for DevOps Toolchains
DevOOPS: Attacks and Defenses for DevOps Toolchains
 
AWS Survival Guide
AWS Survival GuideAWS Survival Guide
AWS Survival Guide
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга Свиридова
 
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
Yunong Xiao - The Paved PaaS to Microservices - Codemotion Milan 2017
 
HAProxy as Egress Controller
HAProxy as Egress ControllerHAProxy as Egress Controller
HAProxy as Egress Controller
 
ruxc0n 2012
ruxc0n 2012ruxc0n 2012
ruxc0n 2012
 
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
 
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
Security Testing with OWASP ZAP in CI/CD - Simon Bennetts - Codemotion Amster...
 
HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩HashiCorp Vault Workshop:幫 Credentials 找個窩
HashiCorp Vault Workshop:幫 Credentials 找個窩
 
Dynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency Planning
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
10 Excellent Ways to Secure Your Spring Boot Application - The Secure Develop...
 
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 201910 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
10 Excellent Ways to Secure Your Spring Boot Application - Devoxx Belgium 2019
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Incident Resolution as Code
Incident Resolution as CodeIncident Resolution as Code
Incident Resolution as Code
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations Center
 
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
Erik Wendel - Beyond JavaScript Frameworks: Writing Reliable Web Apps With El...
 
JavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the CloudJavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
JavaOne India 2011 - Running your Java EE 6 Apps in the Cloud
 
Managing secrets at scale
Managing secrets at scaleManaging secrets at scale
Managing secrets at scale
 
You wouldn't build a toast, would you?
You wouldn't build a toast, would you?You wouldn't build a toast, would you?
You wouldn't build a toast, would you?
 

En vedette

Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyMészáros József
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
The Drift Brand Book
The Drift Brand BookThe Drift Brand Book
The Drift Brand BookDrift
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013Kristin Matson
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Privacy is for losers 2016
Privacy is for losers 2016Privacy is for losers 2016
Privacy is for losers 2016Cain Ransbottyn
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Luminary Labs
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksSlideShare
 
UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)Ned Potter
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great InfographicsSlideShare
 

En vedette (20)

Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.ly
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
The Drift Brand Book
The Drift Brand BookThe Drift Brand Book
The Drift Brand Book
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013BigWeatherGear Group and Corporate Services Brochure 2013
BigWeatherGear Group and Corporate Services Brochure 2013
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Privacy is for losers 2016
Privacy is for losers 2016Privacy is for losers 2016
Privacy is for losers 2016
 
Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016Building Healthier Communities: TEDMED 2016
Building Healthier Communities: TEDMED 2016
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)UX at York: starting small and scaling up (#nclxux)
UX at York: starting small and scaling up (#nclxux)
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
 

Similaire à Startup Safary | Fight against robots with enbrite.ly data platform

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Web APIs & Apps - Mozilla
Web APIs & Apps - MozillaWeb APIs & Apps - Mozilla
Web APIs & Apps - MozillaRobert Nyman
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases WSO2
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Honeypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinHoneypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinPhillip Maddux
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersAll Things Open
 
Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for LaunchCraig Phares
 
APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...apidays
 
Outsmarting smartphones
Outsmarting smartphonesOutsmarting smartphones
Outsmarting smartphonesSensePost
 
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습Oracle Korea
 
CQRS and Event Sourcing
CQRS and Event Sourcing CQRS and Event Sourcing
CQRS and Event Sourcing Inho Kang
 
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...Windows Developer
 
So You Want a Job in Cybersecurity
So You Want a Job in CybersecuritySo You Want a Job in Cybersecurity
So You Want a Job in CybersecurityTeri Radichel
 
2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum2016 IBM Watson IoT Forum
2016 IBM Watson IoT ForumDeirdre Curran
 
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台Mike Chang
 
AnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior CreativelyAnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior CreativelyKiana Tennyson
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...Ted Chien
 

Similaire à Startup Safary | Fight against robots with enbrite.ly data platform (20)

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Web APIs & Apps - Mozilla
Web APIs & Apps - MozillaWeb APIs & Apps - Mozilla
Web APIs & Apps - Mozilla
 
NSA for Enterprises Log Analysis Use Cases
NSA for Enterprises   Log Analysis Use Cases NSA for Enterprises   Log Analysis Use Cases
NSA for Enterprises Log Analysis Use Cases
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Honeypots, Deception, and Frankenstein
Honeypots, Deception, and FrankensteinHoneypots, Deception, and Frankenstein
Honeypots, Deception, and Frankenstein
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile Developers
 
Bootstrapping an App for Launch
Bootstrapping an App for LaunchBootstrapping an App for Launch
Bootstrapping an App for Launch
 
APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...APIsecure 2023 - Learning from a decade of API breaches and why application-c...
APIsecure 2023 - Learning from a decade of API breaches and why application-c...
 
Outsmarting smartphones
Outsmarting smartphonesOutsmarting smartphones
Outsmarting smartphones
 
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
[Hands-on] CQRS(Command Query Responsibility Segregation) 와 Event Sourcing 패턴 실습
 
CQRS and Event Sourcing
CQRS and Event Sourcing CQRS and Event Sourcing
CQRS and Event Sourcing
 
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
Data-Driven and User-Centric: Improving enterprise productivity and engagemen...
 
So You Want a Job in Cybersecurity
So You Want a Job in CybersecuritySo You Want a Job in Cybersecurity
So You Want a Job in Cybersecurity
 
cybersecurity-careers.pdf
cybersecurity-careers.pdfcybersecurity-careers.pdf
cybersecurity-careers.pdf
 
2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum
 
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
2016 ibm watson io t forum 躍升雲端 敏捷打造物聯網平台
 
AnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior CreativelyAnDevCon - Tracking User Behavior Creatively
AnDevCon - Tracking User Behavior Creatively
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
 

Dernier

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Dernier (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Startup Safary | Fight against robots with enbrite.ly data platform

  • 1. Fight against robots with enbrite.ly data platform Joe MÉSZÁROS
  • 2. Joe MÉSZÁROS lead software engineer @joemesz joemeszaros
  • 3. Who we are? Our vision is to revolutionize the KPIs and metrics the online advertisement industry currently using. With our products, Antifraud, Brandsafety and Viewability we provide actionable data to our customers.
  • 4. Ad display fraud (ad stacking, pixel stuffing) Ad viewability
  • 5. Brand safety Detecting traffic that comes from unwanted categories (e.g. adult), countries and single domains
  • 7.
  • 9. How we do? DATA PLATFORM ...so we need do analyze vast amount of data Infrastucture Big Data technologies + enbrite.ly data platform =
  • 10. Amazon Web Services (AWS) ● Most popular cloud service provider ● ~70 services, 13 geographical "regions" ● Amazon Big Data = Elastic Map Reduce ● BUT Do not trust the BIG guy (API problem) https://aws.amazon.com/
  • 11. Apache Hadoop ● de facto Big Data technology ● open source software ● distributed storage (HDFS) + data processing (MapReduce) ● ecosystem: many additional softwares http://hadoop.apache.org/ | https://github.com/apache/hadoop
  • 12. Apache Spark ● large-scale data processing engine ● open source software (popular) ● modules: core, sql, sreaming, graph, ML ● faster than Hadoop MapReduce http://spark.apache.org/ | https://github.com/apache/spark
  • 13. Data platform in numbers 20+ node cluster 16 services 110 servers 0.5 - 4 TB /day 100+ TB on S3
  • 14. How we do? DATA COLLECTION
  • 15. How we do? DATA PROCESSION
  • 16. Let me tell you a short story...
  • 17. Real world example You have a simple idea to detect bot traffic, which saves the world. Let’s implement it!
  • 18. Real world example THE IDEA: Analyse events which are too hasty and deviate from regular, humanlike profiles: too many clicks in a defined timeframe. INPUT: Collected events on Amazon S3 OUTPUT: Invalid sessions
  • 19. Step 1: sessionize events How to solve it? Step 2: detect too many clicks code: https://github.com/enbritely/startup-safary
  • 20. Step 1: event to session //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); Application code : https://github.com/enbritely/startup-safary //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); JavaPairRDD<String, List<Event>> grouped = clicks .groupBy(Event::sessionId); //configure Spark application //read events from HDFS JavaRDD<Event> events = lines.map(Converter::jsonToEvent); JavaRDD<Event> clicks = events.filter(e -> e.type.equals("click")); JavaPairRDD<String, List<Event>> grouped = clicks .groupBy(Event::sessionId); JavaRDD<Session> sessions = grouped.mapValues(sessionizer);
  • 21. Step 1: event to session //Sessionizer (Function<Iterable<Event>, Session>) unorderedEvents -> { List<Event> clickOrdered = sortyByTimestamp(unorderedEvents); Session session = new Session(sessionId); for (Event event: clickOrdered) { session.addClick(event.getTimestamp()); } return session; } Application code : https://github.com/enbritely/startup-safary
  • 22. Step 2: apply heuristic Application code : https://github.com/enbritely/startup-safary JavaRDD<String> badSessions = sessions .filter(s -> s.getClickCount() > threshold) .map(s -> s.sessionId + ":" + s.clickCount); // save output to HDFS
  • 23. Live demo! ● 4 node EMR (Hadoop) Cluster ● Apache Spark 1.6.1 ● 1 GB input events build app : create-cluster : events S3 -> HDFS : submit app
  • 24. Congratulation! MISSION COMPLETED YOU just saved the world with a simple idea within ~10 minutes.
  • 25. WE ARE HIRING! working @exPrezi office, K9 check out the company in Forbes :-) amazing company culture BUT the real reason ….
  • 26. WE ARE HIRING! … is our mood manager, Bigyó :)
  • 27. BEYOND enbrite.ly ...our investor and event sponsor is looking for talented guys
  • 28. Joe MÉSZÁROS lead software engineer joe@enbrite.ly @joemesz @enbritely joemeszaros enbritely THANK YOU! QUESTIONS?