5. WE FIND PROSPECTS THAT ARE IN MARKET TO BUY
WE ARE THE CENTRAL NERVOUS SYSTEM
EMPOWERING ALL MARKETING, SALES AND BIZ
6sense
EMPOWERING ALL MARKETING, SALES AND BIZ
OPERATIONS TEAMS
AS A TEAM, WE LIVE ON: DATA, STATISTICS AND
BEER
6. CTO & CO-FOUNDER @ 6SENSE
EARLY HADOOP ADOPTER (LATE 2008)
about.me
3B+ EVENTS PER DAY
FUN FACT: Used a sledgehammer to unrack my first
hadoop cluster
7. Predict who is in-market to buy!!
eg: Company XYZ is 90% going to buy routers in next
90 days.
Problem
90 days.
What kind of data do we need…. A lot!
9. Research patterns are different for different products
- Expensive routers
Insights
- Expensive routers
- Freemium cloud services
- Open source tools (think H2O)
10. Need to build different models for each product
Data Science Needs
Plus, we don’t like to make our life’s easy :)
- Where’s the fun in easy ?
- Need to build 4 models per product
11. Need to build different models for each product
Data Science Needs
Plus, we don’t like to make our life’s easy :)
- Where’s the fun in easy ?
- Need to build 4 models per product
100’S OF MODELS IN PROD
17. Scikit-Learn or H2O
Output Types: pickle files or pojo
Modeling
Output Types: pickle files or pojo
Script to promote model to production
Puts all artifacts used in s3
eg: data, stats, queries
19. Multiple Models for same prediction
Model 1 Model Stats
Continue
Prod Pipeline
Model 2 Model Stats
Model 3 Model Stats
20. Same pipeline as before…
Output written to temporary tables
use templating to switch settings at runtime
Experimental Modeling
use templating to switch settings at runtime
Stats compared to production runs
top decile
raw data for top-100 items
21. Platform : AWS
Backend: Hadoop, Hive, Presto, Redshift… and a lot more
Tech Stack
ML: H2O, Scikit-Learn
Ops: Fabric, Mesos, Docker, Marathon and home-grown
tools