H2O World - Data Science in Action @ 6sense - Viral Bajaria

•

0 j'aime•1,173 vues

Sri Ambati

H2O World 2015 - Viral Bajria @ 6sense

Logiciels

Viral Bajaria, CTO & Co-Founder
@viralbajaria
@6senseInc
#h2oworld #6sense
#bestpresentationyet

Let’s start with a prediction
BEST TALK @ H2O WORLD!!BEST TALK @ H2O WORLD!!

ONLY 1 FEATURE!
BEST TALK @ H2O WORLD
TOP FEATURE: SAYS MY MOM

PREDICTION ?
BEST TALK @ H2O WORLD
LOW PRECISION, LOW RECALL

WE FIND PROSPECTS THAT ARE IN MARKET TO BUY
WE ARE THE CENTRAL NERVOUS SYSTEM
EMPOWERING ALL MARKETING, SALES AND BIZ
6sense
EMPOWERING ALL MARKETING, SALES AND BIZ
OPERATIONS TEAMS
AS A TEAM, WE LIVE ON: DATA, STATISTICS AND
BEER

CTO & CO-FOUNDER @ 6SENSE
EARLY HADOOP ADOPTER (LATE 2008)
about.me
3B+ EVENTS PER DAY
FUN FACT: Used a sledgehammer to unrack my first
hadoop cluster

Predict who is in-market to buy!!
eg: Company XYZ is 90% going to buy routers in next
90 days.
Problem
90 days.
What kind of data do we need…. A lot!

1st Party:
- Web (eg. apache logs)
- Marketing Automation (eg. Eloqua)
- CRM (eg. Salesforce)
Data Needs
- CRM (eg. Salesforce)
6sense Data Network:
- Publishers
- Ads
- Blogs

Research patterns are different for different products
- Expensive routers
Insights
- Expensive routers
- Freemium cloud services
- Open source tools (think H2O)

Need to build different models for each product
Data Science Needs
Plus, we don’t like to make our life’s easy :)
- Where’s the fun in easy ?
- Need to build 4 models per product

Processing Pipeline
Web
Identify
Companies
Identify
Contacts
Customer
Contacts
Sales
Normalize
Companies
Custom
Data Set
Make
Consistent

Modeling
Baseilne
Model
Model
Stats
Modeling
Predictive
Model

Scikit-Learn or H2O
Output Types: pickle files or pojo
Modeling
Output Types: pickle files or pojo
Script to promote model to production
Puts all artifacts used in s3
eg: data, stats, queries

Modeling
Model Info
• Name
• Type• Type
• Binary Location
• Active
• ……..

Multiple Models for same prediction
Model 1 Model Stats
Continue
Prod Pipeline
Model 2 Model Stats
Model 3 Model Stats

Same pipeline as before…
Output written to temporary tables
use templating to switch settings at runtime
Experimental Modeling
use templating to switch settings at runtime
Stats compared to production runs
top decile
raw data for top-100 items

Platform : AWS
Backend: Hadoop, Hive, Presto, Redshift… and a lot more
Tech Stack
ML: H2O, Scikit-Learn
Ops: Fabric, Mesos, Docker, Marathon and home-grown
tools

THANK YOU!
VIRAL BAJARIA, CTO & CO-FOUNDER
viral@6sense.com
@viralbajaria
@6senseInc

Recommandé

H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik HuddlestonSri Ambati

H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati

H2O World - Machine Learning for non-data scientistsSri Ambati

H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati

Frank Bien Opening Keynote - Join 2016Looker

Join 2017_Deep Dive_To Use or Not Use PDT'sLooker

Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku

Recommandé

H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik HuddlestonSri Ambati

H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati

H2O World - Machine Learning for non-data scientistsSri Ambati

H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati

H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati

Frank Bien Opening Keynote - Join 2016Looker

Join 2017_Deep Dive_To Use or Not Use PDT'sLooker

Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku

H2O World - Intro to Data Science with Erin LedellSri Ambati

H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati

Operationalizing analytics to scaleLooker

Dataiku productive application to production - pap is may 2015 Dataiku

Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels

The paradox of big data - dataiku / oxalide APEROTECHDataiku

Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab

The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku

Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Databricks

Online Games Analytics - Data Science for FunDataiku

Walmart Big Data ExpoBigDataExpo

Data Modeling for Security, Privacy and Data ProtectionKaren Lopez

Satyam open analytics nycOpen Analytics

How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io

H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthySri Ambati

Wisdom of Crowds Webinar DeckLooker

Before KagglePierre Gutierrez

Eat whatever you can with PyBabeDataiku

H2O World - Learning How Humans and Non-Humans Interact with Digital AdsSri Ambati

Analyzing Unstructured Data in Hadoop WebinarDatameer

danmcclary-pspresentation-katieboyle-171030115522.pdfssuser3ee399

Why Big and Small Data Is Important by Google's Product ManagerProduct School

Contenu connexe

Tendances

H2O World - Intro to Data Science with Erin LedellSri Ambati

H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati

Operationalizing analytics to scaleLooker

Dataiku productive application to production - pap is may 2015 Dataiku

Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels

The paradox of big data - dataiku / oxalide APEROTECHDataiku

Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab

The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku

Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Databricks

Online Games Analytics - Data Science for FunDataiku

Walmart Big Data ExpoBigDataExpo

Data Modeling for Security, Privacy and Data ProtectionKaren Lopez

Satyam open analytics nycOpen Analytics

How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io

H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthySri Ambati

Wisdom of Crowds Webinar DeckLooker

Before KagglePierre Gutierrez

Eat whatever you can with PyBabeDataiku

H2O World - Learning How Humans and Non-Humans Interact with Digital AdsSri Ambati

Analyzing Unstructured Data in Hadoop WebinarDatameer

Tendances (20)

H2O World - Intro to Data Science with Erin Ledell

H2O World - What you need before doing predictive analysis - Keen.io

Operationalizing analytics to scale

Dataiku productive application to production - pap is may 2015

Back to Square One: Building a Data Science Team from Scratch

The paradox of big data - dataiku / oxalide APEROTECH

Domino and AWS: collaborative analytics and model governance at financial ser...

The Rise of the DataOps - Dataiku - J On the Beach 2016

Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...

Online Games Analytics - Data Science for Fun

Walmart Big Data Expo

Data Modeling for Security, Privacy and Data Protection

Satyam open analytics nyc

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy

Wisdom of Crowds Webinar Deck

Before Kaggle

Eat whatever you can with PyBabe

H2O World - Learning How Humans and Non-Humans Interact with Digital Ads

Analyzing Unstructured Data in Hadoop Webinar

Similaire à H2O World - Data Science in Action @ 6sense - Viral Bajaria

danmcclary-pspresentation-katieboyle-171030115522.pdfssuser3ee399

Why Big and Small Data Is Important by Google's Product ManagerProduct School

Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeDataWorks Summit

Splunk/Socialize at Hadoop SummitIsaac Mosquera

Hadoop summit socialize_v1.0Isaac Mosquera

A6 big data_in_the_cloudDr. Wilfred Lin (Ph.D.)

Smarter Analytics: Supporting the Enterprise with AutomationInside Analysis

SpatzAI - A referee toolkit to protect bold ideasDesmond Sherlock

SpatzAI - A referee toolkit to protect bold idea-sharingDesmond Sherlock

How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku

SpatzAI - A referee toolkit protecting bold idea-sharingDesmond Sherlock

SpatzAI - A referee toolkit to protect bold ideasDesmond Sherlock

Continuum Analytics and PythonTravis Oliphant

Why Hadoop is the New Infrastructure for the CMO?BigDataCloud

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks

Data In Action: Business Value of DataMatt Turner

Big Data and Hadoop in the CloudAmazon Web Services LATAM

Ducksboardbetabeers

Making the Most of Customer DataWSO2

NYC Data Driven Business Meetup - 2.7.17Karl Pawlewicz

Similaire à H2O World - Data Science in Action @ 6sense - Viral Bajaria (20)

danmcclary-pspresentation-katieboyle-171030115522.pdf

Why Big and Small Data Is Important by Google's Product Manager

Big Data at the Speed of Business: Lessons Learned from Leading at the Edge

Splunk/Socialize at Hadoop Summit

Hadoop summit socialize_v1.0

A6 big data_in_the_cloud

Smarter Analytics: Supporting the Enterprise with Automation

SpatzAI - A referee toolkit to protect bold ideas

SpatzAI - A referee toolkit to protect bold idea-sharing

How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

SpatzAI - A referee toolkit protecting bold idea-sharing

SpatzAI - A referee toolkit to protect bold ideas

Continuum Analytics and Python

Why Hadoop is the New Infrastructure for the CMO?

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...

Data In Action: Business Value of Data

Big Data and Hadoop in the Cloud

Ducksboard

Making the Most of Customer Data

NYC Data Driven Business Meetup - 2.7.17

Plus de Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

Generative AI Masterclass - Model Risk Management.pptxSri Ambati

AI and the Future of Software Development: A Sneak Peek Sri Ambati

LLMOps: Match report from the top of the 5thSri Ambati

Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati

Risk Management for LLMsSri Ambati

Open-Source AI: Community is the WaySri Ambati

Building Custom GenAI Apps at H2OSri Ambati

Applied Gen AI for the Finance Vertical Sri Ambati

Cutting Edge Tricks from LLM PapersSri Ambati

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati

LLM Interpretability Sri Ambati

Never Reply to an Email AgainSri Ambati

Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati

AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati

Plus de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day

Generative AI Masterclass - Model Risk Management.pptx

AI and the Future of Software Development: A Sneak Peek

LLMOps: Match report from the top of the 5th

Building, Evaluating, and Optimizing your RAG App for Production

Building LLM Solutions using Open Source and Closed Source Solutions in Coher...

Risk Management for LLMs

Open-Source AI: Community is the Way

Building Custom GenAI Apps at H2O

Applied Gen AI for the Finance Vertical

Cutting Edge Tricks from LLM Papers

Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...

Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...

KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...

LLM Interpretability

Never Reply to an Email Again

Introducción al Aprendizaje Automatico con H2O-3 (1)

From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...

AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...

AI Foundations Course Module 1 - An AI Transformation Journey

Dernier

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba

Announcing Codolex 2.0 from GDK SoftwareJim McKeeth

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba

WSO2CON2024 - It's time to go PlatformlessWSO2

%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2

8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba

%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

Dernier (20)

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

Announcing Codolex 2.0 from GDK Software

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

WSO2CON2024 - It's time to go Platformless

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

8257 interfacing 2 in microprocessor for btech students

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

%in Midrand+277-882-255-28 abortion pills for sale in midrand

%in Harare+277-882-255-28 abortion pills for sale in Harare

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

H2O World - Data Science in Action @ 6sense - Viral Bajaria

1. Viral Bajaria, CTO & Co-Founder @viralbajaria @6senseInc #h2oworld #6sense #bestpresentationyet

2. Let’s start with a prediction BEST TALK @ H2O WORLD!!BEST TALK @ H2O WORLD!!

3. ONLY 1 FEATURE! BEST TALK @ H2O WORLD TOP FEATURE: SAYS MY MOM

4. PREDICTION ? BEST TALK @ H2O WORLD LOW PRECISION, LOW RECALL

5. WE FIND PROSPECTS THAT ARE IN MARKET TO BUY WE ARE THE CENTRAL NERVOUS SYSTEM EMPOWERING ALL MARKETING, SALES AND BIZ 6sense EMPOWERING ALL MARKETING, SALES AND BIZ OPERATIONS TEAMS AS A TEAM, WE LIVE ON: DATA, STATISTICS AND BEER

6. CTO & CO-FOUNDER @ 6SENSE EARLY HADOOP ADOPTER (LATE 2008) about.me 3B+ EVENTS PER DAY FUN FACT: Used a sledgehammer to unrack my first hadoop cluster

7. Predict who is in-market to buy!! eg: Company XYZ is 90% going to buy routers in next 90 days. Problem 90 days. What kind of data do we need…. A lot!

8. 1st Party: - Web (eg. apache logs) - Marketing Automation (eg. Eloqua) - CRM (eg. Salesforce) Data Needs - CRM (eg. Salesforce) 6sense Data Network: - Publishers - Ads - Blogs

9. Research patterns are different for different products - Expensive routers Insights - Expensive routers - Freemium cloud services - Open source tools (think H2O)

10. Need to build different models for each product Data Science Needs Plus, we don’t like to make our life’s easy :) - Where’s the fun in easy ? - Need to build 4 models per product

11. Need to build different models for each product Data Science Needs Plus, we don’t like to make our life’s easy :) - Where’s the fun in easy ? - Need to build 4 models per product 100’S OF MODELS IN PROD

12. Data Sync Pipeline

13. Data Sync Pipeline

14. Pre Processing Pipeline MOST IMPORTANT

15. Processing Pipeline Web Identify Companies Identify Contacts Customer Contacts Sales Normalize Companies Custom Data Set Make Consistent

16. Modeling Baseilne Model Model Stats Modeling Predictive Model

17. Scikit-Learn or H2O Output Types: pickle files or pojo Modeling Output Types: pickle files or pojo Script to promote model to production Puts all artifacts used in s3 eg: data, stats, queries

18. Modeling Model Info • Name • Type• Type • Binary Location • Active • ……..

19. Multiple Models for same prediction Model 1 Model Stats Continue Prod Pipeline Model 2 Model Stats Model 3 Model Stats

20. Same pipeline as before… Output written to temporary tables use templating to switch settings at runtime Experimental Modeling use templating to switch settings at runtime Stats compared to production runs top decile raw data for top-100 items

21. Platform : AWS Backend: Hadoop, Hive, Presto, Redshift… and a lot more Tech Stack ML: H2O, Scikit-Learn Ops: Fabric, Mesos, Docker, Marathon and home-grown tools

22. Questions ?? JOBS@6SENSE.COM

23. THANK YOU! VIRAL BAJARIA, CTO & CO-FOUNDER viral@6sense.com @viralbajaria @6senseInc