Between traditional Business Intelligence and "Big Data" approaches, many companies need to innovate and work in a hybrid manner. How and with what tools can business and technical profiles collaborate productively together? lorian Douetteau, Dataiku's CEO, answers these questions.
2. I’m Florian
CEO of Dataiku
maker Data
Science
Studio,
the « Photoshop for Data Science »
COMMUNITY
EDITION
(it’s
FREE)
http://www.dataiku.com/dss/trynow/
H i !
React on twitter
@fdouetteau
#BigDataParis
15. M L I K E M E T R I C S
How much does it cost to
produce and maintain a
metric ?
How many metrics do I need ?
Do I Follow the right metrics ?
Do I Have enough data ?
Do I Have enough Data?
16. • Self-Service
Build your own metrics
• Analytical Capabilities
Find your patterns
• Large Volume
Store it all
M o r e M e t r i c s M e a n s M o r e M e a n s
17. DATA
MINING
M o r e M e t r i c s M e a n s M o r e A p p l i c a t i o n
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer Consumption
For Anti-Churn
in Utilities
CLASSIC BI
LARGE
PRODUCTION
PLATFORM
DATA
EXPLORATION
Optimization
Filings
For Fraud
in Insurance
18. D
DATA
MINING
TO DAY E A C H O W N A S I T S S TO R E
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
CLASSIC BI
LARGE
PRODUCTION
PLATFORM
DATA
EXPLORATION
Optimization
DATA
WAREHOUSING
DATA MINING
REPOSITORIES DATA LAKE
GOOGLE LIKE
PLATFORM
19. i t ’s n o t j u s t a b o u t t h e m e t r i c s
25. P R E D I C T I V E M A I N CO N F O R T Z O N E
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Reporting
for Finance
in Any Industry
Analyze
Each Tweet
Web Navigation
For E-Merchant
Ticket Data
For Discounts
in Retail
Phone Call
Logs for Security
RTB Data
For Advertising
Customer Consumption
For Anti-Churn
in Utilities
Optimization
Filings
For Fraud
in Insurance
Not Enough
Data To Learn
From ?
Not Enough
“Hard" Examples
So that you can learn
26.
27. Welcome to Technoslavia
Hadoop
Ceph
Sphere
Cassandra
Kafka Flume
Spark
Scikit-Learn GraphLAB
prediction.io jubatus
Mahout
WEKA
MLBase LibSVM
RapidMiner
Panda
Kibana
InfiniDB Drill
Spark SQL
Hive
Impala
…
Elastic Search
SOLR
MongoDB
Riak
Membase
Pig
Cascading
Talend
Machine Learning
Mystery LandScalability Central
SQL Colunnar Republic
Vizualization County Data Clean Wasteland
Statistician Old
House
R
Real-time island
Storm
NOSQL Nihiland
28. E m b r a c e M a n y S k i l l s M a n y - S e t s
Data
Plumberer
BI
Manager
Data
Scientist
Data
Waiter
Data
Cleaner
Business
Analyst
REAL
JOB
DREAM
JOB
29. • Reformulation de la
recherche
• Pas de réponse
• Clic sur un pro
• Top recherche
• Clic de navigation ou filtre
COMMENT AMÉLIORER LA PERTINENCE DE NOS RÉPONSES
VIA L’ANALYSE DU COMPORTEMENT UTILISATEUR ?
20 M
Analyse &
corrections
automatisation
>10
occurrences1,4M
requêtes
>200M
recherches
✗ ✓
0,5M requêtes
priorisées
31. O p t i m i z i n g L a s t M i l e w i t h
D a t a S c i e n c e S t u d i o
Data Science Studio
Historical delivery
and retrieval data
Modeling of a score
for each delivery
Cleaning and temporal
enrichment of data
Data aggregation by
geographic location
Incorporation of new deliveries
to the existing model
by
32. E X P LO R E N E W W O R D S
Mission
Critical
Small
Structured
Large
Diverse
Sheer
Curiosity
Optimization
Optimize
Existing
BI Capabilities Build Mandatory
Large Volume Capabilities
EXPLORE POTENTIAL
NOT BEING RELEVANT
DANGER ZONE
Analytics
Predictive
Self Service
Cluster