SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Knowledge Discovery
& Data Mining
22nd ACM SIGKDD 2016
André Karpištšenko
~80 sessions for 2,700 participants
• Business Applications and Frameworks at Scale
• Data Streams Mining
• DashOpt features
• Outlier Detection
• Bayesian Optimization
• Deep Learning
• Investing into AI and Data
• Bonus keywords
88 countries, 35% YoY, 15-20% acceptance
Business Application Examples
• Consumer Internet focus: Content Ranking,
Recommendation, User Intent and Context Prediction
• Industrial Internet focus: Autonomy, Predictive
Maintenance, Operational Intelligence, Production Planning
• B2B focus: Targeting, Lead Generation, Sales Development,
Opportunity Management, Account Management
• Web content analytics: Image, Video, Text Classification for
Relevance, Products Categorization, Sentiments
• Other: Cyber Security, Fraud/Spam Detection, NLP, Speech
Recognition, Image/Video Recognition
Predictive Modeling Flow
DashOpt
Feature
Engineering
Raw
Data
Raw
Features
Labels
Feature
Integration
Features
with Labels
Data
Partitioning
Training
Data
Validation
Data
Testing
Data
Model Training
Evaluate for
model selection
Compute offline
evaluation metrics
Best model
Offline scoring
and indexing
Online/offline
systems
Online A/B test
Label
preparation Log data
Scoring
features
Raw features
Feature
integration
Model
Performance
Test Results
Data Technologies
• Most common: HDFS, MapReduce, Spark, Hive
• Decision spectrum: build, assemble, buy
• Factors: network, open-source, maturity, needs
Exploratory Production
Classic ML Platform at Scale
Workflows
HDFS Feature Mart Ground Truth Models Scores
MapReduce / Yarn
Workflow Scheduler & Manager
Workflows Workflows
Intelligence Engine
Metadata
Store
Pig, Hive,
Python, Scala,
Shell script, …
Feature Engineering
Libraries
Machine Learning
Libraries
Drivers
Drivers
Application 1 Application 2 Application N
…
http://github.com/szilard/benchm-ml
Baseline Performance Benchmark
https://sites.google.com/site/iotminingtutorial/
IoT Data Streams Mining
• 10 billion devices in 2015 -> 34 billion devices by 2020
• Continuous data, dynamic models, distributed, few seconds
Streams Mining: Actors Model
Data processing pipeline Distributed processing
Kappa Architecture
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
Outlier Detection
• Single point anomaly detection: likelihood over distribution
• Finding anomalous groups: divergence estimation
• Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme
Studentized Deviate) test, Seasonal Hybrid ESD, etc.
• Goal: move from detection to automated response
Outlier Detection in Practice
• Too many detections of too little value
• Use methods for thresholds
• Breakout detection and Concept Drift
• For changing distributions move baselines over time
• Risk of overfitting to known anomalies, not finding unknown anomalies
Bayesian aka Active Optimization
• Examples: Design of Experiments, hyper-parameters of
supervised learning, algorithms tested with simulations
f is an unknown expensive black-box function with the goal to
approximately optimize f with as few experiments as possible
• No free lunch theorem
• Other bio-inspired
algorithms for
optimization exploitation
and exploration: neural
networks, genetic
algorithms, swarm
intelligence, ant colony
optimisation, etc.
Bayesian Optimization in Practice
• SigOpt experience: 20 dimensions, above human capacity.
• Uber ATC experience: scaling active optimization to high
dimensions as the default works reliably for 5-7 dim.
• Variables are added during optimization.
• Choose fidelity using heuristics.
Deep Learning
Deep Learning
• Compute power, GPU, learning architectures and a lot of
labeled data are what drive DL
• Applied for Vision and Speech: matches human performance
• Not possible where experiments are costly: biotech
• Kaggle winners are not DL models: tree ensembles, SVMs
• Common technologies: TensorFlow, Caffe, Theano, Keras
• Thousands of pieces of software: modules and layers
• Explainability and interpretability are the next big things
• EU regulation. Tradeoff: accuracy vs explainability.
Deep Learning Trends
• Vision nets are deeper and structured (Larsson 2016)
• Language nets have also dynamics, memory and attention
(Rocktaschel 2016, Miller 2016)
• Probabilistic programming (Lake, Tenenbaum)
• Programs as networks (Riedel)
• The Neural Programmer and Interpreter for learning programs
(Reed et al 2016)
• Computation graphs interacting with memory
• Loop for reasoning for nested questions (Miller 2016)
• Generative adversarial networks (Reed 2016). Models capable of
imagining images, videos and text.
Investing into AI and Data
• Data acquisition, real-time detection and visualization not solved yet.
• Empower more people to do data science. Automate routines.
• Unsolved problems are learning from unlabeled data, planning,
reasoning, problem solving, concept formulation, 1/10k compute.
• Key decisions: Timing, accuracy in what is hard, find verticals and
focus, identify differentiation & size of the prize & people & partners
Business of outliers: 1% capital returns 526x, 48% returns 0
0
450
900
Q2'15 Q3'15 Q4'15 Q1'16 Q2'16
Peak Data
Peak ML
VC assessment:
Bonus Keywords
• Lifelong Machine Learning: systems approach, transfer learning,
never-ending learners. Useful for knowledge build.
• Graphons: graph convergence and limits through infinite number of
vertices. Useful for privacy preserving mining.
• Computational Social Science: how individuals interact to produce
collective behaviour. Individuals exert more effort by themselves than
groups.
• Information Security: trusted key management is most sensitive.
Secret must be changed frequently. Confidentiality easier to violate
than authenticity. Integrity. Offence more lucrative than defence.
• Enterprise Data: in reality “random data salad” prone to constant
change due to M&A, politics, dynamic schema DBs (e.g. Mongo), legacy
burden, restructuring, leadership changes, data hoarding. Machine
driven, human guided processes required.
Thinking about Value from Data Science

Contenu connexe

Tendances

Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Turi, Inc.
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine LearningPaul Prae
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create PyData
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIData Products Meetup
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AISri Ambati
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Andrew Badera
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with DatoTuri, Inc.
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientistPoo Kuan Hoong
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...Spark Summit
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesTuri, Inc.
 
Introduction to Azure machine learning
Introduction to Azure machine learningIntroduction to Azure machine learning
Introduction to Azure machine learningJasjit Chopra
 
Teaching the cloud to think
Teaching the cloud to thinkTeaching the cloud to think
Teaching the cloud to thinkJosh Gillespie
 

Tendances (20)

Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
Dealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AIDealing with uncertainty in fintech using AI
Dealing with uncertainty in fintech using AI
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AI
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
From Machine Learning to Learning Machines: Creating an End-to-End Cognitive ...
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Introduction to Azure machine learning
Introduction to Azure machine learningIntroduction to Azure machine learning
Introduction to Azure machine learning
 
Teaching the cloud to think
Teaching the cloud to thinkTeaching the cloud to think
Teaching the cloud to think
 

Similaire à Knowledge Discovery

Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017Prashant Bhatmule
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsGanesan Narayanasamy
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxdataKarthik
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Data Science at UCSB Information Meeting
Data Science at UCSB Information MeetingData Science at UCSB Information Meeting
Data Science at UCSB Information MeetingJason Freeberg
 

Similaire à Knowledge Discovery (20)

Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Deep learning
Deep learningDeep learning
Deep learning
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Managing AI Products
Managing AI ProductsManaging AI Products
Managing AI Products
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Data mining
Data miningData mining
Data mining
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Data Science at UCSB Information Meeting
Data Science at UCSB Information MeetingData Science at UCSB Information Meeting
Data Science at UCSB Information Meeting
 

Plus de André Karpištšenko

Plus de André Karpištšenko (7)

Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language Learning
 
Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery Robots
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 
Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
AI Control
AI ControlAI Control
AI Control
 
Deep learning
Deep learningDeep learning
Deep learning
 

Dernier

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Dernier (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Knowledge Discovery

  • 1. Knowledge Discovery & Data Mining 22nd ACM SIGKDD 2016 André Karpištšenko
  • 2. ~80 sessions for 2,700 participants • Business Applications and Frameworks at Scale • Data Streams Mining • DashOpt features • Outlier Detection • Bayesian Optimization • Deep Learning • Investing into AI and Data • Bonus keywords 88 countries, 35% YoY, 15-20% acceptance
  • 3. Business Application Examples • Consumer Internet focus: Content Ranking, Recommendation, User Intent and Context Prediction • Industrial Internet focus: Autonomy, Predictive Maintenance, Operational Intelligence, Production Planning • B2B focus: Targeting, Lead Generation, Sales Development, Opportunity Management, Account Management • Web content analytics: Image, Video, Text Classification for Relevance, Products Categorization, Sentiments • Other: Cyber Security, Fraud/Spam Detection, NLP, Speech Recognition, Image/Video Recognition
  • 4. Predictive Modeling Flow DashOpt Feature Engineering Raw Data Raw Features Labels Feature Integration Features with Labels Data Partitioning Training Data Validation Data Testing Data Model Training Evaluate for model selection Compute offline evaluation metrics Best model Offline scoring and indexing Online/offline systems Online A/B test Label preparation Log data Scoring features Raw features Feature integration Model Performance Test Results
  • 5. Data Technologies • Most common: HDFS, MapReduce, Spark, Hive • Decision spectrum: build, assemble, buy • Factors: network, open-source, maturity, needs Exploratory Production
  • 6. Classic ML Platform at Scale Workflows HDFS Feature Mart Ground Truth Models Scores MapReduce / Yarn Workflow Scheduler & Manager Workflows Workflows Intelligence Engine Metadata Store Pig, Hive, Python, Scala, Shell script, … Feature Engineering Libraries Machine Learning Libraries Drivers Drivers Application 1 Application 2 Application N …
  • 8. https://sites.google.com/site/iotminingtutorial/ IoT Data Streams Mining • 10 billion devices in 2015 -> 34 billion devices by 2020 • Continuous data, dynamic models, distributed, few seconds
  • 9. Streams Mining: Actors Model Data processing pipeline Distributed processing Kappa Architecture https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
  • 10. Outlier Detection • Single point anomaly detection: likelihood over distribution • Finding anomalous groups: divergence estimation • Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme Studentized Deviate) test, Seasonal Hybrid ESD, etc. • Goal: move from detection to automated response
  • 11. Outlier Detection in Practice • Too many detections of too little value • Use methods for thresholds • Breakout detection and Concept Drift • For changing distributions move baselines over time • Risk of overfitting to known anomalies, not finding unknown anomalies
  • 12. Bayesian aka Active Optimization • Examples: Design of Experiments, hyper-parameters of supervised learning, algorithms tested with simulations f is an unknown expensive black-box function with the goal to approximately optimize f with as few experiments as possible • No free lunch theorem • Other bio-inspired algorithms for optimization exploitation and exploration: neural networks, genetic algorithms, swarm intelligence, ant colony optimisation, etc.
  • 13. Bayesian Optimization in Practice • SigOpt experience: 20 dimensions, above human capacity. • Uber ATC experience: scaling active optimization to high dimensions as the default works reliably for 5-7 dim. • Variables are added during optimization. • Choose fidelity using heuristics.
  • 15. Deep Learning • Compute power, GPU, learning architectures and a lot of labeled data are what drive DL • Applied for Vision and Speech: matches human performance • Not possible where experiments are costly: biotech • Kaggle winners are not DL models: tree ensembles, SVMs • Common technologies: TensorFlow, Caffe, Theano, Keras • Thousands of pieces of software: modules and layers • Explainability and interpretability are the next big things • EU regulation. Tradeoff: accuracy vs explainability.
  • 16. Deep Learning Trends • Vision nets are deeper and structured (Larsson 2016) • Language nets have also dynamics, memory and attention (Rocktaschel 2016, Miller 2016) • Probabilistic programming (Lake, Tenenbaum) • Programs as networks (Riedel) • The Neural Programmer and Interpreter for learning programs (Reed et al 2016) • Computation graphs interacting with memory • Loop for reasoning for nested questions (Miller 2016) • Generative adversarial networks (Reed 2016). Models capable of imagining images, videos and text.
  • 17. Investing into AI and Data • Data acquisition, real-time detection and visualization not solved yet. • Empower more people to do data science. Automate routines. • Unsolved problems are learning from unlabeled data, planning, reasoning, problem solving, concept formulation, 1/10k compute. • Key decisions: Timing, accuracy in what is hard, find verticals and focus, identify differentiation & size of the prize & people & partners Business of outliers: 1% capital returns 526x, 48% returns 0 0 450 900 Q2'15 Q3'15 Q4'15 Q1'16 Q2'16 Peak Data Peak ML VC assessment:
  • 18. Bonus Keywords • Lifelong Machine Learning: systems approach, transfer learning, never-ending learners. Useful for knowledge build. • Graphons: graph convergence and limits through infinite number of vertices. Useful for privacy preserving mining. • Computational Social Science: how individuals interact to produce collective behaviour. Individuals exert more effort by themselves than groups. • Information Security: trusted key management is most sensitive. Secret must be changed frequently. Confidentiality easier to violate than authenticity. Integrity. Offence more lucrative than defence. • Enterprise Data: in reality “random data salad” prone to constant change due to M&A, politics, dynamic schema DBs (e.g. Mongo), legacy burden, restructuring, leadership changes, data hoarding. Machine driven, human guided processes required.
  • 19. Thinking about Value from Data Science