This document summarizes the 22nd ACM SIGKDD conference on knowledge discovery and data mining. It discusses the following topics in 3 sentences or less each:
- Overview of the conference with ~80 sessions and 2,700 participants
- Popular business applications of data mining like recommendation systems, predictive maintenance, and customer targeting
- The typical predictive modeling flow including data preparation, model training, evaluation, and deployment
2. ~80 sessions for 2,700 participants
• Business Applications and Frameworks at Scale
• Data Streams Mining
• DashOpt features
• Outlier Detection
• Bayesian Optimization
• Deep Learning
• Investing into AI and Data
• Bonus keywords
88 countries, 35% YoY, 15-20% acceptance
3. Business Application Examples
• Consumer Internet focus: Content Ranking,
Recommendation, User Intent and Context Prediction
• Industrial Internet focus: Autonomy, Predictive
Maintenance, Operational Intelligence, Production Planning
• B2B focus: Targeting, Lead Generation, Sales Development,
Opportunity Management, Account Management
• Web content analytics: Image, Video, Text Classification for
Relevance, Products Categorization, Sentiments
• Other: Cyber Security, Fraud/Spam Detection, NLP, Speech
Recognition, Image/Video Recognition
9. Streams Mining: Actors Model
Data processing pipeline Distributed processing
Kappa Architecture
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
10. Outlier Detection
• Single point anomaly detection: likelihood over distribution
• Finding anomalous groups: divergence estimation
• Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme
Studentized Deviate) test, Seasonal Hybrid ESD, etc.
• Goal: move from detection to automated response
11. Outlier Detection in Practice
• Too many detections of too little value
• Use methods for thresholds
• Breakout detection and Concept Drift
• For changing distributions move baselines over time
• Risk of overfitting to known anomalies, not finding unknown anomalies
12. Bayesian aka Active Optimization
• Examples: Design of Experiments, hyper-parameters of
supervised learning, algorithms tested with simulations
f is an unknown expensive black-box function with the goal to
approximately optimize f with as few experiments as possible
• No free lunch theorem
• Other bio-inspired
algorithms for
optimization exploitation
and exploration: neural
networks, genetic
algorithms, swarm
intelligence, ant colony
optimisation, etc.
13. Bayesian Optimization in Practice
• SigOpt experience: 20 dimensions, above human capacity.
• Uber ATC experience: scaling active optimization to high
dimensions as the default works reliably for 5-7 dim.
• Variables are added during optimization.
• Choose fidelity using heuristics.
15. Deep Learning
• Compute power, GPU, learning architectures and a lot of
labeled data are what drive DL
• Applied for Vision and Speech: matches human performance
• Not possible where experiments are costly: biotech
• Kaggle winners are not DL models: tree ensembles, SVMs
• Common technologies: TensorFlow, Caffe, Theano, Keras
• Thousands of pieces of software: modules and layers
• Explainability and interpretability are the next big things
• EU regulation. Tradeoff: accuracy vs explainability.
16. Deep Learning Trends
• Vision nets are deeper and structured (Larsson 2016)
• Language nets have also dynamics, memory and attention
(Rocktaschel 2016, Miller 2016)
• Probabilistic programming (Lake, Tenenbaum)
• Programs as networks (Riedel)
• The Neural Programmer and Interpreter for learning programs
(Reed et al 2016)
• Computation graphs interacting with memory
• Loop for reasoning for nested questions (Miller 2016)
• Generative adversarial networks (Reed 2016). Models capable of
imagining images, videos and text.
17. Investing into AI and Data
• Data acquisition, real-time detection and visualization not solved yet.
• Empower more people to do data science. Automate routines.
• Unsolved problems are learning from unlabeled data, planning,
reasoning, problem solving, concept formulation, 1/10k compute.
• Key decisions: Timing, accuracy in what is hard, find verticals and
focus, identify differentiation & size of the prize & people & partners
Business of outliers: 1% capital returns 526x, 48% returns 0
0
450
900
Q2'15 Q3'15 Q4'15 Q1'16 Q2'16
Peak Data
Peak ML
VC assessment:
18. Bonus Keywords
• Lifelong Machine Learning: systems approach, transfer learning,
never-ending learners. Useful for knowledge build.
• Graphons: graph convergence and limits through infinite number of
vertices. Useful for privacy preserving mining.
• Computational Social Science: how individuals interact to produce
collective behaviour. Individuals exert more effort by themselves than
groups.
• Information Security: trusted key management is most sensitive.
Secret must be changed frequently. Confidentiality easier to violate
than authenticity. Integrity. Offence more lucrative than defence.
• Enterprise Data: in reality “random data salad” prone to constant
change due to M&A, politics, dynamic schema DBs (e.g. Mongo), legacy
burden, restructuring, leadership changes, data hoarding. Machine
driven, human guided processes required.