SlideShare une entreprise Scribd logo
1  sur  18
www.globalbigdataconference.com
Twitter : @bigdataconf
Architecting a predictive,
petabyte-scale, self-learning
fraud detection system
3
4
WHAT WE’RE UP AGAINST
4
4
50+Schemes
(and counting)
99.9999%‘Good’ messages
6+Months
per case
Needle in a haystack
Hybrid analytics
No training data
Semi-supervised learning
Adversarial learning
Online feedback
5
WHY HYBRID ANALYTICS?
5
5
Ignore
more rules
Unusual
timing of
events
Unusual
personal
network
Teamwork
& scale
Think & talk
differently
6
(BITS OF) THE TOOLBOX
6
6
Rule
Inference
Time
Series
AnalysisLink
Analysis
Ensemble
Learning
Natural
Language
7
THE CODE, PLEASE
7 7
Freely available Jupyter notebooks
Open source libraries & open data
Github.com/atigeo/hunting_criminals_demo
8
9
STREAM PROCESSING
9
9
Kafka
Email Stream
Account transactions
Stream
Email NLP
Features
People graph
Transactions time series
1 0
SAMPLE EMAIL PATTERNS
1 1
SAMPLE NATURAL LANGUAGE ANNOTATORS
Understand vocabulary
– Jargon
– Code words
– Multi-lingual
Understand grammar
– Who are we talking about?
– Past, present or future?
– Compound sentences
Understand context
– Email: Re:, Fwd:, attachments
– SMS & IM have their own grammar
1 2
SAMPLE GRAPH FEATURES
Standard algorithms like KMeans don’t work on “haystacks”
1 3
SAMPLE GRAPH FEATURES
Bregman Bubble Clustering
1 4
USER ANALYSIS ITERATION
Email NLP
Features
User graph
Transactions
time series
Graph Features
Time Series
Features
NLP Features
Agent Feedback
Train/TestClassifier
1 5
1 6
1 7
•Needle in a very large haystack
– Actually needs a petabyte-scale platform
•Multi-modal: no single trick works
– Hybrid analytics
•No labeled data
– Semi-supervised learning
– Cold start problem
•Sparse & high-dimensional
– Graph based features & change over time
•Adversarial
– Feedback & online learning
SUMMARY: CHALLENGES OF LEARNING CRIMINALS
1 8
@davidtalby

Contenu connexe

Tendances

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
Open Analytics
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
Open Analytics
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 

Tendances (20)

Get Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on TrainingGet Started with Driverless AI Recipes - Hands-on Training
Get Started with Driverless AI Recipes - Hands-on Training
 
Big data and AI in Socialbakers
Big data and AI in SocialbakersBig data and AI in Socialbakers
Big data and AI in Socialbakers
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis LibraryPlume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
 
Simplify Governance of Streaming Data
Simplify Governance of Streaming Data Simplify Governance of Streaming Data
Simplify Governance of Streaming Data
 
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraphFROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
 
ルールベースによるTwitter タイムライン感情分析
ルールベースによるTwitter タイムライン感情分析ルールベースによるTwitter タイムライン感情分析
ルールベースによるTwitter タイムライン感情分析
 
Getting Started With Dato - August 2015
Getting Started With Dato - August 2015Getting Started With Dato - August 2015
Getting Started With Dato - August 2015
 
Big data-science-oanyc
Big data-science-oanycBig data-science-oanyc
Big data-science-oanyc
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022Rakshit (Rocky) Bhatt Resume - 2022
Rakshit (Rocky) Bhatt Resume - 2022
 
Elevation Query Extension: Introducing Subselects into Lucene Queries
Elevation Query Extension: Introducing Subselects into Lucene QueriesElevation Query Extension: Introducing Subselects into Lucene Queries
Elevation Query Extension: Introducing Subselects into Lucene Queries
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Spark Summit
 
Sis fri 1030 michael holmes
Sis fri 1030 michael holmesSis fri 1030 michael holmes
Sis fri 1030 michael holmes
MediaPost
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
Vital.AI
 
PHP to Python with No Regrets
PHP to Python with No RegretsPHP to Python with No Regrets
PHP to Python with No Regrets
Alex Ezell
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Sri Ambati
 

Similaire à Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System (20)

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Sis fri 1030 michael holmes
Sis fri 1030 michael holmesSis fri 1030 michael holmes
Sis fri 1030 michael holmes
 
Active learning from streams of graph, language & time series signals
Active learning from streams of graph, language & time series signalsActive learning from streams of graph, language & time series signals
Active learning from streams of graph, language & time series signals
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Introduction To Python
Introduction To PythonIntroduction To Python
Introduction To Python
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
Hunting criminals with hybrid analytics strata hadoop v4
Hunting criminals with hybrid analytics   strata hadoop v4Hunting criminals with hybrid analytics   strata hadoop v4
Hunting criminals with hybrid analytics strata hadoop v4
 
Natural Language Processing & Semantic Models in an Imperfect World
Natural Language Processing & Semantic Modelsin an Imperfect WorldNatural Language Processing & Semantic Modelsin an Imperfect World
Natural Language Processing & Semantic Models in an Imperfect World
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
 
Hacking CEH cheat sheet
Hacking  CEH cheat sheetHacking  CEH cheat sheet
Hacking CEH cheat sheet
 
Hacking - CEH Cheat Sheet Exercises.pdf
Hacking - CEH Cheat Sheet Exercises.pdfHacking - CEH Cheat Sheet Exercises.pdf
Hacking - CEH Cheat Sheet Exercises.pdf
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
ChatGPT - 5 lessons in 5 minutes
ChatGPT - 5 lessons in 5 minutesChatGPT - 5 lessons in 5 minutes
ChatGPT - 5 lessons in 5 minutes
 
PHP to Python with No Regrets
PHP to Python with No RegretsPHP to Python with No Regrets
PHP to Python with No Regrets
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
1435488539 221998
1435488539 2219981435488539 221998
1435488539 221998
 
Understanding Human Conversations with AI
Understanding Human Conversations with AI Understanding Human Conversations with AI
Understanding Human Conversations with AI
 

Plus de David Talby

Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
David Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
David Talby
 

Plus de David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Dernier (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System

  • 2. Architecting a predictive, petabyte-scale, self-learning fraud detection system
  • 3. 3
  • 4. 4 WHAT WE’RE UP AGAINST 4 4 50+Schemes (and counting) 99.9999%‘Good’ messages 6+Months per case Needle in a haystack Hybrid analytics No training data Semi-supervised learning Adversarial learning Online feedback
  • 5. 5 WHY HYBRID ANALYTICS? 5 5 Ignore more rules Unusual timing of events Unusual personal network Teamwork & scale Think & talk differently
  • 6. 6 (BITS OF) THE TOOLBOX 6 6 Rule Inference Time Series AnalysisLink Analysis Ensemble Learning Natural Language
  • 7. 7 THE CODE, PLEASE 7 7 Freely available Jupyter notebooks Open source libraries & open data Github.com/atigeo/hunting_criminals_demo
  • 8. 8
  • 9. 9 STREAM PROCESSING 9 9 Kafka Email Stream Account transactions Stream Email NLP Features People graph Transactions time series
  • 10. 1 0 SAMPLE EMAIL PATTERNS
  • 11. 1 1 SAMPLE NATURAL LANGUAGE ANNOTATORS Understand vocabulary – Jargon – Code words – Multi-lingual Understand grammar – Who are we talking about? – Past, present or future? – Compound sentences Understand context – Email: Re:, Fwd:, attachments – SMS & IM have their own grammar
  • 12. 1 2 SAMPLE GRAPH FEATURES Standard algorithms like KMeans don’t work on “haystacks”
  • 13. 1 3 SAMPLE GRAPH FEATURES Bregman Bubble Clustering
  • 14. 1 4 USER ANALYSIS ITERATION Email NLP Features User graph Transactions time series Graph Features Time Series Features NLP Features Agent Feedback Train/TestClassifier
  • 15. 1 5
  • 16. 1 6
  • 17. 1 7 •Needle in a very large haystack – Actually needs a petabyte-scale platform •Multi-modal: no single trick works – Hybrid analytics •No labeled data – Semi-supervised learning – Cold start problem •Sparse & high-dimensional – Graph based features & change over time •Adversarial – Feedback & online learning SUMMARY: CHALLENGES OF LEARNING CRIMINALS