SlideShare une entreprise Scribd logo
1  sur  25
© 2016 ZestFinance, Inc.
The Right Tool for the Job: Guidelines for
Algorithm Selection in Predictive Modeling
Derek Wilcox
ZestFinance
2 © 2016 ZestFinance, Inc.
About ZestFinance
• Founded by Douglas Merrill, the former CIO of Google
• Nearly $65M in funding from Lightspeed, Matrix and others
• Additional $150M funding from Fortress for Basix
• The team is mostly data geeks, math whizzes, and financial
analysts from prestigious universities and top companies
• Based in Los Angeles
3 © 2016 ZestFinance, Inc.
Our mission
Make fair and transparent credit available to everyone
4 © 2016 ZestFinance, Inc.
Zest is built to achieve that mission
We built a technology platform that is transforming how credit decisions
are made.
We are using that platform to:
• Partner with high-volume lenders worldwide to extend credit to their
customers
• Provide convenient, online loans that help millions of middle-class
Americans move from near-prime to prime
5 © 2016 ZestFinance, Inc.
Technology platform
ZestFinance has developed an underwriting platform that:
• Ingests data from tens of thousands of disparate sources
• Cleans, scrubs, and normalizes the data
• Runs the data through ensembled Machine Learning
algorithms, enhanced with a touch of Machine Learning artistry
• To deliver scores/ratings that best predict
– Probability of fraud
– Likelihood of default
– Overall creditworthiness
All in under 5 seconds
6 © 2016 ZestFinance, Inc.
ZestFinance: more data is better
The world is flooded with information that’s currently being
overlooked.
Why use only a bit of data when there is an infinite amount available?
We are always striving to use even more data and really advanced
math to change the world.
7 © 2016 ZestFinance, Inc.
Turning shopping data into credit data
• Now, let’s talk about China
• Only 240 million of the more than 1 billion Chinese citizens have a
credit history
• On the other hand, China has more data than most any other place
in the world
– eCommerce at $275B and growing over 30%
– 33% of eCommerce via mobile phones
• This data has tremendous potential to create the most accurate
credit history and decisioning system in the world
8 © 2016 ZestFinance, Inc.
JD.com
• We’ve partnered with JD.com -- the largest e-tailer in China
• We’re working together to turn shopping data into credit data,
creating credit histories from scratch
• Our approach also identifies fraud
9 © 2016 ZestFinance, Inc.
Applying Deep Learning to structured
data
• Among 29 challenge winning solutions on Kaggle’s blog in 2015, 17
solutions used XGBoost and 11 used Deep Neural Networks
• Problems with more inherent structure like image, audio, and nlp
seem to favor Deep Neural Nets
• When problems don’t have this sort of structure we can use
XGBoost
10 © 2016 ZestFinance, Inc.
Neural Network
http://cs231n.github.io/neural-networks-1/
11 © 2016 ZestFinance, Inc.
Neural Network
http://arxiv.org/pdf/1509.07627.pdf
12 © 2016 ZestFinance, Inc.
Deep Learning - ImageNet
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-
convnet-on-imagenet/
13 © 2016 ZestFinance, Inc.
What kind of structure does my data
have?
• Is there some sort of invariance or equivariance?
• Can we effectively learn representations?
• Examples: Image, Speech, Sequences
https://arxiv.org/pdf/1602.02660v2.pdf
14 © 2016 ZestFinance, Inc.
Where does my data actually live?
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
15 © 2016 ZestFinance, Inc.
Text Embedding - word2vec
http://www.offconvex.org/2015/12/12/word-embeddings-
1/http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
16 © 2016 ZestFinance, Inc.
Learning representations
http://colah.github.io/posts/2014-03-NN-
Manifolds-Topology/
17 © 2016 ZestFinance, Inc.
Deep Learning
• Composition of many different functions
• Combining lower level features to create more complicated ones
http://www.iro.umontreal.ca/~bengioy/talks/DL-Tutorial-NIPS2015.pdf
18 © 2016 ZestFinance, Inc.
Deep Learning - Convolutional Nets
http://www.slideshare.net/matsukenbook/deep-learning-chap6-
convolutional-neural-net
19 © 2016 ZestFinance, Inc.
Deep Learning - Recurrent Nets
• Sequences like text
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
20 © 2016 ZestFinance, Inc.
Long Short-Term Memory (LSTMs)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
21 © 2016 ZestFinance, Inc.
Convolutional and Recurrent Networks
• Speech systems
https://arxiv.org/pdf/1512.02595v1.pdf
22 © 2016 ZestFinance, Inc.
Deeper Networks
http://icml.
cc/2016/tutorials/icml2016_tutorial_deep_residu
al_networks_kaiminghe.pdf
23 © 2016 ZestFinance, Inc.
How do we make networks deeper/longer?
• Exploding/Vanishing Gradient Problem
http://deepdish.io/2015/02/24/network-initialization/
24 © 2016 ZestFinance, Inc.
How do we make networks deeper/longer?
http://arxiv.org/pdf/1512.03385v1.pdf
25 © 2016 ZestFinance, Inc.
Thanks
• Christopher Olah and Andrej Karpathy for their amazing blogs that inspired
and provided some of the visuals in this presentation

Contenu connexe

Tendances

DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DigitYser
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
mark madsen
 

Tendances (20)

Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
Big Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and SocietyBig Data: The Force That’s Good for Consumers and Society
Big Data: The Force That’s Good for Consumers and Society
 
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the CloudStrata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015
 
How Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental StressHow Big Data identifies early indicators of Mental Stress
How Big Data identifies early indicators of Mental Stress
 
Moving Big Data to Big Value
Moving Big Data to Big ValueMoving Big Data to Big Value
Moving Big Data to Big Value
 
Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...Is big data handicapped by "design"? Seven design principles for communicatin...
Is big data handicapped by "design"? Seven design principles for communicatin...
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
03 future bda
03 future bda03 future bda
03 future bda
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big Family
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 

En vedette

En vedette (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
 
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
 

Similaire à Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidelines for Algorithm Selection in Predictive Modeling, Derek Wilcox, Senior Data Scientist, ZestFinance

Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
Ron Bodkin
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
Directi Group
 

Similaire à Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidelines for Algorithm Selection in Predictive Modeling, Derek Wilcox, Senior Data Scientist, ZestFinance (20)

From information to intelligence
From information to intelligence From information to intelligence
From information to intelligence
 
Data for Impact Fellowship - SocialCops Careers
Data for Impact Fellowship - SocialCops CareersData for Impact Fellowship - SocialCops Careers
Data for Impact Fellowship - SocialCops Careers
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Digital Analytics: Nonprofit Necessity
Digital Analytics: Nonprofit NecessityDigital Analytics: Nonprofit Necessity
Digital Analytics: Nonprofit Necessity
 
Making Better Decisions Faster
Making Better Decisions FasterMaking Better Decisions Faster
Making Better Decisions Faster
 
Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantage
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Why Alt Data Is So Important
Why Alt Data Is So ImportantWhy Alt Data Is So Important
Why Alt Data Is So Important
 
Nick Brown - Camp Digital 2016
Nick Brown - Camp Digital 2016Nick Brown - Camp Digital 2016
Nick Brown - Camp Digital 2016
 
Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?
 
Network of networks webinar v3 ac
Network of networks webinar v3 acNetwork of networks webinar v3 ac
Network of networks webinar v3 ac
 
Network of Networks - Slide Deck
Network of Networks - Slide DeckNetwork of Networks - Slide Deck
Network of Networks - Slide Deck
 
Cloudsourcing2013
Cloudsourcing2013Cloudsourcing2013
Cloudsourcing2013
 
Nintex Promapp at ACC: Our Journey So Far
Nintex Promapp at ACC: Our Journey So FarNintex Promapp at ACC: Our Journey So Far
Nintex Promapp at ACC: Our Journey So Far
 
Cognic techanical profile
Cognic techanical profileCognic techanical profile
Cognic techanical profile
 
Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...Search++: Cognitive transformation of human-system interaction: Presented by ...
Search++: Cognitive transformation of human-system interaction: Presented by ...
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 

Plus de Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidelines for Algorithm Selection in Predictive Modeling, Derek Wilcox, Senior Data Scientist, ZestFinance

  • 1. © 2016 ZestFinance, Inc. The Right Tool for the Job: Guidelines for Algorithm Selection in Predictive Modeling Derek Wilcox ZestFinance
  • 2. 2 © 2016 ZestFinance, Inc. About ZestFinance • Founded by Douglas Merrill, the former CIO of Google • Nearly $65M in funding from Lightspeed, Matrix and others • Additional $150M funding from Fortress for Basix • The team is mostly data geeks, math whizzes, and financial analysts from prestigious universities and top companies • Based in Los Angeles
  • 3. 3 © 2016 ZestFinance, Inc. Our mission Make fair and transparent credit available to everyone
  • 4. 4 © 2016 ZestFinance, Inc. Zest is built to achieve that mission We built a technology platform that is transforming how credit decisions are made. We are using that platform to: • Partner with high-volume lenders worldwide to extend credit to their customers • Provide convenient, online loans that help millions of middle-class Americans move from near-prime to prime
  • 5. 5 © 2016 ZestFinance, Inc. Technology platform ZestFinance has developed an underwriting platform that: • Ingests data from tens of thousands of disparate sources • Cleans, scrubs, and normalizes the data • Runs the data through ensembled Machine Learning algorithms, enhanced with a touch of Machine Learning artistry • To deliver scores/ratings that best predict – Probability of fraud – Likelihood of default – Overall creditworthiness All in under 5 seconds
  • 6. 6 © 2016 ZestFinance, Inc. ZestFinance: more data is better The world is flooded with information that’s currently being overlooked. Why use only a bit of data when there is an infinite amount available? We are always striving to use even more data and really advanced math to change the world.
  • 7. 7 © 2016 ZestFinance, Inc. Turning shopping data into credit data • Now, let’s talk about China • Only 240 million of the more than 1 billion Chinese citizens have a credit history • On the other hand, China has more data than most any other place in the world – eCommerce at $275B and growing over 30% – 33% of eCommerce via mobile phones • This data has tremendous potential to create the most accurate credit history and decisioning system in the world
  • 8. 8 © 2016 ZestFinance, Inc. JD.com • We’ve partnered with JD.com -- the largest e-tailer in China • We’re working together to turn shopping data into credit data, creating credit histories from scratch • Our approach also identifies fraud
  • 9. 9 © 2016 ZestFinance, Inc. Applying Deep Learning to structured data • Among 29 challenge winning solutions on Kaggle’s blog in 2015, 17 solutions used XGBoost and 11 used Deep Neural Networks • Problems with more inherent structure like image, audio, and nlp seem to favor Deep Neural Nets • When problems don’t have this sort of structure we can use XGBoost
  • 10. 10 © 2016 ZestFinance, Inc. Neural Network http://cs231n.github.io/neural-networks-1/
  • 11. 11 © 2016 ZestFinance, Inc. Neural Network http://arxiv.org/pdf/1509.07627.pdf
  • 12. 12 © 2016 ZestFinance, Inc. Deep Learning - ImageNet http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a- convnet-on-imagenet/
  • 13. 13 © 2016 ZestFinance, Inc. What kind of structure does my data have? • Is there some sort of invariance or equivariance? • Can we effectively learn representations? • Examples: Image, Speech, Sequences https://arxiv.org/pdf/1602.02660v2.pdf
  • 14. 14 © 2016 ZestFinance, Inc. Where does my data actually live? http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
  • 15. 15 © 2016 ZestFinance, Inc. Text Embedding - word2vec http://www.offconvex.org/2015/12/12/word-embeddings- 1/http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
  • 16. 16 © 2016 ZestFinance, Inc. Learning representations http://colah.github.io/posts/2014-03-NN- Manifolds-Topology/
  • 17. 17 © 2016 ZestFinance, Inc. Deep Learning • Composition of many different functions • Combining lower level features to create more complicated ones http://www.iro.umontreal.ca/~bengioy/talks/DL-Tutorial-NIPS2015.pdf
  • 18. 18 © 2016 ZestFinance, Inc. Deep Learning - Convolutional Nets http://www.slideshare.net/matsukenbook/deep-learning-chap6- convolutional-neural-net
  • 19. 19 © 2016 ZestFinance, Inc. Deep Learning - Recurrent Nets • Sequences like text http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 20. 20 © 2016 ZestFinance, Inc. Long Short-Term Memory (LSTMs) http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 21. 21 © 2016 ZestFinance, Inc. Convolutional and Recurrent Networks • Speech systems https://arxiv.org/pdf/1512.02595v1.pdf
  • 22. 22 © 2016 ZestFinance, Inc. Deeper Networks http://icml. cc/2016/tutorials/icml2016_tutorial_deep_residu al_networks_kaiminghe.pdf
  • 23. 23 © 2016 ZestFinance, Inc. How do we make networks deeper/longer? • Exploding/Vanishing Gradient Problem http://deepdish.io/2015/02/24/network-initialization/
  • 24. 24 © 2016 ZestFinance, Inc. How do we make networks deeper/longer? http://arxiv.org/pdf/1512.03385v1.pdf
  • 25. 25 © 2016 ZestFinance, Inc. Thanks • Christopher Olah and Andrej Karpathy for their amazing blogs that inspired and provided some of the visuals in this presentation