SlideShare a Scribd company logo
1 of 21
1
Roman Orac, 1Tap Machine Learning & Data Analysis
A Gentle introduction to Machine Learning
1Tap is a Automated Accounting Platform
For the Self Employed*
* Sole Trader, Sole Proprietor, Freelancer, Contractor, Independent, Non Incorporated
Businesses
Fully
The Self Employed can’t buy the stuff they want
Profit…Welfare…
Taxes…
No idea
That is a problem for
the new year...
Denied...
Hopefully I get better
real soon...
Credit…
6
Making Self Employment
> Employment
Our Mission
1Tap Receipts
Take a photo Data Extracted Tax Return updated Customers Love it
1 2 3 4
The foundation of our apps
Ruby on Rails
Restful JSON API
4.0 Code Climate GPA
Enough about us …
What is Machine Learning Anyway?
What is Machine Learning?
Training data
Machine Learning
algorithm
ClassifierNew samples Prediction
Pre-processing
● Machine Learning is the science of getting computers to act without
being explicitly programmed
Predict survival on the Titanic
In 1912 the Titanic sank, killing 1,502 out
of 2,224 passengers and crew.
Some groups of people were more
likely to survive than others.
Let’s look at the data
Abbreviations
● Embarked: Port of embarkation
○ C = Cherbourg
○ Q = Queenstown
○ S = Southampton
● Parch: Number of parents/children
aboard
● Pclass: Passenger's class
● SibSp: Number of siblings/spouses
aboard
● Survived: Survived (1) or died (0)
● Ticket: Ticket number
Understanding the data
● Distributions of the fare of passengers who survived or did
not survive
● Many passengers with cheaper fares died
● Is fare a good predictive variable?
Most Important Step: Data preprocessing
Original data Preprocessed data
preprocessing
● Clean the data
● Encode attributes
● Fill in missing values
● Add new attributes
Decision Tree
● Use training set and build a decision tree model
● Use the model to predict new samples
What types of problems do we solve with
ML at 1Tap?
Receipt categorization
Initial receipt categorization
based on company’s industry
deterministic categorization
many mis-categorization
The Numbers
600K categorized receipts
40K users
80K new receipts every month
Receipt categorization with ML
Categorizing receipts in a smarter and more contextual way
● Features:
○ user’s profession
○ vendor name, date, expense total and text
● Preprocessing:
○ Filter receipts
○ Recategorize most obvious receipts
● Train a classifier that categorizes receipts
● This approach improves categorization as receipt text adds more context
Receipt categorization with ML
Questions?
Come talk to
us over pizza!
Nejc, Human
Resources
Roman, Machine
Learning
Vesna, Head of
Product

More Related Content

Viewers also liked

Placement of BPM runtime components in an SOA environment
Placement of BPM runtime components in an SOA environmentPlacement of BPM runtime components in an SOA environment
Placement of BPM runtime components in an SOA environmentKim Clark
 
How to Triple Your Speed of Development Using Automation
How to Triple Your Speed of Development Using AutomationHow to Triple Your Speed of Development Using Automation
How to Triple Your Speed of Development Using AutomationAllCloud
 
Deloitte BPM case study by WorkflowGen
Deloitte BPM case study by WorkflowGenDeloitte BPM case study by WorkflowGen
Deloitte BPM case study by WorkflowGenAlain Bezançon
 
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...Frank Altenburg
 
AI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckAI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckThe Digital Insurer
 
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...Thuan Ng
 
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7Manju Devadas
 
Practical Strategies to Designing Beautiful Portals
Practical Strategies to Designing Beautiful PortalsPractical Strategies to Designing Beautiful Portals
Practical Strategies to Designing Beautiful PortalsKanwal Khipple
 
Practical Strategies for Transitioning to Office 365 #sptechcon
Practical Strategies for Transitioning to Office 365 #sptechconPractical Strategies for Transitioning to Office 365 #sptechcon
Practical Strategies for Transitioning to Office 365 #sptechconKanwal Khipple
 
Operations Playbook: Monitoring and Automation - RightScale Compute 2013
Operations Playbook: Monitoring and Automation - RightScale Compute 2013Operations Playbook: Monitoring and Automation - RightScale Compute 2013
Operations Playbook: Monitoring and Automation - RightScale Compute 2013RightScale
 
Case Study for Project Management System Using Sharepoint
Case Study for Project Management System Using SharepointCase Study for Project Management System Using Sharepoint
Case Study for Project Management System Using SharepointMike Taylor
 
Entrepreneurship with Data, Machine Learning and AI
Entrepreneurship with Data, Machine Learning and AIEntrepreneurship with Data, Machine Learning and AI
Entrepreneurship with Data, Machine Learning and AIJesus Ramos
 
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...Morgan Simonsen
 
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...Automation Technology Series: Part 2: Intelligent automation: Driving efficie...
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...Accenture Insurance
 
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentAnsible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentJoel W. King
 
SEM Performance with Machine Learning
SEM Performance with Machine LearningSEM Performance with Machine Learning
SEM Performance with Machine LearningAcquisio
 
Closing with Coffee: Energizing and Engaging Target Accounts
Closing with Coffee: Energizing and Engaging Target AccountsClosing with Coffee: Energizing and Engaging Target Accounts
Closing with Coffee: Energizing and Engaging Target AccountsTerminus
 
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...Sophia Zhou
 

Viewers also liked (18)

Placement of BPM runtime components in an SOA environment
Placement of BPM runtime components in an SOA environmentPlacement of BPM runtime components in an SOA environment
Placement of BPM runtime components in an SOA environment
 
How to Triple Your Speed of Development Using Automation
How to Triple Your Speed of Development Using AutomationHow to Triple Your Speed of Development Using Automation
How to Triple Your Speed of Development Using Automation
 
Deloitte BPM case study by WorkflowGen
Deloitte BPM case study by WorkflowGenDeloitte BPM case study by WorkflowGen
Deloitte BPM case study by WorkflowGen
 
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...
IBM Connections 4.5 Integration - From Zero To Social Hero - 2.0 - with Domin...
 
AI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar DeckAI & Machine Learning - Webinar Deck
AI & Machine Learning - Webinar Deck
 
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
ExpertsLive Asia Pacific 2017 - Planning and Deploying SharePoint Server 2016...
 
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
Machine Learning Application to Manufacturing using Tableau and Google by Pluto7
 
Practical Strategies to Designing Beautiful Portals
Practical Strategies to Designing Beautiful PortalsPractical Strategies to Designing Beautiful Portals
Practical Strategies to Designing Beautiful Portals
 
Practical Strategies for Transitioning to Office 365 #sptechcon
Practical Strategies for Transitioning to Office 365 #sptechconPractical Strategies for Transitioning to Office 365 #sptechcon
Practical Strategies for Transitioning to Office 365 #sptechcon
 
Operations Playbook: Monitoring and Automation - RightScale Compute 2013
Operations Playbook: Monitoring and Automation - RightScale Compute 2013Operations Playbook: Monitoring and Automation - RightScale Compute 2013
Operations Playbook: Monitoring and Automation - RightScale Compute 2013
 
Case Study for Project Management System Using Sharepoint
Case Study for Project Management System Using SharepointCase Study for Project Management System Using Sharepoint
Case Study for Project Management System Using Sharepoint
 
Entrepreneurship with Data, Machine Learning and AI
Entrepreneurship with Data, Machine Learning and AIEntrepreneurship with Data, Machine Learning and AI
Entrepreneurship with Data, Machine Learning and AI
 
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...
NIC 2017 Azure AD Identity Protection and Conditional Access: Using the Micro...
 
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...Automation Technology Series: Part 2: Intelligent automation: Driving efficie...
Automation Technology Series: Part 2: Intelligent automation: Driving efficie...
 
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentAnsible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
 
SEM Performance with Machine Learning
SEM Performance with Machine LearningSEM Performance with Machine Learning
SEM Performance with Machine Learning
 
Closing with Coffee: Energizing and Engaging Target Accounts
Closing with Coffee: Energizing and Engaging Target AccountsClosing with Coffee: Energizing and Engaging Target Accounts
Closing with Coffee: Energizing and Engaging Target Accounts
 
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...
Microsoft PPM tool (Project Online / Project Server) Case Study by epmsolutio...
 

Similar to Gentle introduction to Machine Learning

SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor..."The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...Quantopian
 
First steps in Data Mining Kindergarten
First steps in Data Mining KindergartenFirst steps in Data Mining Kindergarten
First steps in Data Mining KindergartenAlexey Zinoviev
 
Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know Andrew Clark
 
Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)FSR Communications and Media
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Odoo
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERPAudaxis
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis PapaemmanouilPanagiotis Papaemmanouil
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Manjunath Sindagi
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer Jim Czuprynski
 
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j Neo4j
 
Final project presentation group2
Final project presentation   group2Final project presentation   group2
Final project presentation group2VikalpUpadhyay1
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
Data analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedbackData analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedbackAnkit Jain
 
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...Databricks
 
How to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsHow to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsVMware Tanzu
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data ScientistRohit Dubey
 

Similar to Gentle introduction to Machine Learning (20)

SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor..."The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
 
First steps in Data Mining Kindergarten
First steps in Data Mining KindergartenFirst steps in Data Mining Kindergarten
First steps in Data Mining Kindergarten
 
Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know Machine Learning: What Assurance Professionals Need to Know
Machine Learning: What Assurance Professionals Need to Know
 
Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERP
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer
 
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
Neo4j GraphTalk Copenhagen - Next Generation Solutions using Neo4j
 
Final project presentation group2
Final project presentation   group2Final project presentation   group2
Final project presentation group2
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Data analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedbackData analytics in fraud detection and customer feedback
Data analytics in fraud detection and customer feedback
 
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
 
How to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving CarsHow to Make Cars Smarter: A Step Towards Self-Driving Cars
How to Make Cars Smarter: A Step Towards Self-Driving Cars
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Gentle introduction to Machine Learning

  • 1. 1 Roman Orac, 1Tap Machine Learning & Data Analysis A Gentle introduction to Machine Learning
  • 2. 1Tap is a Automated Accounting Platform For the Self Employed* * Sole Trader, Sole Proprietor, Freelancer, Contractor, Independent, Non Incorporated Businesses Fully
  • 3. The Self Employed can’t buy the stuff they want Profit…Welfare… Taxes… No idea That is a problem for the new year... Denied... Hopefully I get better real soon... Credit… 6
  • 4. Making Self Employment > Employment Our Mission
  • 5. 1Tap Receipts Take a photo Data Extracted Tax Return updated Customers Love it 1 2 3 4
  • 6. The foundation of our apps Ruby on Rails Restful JSON API 4.0 Code Climate GPA
  • 7. Enough about us … What is Machine Learning Anyway?
  • 8. What is Machine Learning? Training data Machine Learning algorithm ClassifierNew samples Prediction Pre-processing ● Machine Learning is the science of getting computers to act without being explicitly programmed
  • 9. Predict survival on the Titanic In 1912 the Titanic sank, killing 1,502 out of 2,224 passengers and crew. Some groups of people were more likely to survive than others.
  • 10. Let’s look at the data Abbreviations ● Embarked: Port of embarkation ○ C = Cherbourg ○ Q = Queenstown ○ S = Southampton ● Parch: Number of parents/children aboard ● Pclass: Passenger's class ● SibSp: Number of siblings/spouses aboard ● Survived: Survived (1) or died (0) ● Ticket: Ticket number
  • 11. Understanding the data ● Distributions of the fare of passengers who survived or did not survive ● Many passengers with cheaper fares died ● Is fare a good predictive variable?
  • 12. Most Important Step: Data preprocessing Original data Preprocessed data preprocessing ● Clean the data ● Encode attributes ● Fill in missing values ● Add new attributes
  • 13. Decision Tree ● Use training set and build a decision tree model ● Use the model to predict new samples
  • 14. What types of problems do we solve with ML at 1Tap?
  • 15. Receipt categorization Initial receipt categorization based on company’s industry deterministic categorization many mis-categorization The Numbers 600K categorized receipts 40K users 80K new receipts every month
  • 16. Receipt categorization with ML Categorizing receipts in a smarter and more contextual way
  • 17. ● Features: ○ user’s profession ○ vendor name, date, expense total and text ● Preprocessing: ○ Filter receipts ○ Recategorize most obvious receipts ● Train a classifier that categorizes receipts ● This approach improves categorization as receipt text adds more context Receipt categorization with ML
  • 19.
  • 20.
  • 21. Come talk to us over pizza! Nejc, Human Resources Roman, Machine Learning Vesna, Head of Product