SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Andrew Clark, GStat, CAP, AWS Solutions Architect – Associate
Principal, Machine Learning Audit, Capital One
Machine Learning for
Auditors
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
About me
• B.S. in Business Administration with a concentration in Accounting, Summa Cum Laude, from University of Tennessee
at Chattanooga.
• M.S. in Data Science from Southern Methodist University.
• American Statistical Association Graduate Statistician (GStat), INFORMS Certified Analytics Professional (CAP) and
AWS Certified Solutions Architect – Associate.
• Has designed, built and deployed numerous machine learning and continuous auditing solutions using open source
technologies.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Overview
• What is machine learning?
• Why is it important?
• What do all of the buzzwords mean?
• What are the two broad types of machine
learning?
• Non-technical introduction to modeling
• Examples
• How does it pertain to auditors?
• Security Issues
• Case studies
• What would a machine learning audit entail?
• Where can I learn more about machine
learning?
Kong, Qingkai . "Machine Learning 1 - What is machine learning and real world example." Qingkai's Blog (web log),
October 4, 2016. Accessed February 21, 2017. http://qingkaikong.blogspot.com/2016/10/machine-learning-1-what-is-
machine.html?showComment=1484689212391#c4748865641151946089.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
What is Machine Learning?
A computer recognizing patterns without having to be explicitly programmed
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
What is Machine Learning?
Machine Learning for Auditors
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Why is Machine Learning Important?
• Disrupting business. Example ML powered businesses disrupted Blockbuster, Taxis, etc.
• Revolutionizing existing business models. Predictive maintenance in manufacturing, retailing, credit
card fraud detection, loan underwriting.
• One of the key technologies in driving economic growth.
• One of the most talked about but least understood topics in modern discourse.
• “Facebook shuts down robots after they invent their own language” (The Telegraph August 1, 2017)
• “Elon Musk: regulate AI to combat 'existential threat' before it's too late” (The Guardian July 17, 2017).
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
What Machine Learning is not
• Magic
• Not going to take your job (for the majority of professionals)
• Always the best tool for the job
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
What do all these buzzwords mean?
“Machine Learning based, artificial intelligent, Big Data spewing, Deep Learning,
Neural Network touting, Cognitive Computing, Virtual Reality Natural Language
Processing,…Chat Bot.”
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Two broad types of machine learning
• Supervised
• Unsupervised
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Supervised Learning
• Given a labeled dataset, ‘fraud not fraud’, the algorithm is ‘trained’, to recognize
which items are fraud and which items are not fraud.
• Examples:
• Transaction fraud detection
• Classifying images: dog/not dog
• Common techniques include:
• Logistic Regression
• Support Vector Machines
Machine Learning for Auditors
Machine Learning for Auditors
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Unsupervised Learning
• Given some cleaned data, the algorithm, divides the data into like groups.
• Examples:
• Pattern recognition
• Anomaly detection
• Clustering
• Popular models:
• Kmeans
• Gaussian mixture models
• DBSCAN
Machine Learning for Auditors
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
A non-technical introduction
• Process, when strung together, called a pipeline
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Kearn, Martin . "Machine Learning is for Muggles too!" Microsoft Developer (web log), March 1, 2016. Accessed February 21, 2017.
https://blogs.msdn.microsoft.com/martinkearn/2016/03/01/machine-learning-is-for-muggles-too/.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Business Understanding
• The most important step – ‘The Why’
Why is this needed and what is the desired outcome?
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Data Understanding
• An understanding of where the data is coming from is key to good modeling
• SQL relational database? NoSQL database? Csv, txt, webpage, Tweets?
• What scale is the data on? For example, Celsius or Fahrenheit?
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Data Preparation
Currently, close to 90% of what Data Scientists do
‘Munging’
Data scaling
Select variables
Divide into test and train sets
“I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s
also a little baffling.” – Josh Wills, Head of Data Engineering @ Slack
Press, Gil. "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says." Forbes. March 23, 2016. Accessed March 13, 2017.
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#21e789136f63.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Modeling
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Evaluation
• Accuracy
• Precision
• Recall
• Does the model solve the problem?
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Deployment
• Integrated into existing infrastructure or application?
• Separate web application?
• Scheduled job?
• Run adhoc?
THE ANIMAL VERSION
How Machine Learning works
Machine Learning for Auditors
Machine Learning for Auditors
Machine Learning for Auditors
Machine Learning for Auditors
Machine Learning for Auditors
Machine Learning for Auditors
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
As an auditor, what does this mean for you?
• New opportunities and risks
• Machine Learning control frameworks
• Catch-22 of businesses accepting the risk of black boxes or becoming irrelevant
• Use cases in audit analytics
• More complicated environment, new skills required to understand business
implications and audit algorithms
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Machine Learning Security issues
Mikhailov, Emil, and Roman Trusov. "How Adversarial Attacks Work." Y Combinator. November 02, 2017. Accessed January 17, 2018.
http://blog.ycombinator.com/how-adversarial-attacks-work/?imm_mid=0f81cc&cmp=em-data-na-na-newsltr_20171115.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Machine Learning Security issues cont.
Mikhailov, Emil, and Roman Trusov. "How Adversarial Attacks Work." Y Combinator. November 02, 2017. Accessed January 17, 2018.
http://blog.ycombinator.com/how-adversarial-attacks-work/?imm_mid=0f81cc&cmp=em-data-na-na-newsltr_20171115.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Fun with Kim - Machine Learning issues
Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018.
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Fun with Kim - Machine Learning issues cont.
Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018.
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Fun with Kim - Machine Learning issues cont.
Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018.
https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Use cases in Assurance and Compliance
• Anomaly detection
• Unsupervised journal entry anomaly detection
• Clustering on invoice and AP data for outliers
• Outlier user access
• ‘Auditor sense’ investigation
• Supervised model for expense report investigation
• Supervised model for journal entries
• AP transactions, customer transactions, etc.
• Document review
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
The Machine Learning Algorithm Audit
• With algorithms increasingly dictating our lives, how do we know that they are
operating as intended?
• Weapons of Math Destruction by Cathy O'Neil
• Unfilled role for assurance professionals.
• Review assumptions, and when available, look at the weighting for features in the
model.
• Decision tree, logistic regression, etc.
• Can provide a lot of value when using only SDLC audit methodologies
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Where can I learn more about Machine Learning?
• Visual Intro, highly recommended, short and sweet
http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
• Wikipedia
https://en.wikipedia.org/wiki/Machine_learning
• Good beginning article with some fantastic books
http://machinelearningmastery.com/4-steps-to-get-started-in-machine-learning/
• Weka
http://www.cs.waikato.ac.nz/ml/weka/
• Scikit-Learn
http://scikit-learn.org/
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Conclusion
• Definition of Machine Learning
• Buzzword breakdown
• Broad algorithm overview
• Machine Learning process
• Real world use cases
• The Machine Learning Audit
• Where to learn more about Machine Learning
Questions?
https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliattX4OUCj_8EP65_cTVnBmS1jnYgsGQAieQUc1VQWdgQ/photo/AF1Qi
pNlJ6WstaF6chZe1nbnCHfTpg4e_cuGmgyxI-i-?key=aVBxWjhwSzg2RjJWLWRuVFBBZEN1d205bUdEMnhB
“At the age of six, I wanted to be a
cook. At seven I wanted to be
Napoleon. And my ambition has
been growing steadily ever since” –
Salvador Dali
“Every morning upon awakening I
experience a supreme pleasure: that of
being Salvador Dalí, and I ask myself,
wonderstruck, what prodigious thing
will he do today, this Salvador Dalí.” –
Salvador Dali
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Thank you!
Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved.
Contact
• Email: andrewtaylorclark@gmail.com
• GitHub: aclarkData
• Blog: https://aclarkdata.github.io/
• LinkedIn: www.linkedin.com/in/andrew-clark-b326b767

Contenu connexe

Tendances

AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
MIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningMIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningLex Fridman
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersSudha Jamthe
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 
Harry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law OverviewHarry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law OverviewHarry Surden
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationChasity Gibson
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...Yogesh Malhotra, PhD,MSQF, CISSP,CISA,CEH
 
Quantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisionsQuantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisionsNikita Lukianets
 
Artificial Intelligence & Software Testing: Hype or Hysteria?
Artificial Intelligence & Software Testing: Hype or Hysteria?Artificial Intelligence & Software Testing: Hype or Hysteria?
Artificial Intelligence & Software Testing: Hype or Hysteria?Johan Steyn
 
Smart Data 2017 #AI & #FutureofWork
Smart Data 2017 #AI & #FutureofWorkSmart Data 2017 #AI & #FutureofWork
Smart Data 2017 #AI & #FutureofWorkSteve Ardire
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 

Tendances (20)

AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
MIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine LearningMIT Sloan: Intro to Machine Learning
MIT Sloan: Intro to Machine Learning
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 
Harry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law OverviewHarry Surden - Artificial Intelligence and Law Overview
Harry Surden - Artificial Intelligence and Law Overview
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Artificial Intelligence - Overview
Artificial Intelligence - OverviewArtificial Intelligence - Overview
Artificial Intelligence - Overview
 
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
2018 Princeton Fintech & Quant Conference: AI, Machine Learning & Deep Learni...
 
Quantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisionsQuantitative Ethics - Governance and ethics of AI decisions
Quantitative Ethics - Governance and ethics of AI decisions
 
Artificial Intelligence & Software Testing: Hype or Hysteria?
Artificial Intelligence & Software Testing: Hype or Hysteria?Artificial Intelligence & Software Testing: Hype or Hysteria?
Artificial Intelligence & Software Testing: Hype or Hysteria?
 
Smart Data 2017 #AI & #FutureofWork
Smart Data 2017 #AI & #FutureofWorkSmart Data 2017 #AI & #FutureofWork
Smart Data 2017 #AI & #FutureofWork
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 

Similaire à Machine Learning for Auditors

Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Andrew Clark
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in ISISACA Riyadh
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...Shift Conference
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence DevelopmentManojKumarR41
 
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...Amazon Web Services
 
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty ComputationISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty ComputationUlfMattsson7
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Jon Mead
 
How Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for PetsHow Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for PetsAmazon Web Services
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protectionUlf Mattsson
 
A Journey Through The Far Side Of Data Science
A Journey Through The Far Side Of Data ScienceA Journey Through The Far Side Of Data Science
A Journey Through The Far Side Of Data Sciencetlcj97
 
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...Santiago Cabrera-Naranjo
 
Isaca csx2018-continuous assurance
Isaca csx2018-continuous assuranceIsaca csx2018-continuous assurance
Isaca csx2018-continuous assuranceFrançois Samarcq
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Ai design sprint - Finance - Wealth management
Ai design sprint  - Finance - Wealth managementAi design sprint  - Finance - Wealth management
Ai design sprint - Finance - Wealth managementChinmay Patel
 
Making AIOps-Driven Network Performance Management a Reality
Making AIOps-Driven Network Performance Management a RealityMaking AIOps-Driven Network Performance Management a Reality
Making AIOps-Driven Network Performance Management a RealityEnterprise Management Associates
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Amazon Web Services Korea
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern universityQuantUniversity
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCapgemini
 

Similaire à Machine Learning for Auditors (20)

Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...Machine Learning for Auditors: What you need to know - ISACA North America CA...
Machine Learning for Auditors: What you need to know - ISACA North America CA...
 
MACHINE LEARNING – THE WHY, WHAT AND HOW
MACHINE LEARNING –  THE WHY, WHAT AND HOWMACHINE LEARNING –  THE WHY, WHAT AND HOW
MACHINE LEARNING – THE WHY, WHAT AND HOW
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in IS
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...The future of FinTech product using pervasive Machine Learning automation - A...
The future of FinTech product using pervasive Machine Learning automation - A...
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...
Harness the Power of Crowdsourcing with Amazon Mechanical Turk (AIM351) - AWS...
 
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty ComputationISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
 
How Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for PetsHow Trupanion Became an AI-driven Company for Pets
How Trupanion Became an AI-driven Company for Pets
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
A Journey Through The Far Side Of Data Science
A Journey Through The Far Side Of Data ScienceA Journey Through The Far Side Of Data Science
A Journey Through The Far Side Of Data Science
 
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
 
Isaca csx2018-continuous assurance
Isaca csx2018-continuous assuranceIsaca csx2018-continuous assurance
Isaca csx2018-continuous assurance
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Ai design sprint - Finance - Wealth management
Ai design sprint  - Finance - Wealth managementAi design sprint  - Finance - Wealth management
Ai design sprint - Finance - Wealth management
 
Making AIOps-Driven Network Performance Management a Reality
Making AIOps-Driven Network Performance Management a RealityMaking AIOps-Driven Network Performance Management a Reality
Making AIOps-Driven Network Performance Management a Reality
 
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
Democratization - New Wave of Data Science (홍운표 상무, DataRobot) :: AWS Techfor...
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern university
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pub
 

Plus de Andrew Clark

GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and GovernanceGRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and GovernanceAndrew Clark
 
Blockchain for Auditors
Blockchain for AuditorsBlockchain for Auditors
Blockchain for AuditorsAndrew Clark
 
The Machine Learning Audit
The Machine Learning AuditThe Machine Learning Audit
The Machine Learning AuditAndrew Clark
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwordsAndrew Clark
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Andrew Clark
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsAndrew Clark
 

Plus de Andrew Clark (7)

GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and GovernanceGRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
GRC 2020 - IIA - ISACA Machine Learning Monitoring, Compliance and Governance
 
Blockchain for Auditors
Blockchain for AuditorsBlockchain for Auditors
Blockchain for Auditors
 
The Machine Learning Audit
The Machine Learning AuditThe Machine Learning Audit
The Machine Learning Audit
 
AWS for Auditors
AWS for AuditorsAWS for Auditors
AWS for Auditors
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwords
 
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
Where Open Source Meets Audit Analytics - ISACA North America CACS 2017
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
 

Dernier

Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 

Dernier (17)

Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 

Machine Learning for Auditors

  • 1. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Andrew Clark, GStat, CAP, AWS Solutions Architect – Associate Principal, Machine Learning Audit, Capital One Machine Learning for Auditors
  • 2. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. About me • B.S. in Business Administration with a concentration in Accounting, Summa Cum Laude, from University of Tennessee at Chattanooga. • M.S. in Data Science from Southern Methodist University. • American Statistical Association Graduate Statistician (GStat), INFORMS Certified Analytics Professional (CAP) and AWS Certified Solutions Architect – Associate. • Has designed, built and deployed numerous machine learning and continuous auditing solutions using open source technologies.
  • 3. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Overview • What is machine learning? • Why is it important? • What do all of the buzzwords mean? • What are the two broad types of machine learning? • Non-technical introduction to modeling • Examples • How does it pertain to auditors? • Security Issues • Case studies • What would a machine learning audit entail? • Where can I learn more about machine learning? Kong, Qingkai . "Machine Learning 1 - What is machine learning and real world example." Qingkai's Blog (web log), October 4, 2016. Accessed February 21, 2017. http://qingkaikong.blogspot.com/2016/10/machine-learning-1-what-is- machine.html?showComment=1484689212391#c4748865641151946089.
  • 4. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. What is Machine Learning? A computer recognizing patterns without having to be explicitly programmed
  • 5. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. What is Machine Learning?
  • 7. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Why is Machine Learning Important? • Disrupting business. Example ML powered businesses disrupted Blockbuster, Taxis, etc. • Revolutionizing existing business models. Predictive maintenance in manufacturing, retailing, credit card fraud detection, loan underwriting. • One of the key technologies in driving economic growth. • One of the most talked about but least understood topics in modern discourse. • “Facebook shuts down robots after they invent their own language” (The Telegraph August 1, 2017) • “Elon Musk: regulate AI to combat 'existential threat' before it's too late” (The Guardian July 17, 2017).
  • 8. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. What Machine Learning is not • Magic • Not going to take your job (for the majority of professionals) • Always the best tool for the job
  • 9. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. What do all these buzzwords mean? “Machine Learning based, artificial intelligent, Big Data spewing, Deep Learning, Neural Network touting, Cognitive Computing, Virtual Reality Natural Language Processing,…Chat Bot.”
  • 10. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Two broad types of machine learning • Supervised • Unsupervised
  • 11. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Supervised Learning • Given a labeled dataset, ‘fraud not fraud’, the algorithm is ‘trained’, to recognize which items are fraud and which items are not fraud. • Examples: • Transaction fraud detection • Classifying images: dog/not dog • Common techniques include: • Logistic Regression • Support Vector Machines
  • 14. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Unsupervised Learning • Given some cleaned data, the algorithm, divides the data into like groups. • Examples: • Pattern recognition • Anomaly detection • Clustering • Popular models: • Kmeans • Gaussian mixture models • DBSCAN
  • 16. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. A non-technical introduction • Process, when strung together, called a pipeline • Business Understanding • Data Understanding • Data Preparation • Modeling • Evaluation • Deployment Kearn, Martin . "Machine Learning is for Muggles too!" Microsoft Developer (web log), March 1, 2016. Accessed February 21, 2017. https://blogs.msdn.microsoft.com/martinkearn/2016/03/01/machine-learning-is-for-muggles-too/.
  • 17. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Business Understanding • The most important step – ‘The Why’ Why is this needed and what is the desired outcome?
  • 18. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Data Understanding • An understanding of where the data is coming from is key to good modeling • SQL relational database? NoSQL database? Csv, txt, webpage, Tweets? • What scale is the data on? For example, Celsius or Fahrenheit?
  • 19. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Data Preparation Currently, close to 90% of what Data Scientists do ‘Munging’ Data scaling Select variables Divide into test and train sets “I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling.” – Josh Wills, Head of Data Engineering @ Slack Press, Gil. "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says." Forbes. March 23, 2016. Accessed March 13, 2017. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#21e789136f63.
  • 20. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Modeling
  • 21. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Evaluation • Accuracy • Precision • Recall • Does the model solve the problem?
  • 22. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Deployment • Integrated into existing infrastructure or application? • Separate web application? • Scheduled job? • Run adhoc?
  • 23. THE ANIMAL VERSION How Machine Learning works
  • 30. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. As an auditor, what does this mean for you? • New opportunities and risks • Machine Learning control frameworks • Catch-22 of businesses accepting the risk of black boxes or becoming irrelevant • Use cases in audit analytics • More complicated environment, new skills required to understand business implications and audit algorithms
  • 31. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Machine Learning Security issues Mikhailov, Emil, and Roman Trusov. "How Adversarial Attacks Work." Y Combinator. November 02, 2017. Accessed January 17, 2018. http://blog.ycombinator.com/how-adversarial-attacks-work/?imm_mid=0f81cc&cmp=em-data-na-na-newsltr_20171115.
  • 32. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Machine Learning Security issues cont. Mikhailov, Emil, and Roman Trusov. "How Adversarial Attacks Work." Y Combinator. November 02, 2017. Accessed January 17, 2018. http://blog.ycombinator.com/how-adversarial-attacks-work/?imm_mid=0f81cc&cmp=em-data-na-na-newsltr_20171115.
  • 33. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Fun with Kim - Machine Learning issues Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018. https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
  • 34. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Fun with Kim - Machine Learning issues cont. Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018. https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
  • 35. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Fun with Kim - Machine Learning issues cont. Bourdakos, Nick. "Capsule Networks Are Shaking up AI - Here's How to Use Them." Hacker Noon. November 10, 2017. Accessed January 17, 2018. https://hackernoon.com/capsule-networks-are-shaking-up-ai-heres-how-to-use-them-c233a0971952?imm_mid=0f8530&cmp=em-data-na-na-newsltr_ai_20171127.
  • 36. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Use cases in Assurance and Compliance • Anomaly detection • Unsupervised journal entry anomaly detection • Clustering on invoice and AP data for outliers • Outlier user access • ‘Auditor sense’ investigation • Supervised model for expense report investigation • Supervised model for journal entries • AP transactions, customer transactions, etc. • Document review
  • 37. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. The Machine Learning Algorithm Audit • With algorithms increasingly dictating our lives, how do we know that they are operating as intended? • Weapons of Math Destruction by Cathy O'Neil • Unfilled role for assurance professionals. • Review assumptions, and when available, look at the weighting for features in the model. • Decision tree, logistic regression, etc. • Can provide a lot of value when using only SDLC audit methodologies
  • 38. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Where can I learn more about Machine Learning? • Visual Intro, highly recommended, short and sweet http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ • Wikipedia https://en.wikipedia.org/wiki/Machine_learning • Good beginning article with some fantastic books http://machinelearningmastery.com/4-steps-to-get-started-in-machine-learning/ • Weka http://www.cs.waikato.ac.nz/ml/weka/ • Scikit-Learn http://scikit-learn.org/
  • 39. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Conclusion • Definition of Machine Learning • Buzzword breakdown • Broad algorithm overview • Machine Learning process • Real world use cases • The Machine Learning Audit • Where to learn more about Machine Learning
  • 40. Questions? https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliattX4OUCj_8EP65_cTVnBmS1jnYgsGQAieQUc1VQWdgQ/photo/AF1Qi pNlJ6WstaF6chZe1nbnCHfTpg4e_cuGmgyxI-i-?key=aVBxWjhwSzg2RjJWLWRuVFBBZEN1d205bUdEMnhB “At the age of six, I wanted to be a cook. At seven I wanted to be Napoleon. And my ambition has been growing steadily ever since” – Salvador Dali “Every morning upon awakening I experience a supreme pleasure: that of being Salvador Dalí, and I ask myself, wonderstruck, what prodigious thing will he do today, this Salvador Dalí.” – Salvador Dali
  • 41. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Thank you!
  • 42. Copyright © 2018 Information Systems Audit and Control Association, Inc. All rights reserved. Contact • Email: andrewtaylorclark@gmail.com • GitHub: aclarkData • Blog: https://aclarkdata.github.io/ • LinkedIn: www.linkedin.com/in/andrew-clark-b326b767

Notes de l'éditeur

  1. The term Machine Learning was coined by Arthur Samuel in 1959 when he was working at IBM and wrote a paper called ”Some Studies in Machine Learning Using the Game of Checkers” about how an algorithm could be used to determine self-learn the optimal moves in a checker match. Today I will provide an accessible overview to what machine learning is, conceptually how it works, and thinks to keep in mind when you begin to encounter it in the enterprise.
  2. Basically, statistics on steroids. I recently read an article where the author referred to machine learning as “statistics on a mac”. Well, that isn’t completely accurate, but the basics behind machine learning are not as “revolutionary” as one may think, but are the culmination of a “perfect storm” of statistics, ingenious mathematics, Moore’s law, distributed computing, cheap data storage, and the rise of the Silicon Valley firm. AI, which machine learning is a subset of, will not, as Elon Musk famously postulates, pose an existential threat to human existence, and will not replace the need for human workers. Machines cannot generalize learned processes to completely new areas, as humans can, and cannot reason as some, IBM, harrumph, might tell you. There is no such thing as a “thinking” machine. For a machine to ”think”, it would need to have a conscience, empathy, curiosity, invention; all uniquely human traits. This, in fact, means that Machine Learning and “AI” will make human employees more important, not less. Certain jobs that do not require more than a very narrow range of movement or thought (think factor line jobs, possibly driving jobs (the jury is still out on this one)), will be automated, but this will provide more and more opportunities for “human” jobs, ones that require empathy, compassion, relationships, etc. Additionally, the need for more and more skilled tech workers will increase as well. There is work going on to automate repeated aspects of programming, but this only allows for more time for creativity and innovation.
  3. False: http://www.snopes.com/facebook-ai-developed-own-language/   Facebook: http://www.telegraph.co.uk/technology/2017/08/01/facebook-shuts-robots-invent-language/ Musk: https://www.theguardian.com/technology/2017/jul/17/elon-musk-regulation-ai-combat-existential-threat-tesla-spacex-ceo   Example ML powered businesses disrupted Blockbuster, Taxis, etc. One might argue that actually customer centric businesses caused the disruption, however I believe the correct lesson to take away from Blockbuster and traditional Taxi companies is “Companies that saw a way to use new technology to cater better to customers’ needs and wants”. It is both, not an either-or scenario.   Techies prefer the first definition that ML disrupted Blockbuster (after all, the tool is always the answer). Go to any computer science or data science program in the country, better yet, any meetup or forum and you will find almost exclusively discussions about the tool, not the process or how to actually use the tool in the real world. Many times, “new, shiny objects” are not ready for game time. For example, data science programs focus almost exclusively on modeling, giving students standard, pristine datasets. Even when they claim it is” really world”, they just slightly jumble a real dataset. The real world doesn’t have a standard definition for ’y’, or the outcome, what is right or wrong, and the data almost always includes serious problems. I would saw the majority of the time working in data science is about dealing with datasets, be it text, web, or relational, where nobody has a clue why it is there, what happened with during the last implementation that was botched and created bad data in the system, etc. The real “data science” is not about the fanciest new algorithm, but business concerns, wrangling data, feature engineering, culture changes, model deployment, and a bit of modeling dropped in.   
  4. Address: Why would data need to be prepared? How are candidate models chosen?
  5. Starts and ends right here. As data scientists and machine learning experts, we are excited and love talking about the tools and algorithmic implementations. This however, means nothing outside of an academic setting for the ’real world’. It is all for not if it cannot be applied to optimizing and solving business problems.
  6. Talk about the difference between accuracy, recall, etc.
  7. Talk about the difference between accuracy, recall, etc.
  8. Explain. Emphasis that the computer learns the parameters. Nobody goes down and determines what the feature weights are.
  9. Trained model
  10. "Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.” “Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.” “Non-targeted adversarial attack: the most general type of attack when all you want to do is to make the classifier give an incorrect result.  Targeted adversarial attack: a slightly more difficult attack which aims to receive a particular class for your input. “The simplest yet still very efficient algorithm is known as Fast Gradient Step Method (FGSM). The core idea is to add some weak noise on every step of optimization, drifting towards the desired class — or, if you wish, away from the correct one.” ” You start with the same thing. You generate noise, add it to the image, send it to the classifier and repeat the process until the machine makes a mistake. At some point, whether you limit the amplitude of the noise or not, you will hit the spot where the true class stops appearing at all — all you have to do now is to figure out the weakest possible noise that would give you the same result. Simple binary search.” There are two types of defense strategies: 1. Reactive strategy: training another classifier to detect adversarial inputs and reject them. 2. Proactive strategy: implementing an adversarial training routine. “
  11. “Up until now Convolutional Neural Networks (CNNs) have been the state-of-the-art approach to classifying images. CNNs work by accumulating sets of features at each layer. It starts of by finding edges, then shapes, then actual objects. However, the spatial relationship information of all these features is lost.”
  12. “Yikes! There’s definitely two eyes, a nose and a mouth, but something is wrong, can you spot it? We can easily tell that an eye and her mouth are in the wrong place and that this isn’t what a person is supposed to look like. However, a well trained CNN has difficulty with this concept:”
  13. “In addition to being easily fooled by images with features in the wrong place a CNN is also easily confused when viewing an image in a different orientation. One way to combat this is with excessive training of all possible angles, but this takes a lot of time and seems counter intuitive. We can see here the massive drop in performance by simply flipping Kim upside down:” “Finally, convolutional neural networks can be susceptible to white box adversarial attacks. Which is essentially embedding a secret pattern into an object to make it look like something else.”
  14. Examination of the purpose, process, execution, and monitoring of a machine learning model ‘in the wild’. As assurance professionals, how do we know that the model is doing what it should be doing? What is the risk to the business? Data Science is a new discipline, without the formal rigor and mature of processes that exist in other disciplines. Statistics is a profession that has been around for years, yet there are so many issues with the peer review process of statistics, and their models aren’t as complicated!