Machine learning is permeating our world. As it gains wider adoption, what does it mean for assurance professionals? This session will help you cut through the buzzwords and discover how machine learning can be leveraged in audit and compliance.
After completing this session, you will be able to:
Understand the two groups of algorithms
Understand the machine learning process
Describe use cases in assurance and compliance
Know where to learn more about machine learning
The term Machine Learning was coined by Arthur Samuel in 1959 when he was working at IBM and wrote a paper called ”Some Studies in Machine Learning Using the Game of Checkers” about how an algorithm could be used to determine self-learn the optimal moves in a checker match.
Today I will provide an accessible overview to what machine learning is, conceptually how it works, and thinks to keep in mind when you begin to encounter it in the enterprise.
Basically, statistics on steroids. I recently read an article where the author referred to machine learning as “statistics on a mac”. Well, that isn’t completely accurate, but the basics behind machine learning are not as “revolutionary” as one may think, but are the culmination of a “perfect storm” of statistics, ingenious mathematics, Moore’s law, distributed computing, cheap data storage, and the rise of the Silicon Valley firm. AI, which machine learning is a subset of, will not, as Elon Musk famously postulates, pose an existential threat to human existence, and will not replace the need for human workers. Machines cannot generalize learned processes to completely new areas, as humans can, and cannot reason as some, IBM, harrumph, might tell you.
There is no such thing as a “thinking” machine. For a machine to ”think”, it would need to have a conscience, empathy, curiosity, invention; all uniquely human traits. This, in fact, means that Machine Learning and “AI” will make human employees more important, not less. Certain jobs that do not require more than a very narrow range of movement or thought (think factor line jobs, possibly driving jobs (the jury is still out on this one)), will be automated, but this will provide more and more opportunities for “human” jobs, ones that require empathy, compassion, relationships, etc. Additionally, the need for more and more skilled tech workers will increase as well. There is work going on to automate repeated aspects of programming, but this only allows for more time for creativity and innovation.
False: http://www.snopes.com/facebook-ai-developed-own-language/
Facebook: http://www.telegraph.co.uk/technology/2017/08/01/facebook-shuts-robots-invent-language/
Musk: https://www.theguardian.com/technology/2017/jul/17/elon-musk-regulation-ai-combat-existential-threat-tesla-spacex-ceo
Example ML powered businesses disrupted Blockbuster, Taxis, etc. One might argue that actually customer centric businesses caused the disruption, however I believe the correct lesson to take away from Blockbuster and traditional Taxi companies is “Companies that saw a way to use new technology to cater better to customers’ needs and wants”. It is both, not an either-or scenario.
Techies prefer the first definition that ML disrupted Blockbuster (after all, the tool is always the answer). Go to any computer science or data science program in the country, better yet, any meetup or forum and you will find almost exclusively discussions about the tool, not the process or how to actually use the tool in the real world. Many times, “new, shiny objects” are not ready for game time. For example, data science programs focus almost exclusively on modeling, giving students standard, pristine datasets. Even when they claim it is” really world”, they just slightly jumble a real dataset. The real world doesn’t have a standard definition for ’y’, or the outcome, what is right or wrong, and the data almost always includes serious problems. I would saw the majority of the time working in data science is about dealing with datasets, be it text, web, or relational, where nobody has a clue why it is there, what happened with during the last implementation that was botched and created bad data in the system, etc. The real “data science” is not about the fanciest new algorithm, but business concerns, wrangling data, feature engineering, culture changes, model deployment, and a bit of modeling dropped in.
Address:
Why would data need to be prepared?
How are candidate models chosen?
Starts and ends right here. As data scientists and machine learning experts, we are excited and love talking about the tools and algorithmic implementations. This however, means nothing outside of an academic setting for the ’real world’. It is all for not if it cannot be applied to optimizing and solving business problems.
Talk about the difference between accuracy, recall, etc.
Talk about the difference between accuracy, recall, etc.
Explain.
Emphasis that the computer learns the parameters. Nobody goes down and determines what the feature weights are.
Trained model
"Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions, and with a little bit of skill, you can get them to give pretty much any result you want.”
“Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.”
“Non-targeted adversarial attack: the most general type of attack when all you want to do is to make the classifier give an incorrect result.
Targeted adversarial attack: a slightly more difficult attack which aims to receive a particular class for your input.
“The simplest yet still very efficient algorithm is known as Fast Gradient Step Method (FGSM). The core idea is to add some weak noise on every step of optimization, drifting towards the desired class — or, if you wish, away from the correct one.”
” You start with the same thing. You generate noise, add it to the image, send it to the classifier and repeat the process until the machine makes a mistake. At some point, whether you limit the amplitude of the noise or not, you will hit the spot where the true class stops appearing at all — all you have to do now is to figure out the weakest possible noise that would give you the same result. Simple binary search.”
There are two types of defense strategies:1. Reactive strategy: training another classifier to detect adversarial inputs and reject them.2. Proactive strategy: implementing an adversarial training routine.
“
“Up until now Convolutional Neural Networks (CNNs) have been the state-of-the-art approach to classifying images.
CNNs work by accumulating sets of features at each layer. It starts of by finding edges, then shapes, then actual objects. However, the spatial relationship information of all these features is lost.”
“Yikes! There’s definitely two eyes, a nose and a mouth, but something is wrong, can you spot it? We can easily tell that an eye and her mouth are in the wrong place and that this isn’t what a person is supposed to look like. However, a well trained CNN has difficulty with this concept:”
“In addition to being easily fooled by images with features in the wrong place a CNN is also easily confused when viewing an image in a different orientation. One way to combat this is with excessive training of all possible angles, but this takes a lot of time and seems counter intuitive. We can see here the massive drop in performance by simply flipping Kim upside down:”
“Finally, convolutional neural networks can be susceptible to white box adversarial attacks. Which is essentially embedding a secret pattern into an object to make it look like something else.”
Examination of the purpose, process, execution, and monitoring of a machine learning model ‘in the wild’.
As assurance professionals, how do we know that the model is doing what it should be doing? What is the risk to the business?
Data Science is a new discipline, without the formal rigor and mature of processes that exist in other disciplines. Statistics is a profession that has been around for years, yet there are so many issues with the peer review process of statistics, and their models aren’t as complicated!