https://www.iscwest.com/en/Sessions/52328/Fundamentals-of-Machine-Learning-Perspectives-from-a-Data-Scientist
Abstract:
As our world grows more connected, organizations are collecting ever-growing amounts of data. Almost always there are hidden insights in such data that can lead to better outcomes and more value. One important tool to tap into these opportunities is Machine Learning (ML), and across all verticals more and more companies are investing into their ML operations. In this talk, we will take a look at what ML is, what problems it solves, how it is applied, and why companies need to make sure that they have a strategy to employ ML.
First, we will explain the relevant fundamental concepts with a focus on supervised learning and geometric models. An intuitive data set with an accessible instance space from the physical world is used to illustrate our ability to classify data. Various models are used and visually represented to explain the underlying algorithms in an accessible fashion.
Next, we will discuss how ML is revolutionizing approaches to cybersecurity, and how the cybersecurity industry has been changing its approach to the data it collects. From there, we explore other applications in the larger domain of security.
Lastly, we will wrap up with an outlook of where this technology is going and some pointers to get started with employing ML to the data you already collect.
11. FEATURE SELECTION
“Buttock Circumference” [mm]
Weight[10-1kg]
• Correlation
• Gender-specific
slope
• Reduced overlap
• Selection of features
matters
• How to make a
prediction?
15. LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1kg]
• Classifier generalizes
• Note some
misclassifications
• Let’s assume we want
to detect males (blue)
– I.e. “blue” is our
positive class
34. • Unstructured file content
• Algorithm uncovers
interesting properties
• Requires a lot more more
input data
• Unlocks more insight
• “Deep Learning”
43. • Large datasets require algorithmic approaches
– Many sensors, e.g. IoT
– Large input, e.g. video surveillance
– Complex relationships, e.g. social graph
• Hidden structure
• Better accuracy, better response time
44. • Making the most out of available data
• Less friction, better customer experience
• Automation
• Empiricism (but careful of bias in input data)
Why deploy an ML-based technology?
45. • Increasingly effective and viable technology
• Mind the innovator’s dilemma
• Replace rule-based systems
– ML modeling is repeatable
– Maintainability
– Measurability
Why build ML-enabled products?
46. • True positive/false positive trade-off
– ROC curve
– Base rate
– Overfitting
• What is the data?
– Does the data intuitively contain signal?
– What is the system trained on?
• Training data applicable to your use case
• Ground truth
Beyond the Hype: Recognizing Solid ML
47. • Making defense easier
• But: also making attack easier
– Adversarial models
– Adversarial examples
“Adversarial Patch,” Brown et al.,
https://arxiv.org/abs/1712.09665
48. • Autonomous systems
– Malicious use of e.g. drones
– Manipulating autonomous systems (self-driving cars)
• Spoofing
– Lyrebird
– DeepFake
• Adversarial data
– Circumvent facial recognition
– Road signs etc.
Some Adversarial Challenges for the Physical Domain
49. >>> from sklearn.datasets import load_iris
>>> from sklearn import tree
>>> iris = load_iris()
>>> clf = tree.DecisionTreeClassifier()
>>> clf = clf.fit(iris.data, iris.target)
>>> clf.predict(iris.data[:1, :])
array([0])
Getting Started with Scikit-Learn
Source: http://scikit-learn.org/stable/modules/tree.html#classification