Будущее за машинным обучением. Вы узнаете о классификации, основанной на булевых принципах, а также о классификаторах, используемых во многих из наиболее распространенных систем машинного обучения. Докладчик предоставит вашему вниманию простой пример развертывания систем обеспечения безопасности трубопровода, основанных на машинном обучении и разработанных с использованием Apache Spark.
7. Python toolKits
7
● Scikit-Learn - Python library that implements a range
of machine learning algos and helpers
● TensorFlow - library for numerical computation using
data flow graphs / deep learning
8. scikit-learn
8
● easy-to-use, general-purpose toolbox for machine
learning in Python.
● supervised and unsupervised machine learning
techniques.
● Utilities for common tasks such as model selection,
feature extraction, and feature selection
● Built on NumPy, SciPy, and matplotlib
● Open source, commercially usable - BSD license
9. Tensorflow
9
● Open source
● By Google
● used for both research and production
● Used widely for deep learning/neural nets
○ But not restricted to just deep models
● Multiple GPU Support
11. Basic terms
11
Classifier
"An algorithm that implements classification, especially in a concrete
implementation, is known as a classifier. The term "classifier" sometimes
also refers to the mathematical function, implemented by a classification
algorithm, that maps input data to a category."
Model
Linear regression algorithm is a technique to fit points to a line y = m
x+c. Now after fitting, you get for example, y = 10 x + 4. This a model. A
model is something to which when you give an input, gives an output. In ML,
any 'object' created after training from an ML algorithm is a model.
Linear Regression
Fitting a linear relationship b/w two quantitative variables
16. Regression
● regression = finding relationships between variables
Training data
Regression learning
algorithm
Regression
model/function
Size of
population
Profit
16
24. Anomaly detection
● Outliers vs. novelties
○ novelties: unobserved pattern in new observations not included in
training data
● Simple statistics/forecasting methods
○ Exponential smoothing, Holt-Winters algorithm
● Machine learning methods
○ Elliptical envelope, density-based, clustering, SVM
24
29. Supervised classification
● Many different algorithms!
● We will go through five:
○ Naive Bayes
○ K-nearest neighbors
○ Support Vector Machines
○ Decision Trees
29
30. Bayes Theorem
30
the probability of an event
happening is based on prior
knowledge of conditions that
might be related to the event
33. The dataset: 2007 TREC Public Spam Corpus
33
http://plg.uwaterloo.ca/~gvcormac/treccorpus07/
add info about dataset
34. Multiclass classification
34
2 ways to do it:
● 1-vs-rest
○ 1 binary classifier per class
● 1-vs-1
○ 1 binary classifier per pair of classes
○ K*(K−1)/2 classifiers for a K-class problem