SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Practical Machine Learning
(Part I)
Traian Rebedea
Faculty of Automatic Control and Computers, UPB
traian.rebedea@cs.pub.ro
Outline
• About me
• Why machine learning?
• Basic notions of machine learning
• Sample problems and solutions
– Finding vandalism on Wikipedia
– Sexual predator detection
– Identification of buildings on a map
– Change in the meaning of concepts over time
• Practical ML
– R
– Python (sklearn)
– Java (Weka)
– Others
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
About me
• Started working in 2003 (year 2 of studies)
– J2EE developer
• Then created my own company with a colleague (in 2005)
– Web projects (PHP, Javascript, etc.), involved in some startups for the
technological aspects (e.g. Happyfish TV)
• Started teaching at A&C, UPB (in 2006)
– TA for Algorithms, Natural Language Processing
• Started my PhD (in 2007)
– Natural Language Processing, Discourse Analysis, Tehnology-Enhanced
Learning
• Now I am teaching several lectures: Algorithm Design, Algorithm Design
and Complexity, Symbolic and Statistical Learning, Information Retrieval
• Working on several topics in NLP: opinion mining, conversational agents,
question-answering, culturonomics
• Interested in all new projects that use NLP, ML and IR as well as any other
“smart” applications
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why machine learning?
• “Simply put, machine learning is the part of artificial
intelligence that actually works” (Forbes, link)
• Most popular course on Coursera
• Popular?
– Meaning practical: ML results are nice and easy to show to
others
– Meaning well-paid/in demand: companies are increasing
demand for ML specialists
• Large volumes of data, corpora, etc.
• Computer science, data science, almost any other
domain (especially in research)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Why ML?
• Mix of Computer science and Math (Statistics)
skills
• Why is math essential to ML?
– http://courses.washington.edu/css490/2012.Winter/lecture_slides/02
_math_essentials.pdf
• “To get really useful results, you need good
mathematical intuitions about certain general
machine learning principles, as well as the
inner workings of the individual algorithms.”
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Different tasks in ML
• Supervised
– Regression
– Classification
• Unsupervised
– Clustering
– Dimensionality reduction / visualization
• Semi-supervised
– Label propagation
• Reinforcement learning
• Deep learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset consisting of labeled examples
• Each example has one or several attributes
and a dependent variable (label, output,
response, predicted variable)
• After building, selection and assessing the
model on the labeled data, it is used to find
labels for new data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Supervised learning
• Dataset is usually split into three parts
– A rule is 50-25-25
• Training set
– Used to build models
– Compute the basic parameters of the model
• Holdout/validation set
– Used to find out the best model and adjust some parameters
of the model (usually, in order to minimize some error)
– Also called model selection
• Test set
– Used to assess the final model on new data, after model
building and selection
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://blogs.sas.com/content/jmp/2010/07/06/train-validate-and-test-for-
data-mining-in-jmp/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation
If no of partitions (S) = no of training instances 
leave-one-out (only for small datasets)
Main disadvantages:
- no of runs is
increased by a factor
of S
- Multiple parameters
for the same model
for different runs
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Regression
• Supervised learning when the output variable
is continous
– This is the usual textbook definition
– Also may be used for binary output variables (e.g.
Not-spam=0, Spam=1) or output variables that
have an ordering (e.g. integer numbers, sentiment
levels, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least-
squares-estimation/
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Supervised learning when the output variable
is discrete (classes)
– Can also be used for binary or integer outputs
– Even if they are ordered, but classification usually
does not account for the order
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Classification
• Several important types
– Binary vs. multiple classes
– Hard vs. soft
– Single-label vs. multi-label (per example)
– Balanced vs. unbalanced classes
• Extensions that are semi-supervised
– One-class classification
– PU (Positive and Unlabeled) learning
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Unsupervised learning
• For an unlabeled dataset, infer properties (find
“hidden” structure) about the distribution of the
objects in the dataset
– without any help related to correct answers
• Generally, much more data than for supervised
learning
• There are several techniques that can be applied
– Clustering (cluster analysis)
– Dimensionality reduction (visualization of data)
– Association rules, frequent itemsets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Partition dataset into clusters (groups) of
objects such that the ones in the same cluster
are more similar to each other than to objects
in different clusters
• Similarity
– The most important notion when clustering
– “Inverse of a distance”
• Possible objective: approximate the modes of
the input dataset distribution
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Clustering
• Lots of different cluster models:
– Connectivity (distance models) (e.g Hierarchical)
– Centroid (e.g. K-means)
– Distribution (e.g GMM)
– Density of data (e.g. DBSCAN)
– Graph (e.g. cliques, community detection)
– Subspace (e.g. biclustering)
– …
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://genomics10.bu.edu/terrence/gems/gems.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Dimensionality reduction
• Usually, unsupervised datasets are large both
in number of examples and in number of
attributes (features)
• Reduce the number of features in order to
improve (human) readability and
interpretation of data
• Mainly linked to visualization
– Also used for feature selection, data compression
or denoising
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Semi-supervised learning
• Mix of labeled and unlabeled data in the dataset
• Enlarge the labeled dataset with unlabeled instances for which we
might determine the correct label with a high probability
• Unlabeled data is much cheaper or simpler to get than labeled data
– Human annotation either requires experts or is not challenging (it is
boring)
– Annotation may also require specific software or other types of
devices
– Students (or Amazon Mturk users) are not always good annotators
• More information:
http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Label propagation
• Start with the labeled dataset
• Want to assign labels to some (or all) of the unlabeled
instances
• Assumption: “data points that are close have similar labels”
• Build a fully connected graph with both labeled and unlabeled
instances as vertices
• The weight of an edge represents the similarity between the
points (or inverse of a distance)
• Repeat until convergence
– Propagate labels through edges (larger weights allow a
more quick propagation)
– Nodes will have soft labels
• More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• How to asses the performance of a model?
• Used to choose the best model
• Supervised learning
– Assessing errors of the model on the validation
and test sets
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Performance measures - supervised
• Regression
– Mean error, mean squared error (MSE), root MSE, etc.
• Classification
– Accuracy
– Precision / recall (possibly weighted)
– F-measure
– Confusion matrix
– TP, TN, FP, FN
– AUC / ROC
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://www.resacorp.com/lsmse2.htm
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Basic Notions of ML
• Overfitting
– Low error on training data
– High error on (unseen/new) test data
– The model does not generalize well from the training data
on unseen data
– Possible causes:
• Too few training examples
• Model too complex (too many parameters)
• Underfitting
– High error both on training and test data
– Model is too simple to capture the information in the
training data
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://gerardnico.com/wiki/data_mining/overfitting
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sample problems and solutions
• Chosen from prior experience
• May not be the most interesting problems for all
(any) of you
• I have tried to present 4 problems that are
somehow diverse in solution and simple to
explain
• All solutions are state-of-the-art (that means not
the very best one, but among the top)
– Very best solution may be found in research papers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Finding vandalism on Wikipedia
• Work with Dan Cioiu
• Vandalism = editing in a malicious manner that is intentionally
disruptive
• Dataset used at PAN 2011
– Real articles extracted from Wikipedia in 2009 and 2010
– Training data: 32,000 edits
– Test data: 130,000 edits
• An edit consists of:
– the old and new content of the article
– author information
– revision comment and timestamp
• We chose to use only the actual text in the revision (no author and
timestamp information) to predict vandalism
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Features
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Using several classifiers
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Cross-validation results
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Combining results using a meta-
classifier
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Final results on test set
• Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature extraction is important in ML
– Which are the best features?
• Syntactic and context analysis are the most important
• Semantic analysis is slow but increases detection by
10%
• Using features related to users would further improve
our results
• A meta-classifier sometimes improves the results of
several individual classifiers
– Not always! Maybe another discussion on this…
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Sexual predator detection
• Work with Claudia Cardei
• Automatically detect sexual predators in a chat
discussion in order to alert the parents or the police
• Corpus from PAN 2012
– 60000+ chat conversations (around 10000 after removing
outliers)
– 90000+ users
– 142 sexual predators
• Sexual predator is used “to describe a person seen as
obtaining or trying to obtain sexual contact with
another person in a metaphorically predatory manner”
(Wikipedia)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Problem description
• Decide either a user is a predator or not
• Use only the chats that have a predator
(balanced corpus: 50% predators - 50%
victims)
• One document for each user
• How to determine the features?
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Simple solution
• “Bag of words” (BoW) features with tf-idf
weighting
• Feature extraction using mutual information
• MLP (neural network) with 500 features: 89%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Going beyond
• The should be other features that are discriminative for predators
• Investigated behavioral features:
– Percentage of questions and inquiries initiated by an user
– Percentage of negations
– Expressions that might denote an underage user (“don’t know”, “never
did”,…)
– The age of the user if found in a chat reply (“asl” like answers)
– Percentage of slang words
– Percentage of words with a sexual connotation
– Readability scores (Flesch)
– Who started the conversation
• AdaBoost (with DecisionStump): 90%
• Random Forest: 93%
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Feature selection is useful when the number of
features is large
– Reduced from 10k+ features to hundreds/thousands
– We used MI, there are others techniques
• Fewer features can be as descriptive as (or even
more descriptive than) the BoW model
– Difficult to choose
– Need additional computations and information
– Need to understand the dataset
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Identification of buildings on a map
• Work with Alexandra Vieru, Paul Vlase
• Given an aerial image, identify the buildings
• Non-military purposes:
– Estimate population density
– Identify main landmarks (shopping malls, etc.)
– Identify new buildings over time
• Features
– HOG descriptors – 3x3 cells
– Histogram of RGB – 5 bins, normalized
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Preliminary results
• SVM classifier
• Train: 2276 images, Test: 637 images
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Main conclusions
• Verify which are the best features
– If this is possible
• Must verify for overfitting using train and test
corpus
• Each task can have a baseline (a simple
classification) method
– Here it might be SVM with RGB histogram features
• Actually this could be treated as a one-class
classification (or PU learning) problem
– There are much more negative instances than positive
ones
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Change in the meaning of concepts
over time
• Work with Costin Chiru, Bianca Dinu, Diana Mincu
• Assumption:
– Different meanings of words should be identifiable
based on the other words that are associated with
them
– These other words are changing over time thus
denoting a change in the meaning (or usage of the
meanings)
• Using the Google Books n-grams corpus
– Dependency information
– 5-grams information centered on the analyzed word
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Clustering using tf-idf over dependencies of the
analyzed word
• Each distinct year is considered a different document
and each dependency is a feature/term
• Example for the word “gay”:
– Cluster 1 (1600 - 1925): signature {more, lively, happy,
brilliant, cheerful}
– Cluster 2 (1923-1969): signature {more, lively, happy,
cheerful, be}
– Cluster 3 (1981 - today): signature {lesbian, bisexual, anti,
enolum, straight}
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Practical ML
• Brief discussion of alternatives that might be
used by a programmer
• The main idea is to show that it is simple to
start working on a ML task
• However, it is more difficult to achieve very
good results
– Experience
– Theoretical underpinnings of models
– Statistical background
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
R language
• Designed for “statistical computing”
• Lots of packages and functions for ML & statistics
• Simple to output visualizations
• Easy to write code, short
• It is kind of slow on some tasks
• Need to understand some new concepts: data frame,
factor, etc.
• You may find the same functionality in several
packages (e.g. NaïveBayes, Cohen’s Kappa, etc.)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Python (sklearn)
• sklearn (scikit-learn) package developed for
ML
• Uses existing packages numpy, scipy
• Started as a GSOC project in 2007
• It is new, implemented efficiently (time and
memory), lots of functionality
• Easy to write code
– Naming conventions, parameters, etc.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
• Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat-
sheet-for-scikit.html
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Java (Weka)
• Similar to scikit-learn for Python
• Older (since 1997) and more stable than scikit-learn
• Some implementations are (were) a little bit more
inefficient, especially related to memory consumption
• However, usually it is fast, accurate, and has lots of
other useful utilities
• It also has a GUI for getting some results fast without
writing any code
• Lots of people are debating whether Weka or scikit-learn is
better. I don’t want to get into this discussion as it depends on
your task and resources.
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Not discussed
• RapidMiner
– Easy to use
– GUI interface for building the processing pipeline
– Can write some code/plugins
• Matlab/Octave
– Numeric computing
– Paid vs. free
• SPSS/SAS/Stata
– Somehow similar to R, faster, more robust, expensive
• Many others:
– Apache Mahout / MALLET / NLTK / Orange / MLPACK
– Specific solutions for various ML models
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Conclusions
• Lots of interesting ML tasks and data available
– Do you have access to new and interesting data in your
business?
• Several easy to use programming alternatives
• Understand the basics
• Enhance your maths (statistics) skills for a better ML
experience
• Hands-on experience and communities of practice
– www.kaggle.com (and other similar initiatives)
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
Thank you!
traian.rebedea@cs.pub.ro
29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
_____
_____

Contenu connexe

Tendances

Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1Eddyzulham Mahluzydde
 
Supervised learning
Supervised learningSupervised learning
Supervised learningAlia Hamwi
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AIWitekio
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Virtualization
VirtualizationVirtualization
VirtualizationMadnanS
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...TEJVEER SINGH
 

Tendances (20)

Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
Database, Lecture-1.ppt
Database, Lecture-1.pptDatabase, Lecture-1.ppt
Database, Lecture-1.ppt
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AI
 
Data models
Data modelsData models
Data models
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Normalization
NormalizationNormalization
Normalization
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
 

Similaire à Practical machine learning - Part 1

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptxShamasRehman4
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Hans Põldoja
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized LearningPeter Brusilovsky
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSMohamed EL Zayat
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experienceAndres Baravalle
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareChristian Ramirez
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsShalin Hai-Jew
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & ArchitectureFolajimi Fakoya
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learningananth
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software EngineeringNandhini S
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesHans Põldoja
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.pptNelsonYanes6
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktopThomas Grill
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 

Similaire à Practical machine learning - Part 1 (20)

Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
WWW2015 PHD Symposium
WWW2015 PHD SymposiumWWW2015 PHD Symposium
WWW2015 PHD Symposium
 
Simulation and modeling introduction.pptx
Simulation and modeling introduction.pptxSimulation and modeling introduction.pptx
Simulation and modeling introduction.pptx
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
 
Enabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMSEnabling e labs experiments delivery using Moodle LMS
Enabling e labs experiments delivery using Moodle LMS
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
Measuring the user experience
Measuring the user experienceMeasuring the user experience
Measuring the user experience
 
Building Comfort with MATLAB
Building Comfort with MATLABBuilding Comfort with MATLAB
Building Comfort with MATLAB
 
Lessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning SoftwareLessons Learned from Testing Machine Learning Software
Lessons Learned from Testing Machine Learning Software
 
Using Qualtrics for Online Trainings
Using Qualtrics for Online TrainingsUsing Qualtrics for Online Trainings
Using Qualtrics for Online Trainings
 
Course Possibilities & Architecture
Course Possibilities & ArchitectureCourse Possibilities & Architecture
Course Possibilities & Architecture
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Unit IV Software Engineering
Unit IV Software EngineeringUnit IV Software Engineering
Unit IV Software Engineering
 
Be cse
Be cseBe cse
Be cse
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital Competences
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
CEN6016-Chapter1.ppt
CEN6016-Chapter1.pptCEN6016-Chapter1.ppt
CEN6016-Chapter1.ppt
 
Evaluation beyond the desktop
Evaluation beyond the desktopEvaluation beyond the desktop
Evaluation beyond the desktop
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 

Plus de Traian Rebedea

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...Traian Rebedea
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discoveryTraian Rebedea
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaTraian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeTraian Rebedea
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianTraian Rebedea
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Traian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 

Plus de Traian Rebedea (20)

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
 
Detecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large CorporaDetecting and Describing Historical Periods in a Large Corpora
Detecting and Describing Historical Periods in a Large Corpora
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Relevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTubeRelevance based ranking of video comments on YouTube
Relevance based ranking of video comments on YouTube
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Practical machine learning - Part 1

  • 1. Practical Machine Learning (Part I) Traian Rebedea Faculty of Automatic Control and Computers, UPB traian.rebedea@cs.pub.ro
  • 2. Outline • About me • Why machine learning? • Basic notions of machine learning • Sample problems and solutions – Finding vandalism on Wikipedia – Sexual predator detection – Identification of buildings on a map – Change in the meaning of concepts over time • Practical ML – R – Python (sklearn) – Java (Weka) – Others 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 3. About me • Started working in 2003 (year 2 of studies) – J2EE developer • Then created my own company with a colleague (in 2005) – Web projects (PHP, Javascript, etc.), involved in some startups for the technological aspects (e.g. Happyfish TV) • Started teaching at A&C, UPB (in 2006) – TA for Algorithms, Natural Language Processing • Started my PhD (in 2007) – Natural Language Processing, Discourse Analysis, Tehnology-Enhanced Learning • Now I am teaching several lectures: Algorithm Design, Algorithm Design and Complexity, Symbolic and Statistical Learning, Information Retrieval • Working on several topics in NLP: opinion mining, conversational agents, question-answering, culturonomics • Interested in all new projects that use NLP, ML and IR as well as any other “smart” applications 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 4. Why machine learning? • “Simply put, machine learning is the part of artificial intelligence that actually works” (Forbes, link) • Most popular course on Coursera • Popular? – Meaning practical: ML results are nice and easy to show to others – Meaning well-paid/in demand: companies are increasing demand for ML specialists • Large volumes of data, corpora, etc. • Computer science, data science, almost any other domain (especially in research) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 5. Why ML? • Mix of Computer science and Math (Statistics) skills • Why is math essential to ML? – http://courses.washington.edu/css490/2012.Winter/lecture_slides/02 _math_essentials.pdf • “To get really useful results, you need good mathematical intuitions about certain general machine learning principles, as well as the inner workings of the individual algorithms.” 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 6. Basic Notions of ML • Different tasks in ML • Supervised – Regression – Classification • Unsupervised – Clustering – Dimensionality reduction / visualization • Semi-supervised – Label propagation • Reinforcement learning • Deep learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 7. Supervised learning • Dataset consisting of labeled examples • Each example has one or several attributes and a dependent variable (label, output, response, predicted variable) • After building, selection and assessing the model on the labeled data, it is used to find labels for new data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 8. Supervised learning • Dataset is usually split into three parts – A rule is 50-25-25 • Training set – Used to build models – Compute the basic parameters of the model • Holdout/validation set – Used to find out the best model and adjust some parameters of the model (usually, in order to minimize some error) – Also called model selection • Test set – Used to assess the final model on new data, after model building and selection 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 10. Cross-validation If no of partitions (S) = no of training instances  leave-one-out (only for small datasets) Main disadvantages: - no of runs is increased by a factor of S - Multiple parameters for the same model for different runs 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 11. Regression • Supervised learning when the output variable is continous – This is the usual textbook definition – Also may be used for binary output variables (e.g. Not-spam=0, Spam=1) or output variables that have an ordering (e.g. integer numbers, sentiment levels, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 12. • Source: http://smlv.cc.gatech.edu/2010/10/06/linear-regression-and-least- squares-estimation/ 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 13. Classification • Supervised learning when the output variable is discrete (classes) – Can also be used for binary or integer outputs – Even if they are ordered, but classification usually does not account for the order 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 14. • CIFAR dataset: http://www.cs.toronto.edu/~kriz/cifar.html (also image source) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 15. Classification • Several important types – Binary vs. multiple classes – Hard vs. soft – Single-label vs. multi-label (per example) – Balanced vs. unbalanced classes • Extensions that are semi-supervised – One-class classification – PU (Positive and Unlabeled) learning 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 16. Unsupervised learning • For an unlabeled dataset, infer properties (find “hidden” structure) about the distribution of the objects in the dataset – without any help related to correct answers • Generally, much more data than for supervised learning • There are several techniques that can be applied – Clustering (cluster analysis) – Dimensionality reduction (visualization of data) – Association rules, frequent itemsets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 17. Clustering • Partition dataset into clusters (groups) of objects such that the ones in the same cluster are more similar to each other than to objects in different clusters • Similarity – The most important notion when clustering – “Inverse of a distance” • Possible objective: approximate the modes of the input dataset distribution 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 18. Clustering • Lots of different cluster models: – Connectivity (distance models) (e.g Hierarchical) – Centroid (e.g. K-means) – Distribution (e.g GMM) – Density of data (e.g. DBSCAN) – Graph (e.g. cliques, community detection) – Subspace (e.g. biclustering) – … 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 19. • Source: http://genomics10.bu.edu/terrence/gems/gems.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 20. • Source: https://sites.google.com/site/statsr4us/advanced/multivariate/cluster 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 21. Dimensionality reduction • Usually, unsupervised datasets are large both in number of examples and in number of attributes (features) • Reduce the number of features in order to improve (human) readability and interpretation of data • Mainly linked to visualization – Also used for feature selection, data compression or denoising 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 22. • Source: http://www.ifs.tuwien.ac.at/ifs/research/pub_pdf/rau_ecdl01.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 23. Semi-supervised learning • Mix of labeled and unlabeled data in the dataset • Enlarge the labeled dataset with unlabeled instances for which we might determine the correct label with a high probability • Unlabeled data is much cheaper or simpler to get than labeled data – Human annotation either requires experts or is not challenging (it is boring) – Annotation may also require specific software or other types of devices – Students (or Amazon Mturk users) are not always good annotators • More information: http://pages.cs.wisc.edu/~jerryzhu/pub/sslicml07.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 24. Label propagation • Start with the labeled dataset • Want to assign labels to some (or all) of the unlabeled instances • Assumption: “data points that are close have similar labels” • Build a fully connected graph with both labeled and unlabeled instances as vertices • The weight of an edge represents the similarity between the points (or inverse of a distance) • Repeat until convergence – Propagate labels through edges (larger weights allow a more quick propagation) – Nodes will have soft labels • More info: http://lvk.cs.msu.su/~bruzz/articles/classification/zhu02learning.pdf 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 25. • Source: http://web.ornl.gov/sci/knowledgediscovery/Projects.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 26. Basic Notions of ML • How to asses the performance of a model? • Used to choose the best model • Supervised learning – Assessing errors of the model on the validation and test sets 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 27. Performance measures - supervised • Regression – Mean error, mean squared error (MSE), root MSE, etc. • Classification – Accuracy – Precision / recall (possibly weighted) – F-measure – Confusion matrix – TP, TN, FP, FN – AUC / ROC 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 28. • Source: http://www.resacorp.com/lsmse2.htm 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 29. • Source: http://en.wikipedia.org/wiki/Sensitivity_and_specificity 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 30. Basic Notions of ML • Overfitting – Low error on training data – High error on (unseen/new) test data – The model does not generalize well from the training data on unseen data – Possible causes: • Too few training examples • Model too complex (too many parameters) • Underfitting – High error both on training and test data – Model is too simple to capture the information in the training data 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 31. • Source: http://gerardnico.com/wiki/data_mining/overfitting 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 32. Sample problems and solutions • Chosen from prior experience • May not be the most interesting problems for all (any) of you • I have tried to present 4 problems that are somehow diverse in solution and simple to explain • All solutions are state-of-the-art (that means not the very best one, but among the top) – Very best solution may be found in research papers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 33. Finding vandalism on Wikipedia • Work with Dan Cioiu • Vandalism = editing in a malicious manner that is intentionally disruptive • Dataset used at PAN 2011 – Real articles extracted from Wikipedia in 2009 and 2010 – Training data: 32,000 edits – Test data: 130,000 edits • An edit consists of: – the old and new content of the article – author information – revision comment and timestamp • We chose to use only the actual text in the revision (no author and timestamp information) to predict vandalism 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 34. Features 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 35. Using several classifiers 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 36. Cross-validation results 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 37. Combining results using a meta- classifier 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 38. Final results on test set • Source: http://link.springer.com/chapter/10.1007%2F978-3-642-40495-5_32 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 39. Main conclusions • Feature extraction is important in ML – Which are the best features? • Syntactic and context analysis are the most important • Semantic analysis is slow but increases detection by 10% • Using features related to users would further improve our results • A meta-classifier sometimes improves the results of several individual classifiers – Not always! Maybe another discussion on this… 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 40. Sexual predator detection • Work with Claudia Cardei • Automatically detect sexual predators in a chat discussion in order to alert the parents or the police • Corpus from PAN 2012 – 60000+ chat conversations (around 10000 after removing outliers) – 90000+ users – 142 sexual predators • Sexual predator is used “to describe a person seen as obtaining or trying to obtain sexual contact with another person in a metaphorically predatory manner” (Wikipedia) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 41. Problem description • Decide either a user is a predator or not • Use only the chats that have a predator (balanced corpus: 50% predators - 50% victims) • One document for each user • How to determine the features? 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 42. Simple solution • “Bag of words” (BoW) features with tf-idf weighting • Feature extraction using mutual information • MLP (neural network) with 500 features: 89% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 43. Going beyond • The should be other features that are discriminative for predators • Investigated behavioral features: – Percentage of questions and inquiries initiated by an user – Percentage of negations – Expressions that might denote an underage user (“don’t know”, “never did”,…) – The age of the user if found in a chat reply (“asl” like answers) – Percentage of slang words – Percentage of words with a sexual connotation – Readability scores (Flesch) – Who started the conversation • AdaBoost (with DecisionStump): 90% • Random Forest: 93% 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 44. Main conclusions • Feature selection is useful when the number of features is large – Reduced from 10k+ features to hundreds/thousands – We used MI, there are others techniques • Fewer features can be as descriptive as (or even more descriptive than) the BoW model – Difficult to choose – Need additional computations and information – Need to understand the dataset 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 45. Identification of buildings on a map • Work with Alexandra Vieru, Paul Vlase • Given an aerial image, identify the buildings • Non-military purposes: – Estimate population density – Identify main landmarks (shopping malls, etc.) – Identify new buildings over time • Features – HOG descriptors – 3x3 cells – Histogram of RGB – 5 bins, normalized 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 46. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 47. Preliminary results • SVM classifier • Train: 2276 images, Test: 637 images 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 48. Main conclusions • Verify which are the best features – If this is possible • Must verify for overfitting using train and test corpus • Each task can have a baseline (a simple classification) method – Here it might be SVM with RGB histogram features • Actually this could be treated as a one-class classification (or PU learning) problem – There are much more negative instances than positive ones 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 49. Change in the meaning of concepts over time • Work with Costin Chiru, Bianca Dinu, Diana Mincu • Assumption: – Different meanings of words should be identifiable based on the other words that are associated with them – These other words are changing over time thus denoting a change in the meaning (or usage of the meanings) • Using the Google Books n-grams corpus – Dependency information – 5-grams information centered on the analyzed word 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 50. • Source: https://books.google.com/ngrams (Query: “gay=>*_NOUN”) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 51. • Clustering using tf-idf over dependencies of the analyzed word • Each distinct year is considered a different document and each dependency is a feature/term • Example for the word “gay”: – Cluster 1 (1600 - 1925): signature {more, lively, happy, brilliant, cheerful} – Cluster 2 (1923-1969): signature {more, lively, happy, cheerful, be} – Cluster 3 (1981 - today): signature {lesbian, bisexual, anti, enolum, straight} 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 52. Practical ML • Brief discussion of alternatives that might be used by a programmer • The main idea is to show that it is simple to start working on a ML task • However, it is more difficult to achieve very good results – Experience – Theoretical underpinnings of models – Statistical background 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 53. R language • Designed for “statistical computing” • Lots of packages and functions for ML & statistics • Simple to output visualizations • Easy to write code, short • It is kind of slow on some tasks • Need to understand some new concepts: data frame, factor, etc. • You may find the same functionality in several packages (e.g. NaïveBayes, Cohen’s Kappa, etc.) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 54. Python (sklearn) • sklearn (scikit-learn) package developed for ML • Uses existing packages numpy, scipy • Started as a GSOC project in 2007 • It is new, implemented efficiently (time and memory), lots of functionality • Easy to write code – Naming conventions, parameters, etc. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 55. • Source: http://peekaboo-vision.blogspot.ro/2013/01/machine-learning-cheat- sheet-for-scikit.html 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 56. Java (Weka) • Similar to scikit-learn for Python • Older (since 1997) and more stable than scikit-learn • Some implementations are (were) a little bit more inefficient, especially related to memory consumption • However, usually it is fast, accurate, and has lots of other useful utilities • It also has a GUI for getting some results fast without writing any code • Lots of people are debating whether Weka or scikit-learn is better. I don’t want to get into this discussion as it depends on your task and resources. 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 57. Not discussed • RapidMiner – Easy to use – GUI interface for building the processing pipeline – Can write some code/plugins • Matlab/Octave – Numeric computing – Paid vs. free • SPSS/SAS/Stata – Somehow similar to R, faster, more robust, expensive • Many others: – Apache Mahout / MALLET / NLTK / Orange / MLPACK – Specific solutions for various ML models 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 58. Conclusions • Lots of interesting ML tasks and data available – Do you have access to new and interesting data in your business? • Several easy to use programming alternatives • Understand the basics • Enhance your maths (statistics) skills for a better ML experience • Hands-on experience and communities of practice – www.kaggle.com (and other similar initiatives) 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49
  • 59. Thank you! traian.rebedea@cs.pub.ro 29 May 2014 Practical Machine Learning (Part 1) Softbinator Talks #49 _____ _____