SlideShare une entreprise Scribd logo
1  sur  35
Introduction to 
Machine Learning 
September 2014 Meetup 
Rahul Jain 
@rahuldausa 
Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR 
http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
http://www.meetup.com/DataAnalyticsGroup/ 
Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. 
http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
Agenda 
• Introduction 
• Basics 
• Classification 
• Clustering 
• Regression 
• Use-Cases 
2
Quick Questionnaire 
How many people have heard about Machine Learning 
How many people know about Machine Learning 
How many people are using Machine Learning
About 
• subfield of Artificial Intelligence (AI) 
• name is derived from the concept that it deals with 
“construction and study of systems that can learn from data” 
• can be seen as building blocks to make computers learn to 
behave more intelligently 
• It is a theoretical concept. There are various techniques with 
various implementations. 
• http://en.wikipedia.org/wiki/Machine_learning
In other words… 
“A computer program is said to learn from 
experience (E) with some class of tasks (T) and a 
performance measure (P) if its performance at tasks 
in T as measured by P improves with E”
Terminology 
• Features 
– The number of features or distinct traits that can be used to describe 
each item in a quantitative manner. 
• Samples 
– A sample is an item to process (e.g. classify). It can be a document, a 
picture, a sound, a video, a row in database or CSV file, or whatever 
you can describe with a fixed set of quantitative traits. 
• Feature vector 
– is an n-dimensional vector of numerical features that represent some 
object. 
• Feature extraction 
– Preparation of feature vector 
– transforms the data in the high-dimensional space to a space of 
fewer dimensions. 
• Training/Evolution set 
– Set of data to discover potentially predictive relationships.
Let’s dig deep into it… 
What do you mean by 
Apple
Learning (Training) 
Features: 
1. Color: Radish/Red 
2. Type : Fruit 
3. Shape 
etc… 
Features: 
1. Sky Blue 
2. Logo 
3. Shape 
etc… 
Features: 
1. Yellow 
2. Fruit 
3. Shape 
etc…
Workflow
Categories 
• Supervised Learning 
• Unsupervised Learning 
• Semi-Supervised Learning 
• Reinforcement Learning
Supervised Learning 
• the correct classes of the training data are 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning 
• the correct classes of the training data are not 
known 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning 
• A Mix of Supervised and Unsupervised learning 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning 
• allows the machine or software agent to learn its 
behavior based on feedback from the environment. 
• This behavior can be learnt once and for all, or keep on 
adapting as time goes by. 
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques 
• classification: predict class from observations 
• clustering: group observations into 
“meaningful” groups 
• regression (prediction): predict value from 
observations
Classification 
• classify a document into a predefined category. 
• documents can be text, images 
• Popular one is Naive Bayes Classifier. 
• Steps: 
– Step1 : Train the program (Building a Model) using a 
training set with a category for e.g. sports, cricket, news, 
– Classifier will compute probability for each word, the 
probability that it makes a document belong to each of 
considered categories 
– Step2 : Test with a test data set against this Model 
• http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering 
• clustering is the task of grouping a set of objects in 
such a way that objects in the same group (called 
a cluster) are more similar to each other 
• objects are not predefined 
• For e.g. these keywords 
– “man’s shoe” 
– “women’s shoe” 
– “women’s t-shirt” 
– “man’s t-shirt” 
– can be cluster into 2 categories “shoe” and “t-shirt” or 
“man” and “women” 
• Popular ones are K-means clustering and Hierarchical 
clustering
K-means Clustering 
• partition n observations into k clusters in which each observation belongs 
to the cluster with the nearest mean, serving as a prototype of the cluster. 
• http://en.wikipedia.org/wiki/K-means_clustering 
http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering 
• method of cluster analysis which seeks to build 
a hierarchy of clusters. 
• There can be two strategies 
– Agglomerative: 
• This is a "bottom up" approach: each observation starts in its own 
cluster, and pairs of clusters are merged as one moves up the 
hierarchy. 
• Time complexity is O(n^3) 
– Divisive: 
• This is a "top down" approach: all observations start in one cluster, 
and splits are performed recursively as one moves down the 
hierarchy. 
• Time complexity is O(2^n) 
• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression 
• is a measure of the relation between 
the mean value of one variable (e.g. 
output) and corresponding values of 
other variables (e.g. time and cost). 
• regression analysis is a statistical 
process for estimating the 
relationships among variables. 
• Regression means to predict the 
output value using training data. 
• Popular one is Logistic regression 
(binary regression) 
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression 
• Classification means to 
group the output into 
a class. 
• classification to predict 
the type of tumor i.e. 
harmful or not harmful 
using training data 
• if it is 
discrete/categorical 
variable, then it is 
classification problem 
• Regression means to 
predict the output 
value using training 
data. 
• regression to predict 
the house price from 
training data 
• if it is a real 
number/continuous, 
then it is regression 
problem.
Let’s see the usage in Real life
Use-Cases 
• Spam Email Detection 
• Machine Translation (Language Translation) 
• Image Search (Similarity) 
• Clustering (KMeans) : Amazon 
Recommendations 
• Classification : Google News 
continued…
Use-Cases (contd.) 
• Text Summarization - Google News 
• Rating a Review/Comment: Yelp 
• Fraud detection : Credit card Providers 
• Decision Making : e.g. Bank/Insurance sector 
• Sentiment Analysis 
• Speech Understanding – iPhone with Siri 
• Face Detection – Facebook’s Photo tagging
Classification in Action 
isn’t it easy?
it’s not (Snapshot of Spam folder) 
Not a 
Spam 
Not a 
Spam
NER (Named Entity Recognition) 
http://nlp.stanford.edu:8080/ner/process
Similar/Duplicate Images 
Remember 
Features ? 
(Feature Extraction) 
Can be : 
• Width 
• Height 
• Contrast 
• Brightness 
• Position 
• Hue 
• Colors 
Credit: https://www.google.co.in/ 
Check this : 
LIRE (Lucene Image REtrieval) 
library - 
https://code.google.com/p/lire/
Recommendations 
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
Popular Frameworks/Tools 
• Weka 
• Carrot2 
• Gate 
• OpenNLP 
• LingPipe 
• Stanford NLP 
• Mallet – Topic Modelling 
• Gensim – Topic Modelling (Python) 
• Apache Mahout 
• MLib – Apache Spark 
• scikit-learn - Python 
• LIBSVM : Support Vector Machines 
• and many more…
Advanced concepts (related to IR) 
• Topic Modelling 
• Latent Dirichlet allocation (LDA) 
• Latent semantic analysis (LSA/LSI) - Semantic 
Search 
• Singular Value Decomposition (SVD) 
• Summarization (without Training)
Solr/Lucene Meetup 
• Case study of Rujhaan.com 
(A social news app ) 
• Saturday, Sep 27, 2014 10:00 AM 
• IIIT Hyderabad 
• URL: http://www.meetup.com/Hyderabad- 
Apache-Solr-Lucene-Group/events/203434032/ 
OR 
• Search on Google … 
Topics of Talk 
 Crawler(Crawler4j) 
 MongoDB 
 Solr 
 Nginx, ApacheTomcat 
 Redis 
 Machine Learning 
1. Classification - Classification 
of News, Tweets - Lingpipe 
2. Clustering, - Similar Items - 
carrot2 (Near Future: Hadoop 
and Apache Spark ) 
3. Summarization - Extracting 
the main text with Automatic 
Summary of article 
4. Topics Extraction from text
Questions ? 
34
Thanks! 
@rahuldausa on twitter and slideshare 
http://www.linkedin.com/in/rahuldausa 
Interested in Search/Information Retrieval ? 
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 
35

Contenu connexe

Tendances

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Machine Learning
Machine LearningMachine Learning
Machine LearningRahul Kumar
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Machine Learning
Machine LearningMachine Learning
Machine LearningKumar P
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsDezyreAcademy
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt Poojamanic
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using PythonSavitaHanchinal
 

Tendances (20)

Machine learning
Machine learning Machine learning
Machine learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Machine learning ppt
Machine learning ppt Machine learning ppt
Machine learning ppt
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using Python
 
ML Basics
ML BasicsML Basics
ML Basics
 
Deep learning
Deep learningDeep learning
Deep learning
 

En vedette

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...PAPIs.io
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABRay Phan
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...Edge AI and Vision Alliance
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing BasicsNam Le
 
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction setSaumitra Rukmangad
 
8085 microprocessor architecture ppt
8085 microprocessor architecture ppt8085 microprocessor architecture ppt
8085 microprocessor architecture pptParvesh Gautam
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Linux command ppt
Linux command pptLinux command ppt
Linux command pptkalyanineve
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Stock exchange simple ppt
Stock exchange simple pptStock exchange simple ppt
Stock exchange simple pptAvinash Varun
 

En vedette (20)

Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
 
Machine Learning and Artificial Intelligence
Machine Learning and Artificial IntelligenceMachine Learning and Artificial Intelligence
Machine Learning and Artificial Intelligence
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Basic web dev
Basic web devBasic web dev
Basic web dev
 
Machine learning
Machine learningMachine learning
Machine learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Introduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLABIntroduction to Digital Image Processing Using MATLAB
Introduction to Digital Image Processing Using MATLAB
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
Future technology
Future technologyFuture technology
Future technology
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
8085 Paper Presentation slides,ppt,microprocessor 8085 ,guide, instruction set
 
8085 microprocessor architecture ppt
8085 microprocessor architecture ppt8085 microprocessor architecture ppt
8085 microprocessor architecture ppt
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Linux command ppt
Linux command pptLinux command ppt
Linux command ppt
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
IoT architecture
IoT architectureIoT architecture
IoT architecture
 
Stock exchange simple ppt
Stock exchange simple pptStock exchange simple ppt
Stock exchange simple ppt
 

Similaire à Introduction to Machine Learning

Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning pptshubhamshirke12
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationRoberto García
 
Handout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignHandout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignSAFAD ISMAIL
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Analyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsAnalyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsvikramgopale2
 
Object Modelling Technique " ooad "
Object Modelling Technique  " ooad "Object Modelling Technique  " ooad "
Object Modelling Technique " ooad "AchrafJbr
 
Machine Learning Innovations
Machine Learning InnovationsMachine Learning Innovations
Machine Learning InnovationsHPCC Systems
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented ProgrammingManish Pandit
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 

Similaire à Introduction to Machine Learning (20)

Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Facets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data ExplorationFacets and Pivoting for Flexible and Usable Linked Data Exploration
Facets and Pivoting for Flexible and Usable Linked Data Exploration
 
Handout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and DesignHandout on Object orienetd Analysis and Design
Handout on Object orienetd Analysis and Design
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Analyzing a system and specifying the requirements
Analyzing a system and specifying the requirementsAnalyzing a system and specifying the requirements
Analyzing a system and specifying the requirements
 
Object Modelling Technique " ooad "
Object Modelling Technique  " ooad "Object Modelling Technique  " ooad "
Object Modelling Technique " ooad "
 
Machine Learning Innovations
Machine Learning InnovationsMachine Learning Innovations
Machine Learning Innovations
 
Ooad unit – 1 introduction
Ooad unit – 1 introductionOoad unit – 1 introduction
Ooad unit – 1 introduction
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Object Oriented Programming
Object Oriented ProgrammingObject Oriented Programming
Object Oriented Programming
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 

Plus de Rahul Jain

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationRahul Jain
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to ScalaRahul Jain
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginnersRahul Jain
 

Plus de Rahul Jain (14)

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and Recommendation
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginners
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Introduction to Machine Learning

  • 1. Introduction to Machine Learning September 2014 Meetup Rahul Jain @rahuldausa Join us @ For Solr, Lucene, Elasticsearch, Machine Learning, IR http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ http://www.meetup.com/DataAnalyticsGroup/ Join us @ For Hadoop, Spark, Cascading, Scala, NoSQL, Crawlers and all cutting edge technologies. http://www.meetup.com/Hyderabad-Programming-Geeks-Group/
  • 2. Agenda • Introduction • Basics • Classification • Clustering • Regression • Use-Cases 2
  • 3. Quick Questionnaire How many people have heard about Machine Learning How many people know about Machine Learning How many people are using Machine Learning
  • 4. About • subfield of Artificial Intelligence (AI) • name is derived from the concept that it deals with “construction and study of systems that can learn from data” • can be seen as building blocks to make computers learn to behave more intelligently • It is a theoretical concept. There are various techniques with various implementations. • http://en.wikipedia.org/wiki/Machine_learning
  • 5. In other words… “A computer program is said to learn from experience (E) with some class of tasks (T) and a performance measure (P) if its performance at tasks in T as measured by P improves with E”
  • 6. Terminology • Features – The number of features or distinct traits that can be used to describe each item in a quantitative manner. • Samples – A sample is an item to process (e.g. classify). It can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits. • Feature vector – is an n-dimensional vector of numerical features that represent some object. • Feature extraction – Preparation of feature vector – transforms the data in the high-dimensional space to a space of fewer dimensions. • Training/Evolution set – Set of data to discover potentially predictive relationships.
  • 7. Let’s dig deep into it… What do you mean by Apple
  • 8. Learning (Training) Features: 1. Color: Radish/Red 2. Type : Fruit 3. Shape etc… Features: 1. Sky Blue 2. Logo 3. Shape etc… Features: 1. Yellow 2. Fruit 3. Shape etc…
  • 10. Categories • Supervised Learning • Unsupervised Learning • Semi-Supervised Learning • Reinforcement Learning
  • 11. Supervised Learning • the correct classes of the training data are known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 12. Unsupervised Learning • the correct classes of the training data are not known Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 13. Semi-Supervised Learning • A Mix of Supervised and Unsupervised learning Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 14. Reinforcement Learning • allows the machine or software agent to learn its behavior based on feedback from the environment. • This behavior can be learnt once and for all, or keep on adapting as time goes by. Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
  • 16. Techniques • classification: predict class from observations • clustering: group observations into “meaningful” groups • regression (prediction): predict value from observations
  • 17. Classification • classify a document into a predefined category. • documents can be text, images • Popular one is Naive Bayes Classifier. • Steps: – Step1 : Train the program (Building a Model) using a training set with a category for e.g. sports, cricket, news, – Classifier will compute probability for each word, the probability that it makes a document belong to each of considered categories – Step2 : Test with a test data set against this Model • http://en.wikipedia.org/wiki/Naive_Bayes_classifier
  • 18. Clustering • clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other • objects are not predefined • For e.g. these keywords – “man’s shoe” – “women’s shoe” – “women’s t-shirt” – “man’s t-shirt” – can be cluster into 2 categories “shoe” and “t-shirt” or “man” and “women” • Popular ones are K-means clustering and Hierarchical clustering
  • 19. K-means Clustering • partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. • http://en.wikipedia.org/wiki/K-means_clustering http://pypr.sourceforge.net/kmeans.html
  • 20. Hierarchical clustering • method of cluster analysis which seeks to build a hierarchy of clusters. • There can be two strategies – Agglomerative: • This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. • Time complexity is O(n^3) – Divisive: • This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. • Time complexity is O(2^n) • http://en.wikipedia.org/wiki/Hierarchical_clustering
  • 21. Regression • is a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost). • regression analysis is a statistical process for estimating the relationships among variables. • Regression means to predict the output value using training data. • Popular one is Logistic regression (binary regression) • http://en.wikipedia.org/wiki/Logistic_regression
  • 22. Classification vs Regression • Classification means to group the output into a class. • classification to predict the type of tumor i.e. harmful or not harmful using training data • if it is discrete/categorical variable, then it is classification problem • Regression means to predict the output value using training data. • regression to predict the house price from training data • if it is a real number/continuous, then it is regression problem.
  • 23. Let’s see the usage in Real life
  • 24. Use-Cases • Spam Email Detection • Machine Translation (Language Translation) • Image Search (Similarity) • Clustering (KMeans) : Amazon Recommendations • Classification : Google News continued…
  • 25. Use-Cases (contd.) • Text Summarization - Google News • Rating a Review/Comment: Yelp • Fraud detection : Credit card Providers • Decision Making : e.g. Bank/Insurance sector • Sentiment Analysis • Speech Understanding – iPhone with Siri • Face Detection – Facebook’s Photo tagging
  • 26. Classification in Action isn’t it easy?
  • 27. it’s not (Snapshot of Spam folder) Not a Spam Not a Spam
  • 28. NER (Named Entity Recognition) http://nlp.stanford.edu:8080/ner/process
  • 29. Similar/Duplicate Images Remember Features ? (Feature Extraction) Can be : • Width • Height • Contrast • Brightness • Position • Hue • Colors Credit: https://www.google.co.in/ Check this : LIRE (Lucene Image REtrieval) library - https://code.google.com/p/lire/
  • 31. Popular Frameworks/Tools • Weka • Carrot2 • Gate • OpenNLP • LingPipe • Stanford NLP • Mallet – Topic Modelling • Gensim – Topic Modelling (Python) • Apache Mahout • MLib – Apache Spark • scikit-learn - Python • LIBSVM : Support Vector Machines • and many more…
  • 32. Advanced concepts (related to IR) • Topic Modelling • Latent Dirichlet allocation (LDA) • Latent semantic analysis (LSA/LSI) - Semantic Search • Singular Value Decomposition (SVD) • Summarization (without Training)
  • 33. Solr/Lucene Meetup • Case study of Rujhaan.com (A social news app ) • Saturday, Sep 27, 2014 10:00 AM • IIIT Hyderabad • URL: http://www.meetup.com/Hyderabad- Apache-Solr-Lucene-Group/events/203434032/ OR • Search on Google … Topics of Talk  Crawler(Crawler4j)  MongoDB  Solr  Nginx, ApacheTomcat  Redis  Machine Learning 1. Classification - Classification of News, Tweets - Lingpipe 2. Clustering, - Similar Items - carrot2 (Near Future: Hadoop and Apache Spark ) 3. Summarization - Extracting the main text with Automatic Summary of article 4. Topics Extraction from text
  • 35. Thanks! @rahuldausa on twitter and slideshare http://www.linkedin.com/in/rahuldausa Interested in Search/Information Retrieval ? Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/ 35