SlideShare une entreprise Scribd logo
MACHINE LEARNING AS
A SERVICE
MAKING SENTIMENT PREDICTIONS IN REALTIME WITH ZMQ
AND NLTK
ABOUT ME
DISSERTATION
Let's make somethingcool!
SOCIAL MEDIA
+
MACHINE
LEARNING
+
API
SENTIMENT ANALYSIS
AS A SERVICE
A STEP-BY-STEP GUIDE
FundamentalTopics
Machine Learning
NaturalLanguage Processing
Overview of the platform
The process
Prepare
Analyze
Train
Use
Scale
MACHINE LEARNING
WHAT IS MACHINE LEARNING?
Amethod of teachingcomputers to make and improve
predictions or behaviors based on some data.
Itallow computers to evolve behaviors based on empiricaldata
Datacan be anything
Stock marketprices
Sensors and motors
emailmetadata
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
SUPERVISED MACHINE LEARNING
SPAM OR HAM
NATURAL LANGUAGE PROCESSING
WHAT IS NATURAL LANGUAGE PROCESSING?
Interactions between computers and human languages
Extractinformation from text
Some NLTK features
Bigrams
Part-or-speech
Tokenization
Stemming
WordNetlookup
NATURAL LANGUAGE PROCESSING
SOME NLTK FEATURES
Tokentization
Stopword Removal
>>>phrase="Iwishtobuyspecifiedproductsorservice"
>>>phrase=nlp.tokenize(phrase)
>>>phrase
['I','wish','to','buy','specified','products','or','service']
>>>phrase=nlp.remove_stopwords(tokenized_phrase)
>>>phrase
['I','wish','buy','specified','products','service']
SENTIMENT ANALYSIS
CLASSIFYING TWITTER SENTIMENT IS HARD
Improper language use
Spellingmistakes
160 characters to express sentiment
Differenttypes of english (US, UK, Pidgin)
Gr8 picutre..God bless u RT @WhatsNextInGosp:
Resurrection Sunday Service @PFCNY with
@Donnieradio pic.twitter.com/nOgz65cpY5
7:04 PM - 21 Apr 2014
Donnie McClurkin
@Donnieradio
Follow
8 RETWEETS 36 FAVORITES
BACK TO BUILDING OUR API
.. FINALLY!
CLASSIFIER
3 STEPS
THE DATASET
SENTIMENT140
160.000 labelled tweets
CSVformat
Polarityof the tweet(0 = negative, 2 = neutral, 4 = positive)
The textof the tweet(Lyx is cool)
FEATURE EXTRACTION
How are we goingto find features from aphrase?
"Bag of Words"representation
my_phrase="Todaywassucharainyandhorribleday"
In[12]:fromnltkimportword_tokenize
In[13]:word_tokenize(my_phrase)
Out[13]:['Today','was','such','a','rainy','and','horrible','day']
FEATURE EXTRACTION
CREATE A PIPELINE OF FEATURE EXTRACTORS
FORMATTER=formatting.FormatterPipeline(
formatting.make_lowercase,
formatting.strip_urls,
formatting.strip_hashtags,
formatting.strip_names,
formatting.remove_repetitons,
formatting.replace_html_entities,
formatting.strip_nonchars,
functools.partial(
formatting.remove_noise,
stopwords=stopwords.words('english')+['rt']
),
functools.partial(
formatting.stem_words,
stemmer=nltk.stem.porter.PorterStemmer()
)
)
FEATURE EXTRACTION
PASS THE REPRESENTATION DOWN THE PIPELINE
In[11]:feature_extractor.extract("Todaywassucharainyandhorribleday")
Out[11]:{'day':True,'horribl':True,'raini':True,'today':True}
The resultis adictionaryof variable length, containingkeys as
features and values as always True
DIMENSIONALITY REDUCTION
Remove features thatare common across allclasses (noise)
Increase performanceof the classifier
Decrease the sizeof the model, less memoryusage and more
speed
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
NLTK gives us BigramAssocMeasures.chi_sq
DIMENSIONALITY REDUCTION
CHI-SQUARE TEST
#Calculatethenumberofwordsforeachclass
pos_word_count=label_word_fd['pos'].N()
neg_word_count=label_word_fd['neg'].N()
total_word_count=pos_word_count+neg_word_count
#Foreachwordandit'stotaloccurance
forword,freqinword_fd.iteritems():
#Calculateascoreforthepositiveclass
pos_score=BigramAssocMeasures.chi_sq(label_word_fd['pos'][word],
(freq,pos_word_count),total_word_count)
#Calculateascoreforthenegativeclass
neg_score=BigramAssocMeasures.chi_sq(label_word_fd['neg'][word],
(freq,neg_word_count),total_word_count)
#Thesumofthetwowillgiveyouit'stotalscore
word_scores[word]=pos_score+neg_score
TRAINING
Now thatwe can extractfeatures from text, we can train a
classifier. The simplestand mostflexible learningalgorithm for
textclassification is Naive Bayes
P(label|features)=P(label)*P(features|label)/P(features)
Simple to compute = fast
Assumes feature indipendence = easytoupdate
Supports multiclass = scalable
TRAINING
NLTK provides built-in components
1. Train the classifier
2. Serialize classifier for later use
3. Train once, use as much as you want
>>>fromnltk.classifyimportNaiveBayesClassifier
>>>nb_classifier=NaiveBayesClassifier.train(train_feats)
...waitalotoftime
>>>nb_classifier.labels()
['neg','pos']
>>>serializer.dump(nb_classifier,file_handle)
USING THE CLASSIFIER
#Loadtheclassifierfromtheserializedfile
classifier=pickle.loads(classifier_file.read())
#Pickanewphrase
new_phrase="AtPyconItaly!Lovethefoodandthisspeakerissoamazing"
#1)Preprocessing
feature_vector=feature_extractor.extract(phrase)
#2)Dimensionalityreduction,best_featuresisoursetofbestwords
reduced_feature_vector=reduce_features(feature_vector,best_features)
#3)Classify!
printself.classifier.classify(reduced_feature_vector)
>>>"pos"
BUILDING A CLASSIFICATION API
Classifier is slow, no matter how much optimization is made
Classifier is ablockingprocess, API mustbe event-driven
BUILDING A CLASSIFICATION API
SCALING TOWARDS INFINITY AND BEYOND
BUILDING A CLASSIFICATION API
ZEROMQ
Fast, uses native sockets
Promotes horizontalscalability
Language-agnostic framework
BUILDING A CLASSIFICATION API
ZEROMQ
...
socket=context.socket(zmq.REP)
...
whileTrue:
message=socket.recv()
phrase=json.loads(message)["text"]
#1)Featureextraction
feature_vector=feature_extractor.extract(phrase)
#2)Dimensionalityreduction,best_featuresisoursetofbestwords
reduced_feature_vector=reduce_features(feature_vector,best_features)
#3)Classify!
result=classifier.classify(reduced_feature_vector)
socket.send(json.dumps(result))
DEMO
POST-MORTEM
Real-time sentimentanalysis APIs can be implemented, and
can be scalable
Whatif we use Redis instead of havingserialized classifiers?
Deep learningis givingverygood results in NLP, let's tryit!
FIN
QUESTIONS

Contenu connexe

Similaire à Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK

Concurrency and Parallelism with Scala
Concurrency and Parallelism with ScalaConcurrency and Parallelism with Scala
Concurrency and Parallelism with Scala
Timothy Perrett
 
Cakefest 2010: API Development
Cakefest 2010: API DevelopmentCakefest 2010: API Development
Cakefest 2010: API Development
Andrew Curioso
 
Phoenix for laravel developers
Phoenix for laravel developersPhoenix for laravel developers
Phoenix for laravel developers
Luiz Messias
 

Similaire à Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK (20)

The SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and ComputationThe SAM Pattern: State Machines and Computation
The SAM Pattern: State Machines and Computation
 
Concurrency and Parallelism with Scala
Concurrency and Parallelism with ScalaConcurrency and Parallelism with Scala
Concurrency and Parallelism with Scala
 
Operationalizing Clojure Confidently
Operationalizing Clojure ConfidentlyOperationalizing Clojure Confidently
Operationalizing Clojure Confidently
 
Mobile optimization
Mobile optimizationMobile optimization
Mobile optimization
 
Intro to PySpark: Python Data Analysis at scale in the Cloud
Intro to PySpark: Python Data Analysis at scale in the CloudIntro to PySpark: Python Data Analysis at scale in the Cloud
Intro to PySpark: Python Data Analysis at scale in the Cloud
 
Plattformübergreifende App-Entwicklung (ein Vergleich) - MobileTechCon 2010
Plattformübergreifende App-Entwicklung (ein Vergleich) - MobileTechCon 2010Plattformübergreifende App-Entwicklung (ein Vergleich) - MobileTechCon 2010
Plattformübergreifende App-Entwicklung (ein Vergleich) - MobileTechCon 2010
 
Bubbles & Trees with jQuery
Bubbles & Trees with jQueryBubbles & Trees with jQuery
Bubbles & Trees with jQuery
 
Android Starter Kit
Android Starter KitAndroid Starter Kit
Android Starter Kit
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
 
Mashup University 4: Intro To Mashups
Mashup University 4: Intro To MashupsMashup University 4: Intro To Mashups
Mashup University 4: Intro To Mashups
 
FlasHR - Flex Mobile tuning
FlasHR - Flex Mobile tuningFlasHR - Flex Mobile tuning
FlasHR - Flex Mobile tuning
 
Cakefest 2010: API Development
Cakefest 2010: API DevelopmentCakefest 2010: API Development
Cakefest 2010: API Development
 
Intro To Mashups
Intro To MashupsIntro To Mashups
Intro To Mashups
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
 
Phoenix for laravel developers
Phoenix for laravel developersPhoenix for laravel developers
Phoenix for laravel developers
 
[1D6]RE-view of Android L developer PRE-view
[1D6]RE-view of Android L developer PRE-view[1D6]RE-view of Android L developer PRE-view
[1D6]RE-view of Android L developer PRE-view
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
mobl
moblmobl
mobl
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
 

Dernier

Dernier (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Intelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdfIntelligent Gimbal FINAL PAPER Engineering.pdf
Intelligent Gimbal FINAL PAPER Engineering.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 

Machine Learning as a Service: making sentiment predictions in realtime with ZMQ and NLTK