The session will identify what applications and opportunities AI and ML present and how anyone can get started, there is a whole host of free resources to start learning and experimenting. One of the repetitive tasks which can be automated is the categorisation of keywords which can be sped up using supervised models. Not only is it fascinating to understand what is capable, but the applications mean that you can free your time up to spend more time on tasks which adds more value for your clients/business.
4. Free Resources To Get Started – Classify
Images of Clothing
Fashion MNIST dataset which contains
70,000 grayscale images in 10 categories
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/classification.ipynb#scrollTo=yR0EdgrLCaWR
Label Class
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
#brightonSEO
5. Plotting Predictions to Verify Results
Label Class
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
#brightonSEO
6. Results Can Be Incorrect Even When
Confidence is High
Label Class
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
#brightonSEO
7. Reducing the Time it Takes to Write Meta
Descriptions for Large Websites
Use Text Summarization Algorithms to Help Aid
the Writing of Meta Descriptions (GitHub Repo)
#brightonSEO
8. What Are the Repetitive Tasks That You
Undertake?
• Forecasting
• Classification
• Meta data
• Image alt text
• Sentiment analysis
#brightonSEO
10. The aim of keyword
research is to understand
your audience’s needs and
behaviour.
11. Keyword Research Underpins Many Tasks
• Identifying content gaps
• Optimising and refreshing content
• Keyword mapping and strategy
• Gain audience insight
• Organise content in a user-friendly information architecture
• Segment client performance in search by category for keyword tracking
#brightonSEO
12. When You Need to Categorise Thousands
of Keywords
Phones
iPhone Samsung Google
iPhone 12 Pro
Max
iPhone 12 Pro
iPhone 12
iPhone 12 Mini
Galaxy Z Flip
Galaxy Z Fold
Galaxy S21
Galaxy S20
Pixel 5a
Pixel 5
Pixel 4a
Pixel 4
#brightonSEO
13. Keyword Categorisation is a Time Intensive
Process
Insight &
Analysis
Categorisation
Cleaning &
Sorting
Keyword List
A list of a few thousand keywords can take a few days to categorise,
depending on the level granularity.
#brightonSEO
15. Apply Machine Learning Techniques to
Automate the Process
Artificial Intelligence
Machine Learning
Deep Learning
Artificial Intelligence
A technique which enables
machines to mimic human
behaviour
#brightonSEO
16. Apply Machine Learning Techniques to
Automate the Process
Artificial Intelligence
Machine Learning
Deep Learning
Machine Learning
Subset of AI which uses statistical
methods to enable improvements
with experience
#brightonSEO
17. Apply Machine Learning Techniques to
Automate the Process
Artificial Intelligence
Machine Learning
Deep Learning
Deep Learning
Subset of ML which make
computation of multi-layer neural
network feasible
#brightonSEO
18. Supervised or Unsupervised Learning?
Machine Learning
Supervised Learning Unsupervised Learning
Regression Classification Clustering
Develop predictive models
based on both input and
output data
Group and interpret data
based only on input data
#brightonSEO
20. How Does Supervised Machine Learning Work?
Input Data
Annotations
These are t-
shirts
Prediction
It is a t-shirt
#brightonSEO
21. What is the Process?
3,000 Keywords
Keywords to Label
750 Keywords
Keywords to Predict
2,250 Keywords
#brightonSEO
22. Cleaning Natural Language Data
Tokenization
Tokenization is splitting a phrase, sentence,
paragraph, or an entire text document into
smaller units, such as individual words or terms.
E.g. ‘natural language processing’
becomes ‘natural’ ‘language’ ‘processing’
3,000 Keywords
Keywords to Label
750 Keywords
Keywords to Predict
2,250 Keywords
#brightonSEO
23. Cleaning Natural Language Data
Tokenization
3,000 Keywords
Keywords to Label
750 Keywords
Keywords to Predict
2,250 Keywords
Lemmatization
Lemmatization reduces a given word to its root
word. The root word is called a lemma in the
lemmatization process
E.g. ‘insurance’ and ‘insure’ become ‘insure’
#brightonSEO
24. How Does a Machine Process Natural
Language?
clothing
shirt
jeans
dress
trainers
jeans
shirt jeans dress trainers
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
0 1 0 0
A machine cannot read categories only numerical data. Therefore all input data must be
converted into numerical data, a common method is to use one-hot encoding.
#brightonSEO
25. Cleaning Natural Language Data
Tokenization &
Lemmatization
3,000 Keywords
Keywords to Label
750 Keywords
Keywords to Predict
2,250 Keywords
Training Set
80%
Test Set
20%
26. Apply the Supervised Model
Input Data
Annotations
These are t-
shirts
Prediction
It is a t-shirt
Model
Training
Processing
Test Data
#brightonSEO
27. Model Performance Metrics
• Accuracy – Out of all predictions
made, how many are correctly
classified?
• Precision - What proportion of
positive identifications were actually
correct?
• Recall (sensitivity) - What proportion
of actual positives was identified
correctly?
#brightonSEO
31. Key Takeaways
• AI and Machine Learning techniques can speed up processes.
• If you have a time intensive and repetitive task it can probably be
automated.
• Open source resources means you do not need to have a deep
knowledge.
• Anyone can learn R or Python. When you encounter a problem paste the
error message in Google.
#brightonSEO
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/classification.ipynb#scrollTo=yR0EdgrLCaWR
This is an example of how you can use machine learning today and out a model on your computer, its easier thank you think!
Maybe forecasting traffic? You can use the prophet package in Python as an example to train your data and then make predictions. Predictive models will only be valid if all factors remain equal. This means that if there is more dev or content resource for a website than when the prediction was made then growth could be greater than predicted as the internal drivers have changed. This is because the model was built on different assumptions and therefore might not be valid to predict the outcome of this new strategy
Training set – 720 keywords
Test set – 180 keywords
Accuracy – Number of correct predictions/ Total number of predictions
Precision – True Positives / True Positives+ False Positives
Recall – True Positives / True Positives + False Negatives
Improving Precision & Recall
Review labelling & Sampling of training set
https://datastudio.ca/functions/how-to-use-the-chrome-ux-crux-report-in-google-data-studio-for-seo/
https://www.tensorflow.org/tutorials/keras/classification
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/images/transfer_learning_with_hub.ipynb#scrollTo=PWUmcKKjtwXL
https://kiosk-dot-codelabs-site.appspot.com/codelabs/tensorflow-for-poets/#0
https://www.pemavor.com/solution/keyword-tagging/
Use Text Summarization Algorithms to Help Aid the Writing of Meta Descriptions - https://searchengineland.com/reducing-the-time-it-takes-to-write-meta-descriptions-for-large-websites-299887