This document discusses and summarizes five algorithms: sentiment analysis, page rank, nudity detection, language detection, and TF-IDF. It provides practical applications and technical explanations for each algorithm. For sentiment analysis, applications include using large amounts of reviews to understand consumer feedback and aid in forecasting. The math involves natural language processing and classification. For page rank, the algorithm assumes more inbound links indicates more valid content, and it underlies the Google search engine. Nudity detection helps moderate content by detecting skin-colored pixels and ratios. Language detection aids in search, content moderation, and overcoming language barriers in natural language processing. TF-IDF assigns term importance weights based on frequency in a document and collection.
3. Sample algorithms
● Text Analysis summarizer, sentence tagger, profanity detection
● Machine Learning digit recognizer, recommendation engines
● Web crawler, scraper, pagerank, emailer, html to text
● Computer Vision image similarity, face detection, smile detection
● Audio & Video speech recognition, sound filters, file conversions
● Computation linear regression, spike detection, fourier filter
● Graph traveling salesman, maze generator, theta star
● Utilities parallel for-each, geographic distance, email validator
4. A marketplace for algorithms...
We host algorithms
Anyone can turn their algorithms into scalable web services
Typical users: scientists, academics, domain experts
We make them discoverable
Anyone can use and integrate these algorithms into their solutions
Typical users: businesses, data scientists, app developers, IoT makers
We make them monetizable
Users of algorithms pay for algorithms they use
Typical scenarios: heavy-load use cases with large user base
8. Why?
Create something bigger
Easily combine algorithms like building blocks, regardless of language
Growing Catalogue of Algorithms
New algorithms everyday, make them usable by software developers
Make applications smarter
Smarter algorithms = cooler toys
9. The Five Algorithms
- Sentiment Analysis
- Language Detection
- PageRank
- Nudity Detection
- Term Frequency-Inverse
Document Frequency
10. Sentiment Analysis - Practical Applications
Businesses frequently seek feedback
on the quality of their products
from consumers, and large
amounts of reviews require too
much time to manually review.
Data can be used to in various
forecasting applications, such as
political elections.
11. Sentiment Analysis - The Math
Basic sentiment analysis uses natural language
processing (NLP), via a “bag of words”, to
spot keywords that are signs of strong
emotional triggers. Once spotted, they
classify a document as positive, negative, or
neutral.
Statements can be dual in nature, such as “I
loved the food, BUT hated the service”. This
requires more advanced algorithms to
separate the two.
12. PageRank - Practical Applications
The assumption is the more inbound links to a
page across the web, the more valid its
content.
Most famous application of PageRank is the
Google search engine. Its initial success is
based largely on the success of PageRank.
Not only web pages can utilize PageRank. Any
data that can be directionally modeled can
interact with PageRank.
13. PageRank - Graph Terminology
Node (vertex): Item in graph.
Edge: Relationship between two or more
nodes.
Directionality: Property of an edge indicating
nature of relationship.
15. Nudity Detection - Practical Applications
Nudity detection algorithms minimize the need for manual moderation and deletion
of malicious content.
In a CMS, this algorithm can help prevent pornography from being uploaded by
users.
16. Nudity Detection - The Math
1. Detect skin-colored pixels in the image.
2. Locate skin regions based on the detected pixels.
3. Detect face in image.
4. Calculate ratio of skin toned vs non-skin toned pixels in image, taking into
account the size of the face.
5. Classify the image as nude or not.
More information at: https://algorithmia.com/algorithms/sfw/NudityDetection
19. TF/IDF - Practical Applications
- Keyword extraction is used in search engines, and content
categorization algorithms.
- Creates great content recommendations!
- https://drupal.org/project/algorithmia
- https://wordpress.org/plugins/algorithmia
- https://algorithmia.com/recommends
20. TF/IDF - The Math
TF-IDF computes a weight, recognizing the
importance of a term inside a document,
comparing its usage frequency in the
document set.
The more a term appears, the higher its
importance becomes.
Thanks toothpastefordinner.com for the comic.
21. TF/IDF - The Math
Assume you have a 100 word blog post with the word "JavaScript" in it 5 times.
Term Frequency = 5/100 = 0.05
Also assume your entire collection of blog posts has 10,000 documents, and the
word "JavaScript" appears at least once in 100 of these.
Inverse Document Frequency = log(10,000/100) = 2
For this document, this gives us the score:
TF-IDF = 0.05 * 2 = 0.1
22. Language Detection - Practical Applications
Applications
- Web searching, as engines bring up sites in dozens of languages.
- May be required in conjunction with other Natural Language Processing (NLP)
algorithms. Data sets may include documents in other languages. Some
algorithms will only work in their natural language due to their training data.
- Spam filtering services, so they can properly filter out specific languages and areas
of origin.
23. Language Detection - The Math
Each language has a corpus at its core, a central pattern of components that uniquely
identifies it.
Profiling algorithms are used to set a core set of words to identify that language.
The problem is not all text is long enough to identify a language.
Instead, using the 3-gram algorithm via Algorithmia API, one HTTP request can
break down detection by looking at groups of 3 letters.
24. Algorithmia Credits
Sign up at https://algorithmia.com
Use code: FiveAlgorithmsBook
10,000k additional free API credits.