This document discusses unsupervised learning techniques, including distances, clustering algorithms, and examples from Styria practice. It introduces common distance measures like Euclidean, Manhattan, and Mahalanobis distances. For clustering, it describes K-means clustering and provides a Spark example. It also discusses using convolutional neural network outputs as unsupervised learning features and shows examples of semi-manual photo clustering, T-SNE concept visualization, and automatic learned hierarchies from Styria projects.
4. UNSUPERVISED LEARNING
Opservations are not assigned to classes
Computer program is not ‘supervised’
throughout the learning process
Usually the task is to find ‘meaningful’
groups within data
Decision is made based on distances i.e.
similarities among data points
10.03.2016 4
5. DISTANCES
10.03.2016 5
• To decide upon the groups we have to introduce
similarity measure or contrary – a distance measure
• Pitagora’s theorem – Euclidean distance
• dist((2, -1), (-2, 2))= √((2 - (-2))² + ((-1) - 2)²) = √((2 + 2)² + (-1 -
2)²) = √((4)² + (-3)²) = √(16 + 9) = √25 = 5
8. DEMO (SPARK!)
K-means clustering of photos (ie.
their vector representations)
Convolutional neural network as
a supervised model and its
outputs as features for
unsupervised models
Vector representations after the
pooling layers, after every
convolutional layer (Caffe)
Clustering in Spark
8