Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018

Active Human-in-the-Loop
Deep Learning for Cultural
Metadata Enrichment
Europeana Tech 2018
Eddie Dervakos, Antonis Korkofigas, Natasa Sofou, Giorgos Stamou
Intelligent Systems Laboratory, National Technical University of Athens

Context and Motivation
Artificial Intelligence
● Process big amounts of
data
● Deep Data
● Improve content
discoverability through
automatic information
extraction for metadata
enrichment
Digital Cultural Heritage
● Huge Amount of content in
Digital Cultural Heritage
● Content is not easily
accessible and discoverable
● Poor metadata quality
● Manual annotation cannot
solve the problem: it is
costly and cannot scale up

Automatic metadata enrichment
● Use AI enabled services to extract meaningful and useful information
from available content (additional, not existing metadata).
● Usually information is extracted and identified through a classification
process: content is classified in predefined categories
Vanitas Still Life with the Spinario, Claesz., Pieter
Image: courtesy of Rijksmuseum, Public Domain
Classify image based on its type and
format: Oil Painting
Classify image based on its content w.r.t
musical instruments: Violin

AI and classification
Training
Data
Feature
Selection
Learning
Algorithm
(e.g SVM, CNN, …)
Output
AI Classifier
Data Information extraction: classify data object in predefined categories
Restrictions: Training data availability (lack of annotated content)

AI model : CNN for feature selection and
classification
Learning algorithm & classification
Taking some input data (image, music
recording, text etx) and outputting a
class (a cat, dog, etc) or a probability of
classes that best describes the data
(class labels needed)
Unsupervised Feature Extraction
● Feature extraction is necessary since data
usually carry too much redundant and/or
irrelevant information
● Many possible features to consider
● Start from wide set of data features and result
to a more restricted set with strong
representation power
piano
sax
violin
celo
voiceconvolution +nonlinearity
max
pooling
Convolution and pooling layers
Fully connected
layers
N binary
classification
vec

Human - in- the- Loop
AI
classifier
Human
Annotation
Output
Active Learning
Confident
Uncertain

HITL - Active learning approach
AI model
Labelled
training set
Unlabelled
pool of data
UL
Human annotator
Learn a
model
Select
queries
Given an unlabeled pool of
examples:
● Rank examples in order of
informativeness (using
existing methods and
defining new ones based
on description logic)
● Query the labels of the
most informative examples
of U
● The new labeled examples
are included in the training
data L
● The model is re-trained
using the new training data

Musical Instrument Identification
Data: Music audio signals (synthetic and real recordings)
ac. guitar
el. guitar
violin
cello
saxophone
organ
piano
voice
flute
clarinet
trumpet
Classes (labels)
Audio waveform
Spectrogram
Nsynth
1.5k
Datasets
MIS
2k
5 classes
Europeana
Sounds - user
annotated set
2k
63 labels
IRMAS
6.5k
5 classes
5 classes

Unsupervised learning for
feature extraction
HITL- Deep Active learning
Most
Informative
samples
Human
annotations
CNN
Learning algorithm
Model updating
Labelled
data
L
Unlabelled
data
U

Experimental Results
Problems
● Recordings quality
(scratches)
● Unknown instruments
● Piano often annotates as
string
● User annotations need extra
validation
L IRMAS MIS NSynth Total
cel 388 471 0 859
pia 721 0 300 1021
vio 580 228 0 808
cla 505 239 0 744
org 682 0 176 858
gac 637 0 300 937
tru 441 235 0 676
flu 451 219 0 670
sax 626 323 0 949
gel 760 0 300 1060
voi 778 0 300 1078
Total 6569 1715 1376
Confidece:= Output of final (softmax) layer of CNN
L1 Total
cel 604
pia 738
vio 574
cla 515
org 575
gac 672
tru 468
flu 468
sax 649
gel 739
voi 760
Total
L2 Total
cel 255
pia 283
vio 234
cla 229
org 283
gac 265
tru 208
flu 202
sax 300
gel 321
voi 318
Total
L: Labeled set of
data
L= L1+L2
Each track is
divided in
segments of 3
seconds
Training L1 accuracy L2 accuracy L accuracy Europeana Sounds
Annotated Set
L1 85.18% 71.25% 81.00%
L2
L 90.55% 89.85% 90.34%

Europeana Photography
Assessing the aesthetic quality of images
Attributes:
● balance, fill the frame, lead room,
rule of thirds, motion blur, simple,
color harmony, framed, leading
lines, shallow DOF, Repetition and
pattern symmetry
● interesting content, object
emphasis, good lighting, color
harmony, vivid color, shallow depth
of field, motion blur, rule of thirds,
balancing element, repetition, and
symmetry
earlyresults

Europeana Fashion
Extracting information from
catwalk images:
● Identification of clothing
items (dress, shoes, skirts,
trousers swimsuits)
● Cloth categories
● Details
● Colors
● Patterns
● Fabric texture
Images courtesy of European Fashion, under copyright.

Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018

Recommended

Recommended

More Related Content

Similar to Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018

Similar to Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018 (9)

More from Europeana

More from Europeana (20)

Recently uploaded

Recently uploaded (17)

Active Human-in-the-Loop Deep Learning for Cultural Metadata Enrichment by Natasa Sofou - EuropeanaTech Conference 2018