[SNU Computer Vision Course Project] Image Style Recognition

Style Recognition
Youngjin Kim, Beomjun Shin, Hunjae Jung
4190.429 Image Processing (2015 Spring)
Final Project
June 1, 2015

• Why Style Recognition
• Environment Settings
• Data Sources
• Convolutional Neural Network
• CNN with Fine-tuning
• Hand-crafted Features
• Classifier
• Conclusion
• Demo
Outline

Why Style Recognition
What do you see in the picture?

4
Not only Object, But also Style is important
Pastel Noir

5
Style contains rich information about image
Flickr Style dataset Wikipaintings dataset

Build improved style recognizer
that provides rich and vivid information to image
Our Mission

Application
Example of filtering image search results by style.

Environment Setting
Required Hardware Specification
GPU with 4GB(≥) memory Affordable Motherboard 600W(≥) Power Supply
+ @

Environment Setting
What we have
i3 processor, 2GB memory, GTX 430 graphic card (…)

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩
Our Motherboard was
Micro-ATX, So …

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩
We bought …
Motherboard, CPU,
SSD, Case
690 thousand ₩
Our Motherboard was
Micro-ATX, So …

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩
We bought …
Motherboard, CPU,
SSD, Case
690 thousand ₩
Our Motherboard was
Micro-ATX, So …
Memory was only 4GB
So…

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩
Our Motherboard was
Micro-ATX, So …
We bought …
Motherboard, CPU,
SSD, Case
690 thousand ₩
Memory was only 4GB
So…
We bought …
Memory 16GB
128 thousand ₩

Environment Setting
We bought …
GTX 980(4GB)
Power(600W)
740 thousand ₩
We bought …
Motherboard, CPU,
SSD, Case
690 thousand ₩
We bought …
Memory 16GB
128 thousand ₩
Our Motherboard was
Micro-ATX, So …
Memory was only 4GB
So…

Setting Deep Learning Toolbox
Keras(ver0.1): Theano-based library Caffe: fastest, nice framework
Python, No Configuration file(!) C++, protobuf

Work Flow
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
Color Variance
Datasets
GIST Descriptor
HOG Features

Work Flow
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
GIST Descriptor
HOG Features
Color Variance
Datasets

Data source
All labeled 20 styles
60% training, 20% validation, 20% test
Positive example : clean
Negative example : not clean
80,000 images
Curated by Flickr Group[Kerayev. et.al]

Style Labels
• Optical techniques : Macro, Depth-of-Field, Long Exposure, HDR
• Atmosphere : Hazy, Sunny
• Mood : Serene, Melancholy, Ethereal
• Composition styles : Minimal, Geometric, Detailed, Texture
• Color : Pastel, Bright
• Genre : Noir, Vintage, Romantic, Horror

Work Flow
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
Color Variance
Datasets
GIST Descriptor

Convolutional Neural Network
We made theano-based CNN model and tested it
too slow, need more data
128 batch, SGD optimize
INPUT(256x256)
CONV(32,3x3)
CONV(32,3x3)
MAXPOOLING(4x4)
CONV(32,3x3)
CONV(32,3x3)
MAXPOOLING(4x4)
FC(8192-2048)
FC(2048-20)
SOFTMAX
Our CNN Model (keras)

CaffeNet without fine-tuning
too slow, need more data (!)
INPUT(256x256)
CONV(48,2x2)
MAXPOOLING(2x2)
CONV(128,3,3)
MAXPOOLING(2x2)
CONV(192,3x3)
CONV(192,3x3)
CONV(128,3x3)
MAXPOOLING(2x2)
FC(10,816-4,096)
FC(4,096-4,096)
FC(4,096-2000)
SOFTMAX
CaffeNet (Modified AlexNet)

No Fine-tuning, No Future
Let’s not contribute to global warming
INPUT(256x256)
CONV(48,2x2)
MAXPOOLING(2x2)
CONV(128,3,3)
MAXPOOLING(2x2)
CONV(192,3x3)
CONV(192,3x3)
CONV(128,3x3)
MAXPOOLING(2x2)
FC(10,816-4,096)
FC(4,096-4,096)
FC(4,096-2000)
SOFTMAX
CaffeNet (Modified AlexNet)

Fine-tuning CNN (1)
Dataset MIT Places Database (2.5 million images, 205 category)
Model Deeply Supervised Net (DSN)
Deeply-Supervised Nets [Chen-Yu Lee, Saining Xie]

Fine-tuning CNN (2)
Dataset ImageNet (1.2 million images into 1000 category)
Model CaffeNet (replication of AlexNet)
Caffe: Convolutional Architecture for Fast Feature Embedding [Yangqing Jia, Evan Shelhamer]

Fine-tuning CNN (3)
Dataset ImageNet (1.2 million images into 1000 category)
Model VGG 16-layer Net
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION [Karen Simonyan, Andrew Zisserman]

Winner was VGG, But we used CaffeNet

Hand-crafted Features
GIST Color Histogram
3 scale, {8, 8, 4} orientation
4 x 4 grid
960 dimension
8 bins, 1 patch for Channel
48 dimension
Color Variance
4 x 4 grid for each channel
48 dimension

Concatenate Features
CNN GIST Histogram Variance
4,096 960 96 48
1 by 5200 Vector

Work Flow
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
GIST Descriptor
Color Variance
Datasets

Classifier
Linear Classifier
5,200 Feature Vector
Hinge Log
Squared
Hinge
Modified
Huber
Perceptron
0.314 0.309 0.327 0.316 0.303
Input
Loss func
Accuracy
Not enough performance!

Classifier
Full-connected Neural Network
Let’s tune hyper-parameters

Classifier
Varying Number of Hidden layers
1 hidden layer performed better
INPUT(5200x1)
FC(5152-2048)
RELU
FC(2048-20)
SOFTMAX
1 Hidden Layer
INPUT(5200x1)
FC(5152-2048)
RELU
FC(2048-1024)
RELU
FC(1024-20)
SOFTMAX
2 Hidden Layer

Classifier
Varying Dropout rate
Dropout rate 0.8 is 2% better

Classifier
Varying Optimization Techniques
All of them overfitted, only SGD survived

Classifier
Varying Activation functions
RELU shows highest accuracy
tanh RELU Sigmoid Soft Plus

Classifier
Linear vs Full Connected
31.4 %
30.9 %
32.7 %
31.6 %
30.3 %
39.5 %

Final Touch
Now we have every building blocks
Any remaining variation?
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
GIST Descriptor
Color Variance
Datasets

Final Touch
Let’s compare combination of features
Classifier
pre-training
Top5 Score
Handcraft Features
ConvNet Features
Bright
Romantic
Serene
Sunny
Macro
Color Histogram
Color Variance
Datasets
GIST Descriptor

Final Touch
Let’s compare combination of features
CNN (4,096) GIST Color Histogram Color Variance Acc
COMB 1 O O O O 0.3945
COMB 2 O O O 0.3941
COMB 3 O O 0.3949
COMB 4 O 0.3942
Accuracy for various feature combinations
CNN GIST
4,096 960
1 by 5,056 Feature Vector
Final feature combination
Top1 Accuracy. 0.395 > 0.368 (benchmark)

Conclusion
Confusion matrix
Vintage vs Pastel
Pastel vs Romantic
Sunny vs Serene

Conclusion
Prediction Accuracy

[SNU Computer Vision Course Project] Image Style Recognition

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (15)

Similaire à [SNU Computer Vision Course Project] Image Style Recognition

Similaire à [SNU Computer Vision Course Project] Image Style Recognition (20)

Dernier

Dernier (20)

[SNU Computer Vision Course Project] Image Style Recognition