The document outlines the steps for conducting a deep learning experiment in Korean. It introduces the speaker and their background in artificial intelligence and natural language processing. It then lists the steps, which include understanding neural networks, deep neural networks with techniques like pretraining, rectified linear units and dropout, using the Theano library, writing deep learning code with Theano, and applying deep learning to natural language processing with libraries like Gensim. It also discusses recent interest in deep learning and example applications.
2. 소개
김현호
- UST 컴퓨터전공
- 한국전자통신연구원
자동통역연구실
- Team Popong mobile담
당
- 인공지능, 기계학습, 자연
어 처리
- stray.leone@gmail.com
2
3. 순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
3
4. 순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
4
5. 순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
5
6. 순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
6
7. 순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by RNN
7
35. 기존 Deep Learning의 어려움
35
deeper than two or three level networks yieled
poorer results
36. Deep Learning이 어려운 이유
- Overfitting
- Deep nets have lots of parameters
- Underfitting
- Gradient descent Vanishing
36
37. Deep Learning의 비약적 발전
- Pretraining
- Drop Out
- Rectified Linear Unit
37
38. Pretraining 성능
38
“Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio,
- pretraining initialization은
random initialize보다
better local minimum 에
서 시작한다.
56. Why Theano
- Definition
- Theano is a Python library that allows you
to define, optimize, and evaluate
mathematical expressions involving multi-
dimensional arrays efficiently.
(http://deeplearning.net/software/theano/)
- Optimizing GPU-meta-programming code
generating array oriented optimizing math compiler
in Python
(https://github.com/josephmisiti/awesome-machine-learning)
56
57. Why Theano
- cuda code 작성하지 않고, python
code로 gpu 연산 수행
- grad(), updates, function()
- symbolic function
57
58. Why Theano - grad(), updates, function()
gradients = T.grad() 하면 직접 gradient가 계산
된다.
ex)
x = T.scalar()
gx = T.grad(x**2, x) ← x**2를 x에 대해서
gradient 값을 구한다. (= 2x)
58
60. Why Theano - grad(), updates, function()
60
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
61. Why Theano - grad(), updates, function()
61
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
input
output
62. Why Theano - grad(), updates, function()
62
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
input
output
63. Why Theano - grad(), updates, function()
x = dmatrix('x')
y = dmatrix('y')
z = x + y
f = theano.function([x,y], z) scalarscalar
scalar
64. Why Theano - grad(), updates, function()
x = dmatrix('x')
y = dmatrix('y')
z = x + y
f = theano.function([x,y], z)
Theano represents symbolic
mathematical computations
as graphs
scalarscalar
scalar
65. Why Theano - grad(), updates, function()
x = theano.tensor.dscalar('x')
y = theano.tensor.dscalar('y')
z = x + y
f = theano.function([x,y], z)
print f(4,3)
array(7.0)
scalarscalar
scalar
73. DBN.py
73
380 while (epoch < training_epochs) and (not done_looping):
381 epoch = epoch + 1
382 for minibatch_index in xrange(n_train_batches):
383
384 minibatch_avg_cost = train_fn(minibatch_index)
385 iter = (epoch - 1) * n_train_batches + minibatch_index
386
387 if (iter + 1) % validation_frequency == 0:
388
389 validation_losses = validate_model()
390 this_validation_loss = numpy.mean(validation_losses)
74. DNN using ReLU
import theano
from theano import tensor as T
from theano.sandbox.rng_mrg import
MRG_RandomStreams as RandomStreams
import numpy as np
from load import mnist
74
81. for i in range(100):
for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
cost = train(trX[start:end], trY[start:end])
print np.mean(np.argmax(teY, axis=1) == predict(teX))
81
86. data 만들기1
86
from numpy import genfromtxt
import gzip, cPickle
…………….
train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",")
…………………..
train_set = train_set_x, train_set_x
valid_set = valid_set_x, valid_set_x
test_set = test_set_x, test_set_x
print "writing to pkl.gz..."
data_set = [train_set, valid_set, test_set]
print "zip data into a file"
f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb')
print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz"
cPickle.dump(data_set,f,protocol=2)
f.close()
87. for n, sentence in enumerate(file_lines):
……………………..
data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz"
……………………….
# save vector list
numpy.savez(data_batch_fpath,
data=numpy.asarray(sentence_vector_list),
labels=label_vector,
length=max_length,
dim=dimension)
87
data 만들기2
94. - 나는 밥을 먹는다
Deep Learning
For Natural Language Processing
94
one-hot (1 of K)
representation
95. Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다.
95
형태소 단위로 분리
one-hot (1 of K)
representation
96. Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
96
index 0(나) 1(가) 2(는) ... ... ... ... 999(.)
나 1 0 0 0 0 0 0 0
는 0 0 1 0 0 0 0 0
.. 0 0 0 0 0 1 0 0
.. 0 0 0 0 1 0 0 0
다 0 0 0 0 0 0 1 0
형태소 단위로 분리
one-hot (1 of K)
representation
문자의 벡터로 표현
97. Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
97
형태소 단위로 분리
word2vec
representation
문자의 벡터로 표현
98. Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
- Word2Vec model
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
98
형태소 단위로 분리
word2vec
representation
문자의 벡터로 표현
99. Deep Learning
For Natural Language Processing
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
99
word2vec
representation
one-hot (1 of K)
representation
100. Gensim
- definition
- Gensim is a Python library for topic modelling, document indexing
and similarity retrieval with large corpora
- word2vec class
- word vector representation
- multi threading
- Skip Gram
- Continuous Bag of Words
100
101. Gensim - import, settings
101
# imports
9 from gensim.models.word2vec import LineSentence
10 from gensim.models import word2vec
32 # settings
33 THEADS = 8 # progress with multi threading
34 DIMENSION = 50
35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow
36 WINDOW_SIZE = 8
37 NTimes = 10 # repeat number of sentences
38 min_count_of_word = 5
………………..
65 from gensim import utils
102. Gensim - training, save model
102
97 # load raw sentence
98 sentences = LineSentence(input_train_file_path)
99 # model settings
100 model = word2vec.Word2Vec(size=dimension, workers=THEADS,
min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE)
101
102 # build voca and train
103 number_iter = NTimes # number of iterations (epochs) over the corpus
104 model.build_vocab(sentences)
105
106 ss = utils.RepeatCorpusNTimes(sentences, number_iter)
107 model.train(ss)
108 # save model
109 model.save(model_file_name)
110 model.save_word2vec_format(model_file_name + '.bin', binary=True)
103. Gensim - load model, test
103
83 try:
84 model = utils.SaveLoad.load(fname=model_file_name)
85 except:
86 print "failed to load. Retrying by load_word2vec_format() !!"
87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin")
297 x = model [w.decode('utf-8')]
314 mw, score = model.most_similar(positive=[x])[0]
315 print "most similar : ",mw
316 print "target vector :", x
104. ‘서울’의 most similar words
104
most similar words similarity
대구 0.4282917082309723
광주 0.4046330451965332
부산 0.40132588148117065
울산 0.3863871693611145
수원 0.38555505871772766
청주 0.35919708013534546
안양 0.35622960329055786
주왕산 0.3543151617050171
평택 0.3505415618419647
cebu 0.34598737955093384
105. Auto word spacing
with Recurrent Neural Network
105
- 0 0 1 0 1 0 0
- 나는 밥을 먹는다
- [0.323112, -0.021232, …….. , 0.82123123]
106. Deep Learning 실험하면서 어려웠던
것들
- layer의 개수, layer당 node의 개수, learning
rate, epoch횟수, batch횟수, activation
function 선택 등 선택해야할 parameter들이
많다.
- parameter 바꿔서 실험결과를 확인하는 데에
오래 걸린다.
- big data이기때문에 gpu memory 문제
106