Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Drop Out in Deep Learning

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Es272 ch4b
Es272 ch4b
Chargement dans…3
×

Consultez-les par la suite

1 sur 4 Publicité

Plus De Contenu Connexe

Similaire à Drop Out in Deep Learning (20)

Plus récents (20)

Publicité

Drop Out in Deep Learning

  1. 1. • Train – Forward propagation 1. Assume, drop out ratio of layer 2 is 0.3 2. Assume, layer 2 activation vector [aL2N1, aL2N2 ] = [0.23, 0.47] 3. Bernoulli mask vector – A vector where every value follows Bernoulli distribution with probability of failure same as Drop out ratio. a) Assume layer 2 mask vector [mL2N1, mL2N2 ] = [False, True] 4. Updated activation vector = Activation vector (dot) Mask Vector. a) “Dot” operation will make corresponding values in activation vector disappear. b) Updated activation vector = [ __, 0.47] which means Node 1 dropped off from layer 2 c) Inference – Node 1 from layer 2 gets dropped of 30% times out of all the iterations. d) Please note that the dropped nodes remain dropped for all samples in a batch. Network modifies/ drops different nodes only at the start of a new batch. 5. This process of node 1 from layer 2 getting dropped 30% times reduces the z values which are present in layer 3 in similar proportion. Hence, we try to upscale the z values in layer 3 using below method. a) z3 ~ w3. a2 — z in layer 3 depends on activations in layer 2. b) We divide the values of a2 with (1 – drop out ratio = 1 – 0.3 = 0.7) while computing z3. • Train – Backward propagation – The same nodes that were disconnected in Forward propagation remain disconnected in backward propagation. • Test – No Change – All nodes remain connected. No nodes are dropped. • Keras and Pytorch implements Inverse Dropout Inverse Dropout – Example 1
  2. 2. Inverse Dropout
  3. 3. 1. Drop out ratio of layer 2 is 0.1 (10%) 2. Nodes in layer 2 is 50 3. Layer 2 activation vector [aL2N1, aL2N2…………………….., aL2N50] = [0.23, 0.47……………………..,0.6] 4. Layer 2 mask vector [mL2N1, mL2N2…………………….., mL2N50] = [True, False…………………….., True] 5. Updated activation vector = [0.23, __, ……………………..,0.6] 6. This process of node 2 from layer 2 getting dropped 10% times reduces the z values which are present in layer 3 in similar proportion. Hence, we try to upscale the z values in layer 3 using below method. 1. z3 ~ w3. a2 2. We divide the values of a2 with (1 – drop out ratio = 1 – 0.1 = 0.9) while computing z3. 3. Numeric example: a) Drop out = 0.1 b) Nodes dropped = 0.1 * 50 = 5 c) aL2N2 gets dropped. d) While computing z3, we divide the values of other a’s from layer 2 by 0.9. • For example, aL2N50 becomes 0.6/0.9 = 0.66. 0.66 is 109% of 0.6 which means we scaled the value of aL2N50 by 9% (approx.) • Inference – We made a reduction in 10% in number of nodes in layer 2. We compensated this loss by a proportionate increase while computing parameters of layer 3. Inverse Dropout – Example 2
  4. 4. 1. Train - The nodes are disconnected as was the case in Inverse Drop out. The difference is we do not compensate for the loss of nodes in case of Drop out. 2. The loss compensation happens at the time of Test Phase. 3. Test – Weights corresponding to the layer whose nodes are disconnected are upscaled. 4. In below example, nodes 2, 4 in layer 2 are disconnected in Training phase. During test phase, weights ranging from w1 through w8 are all divided by (1 – drop out ratio). Dropout

×