2. Lecture Content
• Non Linearly separable functions: XOR gate
implementation
– MLP data transformation
– mapping implementation
– graphical solution
2Neural Networks Dr. Randa Elanwar
3. Non linear problems
• XOR problem
• The only way to separate the positive from negative examples
is to draw 2 lines (i.e., we need 2 straight line equations) or
nonlinear region to capture one type only
3Neural Networks Dr. Randa Elanwar
+ve
+ve
-ve
-ve+ve
-ve
cba yx
22
4. Non linear problems
• To implement the nonlinearity we need to insert one or more
extra layer of nodes between the input layer and the output
layer (Hidden layer)
4Neural Networks Dr. Randa Elanwar
6. MLP data transformation and mapping
implementation
• Need for hidden units:
• If there is one layer of enough hidden units, the input can be
recoded (memorized) multilayer perceptron (MLP)
• This recoding allows any problem to be mapped/represented
(e.g., x2, x3, etc.)
• Question: how can the weights of the hidden units be trained?
• Answer: Learning algorithms e.g., back propagation
• The word ‘Back propagation’ is meant to the error propagation
for weight adaptation of layers beginning from the last hidden
layer back to the first i.e. weights of last layer are computed
before the previous layers
6Neural Networks Dr. Randa Elanwar
7. Learning Non Linearly Separable Functions
• Back propagation tries to transform training patterns to
make them almost linearly separable and use linear
network
• In other words, if we need more than 1 straight line to
separate +ve and –ve patterns, we solve the problem in
two phases:
– In phase 1: we first represent each straight line with a single
perceptron and classify/map the training patterns (output)
– In phase 2: these outputs are transformed to new patterns which
are now linearly separable and can be classified by an additional
perceptron giving the final result.
7Neural Networks Dr. Randa Elanwar
8. Multi-layer Networks and Perceptrons
8Neural Networks Dr. Randa Elanwar
- Have one or more
layers of hidden units.
- With two possibly very
large hidden layers, it is
possible to implement
any function.
- Networks without hidden
layer are called perceptrons.
- Perceptrons are very limited
in what they can represent,
but this makes their learning
problem much simpler.
9. Solving Non Linearly Separable Functions
• Example: XOR problem
• Phase 1: we draw arbitrary lines
• We find the line equations g1(x) = 0 and g2(x) = 0 using
arbitrary intersections on the axes (yellow points p1, p2, p3, p4).
• We assume the +ve and –ve directions for each line.
• We classify the given patterns as +ve/-ve with respect to both g1(x) & g2(x)
• Phase 2: we transform the patterns we have
• Let the patterns that are +ve/-ve with respect to both g1(x) and g2(x) belong to class
B (similar signs), otherwise belong to class A (different signs).
• We find the line equations g(y) = 0 using arbitrary intersections on the new axes
9Neural Networks Dr. Randa Elanwar
X2
X1
A
A
B
B
x1 x2 XOR Class
0 0 0 B
0 1 1 A
1 0 1 A
1 1 0 B
g2(x)
g1(x) +ve+ve
p1
p2
p3
p4
11. Solving Non Linearly Separable Functions
• Assume x1>p1(1) is the positive direction for g1(x)
• Assume x1>p3(1) is the positive direction for g2(x)
• Classifying the given patterns with respect to g1(x) and g2(x):
• If we represent +ve and –ve values as a result of step function
i.e., y1 =f(g1(x))= 0 if g1(x) is –ve and 1 otherwise
and y2 =f(g2(x))= 0 if g2(x) is –ve and 1 otherwise
• We now have only three patterns that can be linearly separable and
we got rid of the extra pattern causing the problem (since 2 patterns
coincide)
11Neural Networks Dr. Randa Elanwar
x1 x2 g1(x) g2(x) y1 y2 Class
0 0 -ve -ve 0 0 B
0 1 +ve -ve 1 0 A
1 0 +ve -ve 1 0 A
1 1 +ve +ve 1 1 B
13. Solving Non Linearly Separable Functions
• Example: The linearly non separable patterns x1 = [3 0],
x2 = [5 2], x3 = [1 3], x4 = [2 4], x5 = [1 1], x6 = [3 3]
have to be classified into two categories C1 = {x1, x2,
x3, x4} and C2 = {x5, x6} using a feed forward 2-layer
neural network.
– Select a suitable number of partitioning straight lines.
– Consequently design the first stage (hidden layer) of the
network with bipolar discrete perceptrons.
– Using this layer, transform the six samples.
– Design the output layer of the network with a bipolar
discrete perceptron using the transformed samples.
13Neural Networks Dr. Randa Elanwar
15. Solving Non Linearly Separable Functions
15Neural Networks Dr. Randa Elanwar
x1 x2 g1(x) g2(x) y1 y2 Class
3 0 +ve +ve 1 1 B
5 2 +ve +ve 1 1 B
1 3 -ve -ve -1 -1 B
2 4 -ve -ve -1 -1 B
1 1 +ve -ve 1 -1 A
3 3 +ve -ve 1 -1 A
Assume x2<p1(2) is the positive direction for g1(x)
Assume x1>p3(1) is the positive direction for g2(x)
Classifying the given patterns with respect to g1(x) and g2(x):
We now have only three patterns that can be linearly separable and we got rid of the
extra pattern causing the problem (since 2 patterns coincide)
we represent +ve and –ve values as a result of
bipolar function
y1 =f(g1(x))= -1 if g1(x) is –ve and 1 otherwise
y2 =f(g2(x))= -1 if g2(x) is –ve and 1 otherwise