Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm

Self Organizing Feature Map(SOM),
Topographic Product,
Cascade 2 Algorithm
Chenghao Jin
2014.04.23

Comments
 All of the Self Organizing Feature Map(SOM), Topographic
Product, Cascade 2 Algorithms are used in SCPSNSP algorithm.
 SCPSNSP
 Paper
 A SOM clustering pattern sequence-based next symbol prediction method for
day-ahead direct electricity load and price forecasting
 Journal
 Energy Conversion and Management, Vol. 90, 2015
 https://www.sciencedirect.com/science/article/pii/S0196890414009662
2

The SCPSNSP Algorithm
SCPSNSP: SOM Cluster Pattern Sequence-based Next Symbol Prediction
- Clustering
- Data Normalization
- SOM Clustering
- Feature Map Optimization
- Next Symbol Prediction
- Determine Window Size
- Cascade 2 Algorithm
- Find the Nearest Nonzero Hits Pattern

Proposed Prediction Framework
4
SOM Clustering
NormalizationData Labeled data
ANN
(Cascade 2)
Determine
Window Size W
Label
Sequence
Find Nearest
Cluster LabelLabel
coordinates
Label
coordinates
Clustering
Next Symbol Prediction
Denormalization
Feature Map Optimization
1
2
3
4 5 6

Data Clustering
5
Hourly electricity time series vector of day i:
X(i) = [x1, x2, …, x24]
By clustering these vectors, we can obtain
5 2 2 12 4 6 … 3 5
Each day
Day Time_00:00 Time_01:00 Time_02:00 … Time_23:00 Cluster
2012-01-01 20972.0 19904.3 19837.6 … 19864.9 5
2012-01-02 21140.5 19908.6 19691.8 … 19742.1 2
2012-01-03 20803.5 19447.2 18555.5 … 18592.0 2
2012-01-04 21313.8 19935.3 19595.5 … 19187.8 12
2012-01-05 19258.8 18735.5 18248.7 … 19829.1 4
2012-01-06 18980.2 18532.7 17963.6 … 19908.4 6
… … … … … … …
2012-09-02 21215.6 19766.8 18961.9 … 18949.6 3
2013-09-03 21315.8 21919.1 22137.1 18093.2 5
Electricity time series data:

5 2 2 12 4 6 … 3 5 x … x
Window, W=3
Clustering Pattern Sequence
① ② ③
0
4
0
1
0
1
1
3
5 2 2
(0,4) (0,1) (0,1)
12
(1,3)
Each day to be
predicted
Input pattern sequence Output pattern
Each pattern
coordinates
①
2 2 12
(0,1) (0,1) (1,3)
4
(0,3)
0
1
0
1
1
3
0
3
0
5
2 12 4
(0,1) (1,3) (0,3)
6
(0,5)
0
1
1
3
0
3
③
②
…
…
Test dataset
Input Layer
Hidden
Layer 1 Output Layer
c1
r1
c2
r2
cw
rw
cw+1
rw+1
Hidden
Layer 2
Hidden
Layer 3

Why we choose SOM and ANN?
 SOM [Kohonen 1990]
 The function of preserving the topological properties of the
input space
 However, other clustering algorithms do not provide any
topology relations between cluster patterns
 Cascade 2 [Fahlman96] training algorithm
 It can model the complex relationships between input and
output
 No need to determine the number of hidden layers and
neurons in advance
7

Pseudo Code of Proposed Method
8
SCPSNSP:
SOM Clustering Pattern-Sequence Next Symbol Prediction method
using Cascade 2 algorithm

Normalization
 Time series data
 May be recorded at different situation or different
measurement unit
 Ex: electricity price of this year increased two or three times
higher than that of last year
 In order to analyze in equal condition,
 where N is 24 and is the time series value at ith hour of a day.
10
1
1
i
i
N
ii
x
x
x
N 



Self Organizing Maps(SOM)
 Unsupervised learning neural network
 Projects high-dimensional input data onto two-
dimensional output map
 Preserves the topology of the input data
 Primarily used for organization and visualization of
complex data
12

 Lattice of neurons (‘nodes’) accepts and responds to set of input signals
 Responses compared; ‘winning’ neuron selected from lattice
 Selected neuron activated together with ‘neighbourhood’ neurons
 Adaptive process changes weights to more closely resemble inputs
13

 SOM algorithm
1. Randomly initialise all weights
2. Select input vector x = [x1, x2, x3, … , xn]
3. Compare x with weights wj for each neuron j to determine winner
4. Update winner so that it becomes more like x, together with the
winner’s neighbours
5. Adjust parameters: learning rate & ‘neighbourhood function’
6. Repeat from (2) until the map has converged (i.e. no noticeable
changes in the weights) or pre-defined no. of training cycles have
passed
14

Lattice of the grid
16
Hexagonal SOM grid Rectangular SOM grid

Best Matching Unit(BMU) and Neighbors
17
Hexagonal SOM grid Rectangular SOM grid

Neuron index and coordinates
18
1 9 17 25 33
2 10 18 26 34
3 11 19 27 35
4 12 20 28 36
5 13 21 29 37
6 14 22 30 38
7 15 23 31 39
8 16 24 32 40
(0,0) (1,0) (2,0) (3,0) (4,0)
(0,1) (1,1) (2,1) (3,1) (4,1)
(0,2) (1,2) (2,2) (3,2) (4,2)
(0,3) (1,3) (2,3) (3,3) (4,3)
(0,4) (1,4) (2,4) (3,4) (4,4)
(0,5) (1,5) (2,5) (3,5) (4,5)
(0,6) (1,6) (2,6) (3,6) (4,6)
(0,7) (1,7) (2,7) (3,7) (4,7)
Cluster Label Index Array Coordinates

Neuron index and coordinates
 Neuron index: I
 Coordinates: (ri, ci)
 floor() : returns the nearest smaller integer
 mod() : returns modulus after division
19
R
C
(( 1) / ),
mod( 1, ),
* 1
i
i
i i
r floor i C
c i C
i r C c
 
 
  

 Map size:
 [R, C]: the number of neuron: M = R * C
 Vector initialization
 Find the BMU
 BMU: Best Matching Unit
20
1 2[ , ,..., ]( 1,2, , )i i i idm m m m i M  
arg minc i
i
v m v m

  

 Update the codebook vectors
 According to the training algorithms
 Sequential training
 Batch training
 Sequential training

 v(t): input vector randomly selected from input dataset at
time t
 (t): the learning rate,
 hci(t): neighborhood function
21
( 1) ( ) ( ) ( )[ ( ) ( )]i i ci im t m t t h t v t m t   

 Batch training
 c: the index of the BMU of v(j)
 n: the number of data samples
 t: t-th iteration
22
1
1
( ) (j)
( 1)
( )
n
cij
i
cij
n
h t v
m t
h t


 



 Memory efficient implementation of Batch
 m: the number of neurons in the feature map
 ni: the number of data samples in unit i
 : the mean of the input data in node j
23
1
1
1
( ) ( )
( ) ( ), ( 1)
( )
in
ij jj
i i
j j ijj
m
m
h t s t
t v j m t
n h
s
t



  



1
1
( ) ( )
( 1)
( )
m
j ij jj
i
j ijj
m
n h t v t
m t
n h t


 


jv
or

 Learning process
 Ordering
 reduce of learning rate and neighborhood size with
iterations.
 Convergence
 the SOM is fine tuned with the shrunk neighborhood and
constant learning rate.
24

 Common neighborhood function
 Note: is measured in the output space
25
2
2
( ) exp
2 ( )
c i
ci
r r
h t
t
 
  
 
 
c ir r
(t): defines the radius of the
neighborhood around the BMU

SOM Sample Hits
 Sample hits
26
Zero hits pattern is not considered

Advantages and disadvantages of SOM
 Advantages:
 Projects high dimensional data onto lower dimensional map
 Preserve the topological properties of the input space so that similar
objects will be mapped to nearby locations on the map
 Useful for visualization
 Preserve the topology relations of input space
 Disadvantages:
 Clustering result depends on initial weight vector
 Requires large quantity of good training data
 Difficult to determine optimal map size
 Edge/boundary effect(Neurons on the map boundary have fewer
neighbors) in traditional SOM(two-dimensional regular spacing in
rectangular or hexagonal grid)
 High computation cost
27

Why Need Feature Map Optimization?
 Feature map size
 Important to detect the deviation of the data
 It directly influences the clustering
 Usually, it is recommended that
 Choose large feature map size even when there is few input
data samples
 Small map has relatively no freedom to align topological
relations between neurons
29

Why Need Feature Map Optimization?
 If map size is too large
 The differences between neurons are too detail
 The computation cost will increase extremely high due to a
huge number of neurons
 If map size is too small
 The output patterns will be more general
 Could miss some important differences that should be
detected.
30

Map Quality Measures for SOM
 Survey of Map quality measures [Pölzlbauer 2004]
 Several measures were introduced
 However, no measure is suitable in all cases
 Quantization Error
 Completely disregards map topology and alignment.
 Computed by determining the average distance of the sample vectors to
the cluster centroids by which they are represented
 Average distance between each data vector and its BMU
 The value decreases with the increase of the map size
 Topographic Error
 The proportion of all data vectors for which first and second BMUs are
not adjacent units
 However, the work [Pölzlbauer04] concluded that topographic error is not
reliable for small maps
31

Map Quality Measures for SOM
 Trustworthiness and Neighborhood Preservation
[Venna 2001]
 Determines whether the projected data points which are actually visualized are
close to each other in input space
 Cannot take a single value, instead a series of values to represent the map quality
which depend on one parameter
 SOM distortion [Lampinen 1992]
 In some assumption, it can be seen as a cost function
 However, cannot compare map quality of different sizes.
 Topographic Product [Bauer 1992]
 Only the map's codebook is regarded.
 Indicates whether the size of the map is appropriate to fit onto the dataset
 Represent the appropriateness of the feature map size for the given training data
with a single value
 However, this value depends on the codebook vector initialization
32

Topographic Product (1/10)
 Notation
 Input Space: V
 Output Space(map): A
 wj : weight vector of node j
 Here, node = neuron
 the distance in the input V between wj and v

 : kth nearest-neighbor of node j in the output space
 : kth nearest-neighbor of node j in the input space
33
: ( , ) min ( , )V V
i j
j A
i d w v d w v


( , ):V
id w v
( )k
A
n j
( )k
V
n j

 In the output space:
 Measure with coordinates
 In the Input space:
 Measure with weight vectors
34
1
1 1
{ }
2 2
{ , ( )}
( ) : ( , ( )) min ( , )
( ) : ( , ( )) min ( , )A
A A A A
j A j
A A A A
j A j n j
n j d j n j d j j
n j d j n j d j j




1
2
1
1 ( ) { }
2 ( ) { , ( )}
( ): ( , ) min ( , )
( ): ( , ) min ( , )
V
V
V
V V V
j j jn j j A j
V V A
j j jn j j A j n j
n j d w w d w w
n j d w w d w w







 Relation of k nearest neighbors in input space
 Relation of k nearest neighbors in output space
 Q1(j,k) = Q2(j,k) = 1
 Only if the nearest neighbors of order k in the input and
output space coincide.
35
( )
1
( )
( , )
( , )
( , )
A
k
V
k
V
j n j
V
j n j
d w w
Q j k
d w w

2
( , ( ))
( , )
( , ( ))
A A
k
A V
k
d j n j
Q j k
d j n j


 The first nearest neighbor
coincide
36
1
1
(3) 4
(3) 4
V
A
n
n


 The Second nearest neighbor
not coincide
2
2
(3) 5
(3) 2
V
A
n
n


 Therefore,
3 2
1
3 5
( , )
(3,2) 1
( , )
V
V
d w w
Q
d w w
 Input space output space
3
4
2
1
5
26
27
28

 The pointers in the input space form a line in the same way as
the nodes in the output space
 However, previous definition is sensitive to the nearest
neighbor ordering
 To cancel the nearest neighbors ordering constrains,
normalize as
37
1/
1 1
1
( , ) ( , )
kk
l
P j k Q j l

 
  
 

1/
2 2
1
( , ) ( , )
kk
l
P j k Q j l

 
  
 

1
2
( , ) 1
( , ) 1
P j k
P j k


Now we have

 Now P1, P2 are only sensitive to severe neighborhood violations
 Example
 Close in the input space, but far from in the output space
38
1/3
3 4 3 2 3 5
1 2
3 4 3 5 3 2
( , ) ( , ) ( , )
(3,3) 1 (3,3) 1
( , ) ( , ) ( , )
V V V
V V V
d w w d w w d w w
P with P
d w w d w w d w w
 
   
 
4 (3) 28V
n 
Far away

39
1/4
3 4 3 2 3 5 3 1
1
3 4 3 5 3 2 3 28
1/4
3 1
3 28
( , ) ( , ) ( , ) ( , )
(3,4)
( , ) ( , ) ( , ) ( , )
( , )
1
( , )
V V V V
V V V V
V
V
d w w d w w d w w d w w
P
d w w d w w d w w d w w
d w w
d w w
 
  
 
 
  
 
1/4
2
1/4
(3,4) (3,2) (3,5) (3,1)
(3,4)
(3,4) (3,5) (3,2) (3,28)
(3,1)
1
(3,28)
A A A A
A A A A
A
A
d d d d
P
d d d d
d
d
 
  
 
 
  
 
In this case, a small deviation from 1 for P1 and a very strong deviation for P2.
<<

 Constant magnification factors of the map do not
change next-neighbor ordering
 Therefore, P1 and P2 are insensitive to constant
gradients of the map so that we combine
40
1/2
3 1 2
1
( , ) ( , ) ( , )
kk
l
P j k Q j l Q j l

 
  
 

However, only for one neuron

 By averaging for the whole map
 Therefore
 P  0: perfect map
 P < 0: map too small
 P > 0: map too big
41
1
3
1 1
1
log( ( , ))
( 1)
N N
j k
P P j k
N N

 




42

Heuristic Way for Map Size
 Predefine map size [Vesanto 2000]

 M: total number of neurons
 n: the number of training data
43
5M n
TheLargestEigenvalue
TheSecondLargestEigenvalue
*
R
C
M R C



 
A B C
The given dataset
Recommended map size: [R, C]

Trained Map Size Range
 Previous recommended map size
 [R, C]
 Trained map size
 Map size [i, j]
 5  i  1.5*max([R, C])
 5  j  1.5*max([R, C])
 Among these various trained map,
 We choose map size with the smallest topographic product
value
44

 What window size is better for prediction?
46
5 1 1 3 3 8 4 3 5 3
Window, w=3
Clustering Label Sequence
①
②
③

Determine the window size W
 It is unrealistic to consider the whole pattern sequence
in the further steps
 A window W of patterns before days to be predicted is
chosen to minimize [Martínez-Á lvarez 2008]
 where
 D: The number of consecutive days to be predicted
 N: The number of time series values recorded in equal time
interval within a day, N = 24
47
1 1
ˆ
D N
i i
d i
x x
 


ANN (Cascade 2 training algorithm)5

Artificial Neural Network
 ANN is a model (emulation) of information processing
capabilities of nervous systems.
 A neuron receives signals (impulses) from other neurons
through its dendrites (receivers) and transmits signals generated
by its cell body along the axon (transmitter). The terminals of
axon branches are the synapses. It is a functional unit between
two neurons.
49

 Artificial Neural Network (ANN)
 A mathematical model or computational model that is inspired by the
structure and/or functional aspects of biological neural networks
 Consists of an interconnected group of artificial neurons, and it processes
information using a connectionist approach to computation
 However, difficult to determine network architecture
 If too many hidden neurons, may cause overfitting
 If too small, may cause underfitting
50

Cascade architecture
 Cascade architecture
 No need to determine the number of layers and hidden nodes
 Cascade architecture consists of
 A cascade algorithm
 How the neurons should be added to the network
 Ex: Cascade-Correlation, Cascade 2
 A weight update algorithm
 Quickprop, RPROP
 Key idea of cascade architecture
 Neurons are added to the network one at time
 Their input weights do not change after added
51

Cascade-Correlation
 Cascade 2 [Fahlman 1996] is based on Cascade-Correlation [Fahlman 1989]
 Cascade-Correlation
 Start with a net without hidden nodes
 Hidden neurons have connections from all input and all existing hidden nodes
 During candidate neuron training, only trains input to the candidate neuron, the
other weights are kept frozen
52
Weight Frozen
Weight training
added
candidate
1st candidate:

Cascade-Correlation
 Correlation learning
 After added, input connections to candidate neurons are kept frozen
 Then initialize all output connections to all output neurons with small
random values
53
2nd candidate:
addedcandidate

Cascade-Correlation
 Train wnew to maximize covariance
 covariance between x and Eold
54
,
: The number of output neurons
: The number of training sample
: Vector of weights to the candidate neuron
: Output of for sample
: Average candidate output value of over all samp
new
th
new p
new
K
P
w
x x p
x x
,
les
: Error on output node for sample with old weights
: Average error at output neuron k over all samples
th th
k p
k
E k p
E
where,))(()(
1
,
1
, 
 

K
k
kpk
P
p
newpnewnew EExxwS
xnew
wnew

Cascade-Correlation
 Drawback of Cascade-Correlation
 Covariance training
 Tendency to overcompensate for errors
 Because S give large activations whenever the error of the
ANN deviates from the average
 Deep networks
 Overfitting
 Weight freezing
 Does not manage to escape from the minimum during
candidate training, without achieving better performance
55

Cascade 2 algorithm
 Overcoming the drawbacks of previous problems
 Covariance training
 Replaced by direct error minimization
 Deep networks
 Also difficult problem
 Simply use more training patterns
 Weight freezing
 Two part training: output connections are trained first, and
then the entire ANN is trained
 Allows to find a proper place before the entire training
56

Cascade 2 algorithm
 Cascade 2 algorithm VS Cascade-Correlation
 Based on Cascade-Correlation
 Candidates training has been changed
 Candidates in Cascade 2 have trainable output connections to
all of the outputs in the ANN
 No direct connections in Cascade-Correlation
 Output connections are trained together with the inputs to the
candidates
 Separately in Cascade-Correlation
57

Cascade 2 algorithm
 Train candidate to minimize
 The difference between the error of the output neurons and the input from
the candidate to output neurons
58
,
: weight vector of candidate neuron
: The number of output neurons
: The number of training patterns
: the error of output neuron k for training pattern p
: The output value of candidate ne
new
k p
p
w
K
P
e
c
,
uron
: The connection weight vector from
candidate neuron c to output neuron k
c kw
2
, ,
0 0
( ) ( ) , where
K P
new k p p c k
k p
S w e c w
 
 
k
h Output node error
ek,p
wc,k
h
c
cp

Cascade 2 algorithm
 The ek,p is calculated as follows:
59
, , ,k p k p k pe d y 
, , ,
0
( )
n
k p k j k j p
j
y g w x

 
,
,
,
: Desired value of output neuron k for training pattern p
: Actual forecasted value of output neuron k for training pattern
: The number of incoming connections to outuput neuron k
: Conne
k p
k p
j k
d
y
n
w
,
ction weight from neuron j to output neuron k
: The output value of neuron j for training pattern p
: Activation function used at the output neuron k
j p
k
x
g
k
h
wj,k
h
c
cp
yk,p

Cascade 2 algorithm
 If the linear activation function
 After candidate is added, the updated forecasted output
value
60
, , ,
0
( )
n
k p k j k j p
j
y g w x

  , , ,
0
n
k p j k j p
j
y w x

 
Changed to
k
h
wc,k
h
c
cp, , ,
new
k p c k p k py w c y 
, , , ,
0
n
new
k p c k p j k j p
j
y w c w x

  
yk,p

Cascade 2 algorithm
 The new error can be
61
, , ,
new new
k p k p k pe d y 
, , , ,( )new
k p k p k p c k pe d y w c  
, , , ,
new
k p k p k p c k pe d y w c  
, , ,
new
k p k p c k pe e w c 
If the linear activation function is used and the candidate has been added
to the network, new error and old error only change by Wc,kCp
Interesting finding:
Candidate is trained to
minimize this
, , ,
new
k p c k p k py w c y 

ANN Normalization
 Clustering Label Sequence
 Cluster label is related to feature map size
 (r, c) is coordinates
 R and C: rows and columns of the feature map size
62
/ ( 1), / ( 1)norm normr r R c c C   
R
C

Find the Nearest Non-zero Hits Pattern6

 Find nearest cluster
64
(x,y) (x,y)
Hyperbolic tangent
Find nearest cluster label
5
1
1
1
1
3
1
3
3
Convert each class label to coordinates

Find the Nearest Pattern
 According to the topology relations,
 1. sort the patterns with ascending order of Euclidean
distance
 2. check whether it is a nonzero hits pattern, else find the 2nd
nearest pattern until it meets the nonzero hits pattern at the
first time.
65

Find the Nearest Pattern
66
Assign to (2,5) pattern

Summary
 Time series prediction
 Discover patterns using SOM clustering
 Feature map optimization
 Cascade 2 training with pattern’s coordinates
 No need to determine the neural network architecture size
 Directly forecast future time series with the nearest nonzero
patterns
67

References
 [Abraham 2001] Abraham, A. and Nath, B., “A neuro-fuzzy approach for modelling electricity demand in victoria,” Applied
Soft Computing, Vol. 1, No. 3040, pp. 127–138, 2001.
 [Amjady 2006] Amjady, N., “Day-ahead price forecasting of electricity markets by a new fuzzy neural network,” IEEE
Transactions on Power Systems, Vol. 21, No. 2, pp. 887–896, 2006.
 [Amjady 2009a] Amjady, N. and Daraeepour, A., “Design of input vector for day-ahead price forecasting of electricity
markets,” Expert Systems with Applications, Vol. 36, No. 10, pp. 12281–12294, 2009.
 [Amjady 2009b] Amjady, N. and Daraeepour, A., “Mixed price and load forecasting of electricity markets by a new iterative
prediction method,” Electric Power Systems Research, Vol. 79, No. 9, pp. 1329–1336, 2009.
 [Amjady 2009c] Amjady, N. and Hemmati, M., “Day-ahead price forecasting of electricity markets by a hybrid intelligent
system,” European Transactions on Electrical Power, Vol. 19, No. 1, pp. 89–102, 2009.
 [Amjady 2009d] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by a new feature selection
algorithm and cascaded neural network technique,” Energy Conversion and Management, Vol. 50, No. 12, pp. 2976–2982, 2009.
 [Amjady 2009e] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by mutual information
technique and cascaded neuro-evolutionary algorithm,” IEEE Transactions on Power Systems, Vol. 24, No. 1, pp. 306–318,
2009.
 [Amjady 2009f] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by mutual information
technique and cascaded neuro-evolutionary algorithm,” IEEE Transactions on Power Systems, Vol. 24, No. 1, pp. 306–318,
2009.
 [Amjady 2009g] Amjady, N. and Keynia, F., “Short-term load forecasting of power systems by combination of wavelet
transform and neuro-evolutionary algorithm,” Energy, Vol. 34, No. 1, pp. 46–57, 2009.
 [Amjady 2010a] Amjady, N., Daraeepour, A. and Keynia, F., “Dayahead electricity price forecasting by modified relief
algorithm and hybrid neural network,” IET generation, transmission & distribution, Vol. 4, No. 3, pp. 432–444, 2010.
 [Amjady 2010b] Amjady, N. and Keynia, F., “Application of a new hybrid neuro-evolutionary system for day-ahead price
forecasting of electricity markets,” Applied Soft Computing, Vol. 10, No. 3, pp. 784–792, 2010.
68

References
 [An 2013] An, N., Zhao, W., Wang, J., Shang, D. and Zhao, E., “Using multi-output feedforward neural network with
empirical mode decomposition based signal filtering for electricity demand forecasting,” Energy, Vol. 49, No. 1, pp. 279 – 288,
2013.
 [Anbazhagan 2013] Anbazhagan, S. and Kumarappan, N., “Day-ahead deregulated electricity market price forecasting using
recurrent neural network,” IEEE Systems Journal, Vol. 7, No. 4, pp. 866–872, 2013.
 [Anbazhagan 2014] Anbazhagan, S. and Kumarappan, N., “Day-ahead deregulated electricity market price forecasting using
neural network input featured by DCT,” Energy Conversion and Management, Vol. 78, No. 0, pp. 711 – 719, 2014.
 [ANE 2012] “Australian energy market operator,” http://www.nemmco.com.au, 2012.
 [Bauer 1992] Bauer, H. and Pawelzik, K., “Quantifying the neighborhood preservation of self-organizing feature maps,” IEEE
Transactions on Neural Networks, Vol. 3, No. 4, pp. 570–579, 1992.
 [Burrows 1994] Burrows, M. and Wheeler, D. J., “A block-sorting lossless data compression algorithm,” Digital SRC
Research Report, Tech. Rep., 1994.
 [Catalão 2007] Catalão, J., Mariano, S., Mendes, V. and Ferreira, L., “Short-term electricity prices forecasting in a competitive
market: A neural network approach,” Electric Power Systems Research, Vol. 77, No. 10, pp. 1297–1304, 2007.
 [Catalão 2009] Catalão, J., Pousinho, H. and Mendes, V., “Neural networks and wavelet transform for short-term electricity
prices forecasting,” in 15th International Conference on Intelligent System Applications to Power Systems, pp. 1–5, 2009.
 [Catalão 2011a] Catalão, J., Pousinho, H. and Mendes, V., “Hybrid wavelet-PSO-ANFIS approach for short-term electricity
prices forecasting,” IEEE Transactions on Power Systems, Vol. 26, No. 1, pp. 137–144, 2011.
 [Catalão 2011b] Catalão, J., Pousinho, H. and Mendes, V., “Shortterm electricity prices forecasting in a competitive market by a
hybrid intelligent approach,” Energy Conversion and Management, Vol. 52, No. 2, pp. 1061–1065, 2011.
 [Che 2010] Che, J. and Wang, J., “Short-term electricity prices forecasting based on support vector regression and auto-
regressive integrated moving average modeling,” Energy Conversion and Management, Vol. 51, No. 10, pp. 1911–1917, 2010.
69

References
 [Che 2012] Che, J., Wang, J. and Wang, G., “An adaptive fuzzy combination model based on self-organizing map and support
vector regression for electric load forecasting,” Energy, Vol. 37, No. 1, pp. 657–664, 2012.
 [Chen 1995] Chen, J.-F., Wang, W.-M. and Huang, C.-M., “Analysis of an adaptive time-series autoregressive moving-
average (arma) model for short-term load forecasting,” Electric Power Systems Research, Vol. 34, No. 3, pp. 187–196, 1995.
 [Chen 2007] Chen, J., Deng, S.-J. and Huo, X., “Electricity price curve modeling by manifold learning,” IEEE Transactions on
Power Systems, Vol. 15, pp. 723–736, 2007.
 [Cleary 1984] Cleary, J. and Witten, I., “Data compression using adaptive coding and partial string matching,” IEEE
Transactions on Communications, Vol. 32, No. 4, pp. 396–402, 1984.
 [Cleeremans 1989] Cleeremans, A., Servan-Schreiber, D. and McClelland, J. L., “Finite state automata and simple recurrent
networks,” Neural computation, Vol. 1, No. 3, pp. 372–381, 1989.
 [Conejo 2005] Conejo, A., Plazas, M., Espinola, R. and Molina, A., “Day-ahead electricity price forecasting using the wavelet
transform and ARIMA models,” IEEE Transactions on Power Systems, Vol. 20, No. 2, pp. 1035–1042, 2005.
 [Contreras 2003] Contreras, J., Espinola, R., ogales, F. and Conejo, A., “ARIMA models to predict next-day electricity
prices,” IEEE Transactions on Power Systems, Vol. 18, No. 3, pp. 1014–1020, 2003.
 [Cormack 1987] Cormack, G. V. and Horspool, R., “Data compression using dynamic markov modelling,” The Computer
Journal, Vol. 30, No. 6, pp. 541–550, 1987.
 [Cortes 1995] Cortes, C. and Vapnik, V., “Support-vector networks,” Machine learning, Vol. 20, No. 3, pp. 273–297, 1995.
 [Diebold 1995] Diebold, F. X. and Mariano, R. S., “Comparing predictive accuracy,” Journal of Business & Economic
Statistics, Vol. 13, No. 3, pp. 253–263, 1995.
 [Dong 2011] Dong, Y., Wang, J., Jiang, H. and Wu, J., “Shortterm electricity price forecast based on the improved hybrid
model,” Energy Conversion and Management, Vol. 52, No. 8, pp. 2987–2995, 2011.
 [Ehrenfeucht 1992] Ehrenfeucht, A. and Mycielski, J., “A pseudorandom sequence–how random is it?” American
Mathematical Monthly, pp. 373–375, 1992.
70

References
 [Fahlman 1996] Fahlman, S., Baker, L. and Boyan, J., “The cascade 2 learning architecture,” Techanical Report. CMUCS-
TR-96-184, Carnegie Mellon University, 1996.
 [Fan 2006a] Fan, S. and Chen, L., “Short-term load forecasting based on an adaptive hybrid method,” IEEE Transactions on
Power Systems, Vol. 21, No. 1, pp. 392–401, 2006
 [Fan 2006b] Fan, S., Liao, J. R., Kaneko, K. and Chen, L., “An integrated machine learning model for day-ahead electricity
price forecasting,” in Power Systems Conference and Exposition, 2006. PSCE’06. 2006 IEEE PES, pp. 1643–1649, IEEE, 2006.
 [Fan 2006c] Fan, S., Mao, C., Zhang, J. and Chen, L., “Forecasting electricity demand by hybrid machine learning model,” in
Neural Information Processing, pp. 952– 963, Springer, 2006.
 [Fan 2007] Fan, S., Mao, C. and Chen, L., “Next-day electricity-price forecasting using a hybrid network,” IET generation,
transmission & distribution, Vol. 1, No. 1, pp. 176–182, 2007.
 [Feder 1994] Feder, M. and Merhav, N., “Relations between entropy and error probability,” IEEE Transactions on Information
Theory, Vol. 40, No. 1, pp. 259–266, 1994.
 [García-Martos 2007] García-Martos, C., Rodríguez, J. and Sánchez, M., “Mixed models for short-run forecasting of
electricity prices: application for the spanish market,” IEEE Transactions on Power Systems, Vol. 22, No. 2, pp. 544–552, 2007.
 [Gelper 2010] Gelper, S., Fried, R. and Croux, C., “Robust forecasting with exponential and holt–winters smoothing,” Journal
of Forecasting, Vol. 29, No. 3, pp. 285–300, 2010.
 [Graves 2013] Graves, A., “Generating sequences with recurrent neural networks,” arXiv preprint arXiv:1308.0850, 2013.
 [Han 2006] Han, S.-J. and Cho, S.-B., “Predicting user’s movement with a combination of self-organizing map and markov
model,” in Proceedings of International Conference Artificial Neural Networks, pp. 884–893, 2006.
 [Hartmann 2007] Hartmann, M. and Schreiber, D., “Prediction algorithms for user actions.” in LWA, pp. 349–354, 2007.
 [Hong 2012] Hong, Y.-Y. and Wu, C.-P., “Day-ahead electricity price forecasting using a hybrid principal component analysis
network,” Energies, Vol. 5, No. 11, pp. 4711–4725, 2012.
71

References
 [Hu 2007] Hu, G.-s., Zhu, F.-f. and Zhang, Y.-z., “Short-term load forecasting based on fuzzy c-mean clustering and weighted
support vector machines,” in Natural Computation, 2007. ICNC 2007. Third International Conference on, Vol. 5, pp. 654–659,
IEEE, 2007.
 [Igel 2000] Igel, C. and Hüsken, M., “Improving the rprop learning algorithm,” in Proceedings of the 2nd International
Symposium on Neural Computation, pp. 115–121, 2000.
 [Jacobs 2002] Jacobs, N. and Blockeel, H., “Sequence prediction with mixed order markov chains,” in Proceedings of the
Belgian/Dutch Conference on Artificial Intelligence, 2002.
 [Jacquet 2002] Jacquet, P., Szpankowski, W. and Apostol, I., “A universal predictor based on pattern matching,” IEEE
Transactions on Information Theory, Vol. 48, No. 6, pp. 1462–1472, 2002.
 [Katsaros 2009] Katsaros, D. and Manolopoulos, Y., “Prediction in wireless networks by markov chains,” IEEE Wireless
Communications, Vol. 16, No. 2, pp. 56–64, 2009.
 [Keynia 2012] Keynia, F., “A new feature selection algorithm and composite neural network for electricity price forecasting,”
Engineering Applications of Artificial Intelligence, Vol. 25, No. 8, pp. 1687–1697, 2012.
 [Kohonen 1990] Kohonen, T., “The self-organizing map,” Proceedings of the IEEE, Vol. 78, No. 9, pp. 1464–1480, 1990.
 [Kouhi 2013] Kouhi, S. and Keynia, F., “A new cascade nn based method to short-term load forecast in deregulated electricity
market,” Energy Conversion and Management, Vol. 71, pp. 76–83, 2013.
 [Laird 1994] Laird, P. and Saul, R., “Discrete sequence prediction and its applications,” Machine learning, Vol. 15, No. 1, pp.
43–68, 1994.
 [Lampinen 1992] Lampinen, J. and Oja, E., “Clustering properties of hierarchical self-organizing maps,” Journal of
Mathematical Imaging and Vision, Vol. 2, No. 2, pp. 261–272, 1992.
 [Lora 2007] Lora, A., Santos, J., Expósito, A., Ramos, J. And Santos, J., “Electricity market price forecasting based on
weighted nearest neighbors techniques,” IEEE Transactions on Power Systems, Vol. 22, No. 3, pp. 1294–1301, 2007.
 [MacQueen 1967] MacQueen, J. et al., “Some methods for classification and analysis of multivariate observations,” in
Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, p. 14, California, USA, 1967.
72

References
 [Martínez-Á lvarez 2008] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Lbf: A labeled-based
forecasting algorithm and its application to electricity price time series,” in Proceedings of the 8th IEEE International Conference
on Data Mining, pp. 453– 461, 2008.
 [Martínez-Á lvarez 2011a] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Discovery of motifs to
forecast outlier occurrence in time series,” Pattern Recognition Letters, Vol. 32, No. 12, pp. 1652–1665, 2011.
 [Martínez-Á lvarez 2011b] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Energy time series
forecasting based on pattern sequence similarity,” IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 8, pp.
1230–1243, 2011.
 [McCulloch 1943] McCulloch, W. S. and Pitts, W., “A logical calculus of the ideas immanent in nervous activity,” The
bulletin of mathematical biophysics, Vol. 5, No. 4, pp. 115–133, 1943.
 [Medsker 1999] Medsker, L. and Jain, L. C., Recurrent neural networks: design and applications, CRC press, 1999.
 [Mikolov 2010] Mikolov, T., Karafiát, M., Burget, L., Cernock`y, J. and Khudanpur, S., “Recurrent neural network based
language model,” in INTERSPEECH, pp. 1045–1048, 2010.
 [Nissen 2003] Nissen, S., “Implementation of a fast artificial neural network library (FANN),” Report, Department of
Computer Science University of Copenhagen (DIKU), Vol. 31, 2003.
 [Nowicka-Zagrajek 2002] Nowicka-Zagrajek, J. and Weron, R., “Modeling electricity loads in california: Arma models with
hyperbolic noise,” Signal Processing, Vol. 82, No. 12, pp. 1903–1915, 2002.
 [NYI 2012] “New york independent system operator,” http://www.nyiso.com, 2012.
 [OME 2012] “Spanish electricity market operator,” http://www.omel.es, 2012.
 [Pahasa 2007] Pahasa, J. and Theera-Umpon, N., “Short-term load forecasting using wavelet transform and support vector
machines,” in Power Engineering Conference, 2007. IPEC 2007. International, pp. 47–52, IEEE, 2007.
 [Pai 2005] Pai, P.-F. and Hong, W.-C., “Support vector machines with simulated annealing algorithms in electricity load
forecasting,” Energy Conversion and Management, Vol. 46, No. 17, pp. 2669–2688, 2005.
73

References
 [Pappas 2010] Pappas, S. S., Ekonomou, L., Karampelas, P., Karamousantas, D., Katsikas, S., Chatzarakis, G. and Skafidas,
P., “Electricity demand load forecasting of the hellenic power system using an arma model,” Electric Power Systems Research,
Vol. 80, No. 3, pp. 256–264, 2010.
 [Parate 2013] Parate, A., Böhmer, M., Chu, D., Ganesan, D. and Marlin, B. M., “Practical prediction and prefetch for faster
access to applications on mobile phones,” in Proceedings of the ACM International joint Conference on Pervasive and
Ubiquitous Computing, pp. 275–284, 2013.
 [Pérez-Ortiz 2001a] Pérez-Ortiz, J. A., Calera-Rubio, J. and Forcada, M. L., “Online symbolic-sequence prediction with
discrete-time recurrent neural networks,” in Proceedings of the International Conference on Artificial Neural Networks, pp. 719–
724, 2001.
 [Pérez-Ortiz 2001b] Pérez-Ortiz, J. A., Calera-Rubio, J. and Forcada, M. L., “Online text prediction with recurrent neural
networks,” Neural Processing Letters, Vol. 14, No. 2, pp. 127–140, 2001.
 [Pindoriya 2008] Pindoriya, N., Singh, S. and Singh, S., “An adaptive wavelet neural network-based energy price forecasting
in electricity markets,” IEEE Transactions on Power Systems, Vol. 23, No. 3, pp. 1423–1432, 2008.
 [Pölzlbauer 2004] Pölzlbauer, G., “Survey and comparison of quality measures for self-organizing maps,” in 5th Workshop on
Data Analysis, pp. 67–82, 2004.
 [Pousinho 2012] Pousinho, H., Mendes, V. and Catalão, J., “Shortterm electricity prices forecasting in a competitive market by
a hybrid pso–anfis approach,” International Journal of Electrical Power & Energy Systems, Vol. 39, No. 1, pp. 29–35, 2012.
 [Prasad 2010] Prasad, P. S. and Agrawal, P., “Movement prediction in wireless networks using mobility traces,” in 7th IEEE
Consumer Communications and Networking Conference, pp. 1–5, 2010.
 [Shafie-Khah 2011] Shafie-Khah, M., Moghaddam, M. P. and Sheikh- El-Eslami, M., “Price forecasting of day-ahead
electricity markets using a hybrid forecast method,” Energy Conversion and Management, Vol. 52, No. 5, pp. 2165–2169, 2011.
 [Shayeghi 2013] Shayeghi, H. and Ghasemi, A., “Day-ahead electricity prices forecasting by a modified cgsa technique and
hybrid wt in lssvm based scheme,” Energy Conversion and Management, Vol. 74, pp. 482–491, 2013.
74

References
 [Shen 2013] Shen, W., Babushkin, V., Aung, Z. and Woon, W. L., “An ensemble model for day-ahead electricity demand time
series forecasting,” in Proceedings of the 4th International Conference on Future Energy Systems, 2013.
 [Shrivastava 2014] Shrivastava, N. A. and Panigrahi, B. K., “A hybrid wavelet-elm based short term price forecasting for
electricity markets,” International Journal of Electrical Power & Energy Systems, Vol. 55, pp. 41–50, 2014.
 [Skaruz 2007] Skaruz, J. and Seredynski, F., “Recurrent neural networks towards detection of sql attacks,” in IEEE
International Parallel and Distributed Processing Symposium, pp. 1–8, 2007.
 [SOM 2012] “SOMVIS,” http://www.ifs.tuwien.ac.at/dm/ somvis-matlab/index.html#License, 2012.
 [Tan 2010] Tan, Z., Zhang, J.,Wang, J. and Xu, J., “Day-ahead electricity price forecasting using wavelet transform combined
with ARIMA and GARCH models,” Applied Energy, Vol. 87, No. 11, pp. 3606–3610, 2010.
 [Tino 2000] Tino, P., Stancík, M. and Benuskova, L., “Building predictive models on complex symbolic sequences with a
second-order recurrent bcm network with lateral inhibition,” in Proceedings of International Joint Conference on Neural
Networks, Vol. 2, pp. 265–270, 2000.
 [Troncoso Lora 2004] Troncoso Lora, A., Riquelme Santos, J., Riquelme, J., Gómez Expósito, A. and Martínez Ramos, J.,
“Time-series prediction: application to the shortterm electric energy demand,” Lecture Notes in Artificial Intelligence, Vol. 23,
pp. 577–586, 2004.
 [Venna 2001] Venna, J. and Kaski, S., “Neighborhood preservation in nonlinear projection methods: An experimental study,”
Artificial Neural Networks-ICANN 2001, pp. 485–491, 2001
 [Vesanto 2000] Vesanto, J., Himberg, J., Alhoniemi, E. and Parhankangas, J., SOM toolbox for matlab 5, 2000, Helsinki
University of Technology, Report.
 [Willems 1995] Willems, F. M., Shtarkov, Y. M. and Tjalkens, T. J., “The context-tree weighting method: Basic roperties,”
IEEE Transactions on Information Theory, Vol. 41, No. 3, pp. 653–664, 1995.
 [Xu 2005] Xu, R., Wunsch, D. et al., “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, Vol. 16, No.
3, pp. 645–678, 2005.
75

References
 [Xu 2009] Xu, Y. and Nagasaka, K., “Demand and price forecasting by artificial neural networks (ANNs) in a deregulated
power market,” International Journal of Electrical and Power Engineering, Vol. 3, No. 6, pp. 268–275, 2009.
 [Zhang 2012] Zhang12] Zhang, J., Tan, Z. and Yang, S., “Day-ahead electricity price forecasting by a new hybrid method,”
Computers & Industrial Engineering, Vol. 63, No. 3, pp. 695–701, 2012.
 [Zhou04] Zhou, M., Yan, Z., Ni, Y. and Li, G., “An arima approach to forecasting electricity price with accuracy improvement
by predicted errors,” in Power Engineering Society General Meeting, 2004. IEEE, pp. 233–238, IEEE, 2004.
 [Ziv 1977] Ziv, J. and Lempel, A., “A universal algorithm for sequential data compression,” IEEE Transactions on
Information Theory, Vol. 23, No. 3, pp. 337– 343, 1977.
 [Ziv 1978] Ziv, J. and Lempel, A., “Compression of individual sequences via variable-rate coding,” IEEE Transactions on
Information Theory, Vol. 24, No. 5, pp. 530–536, 1978.
76

Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm

Similaire à Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm (20)

Dernier

Dernier (20)

Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm