2. Comments
All of the Self Organizing Feature Map(SOM), Topographic
Product, Cascade 2 Algorithms are used in SCPSNSP algorithm.
SCPSNSP
Paper
A SOM clustering pattern sequence-based next symbol prediction method for
day-ahead direct electricity load and price forecasting
Journal
Energy Conversion and Management, Vol. 90, 2015
https://www.sciencedirect.com/science/article/pii/S0196890414009662
2
3. The SCPSNSP Algorithm
SCPSNSP: SOM Cluster Pattern Sequence-based Next Symbol Prediction
- Clustering
- Data Normalization
- SOM Clustering
- Feature Map Optimization
- Next Symbol Prediction
- Determine Window Size
- Cascade 2 Algorithm
- Find the Nearest Nonzero Hits Pattern
4. Proposed Prediction Framework
4
SOM Clustering
NormalizationData Labeled data
ANN
(Cascade 2)
Determine
Window Size W
Label
Sequence
Find Nearest
Cluster LabelLabel
coordinates
Label
coordinates
Clustering
Next Symbol Prediction
Denormalization
Feature Map Optimization
1
2
3
4 5 6
5. Data Clustering
5
Hourly electricity time series vector of day i:
X(i) = [x1, x2, …, x24]
By clustering these vectors, we can obtain
5 2 2 12 4 6 … 3 5
Each day
Day Time_00:00 Time_01:00 Time_02:00 … Time_23:00 Cluster
2012-01-01 20972.0 19904.3 19837.6 … 19864.9 5
2012-01-02 21140.5 19908.6 19691.8 … 19742.1 2
2012-01-03 20803.5 19447.2 18555.5 … 18592.0 2
2012-01-04 21313.8 19935.3 19595.5 … 19187.8 12
2012-01-05 19258.8 18735.5 18248.7 … 19829.1 4
2012-01-06 18980.2 18532.7 17963.6 … 19908.4 6
… … … … … … …
2012-09-02 21215.6 19766.8 18961.9 … 18949.6 3
2013-09-03 21315.8 21919.1 22137.1 18093.2 5
Electricity time series data:
6. 5 2 2 12 4 6 … 3 5 x … x
Window, W=3
Clustering Pattern Sequence
① ② ③
0
4
0
1
0
1
1
3
5 2 2
(0,4) (0,1) (0,1)
12
(1,3)
Each day to be
predicted
Input pattern sequence Output pattern
Each pattern
coordinates
①
2 2 12
(0,1) (0,1) (1,3)
4
(0,3)
0
1
0
1
1
3
0
3
0
5
2 12 4
(0,1) (1,3) (0,3)
6
(0,5)
0
1
1
3
0
3
③
②
…
…
Test dataset
Input Layer
Hidden
Layer 1 Output Layer
c1
r1
c2
r2
cw
rw
cw+1
rw+1
Hidden
Layer 2
Hidden
Layer 3
7. Why we choose SOM and ANN?
SOM [Kohonen 1990]
The function of preserving the topological properties of the
input space
However, other clustering algorithms do not provide any
topology relations between cluster patterns
Cascade 2 [Fahlman96] training algorithm
It can model the complex relationships between input and
output
No need to determine the number of hidden layers and
neurons in advance
7
8. Pseudo Code of Proposed Method
8
SCPSNSP:
SOM Clustering Pattern-Sequence Next Symbol Prediction method
using Cascade 2 algorithm
10. Normalization
Time series data
May be recorded at different situation or different
measurement unit
Ex: electricity price of this year increased two or three times
higher than that of last year
In order to analyze in equal condition,
where N is 24 and is the time series value at ith hour of a day.
10
1
1
i
i
N
ii
x
x
x
N
12. Self Organizing Maps(SOM)
Unsupervised learning neural network
Projects high-dimensional input data onto two-
dimensional output map
Preserves the topology of the input data
Primarily used for organization and visualization of
complex data
12
13. Self Organizing Maps(SOM)
Lattice of neurons (‘nodes’) accepts and responds to set of input signals
Responses compared; ‘winning’ neuron selected from lattice
Selected neuron activated together with ‘neighbourhood’ neurons
Adaptive process changes weights to more closely resemble inputs
13
14. Self Organizing Maps(SOM)
SOM algorithm
1. Randomly initialise all weights
2. Select input vector x = [x1, x2, x3, … , xn]
3. Compare x with weights wj for each neuron j to determine winner
4. Update winner so that it becomes more like x, together with the
winner’s neighbours
5. Adjust parameters: learning rate & ‘neighbourhood function’
6. Repeat from (2) until the map has converged (i.e. no noticeable
changes in the weights) or pre-defined no. of training cycles have
passed
14
19. Neuron index and coordinates
Neuron index: I
Coordinates: (ri, ci)
floor() : returns the nearest smaller integer
mod() : returns modulus after division
19
R
C
(( 1) / ),
mod( 1, ),
* 1
i
i
i i
r floor i C
c i C
i r C c
20. Self Organizing Maps(SOM)
Map size:
[R, C]: the number of neuron: M = R * C
Vector initialization
Find the BMU
BMU: Best Matching Unit
20
1 2[ , ,..., ]( 1,2, , )i i i idm m m m i M
arg minc i
i
v m v m
21. Self Organizing Maps(SOM)
Update the codebook vectors
According to the training algorithms
Sequential training
Batch training
Sequential training
v(t): input vector randomly selected from input dataset at
time t
(t): the learning rate,
hci(t): neighborhood function
21
( 1) ( ) ( ) ( )[ ( ) ( )]i i ci im t m t t h t v t m t
22. Self Organizing Maps(SOM)
Batch training
c: the index of the BMU of v(j)
n: the number of data samples
t: t-th iteration
22
1
1
( ) (j)
( 1)
( )
n
cij
i
cij
n
h t v
m t
h t
23. Self Organizing Maps(SOM)
Memory efficient implementation of Batch
m: the number of neurons in the feature map
ni: the number of data samples in unit i
: the mean of the input data in node j
23
1
1
1
( ) ( )
( ) ( ), ( 1)
( )
in
ij jj
i i
j j ijj
m
m
h t s t
t v j m t
n h
s
t
1
1
( ) ( )
( 1)
( )
m
j ij jj
i
j ijj
m
n h t v t
m t
n h t
jv
or
24. Self Organizing Maps(SOM)
Learning process
Ordering
reduce of learning rate and neighborhood size with
iterations.
Convergence
the SOM is fine tuned with the shrunk neighborhood and
constant learning rate.
24
25. Self Organizing Maps(SOM)
Common neighborhood function
Note: is measured in the output space
25
2
2
( ) exp
2 ( )
c i
ci
r r
h t
t
c ir r
(t): defines the radius of the
neighborhood around the BMU
26. SOM Sample Hits
Sample hits
26
Zero hits pattern is not considered
27. Advantages and disadvantages of SOM
Advantages:
Projects high dimensional data onto lower dimensional map
Preserve the topological properties of the input space so that similar
objects will be mapped to nearby locations on the map
Useful for visualization
Preserve the topology relations of input space
Disadvantages:
Clustering result depends on initial weight vector
Requires large quantity of good training data
Difficult to determine optimal map size
Edge/boundary effect(Neurons on the map boundary have fewer
neighbors) in traditional SOM(two-dimensional regular spacing in
rectangular or hexagonal grid)
High computation cost
27
29. Why Need Feature Map Optimization?
Feature map size
Important to detect the deviation of the data
It directly influences the clustering
Usually, it is recommended that
Choose large feature map size even when there is few input
data samples
Small map has relatively no freedom to align topological
relations between neurons
29
30. Why Need Feature Map Optimization?
If map size is too large
The differences between neurons are too detail
The computation cost will increase extremely high due to a
huge number of neurons
If map size is too small
The output patterns will be more general
Could miss some important differences that should be
detected.
30
31. Map Quality Measures for SOM
Survey of Map quality measures [Pölzlbauer 2004]
Several measures were introduced
However, no measure is suitable in all cases
Quantization Error
Completely disregards map topology and alignment.
Computed by determining the average distance of the sample vectors to
the cluster centroids by which they are represented
Average distance between each data vector and its BMU
The value decreases with the increase of the map size
Topographic Error
The proportion of all data vectors for which first and second BMUs are
not adjacent units
However, the work [Pölzlbauer04] concluded that topographic error is not
reliable for small maps
31
32. Map Quality Measures for SOM
Trustworthiness and Neighborhood Preservation
[Venna 2001]
Determines whether the projected data points which are actually visualized are
close to each other in input space
Cannot take a single value, instead a series of values to represent the map quality
which depend on one parameter
SOM distortion [Lampinen 1992]
In some assumption, it can be seen as a cost function
However, cannot compare map quality of different sizes.
Topographic Product [Bauer 1992]
Only the map's codebook is regarded.
Indicates whether the size of the map is appropriate to fit onto the dataset
Represent the appropriateness of the feature map size for the given training data
with a single value
However, this value depends on the codebook vector initialization
32
33. Topographic Product (1/10)
Notation
Input Space: V
Output Space(map): A
wj : weight vector of node j
Here, node = neuron
the distance in the input V between wj and v
: kth nearest-neighbor of node j in the output space
: kth nearest-neighbor of node j in the input space
33
: ( , ) min ( , )V V
i j
j A
i d w v d w v
( , ):V
id w v
( )k
A
n j
( )k
V
n j
34. Topographic Product (2/10)
In the output space:
Measure with coordinates
In the Input space:
Measure with weight vectors
34
1
1 1
{ }
2 2
{ , ( )}
( ) : ( , ( )) min ( , )
( ) : ( , ( )) min ( , )A
A A A A
j A j
A A A A
j A j n j
n j d j n j d j j
n j d j n j d j j
1
2
1
1 ( ) { }
2 ( ) { , ( )}
( ): ( , ) min ( , )
( ): ( , ) min ( , )
V
V
V
V V V
j j jn j j A j
V V A
j j jn j j A j n j
n j d w w d w w
n j d w w d w w
35. Topographic Product (3/10)
Relation of k nearest neighbors in input space
Relation of k nearest neighbors in output space
Q1(j,k) = Q2(j,k) = 1
Only if the nearest neighbors of order k in the input and
output space coincide.
35
( )
1
( )
( , )
( , )
( , )
A
k
V
k
V
j n j
V
j n j
d w w
Q j k
d w w
2
( , ( ))
( , )
( , ( ))
A A
k
A V
k
d j n j
Q j k
d j n j
36. Topographic Product (4/10)
The first nearest neighbor
coincide
36
1
1
(3) 4
(3) 4
V
A
n
n
The Second nearest neighbor
not coincide
2
2
(3) 5
(3) 2
V
A
n
n
Therefore,
3 2
1
3 5
( , )
(3,2) 1
( , )
V
V
d w w
Q
d w w
Input space output space
3
4
2
1
5
26
27
28
37. Topographic Product (5/10)
The pointers in the input space form a line in the same way as
the nodes in the output space
However, previous definition is sensitive to the nearest
neighbor ordering
To cancel the nearest neighbors ordering constrains,
normalize as
37
1/
1 1
1
( , ) ( , )
kk
l
P j k Q j l
1/
2 2
1
( , ) ( , )
kk
l
P j k Q j l
1
2
( , ) 1
( , ) 1
P j k
P j k
Now we have
38. Topographic Product (6/10)
Now P1, P2 are only sensitive to severe neighborhood violations
Example
Close in the input space, but far from in the output space
38
1/3
3 4 3 2 3 5
1 2
3 4 3 5 3 2
( , ) ( , ) ( , )
(3,3) 1 (3,3) 1
( , ) ( , ) ( , )
V V V
V V V
d w w d w w d w w
P with P
d w w d w w d w w
4 (3) 28V
n
Far away
39. Topographic Product (7/10)
39
1/4
3 4 3 2 3 5 3 1
1
3 4 3 5 3 2 3 28
1/4
3 1
3 28
( , ) ( , ) ( , ) ( , )
(3,4)
( , ) ( , ) ( , ) ( , )
( , )
1
( , )
V V V V
V V V V
V
V
d w w d w w d w w d w w
P
d w w d w w d w w d w w
d w w
d w w
1/4
2
1/4
(3,4) (3,2) (3,5) (3,1)
(3,4)
(3,4) (3,5) (3,2) (3,28)
(3,1)
1
(3,28)
A A A A
A A A A
A
A
d d d d
P
d d d d
d
d
In this case, a small deviation from 1 for P1 and a very strong deviation for P2.
<<
40. Topographic Product (8/10)
Constant magnification factors of the map do not
change next-neighbor ordering
Therefore, P1 and P2 are insensitive to constant
gradients of the map so that we combine
40
1/2
3 1 2
1
( , ) ( , ) ( , )
kk
l
P j k Q j l Q j l
However, only for one neuron
41. Topographic Product (9/10)
By averaging for the whole map
Therefore
P 0: perfect map
P < 0: map too small
P > 0: map too big
41
1
3
1 1
1
log( ( , ))
( 1)
N N
j k
P P j k
N N
43. Heuristic Way for Map Size
Predefine map size [Vesanto 2000]
M: total number of neurons
n: the number of training data
43
5M n
TheLargestEigenvalue
TheSecondLargestEigenvalue
*
R
C
M R C
A B C
The given dataset
Recommended map size: [R, C]
44. Trained Map Size Range
Previous recommended map size
[R, C]
Trained map size
Map size [i, j]
5 i 1.5*max([R, C])
5 j 1.5*max([R, C])
Among these various trained map,
We choose map size with the smallest topographic product
value
44
46. Self Organizing Maps(SOM)
What window size is better for prediction?
46
5 1 1 3 3 8 4 3 5 3
Window, w=3
Clustering Label Sequence
①
②
③
47. Determine the window size W
It is unrealistic to consider the whole pattern sequence
in the further steps
A window W of patterns before days to be predicted is
chosen to minimize [Martínez-Á lvarez 2008]
where
D: The number of consecutive days to be predicted
N: The number of time series values recorded in equal time
interval within a day, N = 24
47
1 1
ˆ
D N
i i
d i
x x
49. Artificial Neural Network
ANN is a model (emulation) of information processing
capabilities of nervous systems.
A neuron receives signals (impulses) from other neurons
through its dendrites (receivers) and transmits signals generated
by its cell body along the axon (transmitter). The terminals of
axon branches are the synapses. It is a functional unit between
two neurons.
49
50. Artificial Neural Network
Artificial Neural Network (ANN)
A mathematical model or computational model that is inspired by the
structure and/or functional aspects of biological neural networks
Consists of an interconnected group of artificial neurons, and it processes
information using a connectionist approach to computation
However, difficult to determine network architecture
If too many hidden neurons, may cause overfitting
If too small, may cause underfitting
50
51. Cascade architecture
Cascade architecture
No need to determine the number of layers and hidden nodes
Cascade architecture consists of
A cascade algorithm
How the neurons should be added to the network
Ex: Cascade-Correlation, Cascade 2
A weight update algorithm
Quickprop, RPROP
Key idea of cascade architecture
Neurons are added to the network one at time
Their input weights do not change after added
51
52. Cascade-Correlation
Cascade 2 [Fahlman 1996] is based on Cascade-Correlation [Fahlman 1989]
Cascade-Correlation
Start with a net without hidden nodes
Hidden neurons have connections from all input and all existing hidden nodes
During candidate neuron training, only trains input to the candidate neuron, the
other weights are kept frozen
52
Weight Frozen
Weight training
added
candidate
1st candidate:
53. Cascade-Correlation
Correlation learning
After added, input connections to candidate neurons are kept frozen
Then initialize all output connections to all output neurons with small
random values
53
2nd candidate:
addedcandidate
54. Cascade-Correlation
Train wnew to maximize covariance
covariance between x and Eold
54
,
: The number of output neurons
: The number of training sample
: Vector of weights to the candidate neuron
: Output of for sample
: Average candidate output value of over all samp
new
th
new p
new
K
P
w
x x p
x x
,
les
: Error on output node for sample with old weights
: Average error at output neuron k over all samples
th th
k p
k
E k p
E
where,))(()(
1
,
1
,
K
k
kpk
P
p
newpnewnew EExxwS
xnew
wnew
55. Cascade-Correlation
Drawback of Cascade-Correlation
Covariance training
Tendency to overcompensate for errors
Because S give large activations whenever the error of the
ANN deviates from the average
Deep networks
Overfitting
Weight freezing
Does not manage to escape from the minimum during
candidate training, without achieving better performance
55
56. Cascade 2 algorithm
Overcoming the drawbacks of previous problems
Covariance training
Replaced by direct error minimization
Deep networks
Also difficult problem
Simply use more training patterns
Weight freezing
Two part training: output connections are trained first, and
then the entire ANN is trained
Allows to find a proper place before the entire training
56
57. Cascade 2 algorithm
Cascade 2 algorithm VS Cascade-Correlation
Based on Cascade-Correlation
Candidates training has been changed
Candidates in Cascade 2 have trainable output connections to
all of the outputs in the ANN
No direct connections in Cascade-Correlation
Output connections are trained together with the inputs to the
candidates
Separately in Cascade-Correlation
57
58. Cascade 2 algorithm
Train candidate to minimize
The difference between the error of the output neurons and the input from
the candidate to output neurons
58
,
: weight vector of candidate neuron
: The number of output neurons
: The number of training patterns
: the error of output neuron k for training pattern p
: The output value of candidate ne
new
k p
p
w
K
P
e
c
,
uron
: The connection weight vector from
candidate neuron c to output neuron k
c kw
2
, ,
0 0
( ) ( ) , where
K P
new k p p c k
k p
S w e c w
k
h Output node error
ek,p
wc,k
h
c
cp
59. Cascade 2 algorithm
The ek,p is calculated as follows:
59
, , ,k p k p k pe d y
, , ,
0
( )
n
k p k j k j p
j
y g w x
,
,
,
: Desired value of output neuron k for training pattern p
: Actual forecasted value of output neuron k for training pattern
: The number of incoming connections to outuput neuron k
: Conne
k p
k p
j k
d
y
n
w
,
ction weight from neuron j to output neuron k
: The output value of neuron j for training pattern p
: Activation function used at the output neuron k
j p
k
x
g
k
h
wj,k
h
c
cp
yk,p
60. Cascade 2 algorithm
If the linear activation function
After candidate is added, the updated forecasted output
value
60
, , ,
0
( )
n
k p k j k j p
j
y g w x
, , ,
0
n
k p j k j p
j
y w x
Changed to
k
h
wc,k
h
c
cp, , ,
new
k p c k p k py w c y
, , , ,
0
n
new
k p c k p j k j p
j
y w c w x
yk,p
61. Cascade 2 algorithm
The new error can be
61
, , ,
new new
k p k p k pe d y
, , , ,( )new
k p k p k p c k pe d y w c
, , , ,
new
k p k p k p c k pe d y w c
, , ,
new
k p k p c k pe e w c
If the linear activation function is used and the candidate has been added
to the network, new error and old error only change by Wc,kCp
Interesting finding:
Candidate is trained to
minimize this
, , ,
new
k p c k p k py w c y
62. ANN Normalization
Clustering Label Sequence
Cluster label is related to feature map size
(r, c) is coordinates
R and C: rows and columns of the feature map size
62
/ ( 1), / ( 1)norm normr r R c c C
R
C
65. Find the Nearest Pattern
According to the topology relations,
1. sort the patterns with ascending order of Euclidean
distance
2. check whether it is a nonzero hits pattern, else find the 2nd
nearest pattern until it meets the nonzero hits pattern at the
first time.
65
67. Summary
Time series prediction
Discover patterns using SOM clustering
Feature map optimization
Cascade 2 training with pattern’s coordinates
No need to determine the neural network architecture size
Directly forecast future time series with the nearest nonzero
patterns
67
68. References
[Abraham 2001] Abraham, A. and Nath, B., “A neuro-fuzzy approach for modelling electricity demand in victoria,” Applied
Soft Computing, Vol. 1, No. 3040, pp. 127–138, 2001.
[Amjady 2006] Amjady, N., “Day-ahead price forecasting of electricity markets by a new fuzzy neural network,” IEEE
Transactions on Power Systems, Vol. 21, No. 2, pp. 887–896, 2006.
[Amjady 2009a] Amjady, N. and Daraeepour, A., “Design of input vector for day-ahead price forecasting of electricity
markets,” Expert Systems with Applications, Vol. 36, No. 10, pp. 12281–12294, 2009.
[Amjady 2009b] Amjady, N. and Daraeepour, A., “Mixed price and load forecasting of electricity markets by a new iterative
prediction method,” Electric Power Systems Research, Vol. 79, No. 9, pp. 1329–1336, 2009.
[Amjady 2009c] Amjady, N. and Hemmati, M., “Day-ahead price forecasting of electricity markets by a hybrid intelligent
system,” European Transactions on Electrical Power, Vol. 19, No. 1, pp. 89–102, 2009.
[Amjady 2009d] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by a new feature selection
algorithm and cascaded neural network technique,” Energy Conversion and Management, Vol. 50, No. 12, pp. 2976–2982, 2009.
[Amjady 2009e] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by mutual information
technique and cascaded neuro-evolutionary algorithm,” IEEE Transactions on Power Systems, Vol. 24, No. 1, pp. 306–318,
2009.
[Amjady 2009f] Amjady, N. and Keynia, F., “Day-ahead price forecasting of electricity markets by mutual information
technique and cascaded neuro-evolutionary algorithm,” IEEE Transactions on Power Systems, Vol. 24, No. 1, pp. 306–318,
2009.
[Amjady 2009g] Amjady, N. and Keynia, F., “Short-term load forecasting of power systems by combination of wavelet
transform and neuro-evolutionary algorithm,” Energy, Vol. 34, No. 1, pp. 46–57, 2009.
[Amjady 2010a] Amjady, N., Daraeepour, A. and Keynia, F., “Dayahead electricity price forecasting by modified relief
algorithm and hybrid neural network,” IET generation, transmission & distribution, Vol. 4, No. 3, pp. 432–444, 2010.
[Amjady 2010b] Amjady, N. and Keynia, F., “Application of a new hybrid neuro-evolutionary system for day-ahead price
forecasting of electricity markets,” Applied Soft Computing, Vol. 10, No. 3, pp. 784–792, 2010.
68
69. References
[An 2013] An, N., Zhao, W., Wang, J., Shang, D. and Zhao, E., “Using multi-output feedforward neural network with
empirical mode decomposition based signal filtering for electricity demand forecasting,” Energy, Vol. 49, No. 1, pp. 279 – 288,
2013.
[Anbazhagan 2013] Anbazhagan, S. and Kumarappan, N., “Day-ahead deregulated electricity market price forecasting using
recurrent neural network,” IEEE Systems Journal, Vol. 7, No. 4, pp. 866–872, 2013.
[Anbazhagan 2014] Anbazhagan, S. and Kumarappan, N., “Day-ahead deregulated electricity market price forecasting using
neural network input featured by DCT,” Energy Conversion and Management, Vol. 78, No. 0, pp. 711 – 719, 2014.
[ANE 2012] “Australian energy market operator,” http://www.nemmco.com.au, 2012.
[Bauer 1992] Bauer, H. and Pawelzik, K., “Quantifying the neighborhood preservation of self-organizing feature maps,” IEEE
Transactions on Neural Networks, Vol. 3, No. 4, pp. 570–579, 1992.
[Burrows 1994] Burrows, M. and Wheeler, D. J., “A block-sorting lossless data compression algorithm,” Digital SRC
Research Report, Tech. Rep., 1994.
[Catalão 2007] Catalão, J., Mariano, S., Mendes, V. and Ferreira, L., “Short-term electricity prices forecasting in a competitive
market: A neural network approach,” Electric Power Systems Research, Vol. 77, No. 10, pp. 1297–1304, 2007.
[Catalão 2009] Catalão, J., Pousinho, H. and Mendes, V., “Neural networks and wavelet transform for short-term electricity
prices forecasting,” in 15th International Conference on Intelligent System Applications to Power Systems, pp. 1–5, 2009.
[Catalão 2011a] Catalão, J., Pousinho, H. and Mendes, V., “Hybrid wavelet-PSO-ANFIS approach for short-term electricity
prices forecasting,” IEEE Transactions on Power Systems, Vol. 26, No. 1, pp. 137–144, 2011.
[Catalão 2011b] Catalão, J., Pousinho, H. and Mendes, V., “Shortterm electricity prices forecasting in a competitive market by a
hybrid intelligent approach,” Energy Conversion and Management, Vol. 52, No. 2, pp. 1061–1065, 2011.
[Che 2010] Che, J. and Wang, J., “Short-term electricity prices forecasting based on support vector regression and auto-
regressive integrated moving average modeling,” Energy Conversion and Management, Vol. 51, No. 10, pp. 1911–1917, 2010.
69
70. References
[Che 2012] Che, J., Wang, J. and Wang, G., “An adaptive fuzzy combination model based on self-organizing map and support
vector regression for electric load forecasting,” Energy, Vol. 37, No. 1, pp. 657–664, 2012.
[Chen 1995] Chen, J.-F., Wang, W.-M. and Huang, C.-M., “Analysis of an adaptive time-series autoregressive moving-
average (arma) model for short-term load forecasting,” Electric Power Systems Research, Vol. 34, No. 3, pp. 187–196, 1995.
[Chen 2007] Chen, J., Deng, S.-J. and Huo, X., “Electricity price curve modeling by manifold learning,” IEEE Transactions on
Power Systems, Vol. 15, pp. 723–736, 2007.
[Cleary 1984] Cleary, J. and Witten, I., “Data compression using adaptive coding and partial string matching,” IEEE
Transactions on Communications, Vol. 32, No. 4, pp. 396–402, 1984.
[Cleeremans 1989] Cleeremans, A., Servan-Schreiber, D. and McClelland, J. L., “Finite state automata and simple recurrent
networks,” Neural computation, Vol. 1, No. 3, pp. 372–381, 1989.
[Conejo 2005] Conejo, A., Plazas, M., Espinola, R. and Molina, A., “Day-ahead electricity price forecasting using the wavelet
transform and ARIMA models,” IEEE Transactions on Power Systems, Vol. 20, No. 2, pp. 1035–1042, 2005.
[Contreras 2003] Contreras, J., Espinola, R., ogales, F. and Conejo, A., “ARIMA models to predict next-day electricity
prices,” IEEE Transactions on Power Systems, Vol. 18, No. 3, pp. 1014–1020, 2003.
[Cormack 1987] Cormack, G. V. and Horspool, R., “Data compression using dynamic markov modelling,” The Computer
Journal, Vol. 30, No. 6, pp. 541–550, 1987.
[Cortes 1995] Cortes, C. and Vapnik, V., “Support-vector networks,” Machine learning, Vol. 20, No. 3, pp. 273–297, 1995.
[Diebold 1995] Diebold, F. X. and Mariano, R. S., “Comparing predictive accuracy,” Journal of Business & Economic
Statistics, Vol. 13, No. 3, pp. 253–263, 1995.
[Dong 2011] Dong, Y., Wang, J., Jiang, H. and Wu, J., “Shortterm electricity price forecast based on the improved hybrid
model,” Energy Conversion and Management, Vol. 52, No. 8, pp. 2987–2995, 2011.
[Ehrenfeucht 1992] Ehrenfeucht, A. and Mycielski, J., “A pseudorandom sequence–how random is it?” American
Mathematical Monthly, pp. 373–375, 1992.
70
71. References
[Fahlman 1996] Fahlman, S., Baker, L. and Boyan, J., “The cascade 2 learning architecture,” Techanical Report. CMUCS-
TR-96-184, Carnegie Mellon University, 1996.
[Fan 2006a] Fan, S. and Chen, L., “Short-term load forecasting based on an adaptive hybrid method,” IEEE Transactions on
Power Systems, Vol. 21, No. 1, pp. 392–401, 2006
[Fan 2006b] Fan, S., Liao, J. R., Kaneko, K. and Chen, L., “An integrated machine learning model for day-ahead electricity
price forecasting,” in Power Systems Conference and Exposition, 2006. PSCE’06. 2006 IEEE PES, pp. 1643–1649, IEEE, 2006.
[Fan 2006c] Fan, S., Mao, C., Zhang, J. and Chen, L., “Forecasting electricity demand by hybrid machine learning model,” in
Neural Information Processing, pp. 952– 963, Springer, 2006.
[Fan 2007] Fan, S., Mao, C. and Chen, L., “Next-day electricity-price forecasting using a hybrid network,” IET generation,
transmission & distribution, Vol. 1, No. 1, pp. 176–182, 2007.
[Feder 1994] Feder, M. and Merhav, N., “Relations between entropy and error probability,” IEEE Transactions on Information
Theory, Vol. 40, No. 1, pp. 259–266, 1994.
[García-Martos 2007] García-Martos, C., Rodríguez, J. and Sánchez, M., “Mixed models for short-run forecasting of
electricity prices: application for the spanish market,” IEEE Transactions on Power Systems, Vol. 22, No. 2, pp. 544–552, 2007.
[Gelper 2010] Gelper, S., Fried, R. and Croux, C., “Robust forecasting with exponential and holt–winters smoothing,” Journal
of Forecasting, Vol. 29, No. 3, pp. 285–300, 2010.
[Graves 2013] Graves, A., “Generating sequences with recurrent neural networks,” arXiv preprint arXiv:1308.0850, 2013.
[Han 2006] Han, S.-J. and Cho, S.-B., “Predicting user’s movement with a combination of self-organizing map and markov
model,” in Proceedings of International Conference Artificial Neural Networks, pp. 884–893, 2006.
[Hartmann 2007] Hartmann, M. and Schreiber, D., “Prediction algorithms for user actions.” in LWA, pp. 349–354, 2007.
[Hong 2012] Hong, Y.-Y. and Wu, C.-P., “Day-ahead electricity price forecasting using a hybrid principal component analysis
network,” Energies, Vol. 5, No. 11, pp. 4711–4725, 2012.
71
72. References
[Hu 2007] Hu, G.-s., Zhu, F.-f. and Zhang, Y.-z., “Short-term load forecasting based on fuzzy c-mean clustering and weighted
support vector machines,” in Natural Computation, 2007. ICNC 2007. Third International Conference on, Vol. 5, pp. 654–659,
IEEE, 2007.
[Igel 2000] Igel, C. and Hüsken, M., “Improving the rprop learning algorithm,” in Proceedings of the 2nd International
Symposium on Neural Computation, pp. 115–121, 2000.
[Jacobs 2002] Jacobs, N. and Blockeel, H., “Sequence prediction with mixed order markov chains,” in Proceedings of the
Belgian/Dutch Conference on Artificial Intelligence, 2002.
[Jacquet 2002] Jacquet, P., Szpankowski, W. and Apostol, I., “A universal predictor based on pattern matching,” IEEE
Transactions on Information Theory, Vol. 48, No. 6, pp. 1462–1472, 2002.
[Katsaros 2009] Katsaros, D. and Manolopoulos, Y., “Prediction in wireless networks by markov chains,” IEEE Wireless
Communications, Vol. 16, No. 2, pp. 56–64, 2009.
[Keynia 2012] Keynia, F., “A new feature selection algorithm and composite neural network for electricity price forecasting,”
Engineering Applications of Artificial Intelligence, Vol. 25, No. 8, pp. 1687–1697, 2012.
[Kohonen 1990] Kohonen, T., “The self-organizing map,” Proceedings of the IEEE, Vol. 78, No. 9, pp. 1464–1480, 1990.
[Kouhi 2013] Kouhi, S. and Keynia, F., “A new cascade nn based method to short-term load forecast in deregulated electricity
market,” Energy Conversion and Management, Vol. 71, pp. 76–83, 2013.
[Laird 1994] Laird, P. and Saul, R., “Discrete sequence prediction and its applications,” Machine learning, Vol. 15, No. 1, pp.
43–68, 1994.
[Lampinen 1992] Lampinen, J. and Oja, E., “Clustering properties of hierarchical self-organizing maps,” Journal of
Mathematical Imaging and Vision, Vol. 2, No. 2, pp. 261–272, 1992.
[Lora 2007] Lora, A., Santos, J., Expósito, A., Ramos, J. And Santos, J., “Electricity market price forecasting based on
weighted nearest neighbors techniques,” IEEE Transactions on Power Systems, Vol. 22, No. 3, pp. 1294–1301, 2007.
[MacQueen 1967] MacQueen, J. et al., “Some methods for classification and analysis of multivariate observations,” in
Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, p. 14, California, USA, 1967.
72
73. References
[Martínez-Á lvarez 2008] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Lbf: A labeled-based
forecasting algorithm and its application to electricity price time series,” in Proceedings of the 8th IEEE International Conference
on Data Mining, pp. 453– 461, 2008.
[Martínez-Á lvarez 2011a] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Discovery of motifs to
forecast outlier occurrence in time series,” Pattern Recognition Letters, Vol. 32, No. 12, pp. 1652–1665, 2011.
[Martínez-Á lvarez 2011b] Martínez-Á lvarez, F., Troncoso, A., Riquelme, J. and Aguilar-Ruiz, J., “Energy time series
forecasting based on pattern sequence similarity,” IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 8, pp.
1230–1243, 2011.
[McCulloch 1943] McCulloch, W. S. and Pitts, W., “A logical calculus of the ideas immanent in nervous activity,” The
bulletin of mathematical biophysics, Vol. 5, No. 4, pp. 115–133, 1943.
[Medsker 1999] Medsker, L. and Jain, L. C., Recurrent neural networks: design and applications, CRC press, 1999.
[Mikolov 2010] Mikolov, T., Karafiát, M., Burget, L., Cernock`y, J. and Khudanpur, S., “Recurrent neural network based
language model,” in INTERSPEECH, pp. 1045–1048, 2010.
[Nissen 2003] Nissen, S., “Implementation of a fast artificial neural network library (FANN),” Report, Department of
Computer Science University of Copenhagen (DIKU), Vol. 31, 2003.
[Nowicka-Zagrajek 2002] Nowicka-Zagrajek, J. and Weron, R., “Modeling electricity loads in california: Arma models with
hyperbolic noise,” Signal Processing, Vol. 82, No. 12, pp. 1903–1915, 2002.
[NYI 2012] “New york independent system operator,” http://www.nyiso.com, 2012.
[OME 2012] “Spanish electricity market operator,” http://www.omel.es, 2012.
[Pahasa 2007] Pahasa, J. and Theera-Umpon, N., “Short-term load forecasting using wavelet transform and support vector
machines,” in Power Engineering Conference, 2007. IPEC 2007. International, pp. 47–52, IEEE, 2007.
[Pai 2005] Pai, P.-F. and Hong, W.-C., “Support vector machines with simulated annealing algorithms in electricity load
forecasting,” Energy Conversion and Management, Vol. 46, No. 17, pp. 2669–2688, 2005.
73
74. References
[Pappas 2010] Pappas, S. S., Ekonomou, L., Karampelas, P., Karamousantas, D., Katsikas, S., Chatzarakis, G. and Skafidas,
P., “Electricity demand load forecasting of the hellenic power system using an arma model,” Electric Power Systems Research,
Vol. 80, No. 3, pp. 256–264, 2010.
[Parate 2013] Parate, A., Böhmer, M., Chu, D., Ganesan, D. and Marlin, B. M., “Practical prediction and prefetch for faster
access to applications on mobile phones,” in Proceedings of the ACM International joint Conference on Pervasive and
Ubiquitous Computing, pp. 275–284, 2013.
[Pérez-Ortiz 2001a] Pérez-Ortiz, J. A., Calera-Rubio, J. and Forcada, M. L., “Online symbolic-sequence prediction with
discrete-time recurrent neural networks,” in Proceedings of the International Conference on Artificial Neural Networks, pp. 719–
724, 2001.
[Pérez-Ortiz 2001b] Pérez-Ortiz, J. A., Calera-Rubio, J. and Forcada, M. L., “Online text prediction with recurrent neural
networks,” Neural Processing Letters, Vol. 14, No. 2, pp. 127–140, 2001.
[Pindoriya 2008] Pindoriya, N., Singh, S. and Singh, S., “An adaptive wavelet neural network-based energy price forecasting
in electricity markets,” IEEE Transactions on Power Systems, Vol. 23, No. 3, pp. 1423–1432, 2008.
[Pölzlbauer 2004] Pölzlbauer, G., “Survey and comparison of quality measures for self-organizing maps,” in 5th Workshop on
Data Analysis, pp. 67–82, 2004.
[Pousinho 2012] Pousinho, H., Mendes, V. and Catalão, J., “Shortterm electricity prices forecasting in a competitive market by
a hybrid pso–anfis approach,” International Journal of Electrical Power & Energy Systems, Vol. 39, No. 1, pp. 29–35, 2012.
[Prasad 2010] Prasad, P. S. and Agrawal, P., “Movement prediction in wireless networks using mobility traces,” in 7th IEEE
Consumer Communications and Networking Conference, pp. 1–5, 2010.
[Shafie-Khah 2011] Shafie-Khah, M., Moghaddam, M. P. and Sheikh- El-Eslami, M., “Price forecasting of day-ahead
electricity markets using a hybrid forecast method,” Energy Conversion and Management, Vol. 52, No. 5, pp. 2165–2169, 2011.
[Shayeghi 2013] Shayeghi, H. and Ghasemi, A., “Day-ahead electricity prices forecasting by a modified cgsa technique and
hybrid wt in lssvm based scheme,” Energy Conversion and Management, Vol. 74, pp. 482–491, 2013.
74
75. References
[Shen 2013] Shen, W., Babushkin, V., Aung, Z. and Woon, W. L., “An ensemble model for day-ahead electricity demand time
series forecasting,” in Proceedings of the 4th International Conference on Future Energy Systems, 2013.
[Shrivastava 2014] Shrivastava, N. A. and Panigrahi, B. K., “A hybrid wavelet-elm based short term price forecasting for
electricity markets,” International Journal of Electrical Power & Energy Systems, Vol. 55, pp. 41–50, 2014.
[Skaruz 2007] Skaruz, J. and Seredynski, F., “Recurrent neural networks towards detection of sql attacks,” in IEEE
International Parallel and Distributed Processing Symposium, pp. 1–8, 2007.
[SOM 2012] “SOMVIS,” http://www.ifs.tuwien.ac.at/dm/ somvis-matlab/index.html#License, 2012.
[Tan 2010] Tan, Z., Zhang, J.,Wang, J. and Xu, J., “Day-ahead electricity price forecasting using wavelet transform combined
with ARIMA and GARCH models,” Applied Energy, Vol. 87, No. 11, pp. 3606–3610, 2010.
[Tino 2000] Tino, P., Stancík, M. and Benuskova, L., “Building predictive models on complex symbolic sequences with a
second-order recurrent bcm network with lateral inhibition,” in Proceedings of International Joint Conference on Neural
Networks, Vol. 2, pp. 265–270, 2000.
[Troncoso Lora 2004] Troncoso Lora, A., Riquelme Santos, J., Riquelme, J., Gómez Expósito, A. and Martínez Ramos, J.,
“Time-series prediction: application to the shortterm electric energy demand,” Lecture Notes in Artificial Intelligence, Vol. 23,
pp. 577–586, 2004.
[Venna 2001] Venna, J. and Kaski, S., “Neighborhood preservation in nonlinear projection methods: An experimental study,”
Artificial Neural Networks-ICANN 2001, pp. 485–491, 2001
[Vesanto 2000] Vesanto, J., Himberg, J., Alhoniemi, E. and Parhankangas, J., SOM toolbox for matlab 5, 2000, Helsinki
University of Technology, Report.
[Willems 1995] Willems, F. M., Shtarkov, Y. M. and Tjalkens, T. J., “The context-tree weighting method: Basic roperties,”
IEEE Transactions on Information Theory, Vol. 41, No. 3, pp. 653–664, 1995.
[Xu 2005] Xu, R., Wunsch, D. et al., “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, Vol. 16, No.
3, pp. 645–678, 2005.
75
76. References
[Xu 2009] Xu, Y. and Nagasaka, K., “Demand and price forecasting by artificial neural networks (ANNs) in a deregulated
power market,” International Journal of Electrical and Power Engineering, Vol. 3, No. 6, pp. 268–275, 2009.
[Zhang 2012] Zhang12] Zhang, J., Tan, Z. and Yang, S., “Day-ahead electricity price forecasting by a new hybrid method,”
Computers & Industrial Engineering, Vol. 63, No. 3, pp. 695–701, 2012.
[Zhou04] Zhou, M., Yan, Z., Ni, Y. and Li, G., “An arima approach to forecasting electricity price with accuracy improvement
by predicted errors,” in Power Engineering Society General Meeting, 2004. IEEE, pp. 233–238, IEEE, 2004.
[Ziv 1977] Ziv, J. and Lempel, A., “A universal algorithm for sequential data compression,” IEEE Transactions on
Information Theory, Vol. 23, No. 3, pp. 337– 343, 1977.
[Ziv 1978] Ziv, J. and Lempel, A., “Compression of individual sequences via variable-rate coding,” IEEE Transactions on
Information Theory, Vol. 24, No. 5, pp. 530–536, 1978.
76