SlideShare a Scribd company logo
1 of 18
Download to read offline
Cognitive Computing
Feature Extraction, Classification & Prediction

              www.oliviamoran.me
About The Author

Olivia Moran is a leading training specialist who specialises in E-Learning instructional design and is a certified
Moodle expert. She has been working as a trainer and course developer for 3 years developing and delivery
training courses for traditional classroom, blended learning and E-learning.




Courses Olivia Moran Has Delivered:
● MOS
● ECDL
● Internet Marketing
● Social Media
● Google [Getting Irish Businesses Online]
● Web Design [FETAC Level 5]
● Adobe Dreamweaver
● Adobe Flash
● Moodle


Specialties:
★Moodle [MCCC Moodle Certified Expert]
★ E Learning Tools/ Technologies [Commercial & Opensource]
★ Microsoft Office Specialist
★ Web Design & Online Content Writer
★ Adobe Dreamweaver, Flash & Photoshop




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                             Page: 3
Feature Extraction, Classification & Prediction

1.. ABSTRACT
1 ABSTRACT

This document will examine issues pertaining to feature extraction, classification and prediction. It will
consider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in an
attempt to discriminate between left and right hand imagery movements. It will briefly reflect on the
need for brainwave signal preprocessing. The feature extraction and classification process will be
examined in depth and the results obtained using various classifiers will be illustrated. Classification
algorithms will be given some thought, namely Linear Discriminant Analysis (L.D.A.), K-Nearest
Neighbour (K.N.N.) and Neural Network (N.N.) analysis. This document will explore prediction and
highlight its effect on accuracy. Due to time and knowledge constraints the data could not be tested
using all the desired approaches however, these are briefly addressed. The way in which biology and
nature inspires the design of feature extraction, classification and prediction systems will be explored.
Finally future work will be touched on.



2.. IINTRODUCTIION
2 NTRODUCT ON

The study of E.E.G. data is a very important field of study that according to Ebrahimi et al (2003) has
been ‚Motivated by the hope of creating new communication channels for persons with severe motor
disabilities‛. Advances in this area of research caters for the construction of more advanced Brain
Computer Interfaces (B.C.I.’s). Wolpaw et al (2002) describes such an interface as a ‚Non-muscular
channel for sending messages and commands to the external world‛. The impact that such
technologies could have on the quality of peoples’ everyday lives, namely those who have some form
of physical disability is enormous. ‚Brain-Computer Interfacing is an interesting emerging technology
that translates intentional variations in the Electroencephalogram into a set of particular commands in
order to control a real world machine‛ Atry et al (2005). Improvements to these systems are often
made through an increased understanding of the human body and the way in which it operates.
Feature extraction, classification and prediction are all processes that our bodies carry out on a daily
basis with or without our knowledge. Studying such activities will undoubtedly lead researchers to the
creation of more biologically plausible B.C.I. solutions.

It is not only individuals who will benefit from further studies and understanding of these processes, as
feature extraction, classification and prediction have many other applications. Take for example, the
world of business. Companies everywhere have to deal with a constant bombardment of information
from both their internal and external environments. There seems to be an endless amount of both
useful and useless information. As one can imagine, it is often very difficult to find exactly what you
are looking for. When people eventually locate what they have been seeking it may be in a format
that does not suit them. This is where feature extraction, classification and prediction play their part.
These processes are often the only way in which a business can locate information gems in a sea of
data.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                    Page: 4
This document explores the various issues pertaining to feature extraction, classification and
prediction. The application of these techniques to unlabelled E.E.G. data is examined in an attempt to
discriminate between left and right hand imagery movements. It briefly looks at brainwave signal
preprocessing. An in depth study of the feature extraction and the classification process is carried out
focusing on numerous classifiers. L.D.A., K.N.N. and N.N. classification algorithms are examined. This
document gives thought to prediction and how it could be used to improve accuracy. Due to time and
knowledge constraints the data could not be tested using all the desired approaches, however, these
methods are mentioned in this document. Biology and nature often inspire the computing industry to
produce feature extraction, classification and prediction systems that operate in the same or a similar
way as the human body does. This issue of inspiration is briefly addressed and examples from nature
are given. Finally areas for future work are considered.



3.. BRAIINWAVE SIIGNAL PREPROCESSIING
3 BRA NWAVE S GNAL PREPROCESS NG

E.E.G. data is commonly used for tasks such as discrimination between left and right hand imagery
movements. ‚An E.E.G. is a recording of the very weak electrical potentials generated by the brain on
the scalp‛ Ebrahimi et al (2003). The collection of such signals is non-invasive and they can be ‚Easily
recorded and processed with inexpensive equipment‛ Ebrahimi et al (2003). It also offers many
advantages over other methods as ‚It is based on a much simpler technology and is characterized by
much smaller time constants when compared to other noninvasive approaches such as M.E.G, P.E.T. and
F.M.R.I.‛ Ebrahimi et al (2003).

The E.E.G. data used as input for the analysis carried out during the course of this assignment had been
preprocessed. Ebrahimi et al (2003) points out ‚Some preprocessing is generally performed due to the
high levels of noise and interference usually present‛. Artifacts are factors such as motor movements,
eye blinking, electrode movement etc. that are removed, as these are not required and all the essential
data needed to carry out classification is left behind.

The E.E.G. data was recorded on two different channels, C3 and C4. These correspond to the left and
right hemisphere of the motor cortex and would have been recorded by placing electrodes over the
right and left sides of the motor cortex as shown in the figure 1 below.




Figure 1. – Showing the placing of the electrodes at channels 3 and 4 of the motor cortex.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                  Page: 5
It is important to record signals at these two channels due to the fact that ‚When people execute or
imagine the movement of left and right hand, E.E.G. features differs in two brain hemispheres
corresponding to sensorimotor hand representation area‛ Pei & Zheng (2004). Subsequently, when an
imagined left hand movement is made, there are essentially two signals recorded C3 and C4, with both
being left signals and vice versa for the right hand imagery movements.



4.. FEATURE EXTRACTIION
4 FEATURE EXTRACT ON

A feature is described by Sriraja (2002) as ‚Any structural characteristic, transform, structural
description or graph, extracted from a signal or a part of it, for use in pattern recognition or
interpretation. It is a representation of the signal or pattern, containing only the salient information‛.
Ripley (1996) goes on to argue that a ‚Feature is a measurement on an example, so the training set of
examples has measured features and a class for each‛.

Feature extraction is concerned with the identification of features that are unique or specific to a
particular type of E.E.G. data such as all imagined left hand movements. The aim of this process is the
formation of useful new features by combining existing ones. Using such features facilitates the
process of data classification. There are multiple amounts of these features; some provide useful
information while others none. The next logical step is the elimination of features that produce the
lowest accuracy.

For each test ran the accuracy of the classifier used was calculated. This was important as it allowed
the author to determine which classifiers gave the best results for the data being examined. Wolpert
(1992) points out that ‚Estimating the accuracy of a classier is important not only to predict its future
prediction accuracy, but also for choosing a classifier from a given set (model selection), or combining
classifiers‛.



5.. THE CLASSIIFIICATIION PROCESS
5 THE CLASS F CAT ON PROCESS

5. 1. Descriptive Classifiers
In an effort to find the most appropriate type of classifier for the analysis of the E.E.G. data used in this
assignment, the author turned to descriptive methods. These included basic features like the mean,
standard deviation and kurtosis. Using this descriptive approach allows for the summarisation of the
test and training data. This is useful where the sample contains a large amount of variables.

5. 1. 1. Mean
The mean is ‚Short for arithmetic mean: in descriptive statistics, the average value, calculated for a
finite set of scores by adding the scores together and then dividing the total by the number of scores‛
Coleman (2003). During ‘Descriptive Features – Test 1’ an accuracy of 64% was obtained using the
mean feature. It performed slightly higher than that of the standard deviation, which reached 61%
accuracy.



 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                       Page: 6
5. 1. 2. Standard Deviation
Standard Deviation is defined by Coleman (2003) as ‚A measure of the degree of dispersion, variability
or scatter in a set of scores, expressed in the same units as the scores themselves, defined as the square
root of the variance‛. ‘Descriptive Features – Test 2’ attempted to classify the E.E.G. data by utilising
the feature of standard deviation. An accuracy of 61% was achieved.

5. 1. 3. Kurtosis
Kurtosis is useful in that it ‚Provides information about the ‘peakedness’ of the distribution. If the
distribution is perfectly normal you would obtain a skewness and kurtosis value of 0‛ Pallant (2001).
The results obtained during ‘Descriptive Features – Test 3’ using the kurtosis feature were
disappointing with an accuracy of 49%. Kurtosis in this instance was not able to offer a higher level of
separability than with both the mean and standard deviation. Kurtosis is usually more appropriate for
lager samples, with which more satisfactory results could be accomplished. As noted by Tabachnick &
Fidell (1996), ‚Kurtosis can result in an underestimate of the variance, however, this risk is also reduced
with a large sample‛.

5. 1. 4. Combination Of Mean, Standard Deviation And Kurtosis Features
In some instances the combination of features can allow for greater accuracy, however this was not the
case for the E.E.G. data that was examined using the mean, standard deviation and kurtosis. Test
results from ‘Descriptive Features – Test 4’ showed accuracy to be in the region of 49% giving much
lower performance than that of the mean and standard deviation features when used individually.

5. 1. 5. Conclusion Drawn From Mean, Standard Deviation And Kurtosis
Feature Tests
The accuracy of the mean as a classifier was substantially higher than that of the standard deviation
and kurtosis as well as a combination of all three. On the other hand, it still did not offer a satisfactory
level of separation between the imagery left and right signals. These three features it seems are not
appropriate for E.E.G. data and are better suited to more simple forms of data. With this in mind the
author turned to the Hjorth features.


5. 2. Hjorth Features
A number of Hjorth parameters were drawn upon during the course of this assignment. ‚In 1970, Bo
Hjorth derived certain features that described the E.E.G. signal by means of simple time domain
analysis. These parameters, namely Activity, Mobility and Complexity, together characterize the E.E.G.
pattern in terms of amplitude, time scale and complexity‛ Sriraja (2002). These were used in an
attempt to achieve a separation between imagery left and right hand signals.

The Hjorth approach involves the measurement of the E.E.G. signal ‚For successive epochs (or windows)
of one to several seconds. Two of the attributes are obtained from the first and second time derivates
of the amplitude fluctuations in the signal. The first derivative is the rate of change of the signal’s
amplitude. At peaks and troughs the first derivative is zero. At other points it will be positive or
negative depending on whether the amplitude is increasing or decreasing with time. The steeper the
slope of the wave, the greater will be the amplitude of the first derivative. The second derivative is
determined by taking the first derivative of the first derivative of the signal. Peaks and troughs in the


 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                      Page: 7
first derivative, which correspond to points of greatest slope in the original signal, result in zero
amplitude in the second derivative, and so forth‛ Miranda & Brouse (2005).

According to Sriraja (2002) mathematically the equation for mobility and complexity resembles the
following if x1, x2, …, xn are the n EEG data values, and the consecutive differences, xn - xn-1 be
denoted as dn




5. 2. 1. Activity Feature
Activity is defined by Miranda & Brouse (2005) as ‚The variance of the amplitude fluctuations in the
epoch‛. This feature during ‘Hjorth Features – Test 1’ was able to achieve only an accuracy of 44% and
therefore offered very poor separability. ‘Hjorth Features – Test 2’ used the same classifier, however
the time interval for sampling was changed from the 6th second to the 7th. This change resulted in an
accuracy of 55%, an increase of 11% on the previous test. ‘Hjorth Features – Test 3’ was also carried
out using the activity feature. This test aimed to determine whether or not changing the number of
neurons used in the N.N. would have a notable effect on the accuracy of the classification. A change in
this instance of the neuron numbers did not have a significant impact on performance.

5. 2. 2. Mobility Feature
‚Mobility is calculated by taking the square root of the variance of the first derivative divided by the
variance of the primary signal‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 4’ utilised this
mobility feature for classification purposes. Results from this test showed that accuracy using this
feature stands at 52%.

5. 2. 3. Complexity Feature
Complexity is described as ‚The ratio of the mobility of the first derivative of the signal to the mobility
of the signal itself‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 5’ examined the complexity
feature and its effect on accuracy. Results for this test showed the level of accuracy using this feature
to be 64%.

5. 2. 4. Combination Of Activity, Mobility And Complexity Features
‘Hjorth Features – Test 6’ combined the activity, mobility and complexity feature in the hope of
increasing accuracy further. This test showed very mediocre results with accuracy at 56%. However,
when the data windows were specified as in ‘Hjorth Features – Test 7’ more promising results were
recorded. Accuracy at 74% was achieved with a greater level of separability of the imagery left and
right hand signals than all other pervious results.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                     Page: 8
Combining multiple features is useful as it can often lead to improved accuracy. Lotte et al (2007)
highlights this point arguing, ‚A combination of similar classifiers is very likely to outperform one of
the classifiers on its own. Actually, combining classifiers is known to reduce the variance and thus the
classification error‛.



6.. CLASSIIFIICATIION ALGORIITHMS
6 CLASS F CAT ON ALGOR THMS

Kohavi (1995) defines a classifier as ‚A function that maps an unlabelled instance to a label using
internal data structures‛. Three different types of algorithms were used for classification. These
included the L.D.A, K.N.N. and the N.N. classification algorithms.


6.1. L.D.A. Classification
L.D.A. also known as Fisher’s L.D.A. is ‚Often used to investigate the difference between various
groups when their relationship is not clear. The goal of a discriminant analysis is to find a set of
features or discriminants whose values are such that the different groups are separated as much as
possible‛ Sriraja (2002). Lotte et al (2007) describes the aim of L.D.A. as being to ‚Use hyperplanes to
separate the data representing the different classes. For a two-class problem, the class of a feature
vector depends on which side of the hyperplane the vector is‛. The L.D.A. is concerned with finding
the features that will maximise the distance between the two classes and reducing the distance that
exists among the interclass. This concept is illustrated in figure 2 below.

           Imagery Left Hand
           Data




                                                                                        Imagery Right
                                                                                        Hand Data



                                  Imagery Right Hand
                                  Data


Figure 2. – Shows a hyperplane that is used to illustrate graphically the separation of the classes i.e. the
separability of the imagery left hand data from the imagery right hand data

The equation for L.D.A. can be denoted in mathematical terms. Sriraja (2002) discusses the equation of
L.D.A. and the principles on which it works. ‚First, a linear combination of the features x are projected
into a new feature, y. The idea is to have a projection such that the y’s from the two classes would be
as much separated as possible. The measure of separation between the two sets of y’s is evaluated in



 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                      Page: 9
terms of the respective means and the variances of the projected classes . . . The objective is therefore
to have a linear combination such that the following ratio is maximised.‛




where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and




where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and n1 and n2 are the
sample sizes for the two sets‛.

During testing the author utilised scatter graphs like figure 3 below to display graphically the results
from the tests. Figure 3 shows the scatter graph that was constructed as part of test, which attempted
classification of the E.E.G. data using the mean feature. The accuracy achieved using this feature was
64%.
 0.08

 0.06

 0.04

 0.02

   0

-0.02

-0.04

-0.06

-0.08


        -0.06   -0.04   -0.02   0   0.02   0.04   0.06   0.08




Figure 3. – Mean Scatter Graph

The next graph Figure 4 illustrates the results of a test examining standard deviation with the accuracy
of this feature standing at 61%.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                  Page: 10
0.16


 0.14


 0.12


  0.1


 0.08


 0.06


 0.04


 0.02


        0.01   0.02   0.03   0.04   0.05   0.06   0.07   0.08   0.09   0.1



Figure 4. – Standard Deviation Scatter Graph

Scatter graphs are described by Fisher & Holtom (1999) as useful for the presentation of ‚The
relationship between two different types of information plotted on horizontal, x, and vertical, y, axis.
You simply plot the point at which the values meet, to get an idea of the overall distribution of your
data‛. Pallant (2001) is keen to point out that ‚The scatter graph also provides a general indication of
the strength of the relationship between your two variables. If the relationship is weak, the points will
be all over the place, in a blob type arrangement. For a strong relationship the points will form a
vague cigar shape with a definite clumping of scores around an imaginary straight line‛.


6.2. K.N.N. Classification
The K.N.N. function is concerned with the computation of the minimum distance between the test data
and the data used for training. Ripley (1996) defines test data as a ‚Set of examples used only to assess
the performance of a fully specified classifier‛ while training data is a ‚Set of examples used for
learning, that is to fit the parameters of the classifier‛. The K.N.N. belongs to the family of
discriminative nonlinear classifiers. According to Lotte et al (2007) the main objective of this method is
‚To assign to an unseen point the dominant class among its k nearest neighbours within the training
set‛. A metric distance may be used to find the nearest neigbour. ‚With a sufficiently high value of k
and enough training samples, K.N.N. can approximate any function which enables it to produce
nonlinear decision boundaries‛ Lotte et al (2007).


6.3. N.N. Classification
N.N.’s are widely used for classification ‚Due to their non-linear model and parallel computation
capabilities‛ Sriraja (2002). N.N.’s are described by Lotte et al (2007) as ‚An Assembly of several
artificial neurons which enables us to produce nonlinear decision boundaries‛. The N.N. used for the
classification tests was the Multilayer Perception (M.L.P.) which is one of the more popular N.N.’s. It
used 10 linear neurons for the first input layer and then 12 for the hidden layer. In this M.L.P. N.N
‚Each neuron’s input is connected with the output of the previous layer’s neurons whereas the neurons
of the output layer determine the class of the input feature vector‛ Lotte et al (2007).




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                   Page: 11
M.L.P. are useful for classification, provided they have a satisfactory amount of neurons and layers
‚They can approximate any continuous function‛ Lotte et al (2007). They are commonly used as they
can quickly adapt to different problems and situations. However, it must be noted, ‚The fact that
M.L.P. are universal approximators makes these classifiers sensitive to overtraining, especially with such
noisy and non-stationary data as E.E.G. therefore, careful architecture selection and regularization is
required‛ Lotte et al (2007).

The greater the amount of neurons available or used, the greater the ability of the N.N. to learn
however, they are susceptible to over learning and therefore sometimes a lower amount of neurons
gives greater accuracy. Cross validation is useful as it is concerned with preventing the N.N. from
learning too much and consequently ignoring new data when it is inputted.

Usually training sets are small in size as it is very time consuming and costly collecting ‚Known cases for
training and testing‛ Masters (1995). These small cases are often broken down further into relatively
small sets for both training and testing, however this is not a desirable approach. Instead of taking this
action one can avail of cross validation. This is a process which ‚Combines training and validation into
one operation‛ Masters (1995).

When constructing a prediction rule reducing the error rate where possible is an important task. Efron
(1983) describes an error rate as the ‚Probability of incorrectly classifying a randomly selected future
case, in other words the exception‛ to the rule. Cross validation is often used to reduce this error rate
and ‚Provides a nearly unbiased estimate, using only the original data‛ Efron (1983).

6. 3. 1. Euclidean Distance
A part of the N.N. algorithm examines the Euclidean distance. This distance refers to the difference
between the coordinates i.e. location of a set of objects squared. This Euclidean distance between two
points where


                          and                         can be denoted as




6.. PREDIICTIION
6 PRED CT ON

Frank et al (2001) defines a time series as ‚A sequence of vectors, x(t), t = 0,1,… , where t represents
elapsed time. Theoretically, x may be a value which varies continuously with t, such as a temperature‛.
This time series method can be used in prediction in what is known as time series prediction. It involves
the examination of past performance to predict future performance.

This according to Coyle et al (2004) can be used to improve classification accuracy. Their work uses a
‚Novel feature extraction procedure which carries out self-organising fuzzy neural network based time
series prediction, performing feature extraction in the time domain only‛. Using such a method in
their studies allowed for classification accuracies in the region of 94%. They argue that the main


 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                    Page: 12
advantage of this approach is that ‚The problem of specifying the neural network architecture does
not have to be considered‛. Instead of adapting the parameters for individual users, the system can
‚Self-organise the network architecture, adding and pruning neurons as required‛ just like with the
human body.

The author, using 6-step ahead prediction carried out a number of tests. The parameters for these
tests were set at the following, unless otherwise stated.

                                      Data was trained and tested with x (trl3)
                                      Embedding Dimension = 6
                                      Time Lag = 1
                                      Cross Validation was not used
                                      Number of neurons available to the neural network = one layer of 6.

All results were graphically displayed on a chart like that seen in figure 5 below.

                                               Training Vectors
                     0.15


                      0.1


                     0.05
 Target and Output




                        0


                     -0.05


                      -0.1


                     -0.15


                      -0.2
                             0   500    1000        1500          2000   2500   3000
                                                 Time Step t




Figure 5. – Shows the training data in blue and the test data in red. The difference between these two
lines is referred to as the root square error or simply the error rate.


7. 1. One Layer Neural Network
The first test examined accuracy using a neural network with one layer of 6 neurons. This test was ran
10 times and then the average training root mean square and testing root mean square were
calculated. The training root mean square was recorded at 0.0324 and the testing root mean square at
0.0313.


7. 2. Multi Layer Neural Network
The next test was conducted using the exact same parameters except the neural network was changed
from a single layer network with 6 neurons to one that also has a hidden layer of 8 neurons. The
results from this test were slightly worse than the previous with a training and testing root mean
square of 0.0326 and 0.0314. The difference between the figures from test 1 and test 2 were extremely
minuet.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                             Page: 13
7. 3. Cross Validation
The next test was exactly the same as test 1 except that cross validation was used to determine whether
or not it has a negative or positive effect. The training data scored slightly better with cross validation
at 0.0293 compared to 0.0324 obtained in test 1. On the other hand the testing data performed better
in test 1 with 0.0313 rather than 0.0317 found with cross validation.


7. 4. Left E.E.G. Data
A test was carried out which used trl3 to train the network and trl4 to test it. The training mean
square root was relatively the same as previous experiments using the same parameters for the training
data. The testing mean square root however, was much improved with a result of 0.0240 compared to
0.0313 using trl3 for training.


7. 5. Right E.E.G. Data
Tests were conducted using the right data. The N.N. was trained and tested with trr3. The error was a
lot less than that found with the tests on the left data using the same parameters. 0.0292 was recorded
for the training mean root square error and 0.0281 for the testing mean root square error. The right
data was also tested to see what effect testing the N.N. with trr4 instead or trr3 would have on the
performance. The training root mean square error stayed more or less the same and the testing root
mean square error increased slightly to 0.0293.



8.. OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACTIION
8 OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACT ON

There are many other methods that could be used and that offer satisfactory performance when it
comes to feature extraction for B.C.I’s.


8. 1. Amplitude And Phase Coupling Measure
One such approach was created by Wei et al (2007), it is known as the ‘Amplitude and Phase Coupling
Measure’. This method is concerned with ‚Using amplitude and phase coupling measures, quantified
by a nonlinear regressive coefficient and phase locking value respectively‛. Wei and his colleagues
carried out studies utilising this approach. The results obtained from the application of this feature
extraction method were promising. The ‚Averaged classification accuracies of the five subjects ranged
from 87.4% to 92.9%‛ and the ‚Best classification accuracies ranged between 84.4% and 99.6%‛. The
conclusion reached from these studies is that ‚The combination of coupling and autoregressive
features can effectively improve the classification accuracy due to their complementarities‛ Wei et al
(2007).




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                    Page: 14
8. 2. Combination Of Classifiers
Some researchers in an effort to improve performance and accuracy have begun using multiple
classifiers to achieve the desired results. The author attempted this approach with the combination of
mean, standard deviation and kurtosis as well as activity, mobility and complexity however, there are
various different strategies that can be followed. These include boosting, voting and stacking to name
but a few. Boosting basically operates on the principle of cooperation with ‚Each classifier focusing on
the errors committed by the previous ones‛ Lotte et al (2007).

Voting on the other hand works like a voting system. The different modules of the N.N. are ‚Modeled
as multiple voters electing one candidate in a single ballot election assuming the availability of votes'
preferences and intensities. All modules are considered as candidates as well as voters. Voting bids are
the output-activations of the modules forming the cooperative modular structure‛ Auda et al (1995).
Those candidates who have the majority vote wins. According to Lotte et al (2007) ‚Voting is the most
popular way of combining classifiers in B.C.I. research, probably because it is simple and efficient‛.
Another strategy used for the combining of classifiers is what’s known as ‘Stacking’. This method
according to Ghorbani & Owrangh (2001) ‚Improves classification performance and generalization
accuracy over single level cross-validation model‛.


8. 3. Multivariate Autoregressive Analysis (M.V.A.R.)
Studies have been conducted in the past based on the M.V.A.R. model. Pei et al (2004) carried out such
a study and boasts a classification accuracy of 88.57%. They describe the MVAR model as ‚The
extension form of univariate A.R. model‛ and argue, ‚Using the coefficients of M.V.A.R. model as EEG
features is feasible‛.



9.. IINSPIIRATIION FROM BIIOLOGY
9 NSP RAT ON FROM B OLOGY

There is no doubt that inspiration for some of the classification and prediction techniques that we use
today came from the world of biology. Shadbolt (2004) points out that ‚We see complexity all around
us in the natural world – from the cytology and fine structures of cells to the organization of the
nervous system . . . Biological systems cope with and glory in complexity – they seem to scale, to be
robust and inherently adaptable at the system level . . . Nature might provide the most direct
inspiration‛. The author shares the view of Bamford et al (2006) that ‚An attempt to imitate a
biological phenomenon is spawning innovative system designs in an emerging alternative
computational paradigm with both specific and yet unexplored potential‛.


9. 1. Classification And Object Recognition
Our brains are constantly classifying things in our everyday environment whether we are aware of it or
not. Classification is the process that is responsible for letting us determine what the objects around us
are i.e. a chair, a car, a person. It even allows us to recognise different faces of people with whom we
come in contact with. The brain is able to distinguish each specific object by examining its numerous
features and does so with great speed and accuracy. Many systems seek to reproduce a similar means
of classifying data and can be useful in nearly every kind of industry. Take for example, the medical



 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                   Page: 15
industry in which classification plays a crucial role. Classification is used extensively for the
identification of almost every kind of disease and illness. The process of diagnosis would be much
more complex and time consuming if classification techniques were not applied to it.


9. 2. Self-Organisation
Computer systems i.e. neural networks can be constructed on the same principles and concepts of self-
organisation in humans. The term self-organisation is used to describe the process by which ‚Internal
structures can evolve without the intervention of an external designer or the presence of some
centralised form of internal control. If the capacities of the system satisfy a number of constraints, it
can develop a distributed form of internal structure through a process of self-organisation‛ Cilliers
(1998). Self-organising maps are widely used a method for feature extraction and data mapping as
well as prediction. Self-organising neural networks can encompass a time series prediction element
and often with huge success. These can be extremely useful for predicting trends in different areas
such as weather forecasting, marketing, the list is endless.

The various prediction algorithms available work in the same way as the nervous system in humans.
These programs aim to replicate the ‘anticipatory neural activity’ that occurs in the body and reproduce
this in a system. Take for example a financial decisions system recently developed. This system looked
at how using the ‘anticipatory neural activity’ element and taking it into consideration could help
people using this system to make decisions that are more likely to be successful and thus less risky.
When people are making financial decisions, they can often opt for an option that seems like the
irrational one. The reasons for this irrational thought had not previously been known.
Kuhnen & Knutson (2005) examined ‚Whether anticipatory neural activity would predict optimal and
suboptimal choices in a financial decision-making task‛. They observed that the nucleus accumbens
was more active when risky choices were being made and that anterior insula when riskless options
were being followed. From their findings they concluded that particular neural circuits linked to
anticipatory affect would either hinder or encourage an individual to go for either a risky or riskless
choice. They also uncovered the fact that an over activation in these circuits are more likely to cause
investing mistake and ‚Thus, consideration of anticipatory neural mechanisms may add predictive
power to the rational actor model of economic decision making‛. The system was able to replicate
relatively successfully the way in which humans make investment decisions.



10.. FURTURE WORK
10 FURTURE WORK

The combination of classifiers is gaining popularity and becoming more widely used as a means of
improving accuracy and performance. From researching this topic one can see that most publications
deal with one particular classifier with little effort been taken to compare one classifier to the next.
Studies could be undertaken in an attempt to compare these to particular criteria.

There is a lot more room for improvement considering the algorithms that are available at the
moment. A deeper understanding of the human brain and how it classifies and predicts should lead to
the creation of more biologically plausible solutions.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                  Page: 16
11.. CONCLUSIION
11 CONCLUS ON

This document addressed the various issues pertaining to feature extraction, classification and
prediction. It focused on the application of these techniques to unlabelled E.E.G. data. This was done
in an effort to discriminate between left and right hand imagery movements. It briefly reflected on
the need for brainwave signal preprocessing. An in depth analysis of the feature extraction and
classification process was carried out and the results highlighted. Classification algorithms were
examined, namely L.D.A., K.N.N. and N.N. This document looked at prediction and its effect on
accuracy. Due to time and knowledge constraints the data could not be tested using all the desired
approaches, however, a number of these other methods not tested were dealt with. This document
also highlighted the fact that inspiration for the design of feature extraction, classification and
prediction systems often comes from nature. Finally thought was given to future work.

From studying the E.E.G. data and carrying out various tests using numerous parameters and classifiers,
it has been concluded that a combination of the three Hjorth features, activity, mobility and
complexity gives the highest level of accuracy. The author discovered that the descriptive classifiers
drawn upon are not suitable for E.E.G. data, as they do not provide a satisfactory level of separation,
they work better with simple data. It was found that feature extraction and classification enjoyed
more success by using cross validation and a multiple layer N.N. in contrast to prediction that was best
suited to a single layer N.N. without cross validation.

The greatest level of accuracy recorded using the combined Hjorth features was 74%. Separability of
the left hand imagery motor signal from the right was greater at 7 seconds than it was at 6. Accuracy
was improved by specifying the data window extents of s=680 and e=700. Prediction tests indicated
that left hand data is more easily separated and classified than the right hand data. The author also
realised that the N.N. performed better when different data was used for training and testing.

New methods of feature extraction, classification and prediction will undoubtedly be discovered as the
understanding of the human body evolves. The research of this particular topic can extend over
multiple disciplines and therefore it is likely that ‚Insights from one subject will inform the thinking in
another‛ Shadbolt (2004). Advances made in the field of science often results in complimentary gains
in the area of computing and vice versa.

All the processes discussed in this document can have a huge impact on the lives of individuals,
businesses and society at large. Many people suffering from motor impairments rely heavily on B.C.I.
technologies that incorporate classification and prediction techniques for everyday living. They will
undoubtedly progress society towards the creation of a safer and more inclusive society. Classification
and prediction can often be an integral part of any business decision. A manager may consult with
his/her computer system to make risky business decisions such as, should I invest in this new product?,
how much stock should I buy? etc. Society also benefits from feature extraction, classification and
prediction. These processes are widely used for disease and illness diagnoses and other things such as
weather forecasting and storm prediction to name but a few. Consequently, it is safe to assume that
this field of study will remain a popular one in the years to come and make many more advances.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                    Page: 17
BIIBLIIOGRAPHY
B BL OGRAPHY

ATRY, F. & OMIDVARNIA, A. H. & SETAREHDAN, S. K. (2005) ‚Model Based E.E.G. Signal Purification to
Improve the Accuracy of the B.C.I. Systems‛ Proceedings from the 13th European Signal Processing
Conference.

Auda, G. & Kamel, M. & Raafat, H. (1995) ‚Voting Schemes for Cooperative Neural Network Classifiers‛
Neural Networks 3(3), pp. 1240-1243. Proceedings of the IEEE International Conference on Neural
Networks.

Bamford, S. & Murray, A. & Willshaw, D. J. (2006) ‚Synaptic Rewiring in Neuromorphic VLSI for
Topographic Map Formation‛ [Internet], Date Accessed 15 April 2007, Available From:
http://www.see.ed.ac.uk/~s0454958/interimreport.
pdf.

ColEman, A. M. (2003) ‚Oxford Dictionary of Psychology‛ Oxford: Oxford University Press.

COYLE, D. & PRASAD, g. & MCGINNITY, T. M. (2004) ‚extracting Features for a Brain-Computer
Interface by Self-Organising Fuzzy Neural Network-Based Time Series Prediction‛ Proceedings from the
26th Annual International Conference of the IEEE EMBS.

Cilliers, P. (1998) ‚Complexity and Postmodernism: Understanding Complex Systems‛ London:
Routledge.

EBRAHIMI, T. & VESIN, J. M. & GARCIA, G. (2003) ‚Brain-Computer Interface in Multimedia
Communication‛ IEEE Signal Processing Magazine 20(1), pp. 14-24.

Efron, B. (1983) ‚Estimating the Error
Rate of Prediction Rules: Improvement on Cross Validation‛ Journal of the American Statistical
Association 78(382), pp. 316-331.

FISHER, E. & HOLTOM, D (1999) ‚Enjoy Writing Your Science Thesis or Dissertation – A Step by Step
Guide to Planning and Writing ‛ London: Imperial College Press.

FRANK, R. J. & DAVEY, N. & HUNT, S. P. (2001) ‚Time Series Prediction and Neural Networks‛ Journal of
Intelligent and Robotic Systems 31(1-3), pp. 91-103.

GHORBANI, A. A. & OWRANGH, K. (2001) ‚Stacked Generalization in Neural Networks: Generalization
on Statistically Neutral Problems‛ Neural Networks 3, pp. 1715-1720, Proceedings from the IJCNN
International Joint Conference on Neural Networks.

Kohavi, R. (1995) ‚A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model
Selection‛ IJCAI Proceedings from the International Joint Conference on Artificial Intelligence.




 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                              Page: 18
Kuhnen, C. M. & knutson, b. (2005) ‚The Neural Basis of Financial Risk Taking‛
Neuron 47(5), pp. 763-770.

LOTTE, F. & CONGEDO, M. & LECUYER, A. & LAMARCHE, F. & ARNALDI, B. (2007) ‚A Review of
Classification Algorithms for EEG-Based Brain-Computer Interfaces‛ Journal of Neural Engineering 4,
pp. R1-R13.

MASTERS, T. (1995) ‚Neural, Novel & Hybrid Algorithms for Time Series Prediction‛ New York: John
Wiley & Sons Inc.

MIRANDA, E. & BROUSE, A. (2005) ‚Toward Direct Brain-Computer Musical Interfaces‛ Proceedings
from the 2005 Conference on New Interfaces for Musical Expression, pp. 216 - 219.

PALLANT, J. (2001) ‚S.P.S.S. Survival Manual – A Step By Step Guide To Data Analysis Using S.P.S.S.‛
Berkshire: Open University Press.

PEI, X. M. & ZHENG, C. X. (2004) ‚Feature Extraction and Classification of Brain Motor Imagery Task
Based on MVAR Model‛ Machine Learning and Cybernetics 6, pp. 3726 – 3730, Proceedings from the 3rd
International Conference on Machine Learning and Cybernetics.

RIPLEY, B. D. (1996) ‚Pattern Recognition and Neural Networks‛ Cambridge: Cambridge University
Press.

SHADBOLT, N. (2004) ‚From the Editor in Chief: Nature-Inspired Computing‛ IEEE Intelligent Systems
19(1), pp.2-3.

SRIRAJA, Y. (2002) ‚E.E.G. Signal Analysis for Detection of Alzheimer’s Disease‛ PhD Thesis, Texas Tech
University, Data Accessed: 11 April 2007, Available From: http://webpages.acs.ttu.edu
/ysriraja/MSthesis/Thesis.pdf.

TABACHNICK, B. G. & FIDELL, L. S. (1996) ‚Using Multivariate Statistics‛ 3 ed. New York: Harper Collins.
WEI, Q. & WANG, Y. & GAO, X. & GAO, S. (2007) ‚Amplitude and Phase Coupling Measures for Feature
Extraction in an E.E.G.-Based Brain-Computer Interface‛ Journal of Neural Engineering 4, pp. 120-129.

Wolpaw, J. R. & Birbaumer, N. & McFarland, D. J. & Pfurtscheller, G. & Vaughan, T. M. (2002) ‚Brain-
Computer Interfaces for Communication and Control‛ The Journal of Clinical Neurophysiology 113(6),
pp. 767-91.

WOLPERT, D. H. (1992) ‚Stacked Generalization‛ Neural Networks 5(2), pp. 241-259, Pergamon Press.




                                                   Heading One

 FEATURE EXTRACTION, CLASSIFICATION & PREDICTION                                                 Page: 19

More Related Content

What's hot

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel ChairModelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
IJTET Journal
 
Review:Wavelet transform based electroencephalogram methods
Review:Wavelet transform based electroencephalogram methodsReview:Wavelet transform based electroencephalogram methods
Review:Wavelet transform based electroencephalogram methods
ijtsrd
 
Brain Computer Interface for User Recognition And Smart Home Control
Brain Computer Interface for User Recognition And Smart Home ControlBrain Computer Interface for User Recognition And Smart Home Control
Brain Computer Interface for User Recognition And Smart Home Control
IJTET Journal
 
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
IAESIJEECS
 

What's hot (17)

METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGSMETHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
 
Classification of EEG Signal for Epileptic Seizure DetectionusingEMD and ELM
Classification of EEG Signal for Epileptic Seizure DetectionusingEMD and ELMClassification of EEG Signal for Epileptic Seizure DetectionusingEMD and ELM
Classification of EEG Signal for Epileptic Seizure DetectionusingEMD and ELM
 
Hs2014 bci mi
Hs2014 bci miHs2014 bci mi
Hs2014 bci mi
 
Performance Comparison of Known ICA Algorithms to a Wavelet-ICA Merger
Performance Comparison of Known ICA Algorithms to a Wavelet-ICA MergerPerformance Comparison of Known ICA Algorithms to a Wavelet-ICA Merger
Performance Comparison of Known ICA Algorithms to a Wavelet-ICA Merger
 
A0510107
A0510107A0510107
A0510107
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
ANALYSIS OF BRAIN COGNITIVE STATE FOR ARITHMETIC TASK AND MOTOR TASK USING EL...
ANALYSIS OF BRAIN COGNITIVE STATE FOR ARITHMETIC TASK AND MOTOR TASK USING EL...ANALYSIS OF BRAIN COGNITIVE STATE FOR ARITHMETIC TASK AND MOTOR TASK USING EL...
ANALYSIS OF BRAIN COGNITIVE STATE FOR ARITHMETIC TASK AND MOTOR TASK USING EL...
 
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Lj2519711975
Lj2519711975Lj2519711975
Lj2519711975
 
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
Neural signal processing by mustafa rasheed & zeena saadon & walaa kahtan 2015
 
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel ChairModelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
 
Review:Wavelet transform based electroencephalogram methods
Review:Wavelet transform based electroencephalogram methodsReview:Wavelet transform based electroencephalogram methods
Review:Wavelet transform based electroencephalogram methods
 
Brain Computer Interface for User Recognition And Smart Home Control
Brain Computer Interface for User Recognition And Smart Home ControlBrain Computer Interface for User Recognition And Smart Home Control
Brain Computer Interface for User Recognition And Smart Home Control
 
Analysis of eeg for motor imagery
Analysis of eeg for motor imageryAnalysis of eeg for motor imagery
Analysis of eeg for motor imagery
 
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
5. detection and separation of eeg artifacts using wavelet transform nov 11, ...
 

Viewers also liked

Library powerpoint
Library powerpointLibrary powerpoint
Library powerpoint
themachine99
 

Viewers also liked (9)

Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
 
Decision Tree for Predictive Modeling
Decision Tree for Predictive ModelingDecision Tree for Predictive Modeling
Decision Tree for Predictive Modeling
 
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Decision trees
Decision treesDecision trees
Decision trees
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge Detection
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Library powerpoint
Library powerpointLibrary powerpoint
Library powerpoint
 

Similar to Brainwave Feature Extraction, Classification & Prediction

rob 537 final paper(fourth modify)
rob 537 final paper(fourth modify)rob 537 final paper(fourth modify)
rob 537 final paper(fourth modify)
Huanchi Cao
 
Evaluation of rule extraction algorithms
Evaluation of rule extraction algorithmsEvaluation of rule extraction algorithms
Evaluation of rule extraction algorithms
IJDKP
 

Similar to Brainwave Feature Extraction, Classification & Prediction (20)

Overview of Machine Learning and Deep Learning Methods in Brain Computer Inte...
Overview of Machine Learning and Deep Learning Methods in Brain Computer Inte...Overview of Machine Learning and Deep Learning Methods in Brain Computer Inte...
Overview of Machine Learning and Deep Learning Methods in Brain Computer Inte...
 
IRJET - Real Time Facial Analysis using Tensorflowand OpenCV
IRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCVIRJET -  	  Real Time Facial Analysis using Tensorflowand OpenCV
IRJET - Real Time Facial Analysis using Tensorflowand OpenCV
 
IRJET- Spot Me - A Smart Attendance System based on Face Recognition
IRJET- Spot Me - A Smart Attendance System based on Face RecognitionIRJET- Spot Me - A Smart Attendance System based on Face Recognition
IRJET- Spot Me - A Smart Attendance System based on Face Recognition
 
Using K-Nearest Neighbors and Support Vector Machine Classifiers in Personal ...
Using K-Nearest Neighbors and Support Vector Machine Classifiers in Personal ...Using K-Nearest Neighbors and Support Vector Machine Classifiers in Personal ...
Using K-Nearest Neighbors and Support Vector Machine Classifiers in Personal ...
 
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)Face Detection and Recognition using Back Propagation Neural Network (BPNN)
Face Detection and Recognition using Back Propagation Neural Network (BPNN)
 
J48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance DataJ48 and JRIP Rules for E-Governance Data
J48 and JRIP Rules for E-Governance Data
 
rob 537 final paper(fourth modify)
rob 537 final paper(fourth modify)rob 537 final paper(fourth modify)
rob 537 final paper(fourth modify)
 
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
 
Efficient electro encephelogram classification system using support vector ma...
Efficient electro encephelogram classification system using support vector ma...Efficient electro encephelogram classification system using support vector ma...
Efficient electro encephelogram classification system using support vector ma...
 
woot2
woot2woot2
woot2
 
Evaluation of rule extraction algorithms
Evaluation of rule extraction algorithmsEvaluation of rule extraction algorithms
Evaluation of rule extraction algorithms
 
Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approaches
 
IRJET- Review on Depression Prediction using Different Methods
IRJET- Review on Depression Prediction using Different MethodsIRJET- Review on Depression Prediction using Different Methods
IRJET- Review on Depression Prediction using Different Methods
 
IRJET- Chest Abnormality Detection from X-Ray using Deep Learning
IRJET- Chest Abnormality Detection from X-Ray using Deep LearningIRJET- Chest Abnormality Detection from X-Ray using Deep Learning
IRJET- Chest Abnormality Detection from X-Ray using Deep Learning
 
Classification of physiological diseases using eeg signals and machine learni...
Classification of physiological diseases using eeg signals and machine learni...Classification of physiological diseases using eeg signals and machine learni...
Classification of physiological diseases using eeg signals and machine learni...
 
Classification of physiological diseases using eeg signals and machine learni...
Classification of physiological diseases using eeg signals and machine learni...Classification of physiological diseases using eeg signals and machine learni...
Classification of physiological diseases using eeg signals and machine learni...
 
Lec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed ElngarLec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed Elngar
 
A Comparative Study of Machine Learning Algorithms for EEG Signal Classification
A Comparative Study of Machine Learning Algorithms for EEG Signal ClassificationA Comparative Study of Machine Learning Algorithms for EEG Signal Classification
A Comparative Study of Machine Learning Algorithms for EEG Signal Classification
 
A Comparative Study of Machine Learning Algorithms for EEG Signal Classification
A Comparative Study of Machine Learning Algorithms for EEG Signal ClassificationA Comparative Study of Machine Learning Algorithms for EEG Signal Classification
A Comparative Study of Machine Learning Algorithms for EEG Signal Classification
 
A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATION
A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATIONA COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATION
A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATION
 

More from Olivia Moran

Top Tips For Creating A Great CV
Top Tips For Creating A Great CVTop Tips For Creating A Great CV
Top Tips For Creating A Great CV
Olivia Moran
 

More from Olivia Moran (19)

Olivia Moran, my linked in recommendations
Olivia Moran, my linked in recommendationsOlivia Moran, my linked in recommendations
Olivia Moran, my linked in recommendations
 
Basic Html Tags Tutorial For Kids
Basic Html Tags Tutorial For KidsBasic Html Tags Tutorial For Kids
Basic Html Tags Tutorial For Kids
 
Top Tips For Creating A Great CV
Top Tips For Creating A Great CVTop Tips For Creating A Great CV
Top Tips For Creating A Great CV
 
Lesson 3: Linking It All Together
Lesson 3: Linking It All Together Lesson 3: Linking It All Together
Lesson 3: Linking It All Together
 
Lesson 2: Getting To Know HTML
Lesson 2: Getting To Know HTMLLesson 2: Getting To Know HTML
Lesson 2: Getting To Know HTML
 
Lesson 1: Introduction to HTML
Lesson 1: Introduction to HTMLLesson 1: Introduction to HTML
Lesson 1: Introduction to HTML
 
E-College
E-CollegeE-College
E-College
 
Social Media Strategy
Social Media StrategySocial Media Strategy
Social Media Strategy
 
E-Learning @ The Library
E-Learning @ The LibraryE-Learning @ The Library
E-Learning @ The Library
 
Blended Learning
Blended LearningBlended Learning
Blended Learning
 
What is Moodle
What is MoodleWhat is Moodle
What is Moodle
 
The E-Tutor: A Jack Of All Trades
The E-Tutor: A Jack Of All TradesThe E-Tutor: A Jack Of All Trades
The E-Tutor: A Jack Of All Trades
 
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
Technology Infrastructure For The Pervasive Vision, Does It Exist Yet?
 
Biometrics Iris Scanning: A Literature Review
Biometrics Iris Scanning: A Literature ReviewBiometrics Iris Scanning: A Literature Review
Biometrics Iris Scanning: A Literature Review
 
Self Organisation: Inspiring Neural Network & IT Design
Self Organisation: Inspiring Neural Network & IT DesignSelf Organisation: Inspiring Neural Network & IT Design
Self Organisation: Inspiring Neural Network & IT Design
 
Project Management: A Critical Examination of the PPARS Project
Project Management: A Critical Examination of the PPARS ProjectProject Management: A Critical Examination of the PPARS Project
Project Management: A Critical Examination of the PPARS Project
 
Knowledge Management: A Literature Review
Knowledge Management: A Literature ReviewKnowledge Management: A Literature Review
Knowledge Management: A Literature Review
 
Complexity Versus Comprehendability: Simplifying Wireless Security
Complexity Versus Comprehendability: Simplifying Wireless SecurityComplexity Versus Comprehendability: Simplifying Wireless Security
Complexity Versus Comprehendability: Simplifying Wireless Security
 
Baseline Brainwave Biometrics
Baseline Brainwave BiometricsBaseline Brainwave Biometrics
Baseline Brainwave Biometrics
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Brainwave Feature Extraction, Classification & Prediction

  • 1. Cognitive Computing Feature Extraction, Classification & Prediction www.oliviamoran.me
  • 2. About The Author Olivia Moran is a leading training specialist who specialises in E-Learning instructional design and is a certified Moodle expert. She has been working as a trainer and course developer for 3 years developing and delivery training courses for traditional classroom, blended learning and E-learning. Courses Olivia Moran Has Delivered: ● MOS ● ECDL ● Internet Marketing ● Social Media ● Google [Getting Irish Businesses Online] ● Web Design [FETAC Level 5] ● Adobe Dreamweaver ● Adobe Flash ● Moodle Specialties: ★Moodle [MCCC Moodle Certified Expert] ★ E Learning Tools/ Technologies [Commercial & Opensource] ★ Microsoft Office Specialist ★ Web Design & Online Content Writer ★ Adobe Dreamweaver, Flash & Photoshop FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 3
  • 3. Feature Extraction, Classification & Prediction 1.. ABSTRACT 1 ABSTRACT This document will examine issues pertaining to feature extraction, classification and prediction. It will consider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in an attempt to discriminate between left and right hand imagery movements. It will briefly reflect on the need for brainwave signal preprocessing. The feature extraction and classification process will be examined in depth and the results obtained using various classifiers will be illustrated. Classification algorithms will be given some thought, namely Linear Discriminant Analysis (L.D.A.), K-Nearest Neighbour (K.N.N.) and Neural Network (N.N.) analysis. This document will explore prediction and highlight its effect on accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches however, these are briefly addressed. The way in which biology and nature inspires the design of feature extraction, classification and prediction systems will be explored. Finally future work will be touched on. 2.. IINTRODUCTIION 2 NTRODUCT ON The study of E.E.G. data is a very important field of study that according to Ebrahimi et al (2003) has been ‚Motivated by the hope of creating new communication channels for persons with severe motor disabilities‛. Advances in this area of research caters for the construction of more advanced Brain Computer Interfaces (B.C.I.’s). Wolpaw et al (2002) describes such an interface as a ‚Non-muscular channel for sending messages and commands to the external world‛. The impact that such technologies could have on the quality of peoples’ everyday lives, namely those who have some form of physical disability is enormous. ‚Brain-Computer Interfacing is an interesting emerging technology that translates intentional variations in the Electroencephalogram into a set of particular commands in order to control a real world machine‛ Atry et al (2005). Improvements to these systems are often made through an increased understanding of the human body and the way in which it operates. Feature extraction, classification and prediction are all processes that our bodies carry out on a daily basis with or without our knowledge. Studying such activities will undoubtedly lead researchers to the creation of more biologically plausible B.C.I. solutions. It is not only individuals who will benefit from further studies and understanding of these processes, as feature extraction, classification and prediction have many other applications. Take for example, the world of business. Companies everywhere have to deal with a constant bombardment of information from both their internal and external environments. There seems to be an endless amount of both useful and useless information. As one can imagine, it is often very difficult to find exactly what you are looking for. When people eventually locate what they have been seeking it may be in a format that does not suit them. This is where feature extraction, classification and prediction play their part. These processes are often the only way in which a business can locate information gems in a sea of data. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 4
  • 4. This document explores the various issues pertaining to feature extraction, classification and prediction. The application of these techniques to unlabelled E.E.G. data is examined in an attempt to discriminate between left and right hand imagery movements. It briefly looks at brainwave signal preprocessing. An in depth study of the feature extraction and the classification process is carried out focusing on numerous classifiers. L.D.A., K.N.N. and N.N. classification algorithms are examined. This document gives thought to prediction and how it could be used to improve accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches, however, these methods are mentioned in this document. Biology and nature often inspire the computing industry to produce feature extraction, classification and prediction systems that operate in the same or a similar way as the human body does. This issue of inspiration is briefly addressed and examples from nature are given. Finally areas for future work are considered. 3.. BRAIINWAVE SIIGNAL PREPROCESSIING 3 BRA NWAVE S GNAL PREPROCESS NG E.E.G. data is commonly used for tasks such as discrimination between left and right hand imagery movements. ‚An E.E.G. is a recording of the very weak electrical potentials generated by the brain on the scalp‛ Ebrahimi et al (2003). The collection of such signals is non-invasive and they can be ‚Easily recorded and processed with inexpensive equipment‛ Ebrahimi et al (2003). It also offers many advantages over other methods as ‚It is based on a much simpler technology and is characterized by much smaller time constants when compared to other noninvasive approaches such as M.E.G, P.E.T. and F.M.R.I.‛ Ebrahimi et al (2003). The E.E.G. data used as input for the analysis carried out during the course of this assignment had been preprocessed. Ebrahimi et al (2003) points out ‚Some preprocessing is generally performed due to the high levels of noise and interference usually present‛. Artifacts are factors such as motor movements, eye blinking, electrode movement etc. that are removed, as these are not required and all the essential data needed to carry out classification is left behind. The E.E.G. data was recorded on two different channels, C3 and C4. These correspond to the left and right hemisphere of the motor cortex and would have been recorded by placing electrodes over the right and left sides of the motor cortex as shown in the figure 1 below. Figure 1. – Showing the placing of the electrodes at channels 3 and 4 of the motor cortex. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 5
  • 5. It is important to record signals at these two channels due to the fact that ‚When people execute or imagine the movement of left and right hand, E.E.G. features differs in two brain hemispheres corresponding to sensorimotor hand representation area‛ Pei & Zheng (2004). Subsequently, when an imagined left hand movement is made, there are essentially two signals recorded C3 and C4, with both being left signals and vice versa for the right hand imagery movements. 4.. FEATURE EXTRACTIION 4 FEATURE EXTRACT ON A feature is described by Sriraja (2002) as ‚Any structural characteristic, transform, structural description or graph, extracted from a signal or a part of it, for use in pattern recognition or interpretation. It is a representation of the signal or pattern, containing only the salient information‛. Ripley (1996) goes on to argue that a ‚Feature is a measurement on an example, so the training set of examples has measured features and a class for each‛. Feature extraction is concerned with the identification of features that are unique or specific to a particular type of E.E.G. data such as all imagined left hand movements. The aim of this process is the formation of useful new features by combining existing ones. Using such features facilitates the process of data classification. There are multiple amounts of these features; some provide useful information while others none. The next logical step is the elimination of features that produce the lowest accuracy. For each test ran the accuracy of the classifier used was calculated. This was important as it allowed the author to determine which classifiers gave the best results for the data being examined. Wolpert (1992) points out that ‚Estimating the accuracy of a classier is important not only to predict its future prediction accuracy, but also for choosing a classifier from a given set (model selection), or combining classifiers‛. 5.. THE CLASSIIFIICATIION PROCESS 5 THE CLASS F CAT ON PROCESS 5. 1. Descriptive Classifiers In an effort to find the most appropriate type of classifier for the analysis of the E.E.G. data used in this assignment, the author turned to descriptive methods. These included basic features like the mean, standard deviation and kurtosis. Using this descriptive approach allows for the summarisation of the test and training data. This is useful where the sample contains a large amount of variables. 5. 1. 1. Mean The mean is ‚Short for arithmetic mean: in descriptive statistics, the average value, calculated for a finite set of scores by adding the scores together and then dividing the total by the number of scores‛ Coleman (2003). During ‘Descriptive Features – Test 1’ an accuracy of 64% was obtained using the mean feature. It performed slightly higher than that of the standard deviation, which reached 61% accuracy. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 6
  • 6. 5. 1. 2. Standard Deviation Standard Deviation is defined by Coleman (2003) as ‚A measure of the degree of dispersion, variability or scatter in a set of scores, expressed in the same units as the scores themselves, defined as the square root of the variance‛. ‘Descriptive Features – Test 2’ attempted to classify the E.E.G. data by utilising the feature of standard deviation. An accuracy of 61% was achieved. 5. 1. 3. Kurtosis Kurtosis is useful in that it ‚Provides information about the ‘peakedness’ of the distribution. If the distribution is perfectly normal you would obtain a skewness and kurtosis value of 0‛ Pallant (2001). The results obtained during ‘Descriptive Features – Test 3’ using the kurtosis feature were disappointing with an accuracy of 49%. Kurtosis in this instance was not able to offer a higher level of separability than with both the mean and standard deviation. Kurtosis is usually more appropriate for lager samples, with which more satisfactory results could be accomplished. As noted by Tabachnick & Fidell (1996), ‚Kurtosis can result in an underestimate of the variance, however, this risk is also reduced with a large sample‛. 5. 1. 4. Combination Of Mean, Standard Deviation And Kurtosis Features In some instances the combination of features can allow for greater accuracy, however this was not the case for the E.E.G. data that was examined using the mean, standard deviation and kurtosis. Test results from ‘Descriptive Features – Test 4’ showed accuracy to be in the region of 49% giving much lower performance than that of the mean and standard deviation features when used individually. 5. 1. 5. Conclusion Drawn From Mean, Standard Deviation And Kurtosis Feature Tests The accuracy of the mean as a classifier was substantially higher than that of the standard deviation and kurtosis as well as a combination of all three. On the other hand, it still did not offer a satisfactory level of separation between the imagery left and right signals. These three features it seems are not appropriate for E.E.G. data and are better suited to more simple forms of data. With this in mind the author turned to the Hjorth features. 5. 2. Hjorth Features A number of Hjorth parameters were drawn upon during the course of this assignment. ‚In 1970, Bo Hjorth derived certain features that described the E.E.G. signal by means of simple time domain analysis. These parameters, namely Activity, Mobility and Complexity, together characterize the E.E.G. pattern in terms of amplitude, time scale and complexity‛ Sriraja (2002). These were used in an attempt to achieve a separation between imagery left and right hand signals. The Hjorth approach involves the measurement of the E.E.G. signal ‚For successive epochs (or windows) of one to several seconds. Two of the attributes are obtained from the first and second time derivates of the amplitude fluctuations in the signal. The first derivative is the rate of change of the signal’s amplitude. At peaks and troughs the first derivative is zero. At other points it will be positive or negative depending on whether the amplitude is increasing or decreasing with time. The steeper the slope of the wave, the greater will be the amplitude of the first derivative. The second derivative is determined by taking the first derivative of the first derivative of the signal. Peaks and troughs in the FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 7
  • 7. first derivative, which correspond to points of greatest slope in the original signal, result in zero amplitude in the second derivative, and so forth‛ Miranda & Brouse (2005). According to Sriraja (2002) mathematically the equation for mobility and complexity resembles the following if x1, x2, …, xn are the n EEG data values, and the consecutive differences, xn - xn-1 be denoted as dn 5. 2. 1. Activity Feature Activity is defined by Miranda & Brouse (2005) as ‚The variance of the amplitude fluctuations in the epoch‛. This feature during ‘Hjorth Features – Test 1’ was able to achieve only an accuracy of 44% and therefore offered very poor separability. ‘Hjorth Features – Test 2’ used the same classifier, however the time interval for sampling was changed from the 6th second to the 7th. This change resulted in an accuracy of 55%, an increase of 11% on the previous test. ‘Hjorth Features – Test 3’ was also carried out using the activity feature. This test aimed to determine whether or not changing the number of neurons used in the N.N. would have a notable effect on the accuracy of the classification. A change in this instance of the neuron numbers did not have a significant impact on performance. 5. 2. 2. Mobility Feature ‚Mobility is calculated by taking the square root of the variance of the first derivative divided by the variance of the primary signal‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 4’ utilised this mobility feature for classification purposes. Results from this test showed that accuracy using this feature stands at 52%. 5. 2. 3. Complexity Feature Complexity is described as ‚The ratio of the mobility of the first derivative of the signal to the mobility of the signal itself‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 5’ examined the complexity feature and its effect on accuracy. Results for this test showed the level of accuracy using this feature to be 64%. 5. 2. 4. Combination Of Activity, Mobility And Complexity Features ‘Hjorth Features – Test 6’ combined the activity, mobility and complexity feature in the hope of increasing accuracy further. This test showed very mediocre results with accuracy at 56%. However, when the data windows were specified as in ‘Hjorth Features – Test 7’ more promising results were recorded. Accuracy at 74% was achieved with a greater level of separability of the imagery left and right hand signals than all other pervious results. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 8
  • 8. Combining multiple features is useful as it can often lead to improved accuracy. Lotte et al (2007) highlights this point arguing, ‚A combination of similar classifiers is very likely to outperform one of the classifiers on its own. Actually, combining classifiers is known to reduce the variance and thus the classification error‛. 6.. CLASSIIFIICATIION ALGORIITHMS 6 CLASS F CAT ON ALGOR THMS Kohavi (1995) defines a classifier as ‚A function that maps an unlabelled instance to a label using internal data structures‛. Three different types of algorithms were used for classification. These included the L.D.A, K.N.N. and the N.N. classification algorithms. 6.1. L.D.A. Classification L.D.A. also known as Fisher’s L.D.A. is ‚Often used to investigate the difference between various groups when their relationship is not clear. The goal of a discriminant analysis is to find a set of features or discriminants whose values are such that the different groups are separated as much as possible‛ Sriraja (2002). Lotte et al (2007) describes the aim of L.D.A. as being to ‚Use hyperplanes to separate the data representing the different classes. For a two-class problem, the class of a feature vector depends on which side of the hyperplane the vector is‛. The L.D.A. is concerned with finding the features that will maximise the distance between the two classes and reducing the distance that exists among the interclass. This concept is illustrated in figure 2 below. Imagery Left Hand Data Imagery Right Hand Data Imagery Right Hand Data Figure 2. – Shows a hyperplane that is used to illustrate graphically the separation of the classes i.e. the separability of the imagery left hand data from the imagery right hand data The equation for L.D.A. can be denoted in mathematical terms. Sriraja (2002) discusses the equation of L.D.A. and the principles on which it works. ‚First, a linear combination of the features x are projected into a new feature, y. The idea is to have a projection such that the y’s from the two classes would be as much separated as possible. The measure of separation between the two sets of y’s is evaluated in FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 9
  • 9. terms of the respective means and the variances of the projected classes . . . The objective is therefore to have a linear combination such that the following ratio is maximised.‛ where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and n1 and n2 are the sample sizes for the two sets‛. During testing the author utilised scatter graphs like figure 3 below to display graphically the results from the tests. Figure 3 shows the scatter graph that was constructed as part of test, which attempted classification of the E.E.G. data using the mean feature. The accuracy achieved using this feature was 64%. 0.08 0.06 0.04 0.02 0 -0.02 -0.04 -0.06 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 Figure 3. – Mean Scatter Graph The next graph Figure 4 illustrates the results of a test examining standard deviation with the accuracy of this feature standing at 61%. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 10
  • 10. 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Figure 4. – Standard Deviation Scatter Graph Scatter graphs are described by Fisher & Holtom (1999) as useful for the presentation of ‚The relationship between two different types of information plotted on horizontal, x, and vertical, y, axis. You simply plot the point at which the values meet, to get an idea of the overall distribution of your data‛. Pallant (2001) is keen to point out that ‚The scatter graph also provides a general indication of the strength of the relationship between your two variables. If the relationship is weak, the points will be all over the place, in a blob type arrangement. For a strong relationship the points will form a vague cigar shape with a definite clumping of scores around an imaginary straight line‛. 6.2. K.N.N. Classification The K.N.N. function is concerned with the computation of the minimum distance between the test data and the data used for training. Ripley (1996) defines test data as a ‚Set of examples used only to assess the performance of a fully specified classifier‛ while training data is a ‚Set of examples used for learning, that is to fit the parameters of the classifier‛. The K.N.N. belongs to the family of discriminative nonlinear classifiers. According to Lotte et al (2007) the main objective of this method is ‚To assign to an unseen point the dominant class among its k nearest neighbours within the training set‛. A metric distance may be used to find the nearest neigbour. ‚With a sufficiently high value of k and enough training samples, K.N.N. can approximate any function which enables it to produce nonlinear decision boundaries‛ Lotte et al (2007). 6.3. N.N. Classification N.N.’s are widely used for classification ‚Due to their non-linear model and parallel computation capabilities‛ Sriraja (2002). N.N.’s are described by Lotte et al (2007) as ‚An Assembly of several artificial neurons which enables us to produce nonlinear decision boundaries‛. The N.N. used for the classification tests was the Multilayer Perception (M.L.P.) which is one of the more popular N.N.’s. It used 10 linear neurons for the first input layer and then 12 for the hidden layer. In this M.L.P. N.N ‚Each neuron’s input is connected with the output of the previous layer’s neurons whereas the neurons of the output layer determine the class of the input feature vector‛ Lotte et al (2007). FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 11
  • 11. M.L.P. are useful for classification, provided they have a satisfactory amount of neurons and layers ‚They can approximate any continuous function‛ Lotte et al (2007). They are commonly used as they can quickly adapt to different problems and situations. However, it must be noted, ‚The fact that M.L.P. are universal approximators makes these classifiers sensitive to overtraining, especially with such noisy and non-stationary data as E.E.G. therefore, careful architecture selection and regularization is required‛ Lotte et al (2007). The greater the amount of neurons available or used, the greater the ability of the N.N. to learn however, they are susceptible to over learning and therefore sometimes a lower amount of neurons gives greater accuracy. Cross validation is useful as it is concerned with preventing the N.N. from learning too much and consequently ignoring new data when it is inputted. Usually training sets are small in size as it is very time consuming and costly collecting ‚Known cases for training and testing‛ Masters (1995). These small cases are often broken down further into relatively small sets for both training and testing, however this is not a desirable approach. Instead of taking this action one can avail of cross validation. This is a process which ‚Combines training and validation into one operation‛ Masters (1995). When constructing a prediction rule reducing the error rate where possible is an important task. Efron (1983) describes an error rate as the ‚Probability of incorrectly classifying a randomly selected future case, in other words the exception‛ to the rule. Cross validation is often used to reduce this error rate and ‚Provides a nearly unbiased estimate, using only the original data‛ Efron (1983). 6. 3. 1. Euclidean Distance A part of the N.N. algorithm examines the Euclidean distance. This distance refers to the difference between the coordinates i.e. location of a set of objects squared. This Euclidean distance between two points where and can be denoted as 6.. PREDIICTIION 6 PRED CT ON Frank et al (2001) defines a time series as ‚A sequence of vectors, x(t), t = 0,1,… , where t represents elapsed time. Theoretically, x may be a value which varies continuously with t, such as a temperature‛. This time series method can be used in prediction in what is known as time series prediction. It involves the examination of past performance to predict future performance. This according to Coyle et al (2004) can be used to improve classification accuracy. Their work uses a ‚Novel feature extraction procedure which carries out self-organising fuzzy neural network based time series prediction, performing feature extraction in the time domain only‛. Using such a method in their studies allowed for classification accuracies in the region of 94%. They argue that the main FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 12
  • 12. advantage of this approach is that ‚The problem of specifying the neural network architecture does not have to be considered‛. Instead of adapting the parameters for individual users, the system can ‚Self-organise the network architecture, adding and pruning neurons as required‛ just like with the human body. The author, using 6-step ahead prediction carried out a number of tests. The parameters for these tests were set at the following, unless otherwise stated.  Data was trained and tested with x (trl3)  Embedding Dimension = 6  Time Lag = 1  Cross Validation was not used  Number of neurons available to the neural network = one layer of 6. All results were graphically displayed on a chart like that seen in figure 5 below. Training Vectors 0.15 0.1 0.05 Target and Output 0 -0.05 -0.1 -0.15 -0.2 0 500 1000 1500 2000 2500 3000 Time Step t Figure 5. – Shows the training data in blue and the test data in red. The difference between these two lines is referred to as the root square error or simply the error rate. 7. 1. One Layer Neural Network The first test examined accuracy using a neural network with one layer of 6 neurons. This test was ran 10 times and then the average training root mean square and testing root mean square were calculated. The training root mean square was recorded at 0.0324 and the testing root mean square at 0.0313. 7. 2. Multi Layer Neural Network The next test was conducted using the exact same parameters except the neural network was changed from a single layer network with 6 neurons to one that also has a hidden layer of 8 neurons. The results from this test were slightly worse than the previous with a training and testing root mean square of 0.0326 and 0.0314. The difference between the figures from test 1 and test 2 were extremely minuet. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 13
  • 13. 7. 3. Cross Validation The next test was exactly the same as test 1 except that cross validation was used to determine whether or not it has a negative or positive effect. The training data scored slightly better with cross validation at 0.0293 compared to 0.0324 obtained in test 1. On the other hand the testing data performed better in test 1 with 0.0313 rather than 0.0317 found with cross validation. 7. 4. Left E.E.G. Data A test was carried out which used trl3 to train the network and trl4 to test it. The training mean square root was relatively the same as previous experiments using the same parameters for the training data. The testing mean square root however, was much improved with a result of 0.0240 compared to 0.0313 using trl3 for training. 7. 5. Right E.E.G. Data Tests were conducted using the right data. The N.N. was trained and tested with trr3. The error was a lot less than that found with the tests on the left data using the same parameters. 0.0292 was recorded for the training mean root square error and 0.0281 for the testing mean root square error. The right data was also tested to see what effect testing the N.N. with trr4 instead or trr3 would have on the performance. The training root mean square error stayed more or less the same and the testing root mean square error increased slightly to 0.0293. 8.. OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACTIION 8 OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACT ON There are many other methods that could be used and that offer satisfactory performance when it comes to feature extraction for B.C.I’s. 8. 1. Amplitude And Phase Coupling Measure One such approach was created by Wei et al (2007), it is known as the ‘Amplitude and Phase Coupling Measure’. This method is concerned with ‚Using amplitude and phase coupling measures, quantified by a nonlinear regressive coefficient and phase locking value respectively‛. Wei and his colleagues carried out studies utilising this approach. The results obtained from the application of this feature extraction method were promising. The ‚Averaged classification accuracies of the five subjects ranged from 87.4% to 92.9%‛ and the ‚Best classification accuracies ranged between 84.4% and 99.6%‛. The conclusion reached from these studies is that ‚The combination of coupling and autoregressive features can effectively improve the classification accuracy due to their complementarities‛ Wei et al (2007). FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 14
  • 14. 8. 2. Combination Of Classifiers Some researchers in an effort to improve performance and accuracy have begun using multiple classifiers to achieve the desired results. The author attempted this approach with the combination of mean, standard deviation and kurtosis as well as activity, mobility and complexity however, there are various different strategies that can be followed. These include boosting, voting and stacking to name but a few. Boosting basically operates on the principle of cooperation with ‚Each classifier focusing on the errors committed by the previous ones‛ Lotte et al (2007). Voting on the other hand works like a voting system. The different modules of the N.N. are ‚Modeled as multiple voters electing one candidate in a single ballot election assuming the availability of votes' preferences and intensities. All modules are considered as candidates as well as voters. Voting bids are the output-activations of the modules forming the cooperative modular structure‛ Auda et al (1995). Those candidates who have the majority vote wins. According to Lotte et al (2007) ‚Voting is the most popular way of combining classifiers in B.C.I. research, probably because it is simple and efficient‛. Another strategy used for the combining of classifiers is what’s known as ‘Stacking’. This method according to Ghorbani & Owrangh (2001) ‚Improves classification performance and generalization accuracy over single level cross-validation model‛. 8. 3. Multivariate Autoregressive Analysis (M.V.A.R.) Studies have been conducted in the past based on the M.V.A.R. model. Pei et al (2004) carried out such a study and boasts a classification accuracy of 88.57%. They describe the MVAR model as ‚The extension form of univariate A.R. model‛ and argue, ‚Using the coefficients of M.V.A.R. model as EEG features is feasible‛. 9.. IINSPIIRATIION FROM BIIOLOGY 9 NSP RAT ON FROM B OLOGY There is no doubt that inspiration for some of the classification and prediction techniques that we use today came from the world of biology. Shadbolt (2004) points out that ‚We see complexity all around us in the natural world – from the cytology and fine structures of cells to the organization of the nervous system . . . Biological systems cope with and glory in complexity – they seem to scale, to be robust and inherently adaptable at the system level . . . Nature might provide the most direct inspiration‛. The author shares the view of Bamford et al (2006) that ‚An attempt to imitate a biological phenomenon is spawning innovative system designs in an emerging alternative computational paradigm with both specific and yet unexplored potential‛. 9. 1. Classification And Object Recognition Our brains are constantly classifying things in our everyday environment whether we are aware of it or not. Classification is the process that is responsible for letting us determine what the objects around us are i.e. a chair, a car, a person. It even allows us to recognise different faces of people with whom we come in contact with. The brain is able to distinguish each specific object by examining its numerous features and does so with great speed and accuracy. Many systems seek to reproduce a similar means of classifying data and can be useful in nearly every kind of industry. Take for example, the medical FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 15
  • 15. industry in which classification plays a crucial role. Classification is used extensively for the identification of almost every kind of disease and illness. The process of diagnosis would be much more complex and time consuming if classification techniques were not applied to it. 9. 2. Self-Organisation Computer systems i.e. neural networks can be constructed on the same principles and concepts of self- organisation in humans. The term self-organisation is used to describe the process by which ‚Internal structures can evolve without the intervention of an external designer or the presence of some centralised form of internal control. If the capacities of the system satisfy a number of constraints, it can develop a distributed form of internal structure through a process of self-organisation‛ Cilliers (1998). Self-organising maps are widely used a method for feature extraction and data mapping as well as prediction. Self-organising neural networks can encompass a time series prediction element and often with huge success. These can be extremely useful for predicting trends in different areas such as weather forecasting, marketing, the list is endless. The various prediction algorithms available work in the same way as the nervous system in humans. These programs aim to replicate the ‘anticipatory neural activity’ that occurs in the body and reproduce this in a system. Take for example a financial decisions system recently developed. This system looked at how using the ‘anticipatory neural activity’ element and taking it into consideration could help people using this system to make decisions that are more likely to be successful and thus less risky. When people are making financial decisions, they can often opt for an option that seems like the irrational one. The reasons for this irrational thought had not previously been known. Kuhnen & Knutson (2005) examined ‚Whether anticipatory neural activity would predict optimal and suboptimal choices in a financial decision-making task‛. They observed that the nucleus accumbens was more active when risky choices were being made and that anterior insula when riskless options were being followed. From their findings they concluded that particular neural circuits linked to anticipatory affect would either hinder or encourage an individual to go for either a risky or riskless choice. They also uncovered the fact that an over activation in these circuits are more likely to cause investing mistake and ‚Thus, consideration of anticipatory neural mechanisms may add predictive power to the rational actor model of economic decision making‛. The system was able to replicate relatively successfully the way in which humans make investment decisions. 10.. FURTURE WORK 10 FURTURE WORK The combination of classifiers is gaining popularity and becoming more widely used as a means of improving accuracy and performance. From researching this topic one can see that most publications deal with one particular classifier with little effort been taken to compare one classifier to the next. Studies could be undertaken in an attempt to compare these to particular criteria. There is a lot more room for improvement considering the algorithms that are available at the moment. A deeper understanding of the human brain and how it classifies and predicts should lead to the creation of more biologically plausible solutions. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 16
  • 16. 11.. CONCLUSIION 11 CONCLUS ON This document addressed the various issues pertaining to feature extraction, classification and prediction. It focused on the application of these techniques to unlabelled E.E.G. data. This was done in an effort to discriminate between left and right hand imagery movements. It briefly reflected on the need for brainwave signal preprocessing. An in depth analysis of the feature extraction and classification process was carried out and the results highlighted. Classification algorithms were examined, namely L.D.A., K.N.N. and N.N. This document looked at prediction and its effect on accuracy. Due to time and knowledge constraints the data could not be tested using all the desired approaches, however, a number of these other methods not tested were dealt with. This document also highlighted the fact that inspiration for the design of feature extraction, classification and prediction systems often comes from nature. Finally thought was given to future work. From studying the E.E.G. data and carrying out various tests using numerous parameters and classifiers, it has been concluded that a combination of the three Hjorth features, activity, mobility and complexity gives the highest level of accuracy. The author discovered that the descriptive classifiers drawn upon are not suitable for E.E.G. data, as they do not provide a satisfactory level of separation, they work better with simple data. It was found that feature extraction and classification enjoyed more success by using cross validation and a multiple layer N.N. in contrast to prediction that was best suited to a single layer N.N. without cross validation. The greatest level of accuracy recorded using the combined Hjorth features was 74%. Separability of the left hand imagery motor signal from the right was greater at 7 seconds than it was at 6. Accuracy was improved by specifying the data window extents of s=680 and e=700. Prediction tests indicated that left hand data is more easily separated and classified than the right hand data. The author also realised that the N.N. performed better when different data was used for training and testing. New methods of feature extraction, classification and prediction will undoubtedly be discovered as the understanding of the human body evolves. The research of this particular topic can extend over multiple disciplines and therefore it is likely that ‚Insights from one subject will inform the thinking in another‛ Shadbolt (2004). Advances made in the field of science often results in complimentary gains in the area of computing and vice versa. All the processes discussed in this document can have a huge impact on the lives of individuals, businesses and society at large. Many people suffering from motor impairments rely heavily on B.C.I. technologies that incorporate classification and prediction techniques for everyday living. They will undoubtedly progress society towards the creation of a safer and more inclusive society. Classification and prediction can often be an integral part of any business decision. A manager may consult with his/her computer system to make risky business decisions such as, should I invest in this new product?, how much stock should I buy? etc. Society also benefits from feature extraction, classification and prediction. These processes are widely used for disease and illness diagnoses and other things such as weather forecasting and storm prediction to name but a few. Consequently, it is safe to assume that this field of study will remain a popular one in the years to come and make many more advances. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 17
  • 17. BIIBLIIOGRAPHY B BL OGRAPHY ATRY, F. & OMIDVARNIA, A. H. & SETAREHDAN, S. K. (2005) ‚Model Based E.E.G. Signal Purification to Improve the Accuracy of the B.C.I. Systems‛ Proceedings from the 13th European Signal Processing Conference. Auda, G. & Kamel, M. & Raafat, H. (1995) ‚Voting Schemes for Cooperative Neural Network Classifiers‛ Neural Networks 3(3), pp. 1240-1243. Proceedings of the IEEE International Conference on Neural Networks. Bamford, S. & Murray, A. & Willshaw, D. J. (2006) ‚Synaptic Rewiring in Neuromorphic VLSI for Topographic Map Formation‛ [Internet], Date Accessed 15 April 2007, Available From: http://www.see.ed.ac.uk/~s0454958/interimreport. pdf. ColEman, A. M. (2003) ‚Oxford Dictionary of Psychology‛ Oxford: Oxford University Press. COYLE, D. & PRASAD, g. & MCGINNITY, T. M. (2004) ‚extracting Features for a Brain-Computer Interface by Self-Organising Fuzzy Neural Network-Based Time Series Prediction‛ Proceedings from the 26th Annual International Conference of the IEEE EMBS. Cilliers, P. (1998) ‚Complexity and Postmodernism: Understanding Complex Systems‛ London: Routledge. EBRAHIMI, T. & VESIN, J. M. & GARCIA, G. (2003) ‚Brain-Computer Interface in Multimedia Communication‛ IEEE Signal Processing Magazine 20(1), pp. 14-24. Efron, B. (1983) ‚Estimating the Error Rate of Prediction Rules: Improvement on Cross Validation‛ Journal of the American Statistical Association 78(382), pp. 316-331. FISHER, E. & HOLTOM, D (1999) ‚Enjoy Writing Your Science Thesis or Dissertation – A Step by Step Guide to Planning and Writing ‛ London: Imperial College Press. FRANK, R. J. & DAVEY, N. & HUNT, S. P. (2001) ‚Time Series Prediction and Neural Networks‛ Journal of Intelligent and Robotic Systems 31(1-3), pp. 91-103. GHORBANI, A. A. & OWRANGH, K. (2001) ‚Stacked Generalization in Neural Networks: Generalization on Statistically Neutral Problems‛ Neural Networks 3, pp. 1715-1720, Proceedings from the IJCNN International Joint Conference on Neural Networks. Kohavi, R. (1995) ‚A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection‛ IJCAI Proceedings from the International Joint Conference on Artificial Intelligence. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 18
  • 18. Kuhnen, C. M. & knutson, b. (2005) ‚The Neural Basis of Financial Risk Taking‛ Neuron 47(5), pp. 763-770. LOTTE, F. & CONGEDO, M. & LECUYER, A. & LAMARCHE, F. & ARNALDI, B. (2007) ‚A Review of Classification Algorithms for EEG-Based Brain-Computer Interfaces‛ Journal of Neural Engineering 4, pp. R1-R13. MASTERS, T. (1995) ‚Neural, Novel & Hybrid Algorithms for Time Series Prediction‛ New York: John Wiley & Sons Inc. MIRANDA, E. & BROUSE, A. (2005) ‚Toward Direct Brain-Computer Musical Interfaces‛ Proceedings from the 2005 Conference on New Interfaces for Musical Expression, pp. 216 - 219. PALLANT, J. (2001) ‚S.P.S.S. Survival Manual – A Step By Step Guide To Data Analysis Using S.P.S.S.‛ Berkshire: Open University Press. PEI, X. M. & ZHENG, C. X. (2004) ‚Feature Extraction and Classification of Brain Motor Imagery Task Based on MVAR Model‛ Machine Learning and Cybernetics 6, pp. 3726 – 3730, Proceedings from the 3rd International Conference on Machine Learning and Cybernetics. RIPLEY, B. D. (1996) ‚Pattern Recognition and Neural Networks‛ Cambridge: Cambridge University Press. SHADBOLT, N. (2004) ‚From the Editor in Chief: Nature-Inspired Computing‛ IEEE Intelligent Systems 19(1), pp.2-3. SRIRAJA, Y. (2002) ‚E.E.G. Signal Analysis for Detection of Alzheimer’s Disease‛ PhD Thesis, Texas Tech University, Data Accessed: 11 April 2007, Available From: http://webpages.acs.ttu.edu /ysriraja/MSthesis/Thesis.pdf. TABACHNICK, B. G. & FIDELL, L. S. (1996) ‚Using Multivariate Statistics‛ 3 ed. New York: Harper Collins. WEI, Q. & WANG, Y. & GAO, X. & GAO, S. (2007) ‚Amplitude and Phase Coupling Measures for Feature Extraction in an E.E.G.-Based Brain-Computer Interface‛ Journal of Neural Engineering 4, pp. 120-129. Wolpaw, J. R. & Birbaumer, N. & McFarland, D. J. & Pfurtscheller, G. & Vaughan, T. M. (2002) ‚Brain- Computer Interfaces for Communication and Control‛ The Journal of Clinical Neurophysiology 113(6), pp. 767-91. WOLPERT, D. H. (1992) ‚Stacked Generalization‛ Neural Networks 5(2), pp. 241-259, Pergamon Press. Heading One FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 19