Contenu connexe
Similaire à Classification of data using semi supervised learning a learning disability case
Similaire à Classification of data using semi supervised learning a learning disability case (20)
Plus de IAEME Publication
Plus de IAEME Publication (20)
Classification of data using semi supervised learning a learning disability case
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
432
CLASSIFICATION OF DATA USING SEMI-SUPERVISED LEARNING (A
LEARNING DISABILITY CASE STUDY)
Pooja Manghirmalani Mishra1, Dr. Sushil Kulkarni2
1Dept. of Computer Science, University of Mumbai, Mumbai, India
2Dept. of Mathematics, Jai Hind College, Mumbai, India
ABSTRACT
In classification, Semi-supervised learning occurs when a large amount of unlabeled data
is available. In such a situation, how to enhance predictability of classification through unlabeled
data is the focus. In this paper, we propose a methodology based on Support Vector Machine of
semi- supervised learning and implement it on the case samples of learning disability.
It is observed that about 10% of children enrolled in school have a learning disability. Learning
disability prediction in school age children is a very complicated task because it tends to be
identified in elementary school where there is no one sign to be identified. As the information is in
the from labeled and unlabeled data, when applied together with the concept of margins is proving to
give better accuracy for predicting learning disability within children.
Keywords: Support Vector Machine, Learning Disability, Semi-Supervised Learning, Hyperplane.
I. INTRODUCTION
Learning refers to the process of inferring common rules by surveying examples. For instance,
child can learn what a ‘car’ is, just by showing examples of objects that are cars and objects that
are not cars. They need not be told any rules about what makes an object a car; child can simply
learn the model ‘car’ by observing examples. As the object car comes with the label that defines it, its
examples (data or samples) are of supervised learning category.
Semi-supervised learning (SSL) is a type of machine learning techniques that make use of
both labeled and unlabeled samples for training. SSL falls between unsupervised learning and
supervised learning. Because SSL requires less human effort and claims to give better accuracy, it is
of great interest both in theory and in practice. Computation models are needed, as in
classification, SSL occurs when a large amount of unlabeled samples are available with only a small
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 432-440
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
433
number of labeled samples. In such a situation, how to enhance predictability of classification
through unlabeled samples is the focus. As classification is a data mining technique used to predict
group membership for data instances, this work aims to novel large margin SSL methodologies,
using grouping information from unlabeled samples, in a form of regularization controlling the
interplay between labeled and unlabeled sample.
The working of SSL is sampled on a data set of Learning Disability (LD). Learning
disability refers to a neurobiological disorder which affects a person‘s brain and interferes with a
person's ability to think and remember [1]. The causes that lead to learning disability (LD) are
maturational delay, some unexplained disorder of the nervous system and injuries before birth or in
early childhood. Children born prematurely and children who had medical problems soon after
birth can also inherit LD [2]. LD can be broadly classified into three types. They are difficulties in
learning with respect to read (Dyslexia), to write (Dysgraphia) or to do simple mathematical
calculations (Dyscalculia) [3].
The term ―specific learning disability means a disorder in one or more of the basic
psychological processes involved in understanding or in using language, spoken or written This may
manifest itself in an imperfect ability to listen, speak, read, write, spell, or to do mathematical
calculations. The term includes such conditions as perceptual handicaps, brain injury, minimal brain
dysfunction, dyslexia, and developmental aphasia. The term does not include children who have
learning disabilities which are primarily the result of visual, hearing, or motor handicaps, mental
retardation, emotional disturbance, of environmental, cultural, or economic disadvantage [4].
LD cannot be cured completely by medication. Children suffering from LD are made to go
through a remedial study in order to make them cope up with non-LD children of their age. For
detecting LD, there does not exist a global method.
This paper proposes a model for diagnosis and classification of LD. Section II of this
paper explores in detail different computational methods and models applied for SSL. Having
elaborately explored different approaches, we have found that there are still possible ways of
approaching the given problem. Section III discusses the Support Vector Machine concept designed
to classify the problem of LD. Section IV gives the implementation requirement of the system and
sections V and VI discusses the results and future objectives respectively.
II. TAXONOMY
Of lately, several SSL methods have been introduced [5] Inclusive reviews are given in [6]
on SSL algorithms. One of the popular approaches for SSL is based on a weighted graph [7]
where labeled and unlabeled points compose the vertices of the graph where the similarities between
the data point pairs are shown by the edge weights. A function is then used to label the
unlabeled points on the graph. The technique for finding the appropriate weights and selecting
the labeling function may differ.
Many of these graph-based methods such as [8, 9, 10, 11] presume a transductive situation.
In the transductive situation, the learner needs to observe the unlabeled (testing) data while training.
This leads to retraining of these transductive algorithms every time a test sample is to be classified.
Hence transductive algorithms may not satisfy the run-time requirements for many real world
applications, including computer-aided diagnosis applications where new patient cases need to be
classified in real- time as part of the physician’s work flow.
Apart from the graph-based methods, T. Joachims [12] introduced an SVM-based semi-
supervised algorithm (TSVM), where the labels of the unlabeled points are initialized with the
prediction of the SVM classifier trained on the labeled data. Then the labels of the unlabeled points
are altered till the time margin is improved. Even though this approach may lead to a local optimal
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
434
solution and it can be comparatively time consuming.
In Bennett et al. [13] introduced a mixed integer programming (MIP) formulation that results
in inductive classifiers (i.e., the algorithm produces a classifier that can be used directly to classify
new samples without retraining). Nevertheless, this method needs a complex optimization solver and
it is not feasible for data where the size of the unlabeled set is not small.
Apart from these, there are some methods that attempt to find efficient approximate solutions
to the MIP formulation [14], the drawback of these formulations is that they converge to a local
minimum which may not be a sufficiently “good” solution.
The only problem seen in SSL is Scalability. Current semi-supervised learning methods have
not yet handled large amount of data. The complexity of many elegant graph-based methods is close
to O(n3). Speed-up improvements have been proposed by many researchers but their effectiveness
has yet to be proven on real large problems [15].
III. CLASSIFYING LEARNING DISABILITY DATA SAMPLE
Mostly detection of LD is done using Wechsler Intelligence Scale for Children (WISC) test
[16], conducted in the supervision of special educators and with the observation of parent and
teachers. In this context, computational approach to detect LD is quite significant.
Although states vary considerably in the IQ and achievement criteria used to designate a
child as LD, discrepancy is used in either the definition and/or criteria by virtually all states, with the
use of an IQ test to establish “aptitude” equally common.
The IQ-achievement discrepancy criterion is the most controversial and best-studied
component of the central definition of LD. From a classification perspective, it is a hypothesis
that children with poor achievement below a level predicted by an IQ score are different from
children with poor achievement consistent with their IQ score. IQ-discrepant children with LD have
been proposed to differ from low achievers who are not IQ discrepant on several dimensions,
including neurological integrity, cognitive characteristics, response to intervention, prognosis,
gender, and the heritability of LD.
Solution to this is a form of knowledge representation suitable for notions that cannot be
defined precisely, but which depend upon their contexts. The soft computing technique called
Support Vector Machine provides an alternative way to represent linguistic and subjective attributes
of the real world in computing. It deals with the labeled and unlabeled data together. In this
circumstance, computational approach to classify LD is quite significant.
A. Collection of Exhaustive Parameters
A curriculum-based test was designed with respect to the syllabus of primary-level school
going children. This test was conducted in schools for collecting LD datasets for testing. Historic
data for LD cases were collected from LD Clinics of Government hospitals where the tests were
conducted in real-time medical environments. The system was fed with 11 input units which
correspond to 11 different sections of the curriculum-based test.
Table-1 shows the initial 11 inputs corresponding to curriculum-based test. Column 1
represents the name of the parameter, column-2 represents the total marks allocated to a particular
section, and column-3 determines the category of LD a section corresponds to. Dataset consists of
340 cases of LD children. The system was trained using 70% of the data items and the remaining
was used to test the network [17, 18, 19, 20].
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
435
Input
Parameter
Marks Category of LD
Essay 10 Dysgraphia
Reading 10 Dyslexia
Comprehension 10 Dyslexia,Dysgraphia
Spelling 10 Dysgraphia
Perception 10 Dyslexia
Solve 10 Dysgraphia
Word Problem 10 Dyscalculia,Dyslexia
Mental Sums 10 Dyscalculia
Time 10 Dyscalculia
Calander 05 Dyscalculia
Money 05 Dyscalculia
Table-1: Parameters and marks of curriculum -based test
B.Support Vector Machine
Vladimir Vapnik invented Support Vector Machine in 1979 [21].Support Vector Machine
algorithm is based on statistical learning theory. It is a new method for the classification of
both linear and non-linear data.
The fundamental idea following the SVM is to map the original data into a feature space
with high dimensionality through a non-linear mapping function and create an optimal hyper plane
in new space [22].
SVM can be useful to both classification and regression. In the case of classification, an
optimal hyper plane is found that separates the data into two classes, whereas in the case of
regression a hyper plane is to be constructed that less close to as many points as possible [23, 24].
By separating the classes with a large margin minimizes a bound on the expected generalization
error. A minimum generalization error means that when new examples arrive for classification, the
chance of making an error in the prediction based on the learned classifier should be minimum [25].
Such a classifier is one, which achieve maximum separation margin between the classes. The two
planes parallel to the classifier and which passes through one or more points in the data set are
called bounding planes [26].
SVMs select a small number of critical boundary instances called support vectors from each
class and build a linear discriminant function that separates them as widely as possible [15]. The
points in the dataset falling on the bounding planes are called support vectors.
SVM algorithm transforms the original data in a higher dimension, from where it can find a
hyper plane for separation of the data using essential training tuples called support vectors [27]. If
the training vectors are separated without errors by an optimal hyper plane, the expected error rate
on a test sample is bounded by the ratio of the expectation of the support vectors to the number of
training vectors [28]. Since this ratio is independent of the dimension of the problem, if one can find
a small set of support vectors, good generalization is guaranteed. .
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
436
Figure-1: Maximum-margin hyperplane and margins for an SVM trained with samples from two
classes. Samples on the margin are called the support vectors
The overall aim is to generalize well to test data. This is obtained by introducing a
separating hyperplane, which must maximize the margin between the two classes; this is known as
the optimum separating hyperplane [30].
Assuming each example consists of a m number of data points (x1,……xm) followed
by a label, which in the two class classification we will consider later, will be +1 or -1. -1
representing one state and 1 representing another. The two classes are then separated by an optimum
hyperplane, illustrated in figure 1, minimizing the distance between the closest +1 and -1
points, which are known as support vectors [29]. The right hand side of the separating hyperplane
represents the +1 class and the left hand side represents the -1 class. This classification divides
two separate classes, which are generated from training examples
C. Algorithm
We consider data points of the form {(x1, y1), (x2, y2), (x3, y3), (x4, y4) ……….,(xn, yn)}.
Where yn=1 or -1, a constant denoting the class to which that point xn belongs. n = number
of sample.
i. Each xn is p-dimensional real vector. The scaling is important to guard against
variable (attributes) with larger variance. We can view this Training data, by means of the
dividing (or separating) hyperplane, which takes
w . x + b = 0 ----- (1)
ii. Where b is scalar and w is p-dimensional Vector. The vector w points perpendicular to
the separating hyperplane. Adding the offset parameter b allows us to increase the margin.
Absence of b, the hyperplane is forced to pass through the origin, restricting the solution.
As we are interesting in the maximum margin, we are interested SVM and the
parallel hyperplanes. Parallel hyperplanes can be described by equation w.x + b = 1 , w.x +
b = -1
iii. If the training data are linearly separable, we can select these hyperplanes so that there are
no points between them and then try to maximize their distance.
By geometry, we find the distance between the hyperplane is 2 / │w│. So we want to
minimize │w│. To excite data points, we need to ensure that for all i either
w. xi – b ≥ 1 or w. xi – b ≤ -1
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
437
iv. This can be written as:
yi ( w. xi – b) ≥1 , 1 ≤ i ≤ n ------(2)
v. Samples along the hyperplanes are called Support Vectors (SVs). A separating
hyperplane with the largest margin defined by M = 2 / │w│ that is specifies support vectors
means training data points closets to it. Which satisfy,
yj [wT . x j + b] = 1 , i =1 -----(3)
vi. Optimal Canonical Hyperplane (OCH) is a canonical Hyperplane having a maximum
margin.
For all the data, OCH should satisfy the following vii. Constraints
yi [wT . xi + b] ≥1 ; i =1,2…l ------(4)
where l is Number of Training data point.
viii. In order to find the optimal separating hyperplane having a maximal margin, A
learning machine should minimize ║w║2 subject to the inequality constraints
yi [wT . xi + b] ≥ 1 ; i =1,2…….l
D. Steps
i. The data is in form of pair, consisting of an input object and desired output value.
ii. Take the input marks of 11 subjects for each student.
iii. Calculate the weighted sum.
iv. Plot the graph as Student Name Index vs. Marks (weighted sum).
v. Based on the data plotted on the graph find the maximum-margin hyperplane that divides
the points having class as -1 or +1.
vi. After getting the hyperplane check that
IF the points are above the hyperplane;
Assign a label LD->NO
ELSE IF the points are below the hyperplane
THEN assign label LD->YES.
vii. Then compare this Predicted Label with Actual Label and calculate the accuracy.
IV. IMPLEMENTATION
The system is implemented using JAVA. The LD data collected is stored in Excel sheets. The
experiments were conducted on a workstation with an Intel Pentium(R) 4 CPU, 3.06ghz, 2GB of
RAM, running on Microsoft Windows 7 Home Edition, Version 2002 with Service Pack 3.
V. RESULT AND DISCUSSION
SVM belongs to the group of supervised learning algorithms in which the learning
machine is given a set of examples with the associated labels as in the case of decision trees, the
examples are in the form of attribute vectors. This case study has been carried out on more than
300 real data sets with the attributes, which represents the symptoms of LD, takes binary values and
more work need to be carried out on quantitative data, as that is an important part of any data set.
In comparison with our other study using LVQ and Single Layer Perceptron [17, 18], in this field,
we found that SVM is more suitable in attribute selection while decision tree would be probably
better suitable in classification.
- 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
438
Graph 1: SVM Classifier graph of the entire dataset. The margin represents the border where the
points above the line belong to non-LD group and the points below the line show the LD group of
children
Correctly Classified Instances in % = 84.61538461538461
Incorrectly Classified Instances in % = 15.0% Hence system accuracy is 84.615%
VI. FUTURE WORK OBJECTIVE
We want to propose various algorithms that will overcome the limitations of SSL.
i. The first limitation is that it is very costly to obtain labeled instances. This is mainly
observed in medical domain where physician can evaluate and assign labels which becomes
time consuming. We want to propose methods with the help of relationship between labeled
and unlabeled sample and later on incorporate hidden information which is hidden in
the unlabeled sample into learning algorithm like SVM.
ii. The second limitation that we have observed is that in classification techniques, sample
from training set and test set is independent and identically distributed (i.i.d.) as each
random variable has the same probability distribution as others and all are mutually
independent. In practical applications this assumption may or may not be realistic as sub-
groups of samples have a high degree of correlation amongst both their features and their
labels. Thus we have an intension to introduce approaches that relax the i.i.d. assumption in
learning algorithm like SVM.
iii. The last limitation that we want to study is that most of the classification techniques
are designed for binary classification. In many applications we may require more than
two classes. Thus we have an interest to study different algorithms where their learning
can be extended to multiple classes for pursuing two goals in mind: for efficiency in
terms of training and testing times, increasing accuracy by finding information that is
hidden in inter- class relationships. For instance SVM is designed for binary classification
and our interest is to study different algorithms where SVM learning can be extended to
multiple classes.
- 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
439
REFERENCES
[1]. S.A Kirk, Educating Exceptional Children Book, Wadsworth Publishing, ISBN:
0547124139.
[2]. Lisa L. Weyandt; ―The physiological bases of cognitive and behavioral disorders‖;
Blausen Medical Communications, United States.
[3]. Lerner, Janet W, ―Learning disabilities: theories, diagnosis, and teaching strategies‖;
Boston: Houghton Mifflin; ISBN 0395961149.
[4]. Fletcher, Francis,Rourke, Shaywitz & Shaywitz; "Classification of learning disabilities:
an evidence based evaluation"; 1993.
[5]. Belkin, Matveeva, and Niyogi. Regularization and semi-supervised learning on large
graphs. Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann,
2004.
[6]. M. Seeger. ‘Learning with labeled and unlabeled data’; technical report, Institute for
ANC, Edinburgh, UK, 2000.
[7]. A. Blum and S. Chawla, ‘Learning from labeled and unlabeled data using graph mincuts’;
in ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning,
pages 19–26. Morgan Kaufmann, 2001.
[8]. A. Corduneanu and T. Jaakkola. Distributed information regularization on graphs. In
NIPS ’04: Advances in Neural Information Processing Systems, 2004.
[9]. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨olkopf. Learning with local
and global consistency. In NIPS ’03: Advances in Neural Information Processing Systems.
MIT Press, 2003.
[10]. D. Zhou and B. Scholkopf. Learning from labeled and unlabeled data using random walks.
In German Pattern Recognition Symposium, pages 237–244, 2004.
[11]. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using Gaussian fields
and harmonic functions. In ICML ’03: Proceedings of the Twentieth International
Conference on Machine Learning, pages 912–919, 2003.
[12]. T. Joachims. Transductive inference for text classification using support vector machines.
In ICML ’99: Proceedings of The Sixteenth International Conference on Machine Learning,
pages 200–209. Morgan Kaufmann, 1999.
[13]. K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In NIPS’98:
Advances in Neural Information Processing Systems 10, pages 368–374. MIT Press, 1998.
[14]. G. Fung and O. L. Mangasarian. Semi-supervised support vector machines for unlabeled
data classification. Optimization Methods and Software, 15:29–44, 2001.
[15]. Xiaojin Zhu; ‘Semi-Supervised Learning Literature Survey’; Computer Sciences TR 1530
University of Wisconsin – Madison, July 19, 2008.
[16]. Kaplan, Robert M.; Saccuzzo, Dennis P. (2009); ‘Psychological Testing: Principles,
Applications, and Issues (Seventh ed.)’, Belmont (CA): Wadsworth. p. 262 (citing Wechsler
(1958) The Measurement and Appraisal of Adult Intelligence), ISBN 978-0-495-09555-2.
[17]. Kavita Jain, Pooja Manghirmalani, Jyotshna Dongardive, Siby Abraham; ‘Computational
Diagnosis of Learning Disability’; International Journal of Recent Trends in Engineering,
Vol 2, No. 3, pages 64-44, ACEEE 2009.
[18]. Pooja Manghirmalani, Zenobia Panthaky, Kavita Jain; ‘Learning Disability Diagnosis and
Classification- a soft Computing Approach’; IEEE World Congress on Information and
Communication Technologies, pages 479 – 484, 2011.
[19]. Pooja Manghirmalani, Darshana More, Kavita Jain; ‘A Fuzzy Approach to Classify Learning
Disability’; International Journal of Advanced Research in Artificial Intelligence – IJARAI,
pages 1-7, 2012.
- 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
440
[20]. Kavita Jain, Pooja Manghirmalani Mishra, Sushil Kulkarni. ‘A Neuro-Fuzzy System to
Diagnose Learning Disability’; IEEE International Conference on Radar, Communication
and Computing (ICRCC 2012).
[21]. N. Cristianini and J. Shawe-Taylor; ‘An Introduction to Support Vector Machines’;
Cambridge University Press, 2000; ISBN: 0 521780195
[22]. Asa Ben-Hur, Jason Weston; ‘A User's Guide to Support Vector Machines’; O.Carugo,
F. Eisenhaber (eds.), Data Mining Techniques for the Life Sciences, Methods in Molecular
Biology 609, DOI 10.1007/978-1-60327-241-4_13, Humana Press, a part of Springer
Science + Business Media, LLC 2010.
[23]. Soman K.P., Loganathan R., Ajay V, ‘Machine Learning with SVM and other Kernel
Methods’; New Delhi, PHI Learning Pvt. Ltd, ISBN-978-81-203-3435-9, 2009
[24]. Stuart Russell, Peter Norvig; ‘Artificial Intelligence – A Modern Approach’; Pearson
Prentice Hall, 2009.
[25]. Anshu Bharadwaj; ‘Support Vector Machines’; Chapter in Indian Agriculture Statics
Research Institute, New Delhi, India.
[26]. Radhika Y, Shashi M., ‘Atmospheric Temperature Prediction using Support Vector
Machines’, International Journal of Computer Theory and Engineering, Vol. 1, No.1,
April 2009, 1793-8201 55-58.
[27]. Chapelle, O., Zien, A., & Sch¨olkopf, B. (Eds.);’Semi-supervised learning’; MIT Press,
2006.
[28]. Chawla, N. V., & Karakoulas, G; 'Learning from labeled and unlabeled data: An
empirical study across techniques and domains’; Journal of Artificial Intelligence
Research, 23, 331–366, 2005.
[29]. De Bie, T., & Cristianini, N; ‘Semi-supervised learning using semi definite
programming’; O. Chapelle, B. Scho¨elkopf and A. Zien (Eds.), Semi supervised learning.
Cambridge- Massachussets: MIT Press. 48, 2006.
[30]. Fung, G., & Mangasarian, O; ‘Semi-supervised support vector machines for unlabeled data
classification; (Technical Report 99-05). Data Mining Institute, University of Wisconsin
Madison, 1999.
[31]. S. Aruna, L.V. Nandakishore and Dr S.P. Rajagopalan, “A Novel Lns Semi Supervised
Learning Algorithm for Detecting Breast Cancer”, International Journal of Computer
Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 44 - 53, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.
[32]. Jagadanand G, Kiran Y M, Saly George and Jeevamma Jacob, “Single Chip Implementation
of Support Vector Machine Based Bi-Classifier”, International Journal of Computer
Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 74 - 84, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.