1. By: Sy-Quan Nguyen Minh-Hoang Nguyen Phi-Dung Tran Instructor : Prof. Quang-Thuy Ha Tuan-Quang Nguyen A Comparation study on SVM,TSVM and SVM-kNN in Text Categorization
48. Initial training set Predict labels all the remaining unlabeled data SVM1 New testing set Choose 2n Vector boudary Boundary vectors get new labels Use KNN Retrain new SVM2 Put them in training set Number training set =m times whole data set
this is the instance in many application areas of machine learning, for example,
For instance: in content-based image retrieval, a user usually poses several example images as a query and asks a system to return similar images. In this situation there are many unlabeled examples. IE: images that exist in a database, but there are only several labeled examples. Another instance is online web page recommendation. When a user is surfing the Internet, he may occasionally encounter some interesting web pages and may want the system bring him similarly interesting web pages. It will be difficult to require the user to confirm more interesting pages as training examples because the user may not know where they are. In this instance, although there are a lot of unlabeled examples and there are only a few labeled examples. In the cases, there is only one labeled training example to reply on. If the initial weakly useful predictor cannot be generated based on this single example, the above-mentioned SSL techniques cannot be applied.[4]
Because the RBF kernel nonlinearly maps examples into a higher dimensional space, unlike the linear kernel, it can handle the situation when the relation between class labels and attributes is nonlinear.
This method is very simple: for example, if example x 1 has k nearest examples in the feature space and majority of them have the same label y 1 , then example x 1 belongs to y 1. Because: Although KNN method depends on outmost theorem in the theory, during the decision course it is only related to small number of nearest neighbors, so adopting this method can avoid the problem of examples imbalanced, otherwise, KNN mainly depends on limited number of nearest neighbor around not a decision boundary.
Because the examples located around the boundary are easy to be misclassified, but they are likely to the support vectors, we call them boundary vectors, so picking out these boundary vectors whose labels are fuzzy labeled by weaker classifier SVM.