### Ai_Project_report

• 1. Indian Institute of Technology Jodhpur Computer Science of Engineering Sixth Semester (2015-2016) Machine learning(Building and comparing various machine learning models to recognize hand written digits) Team Members:Shrey Maheshwari(ug201314017) :Ravi Prakash Gupta(ug201310027) Mentor:Prof. K.R.Chowdhary 1
• 2. Contents 1 Introduction 3 2 Theory 5 3 Implementation(Data Structures And Algorithms) 7 4 Application 10 5 Result 11 6 Conclusion 12 2
• 3. 1 Introduction The data ﬁle contains grayscale images of handdrawn digits, from zero through nine. Each image is 16 pixels in height and 16 pixels in width, for a total of 256 pixels in total. Each pixel has a single pixelvalue associated with it, indicating the lightness or darkness of that pixel.Each image is 8bit depth single channel so this pixelvalue is an integer between 0 and 255, inclusive. We have modiﬁed it in the following way value=1 if pixel value >127 value =0 otherwise Previously each pixel value was taking 8 bits. But now each pixel value is taking 1 bit only. So 1 image is taking 256 bits only. The data set, (train.csv), has 266 columns. The ﬁrst 256 columns are pixel values associated and other 10 indicate the label i.e. the digit that was drawn by the user. We divided our data into 2 sets 1. Training data which comprises of 80 % of the data. 2.Test data which comprises of 20% of the data. Figure 1: Data. Figure 1 shows the data. The test data set, (test.csv), is the same as the training set, except that it does not contain the ”label” column. 3
• 4. Figure 2: Visualization of data Classiﬁcation is a process of assigning new data to a category based on training data in known categories. In this paper, we use a number of human identiﬁed digit images split into training and test set. A classiﬁer learns on training images and labels and produces output based on test images. Output is then compared to test labels to evaluate the classiﬁcation performance. A good classiﬁer should be able to learn on the training data but maintain the generalization property to be accurate when identifying the test set. 4
• 5. 2 Theory The given problem falls under the category of Supervised Learning. Su- pervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training ex- amples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the su- pervisory signal).Our problem is basically a multiclass classiﬁcation problem . To solve this problem we used Logistic Regression. logistic regression is a regression model where the dependent variable (DV) is categorical. The logistic function is deﬁned as follows: σ(t) = et 1 + e−t Figure 3: Logistic Function. Figure 3 shows the Logistic Function. 5