SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Working With Python
Algorithm Implementations In Python
The algorithms involved in machine learning and data science has two vital types of
implementation:
• Classification
• Regression
We will study and analyze some algorithms from both these types and understand how they
accelerate the process of nurturing the data and bring important insights from them.
Linear Regression
Linear regression comes under predictive analysis and is used to find the relationship between
two variables. These two variables are the target variable and the predictor variable. The
dependent variable is the target variable and the independent variable is the predictor variable.
Both of these variables are features that already exist in a dataset.
The overall concept of regression is to check two things- does the given group of predictor
variables do a satisfactory job in predicting the dependent variable? And which variables, in
particular, are the real predictors of the dependent variable, and what is the impact the outcome
variable?
Linear regression is represented by a simple equation-
Y = b*x+c
Where Y equals to a dependent variable, b is the regression coefficient, x is the slope and c is
the constant.
The Line of Best Fit
The line of best fit is a line which demonstrates the correlation between the observed or actual
values against the predicted ones. After applying the linear regression algorithm to our data, we
use this line to check how close the predicted values are to the actual ones. It helps in reducing
the distance between both those values also pronounced as the error values. They are also
referred to as residuals. These residuals are symbolized by the vertical lines showing the
comparison between the predicted and actual values.
For example, we can see that the weight of a person increases with an increase in their age.
Therefore, the blue line represents our line of best fit which is also known as the regression line.
For calculating the distance between the line and the points, we need the following formula
SS(residual)= ∑[h(x)-y]^2
where h(x) is the predicted value and y is the actual value
The Cost Function
Let us consider an example to understand this case. A sales department of a company is
planning to invest some capital to increase its sales in the next 6 months. But, they couldn't hit
their targets and had to incur some loss. Hence, to minimize that loss, we use the cost function.
This cost function is applied to represent and calculate the error of the model.
Therefore, cost function, J(Θ0, Θ1) = 1/2m∑[h(x)-y]^2, where and x is the number of rows in the
training set.
Gradient Descent
Gradient Descent is yet another important term which is used to find the minimalistic cost of a
function or an equation. It is by far the best optimization algorithm incorporated in machine
learning and deep learning. Based on a convex function, this descent makes some small tweaks
and changes to its parameters iteratively in order to minimize a given function to a local
minimum if possible.
Gradient Descent can be imagined as climbing down to the bottom of a mountain, instead of
climbing up. This is because it is a minimization technique used to minimize a given local
function.
Code in python
# Importing the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Retrieving the dataset
dataset = pd.read_csv('Salary_Data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3, random_state = 0)
# Performing feature scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)
# Fitting the Simple Linear Regression model to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train, y_train)
# Test set results prediction
y_pred = regressor.predict(x_test)
Logistic Regression
Logistic regression is a field of statistics that come under classification rather than regression.
Like all regression techniques, the logistic regression comes under predictive analysis theory of
implementation. Logistic regression is used to describe the structure of data and explain the
correlation between a dependent binary variable and one or more nominal independent
variables.
It is favorable for predicting binary outcomes as 1/0 or yes/no or true/false considering the kind
of dataset given and the output required. Logistic regression can also be considered as a
special case of linear regression when the outcome variable is categorical, where we are using
log of odds as the dependent variable. In simple words, it predicts the probability of occurrence
of an event by fitting data to a logit function.
This type of regression can be characterized by probabilities of following events-
Odds = p/(1-p) = probability of event occurring/probability of event not occurring
Ln (odds) = ln (p/(1-p))
Logit (p) = ln (p/(1-p))
In this, (p/1-p) is the odds ratio. If the log of the odd-oriented ratio is positive, the probability of
success rate will always be higher than 50%. A typical logistic model plot is shown below. It is
observed that the probability never goes below 0 and above 1.
We can check the performance of this regression by testing it through the following parameters.
Akaike Information Criteria- AIC is the measure of fitness which can penalize a model for the
frequency of its model coefficients. Therefore, we always prefer the model with minimum
Alkaline Information criteria value for better results.
Null Deviance- Null Deviance represents the outcome predicted by a model with the help of the
intercept. It all depends, if the null deviance is less, then the model will be better.
Residual Deviance- Residual deviance describes the response predicted by a model on the
addition of independent variables. Same goes for residual deviance, lower the value, better the
results.
Confusion Matrix- Confusion matrix is the tabular representation of actual vs predicted values.
It helps in finding the performance of a machine learning model, either classification or
regression and avoids overfitting.
Predicted Values
Actual Values
True Positive False Positive
False Positive True Negative
The accuracy of a model can be calculated by
True Positive(TP) + True Negative(TN)
True Positive(TP) + True Negative(TN) + False Positive(FP) + False Negative(FN)
ROC curve
Receiver operating characteristic curve or ROC curve signifies how well the model can
distinguish between two things by plotting the true positive rate with the false positive rate. Good
models will be able to accurately distinguish between the two. Whereas, a poor model will have
difficulties in differentiating between the two.
Code in python
# Importing the necessary libraries
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import scipy
from scipy import stats
from scipy.stats import spearmanr
# Retrieving the dataset
t1= 'C:/Users/ml/datasets/train.csv'
train=pd.read_csv(t1)
t2= 'C:/Users/ml/datasets/test.csv'
test=pd.read_csv(t2)
x= train.iloc[:, [2,4,5,6,7,9]].values
y= train.iloc[:, 1].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
x_train, X_test, y_train, y_test= train_test_split(x,y,test_size= 0.25, random_state=0)
# Performing feature scaling
from sklearn.preprocessing import StandardScaler
sc_x=StandardScaler()
x_train=sc_x.fit_transform(x_train)
x_test=sc_x.transform(x_test)
# Fitting the Simple Linear Regression model to the training set
from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state = 0)
classifier.fit(x_train,y_train)
# Test set results prediction
y_pred=classifier.predict(x_test)
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test, y_pred)
Support Vector Machines
Support Vector Machines(SVMs) are used to find the best hyperplane in an array of data points
that will best suit the results in a supervised learning environment. Suppose we have got two
columns x and y and they consist of some random data-points. These points are plotted in a
two-dimensional plane. Our motive is to derive a line that is going to separate these points.
The line that separates these points horizontally, vertically or diagonally is known as a
hyperplane. This hyperplane calculates the distance between the data points and itself to
determine the appropriate hyperplane which will enable in classifying these points. This distance
is known as margin.
SVM supports both regression and classification tasks and can tackle multiple continuous and
categorical variables. For categorical variables, a dummy variable is created with case values
as either 0 or 1. Thus, a categorical dependent variable consisting of three levels, say A, B, C, is
represented by a set of three dummy variables
A: {1 0 0}
B: {0 1 0}
C: {0 0 1}
As we all know how to identify a hyperplane, the question is how to identify the right one?
We can reach a conclusion by considering the following cases.
CASE 1
There are three hyper-planes in our n-dimensional space which are x1, x2, x3. We need to
identify the right hyperplane between the three. X1 and x3 are traversing between the points
while x2 is separating these points in a perfect fashion. Hence, x3 is our ideal hyper-plane.
CASE 2
The three hyperplanes x1, x2 and x3 are segregating the points quite well as they are all vertical
and parallel to each other. So, how can we identify the right hyperplane in this situation? x1 and
x3 are planes which are nearer to the points that mean their margins are quite small compared
to x3. Hence, x3 is having more margin and hence it is the ideal hyperplane.
CASE 3
In the third case, all the points are residing very close to each other in the center of the plane
with little or no room for the hyperplane to pass between them. What can we do in such a case?
This problem can be dealt with by adding a third axis, the Z-axis! As z is x^2 + y^2, all the
values for z will be positive as z is the squared sum of both x and y. Sometimes, this trick won't
be applicable to this type of scenario. Hence, kernel trick comes into play for such scarcity. It
converts the not so separable problem(the scenario discussed above) to a separable problem.
These functions are called kernels. They are useful in non-linear separation problem. Simply
put, it does some extremely complex data transformations, then finds out the process to
separate the data based on the labels or outputs that have been defined.
Code in python
# Importing the important libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Retrieving the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
# Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# Fitting the SVM model to the training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(x_train, y_train)
# Test set results prediction
y_pred = classifier.predict(x_test)
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Decision Trees
Decision trees are the most preferred and favored machine learning classification technique in
machine learning. It not only helps us with the prediction analysis but also is a very efficient
algorithm to understand the characteristics of various variables. They come under the
supervised learning algorithm consisting of a predefined target variable which is to be
determined. This is suited for both categorical as well as continuous variables in the output.
The basic functioning of decision trees goes this way- there are a set of points that are plotted
on a plane. These points can’t be separated easily by a line due to their heterogeneous
properties. Hence, decision trees divide these points into different clusters or leaves based on
some predefined criteria and take care of them individually.
There are two different types of decision trees which are classified based on the type of target
variable we have taken.
Binary Variable Decision Tree- The decision tree which has a binary target variable is known
as Binary Variable Decision Tree. In this case, the output will be either “yes” or “no”.
Continuous Variable Decision Tree- The decision tree which has a continuous target variable
is known as Continuous Variable Decision Tree. In this case, the output will be any recurring
value such as the salary of a person.
Let us go through some of the key terms commonly used in decision trees.
Root Node- It represents the entire population or the given sample and further gets divided into
two or more homogeneous sets.
Splitting- It enables the division of a node into two or more sub-nodes.
Decision Node- This is like sub-node splitting into further sub-nodes.
Leaf/Terminal Node- These are nodes with zero sub-nodes, that is, these nodes can’t be split
further.
Pruning- When the size of the decision trees is reduced by removing nodes, the process is
called pruning.
Branch/Subtree- A subsection of a decision tree is called as a branch or a sub-tree.
Parent and Child Node- A node which is divided further into small sub-nodes is called a parent
node of whereas sub-nodes are the children of this parent node.
There are some important terms that we first need to understand before we can implement
decision trees in python.
Impurity
Impurity is the measure of unknown or redundant data which is evident when there are traces of
one class into another. There are reasons for its existence. The decision tree can run out of
classes to divide the class any further. We have assumed that we can allow some percentage of
impurity in our data for better performance which will introduce the impurity into our humble
model!
Entropy
Entropy is the degree of redundancy of elements or in other terms, it is a measure of impurity.
Mathematically, it can be calculated with the help of probability of the items as:
H= -Σp(x)*log[p(x)]
It is the negative summation of probability times the log of the probability of item x.
Information Gain
Information gain is the main ingredient that is instrumental in the construction and setting up of a
decision tree. Constructing a decision tree from scratch is all about finding the attribute that will
return the highest information gain in order to produce maximum accuracy in the decision trees.
Therefore, IG is equal to entropy(parent) - (average weights) * entropy(children)
Code in python
# Importing the important libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Retrieving the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Fitting the Decision Tree Classification model to the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
# Test set results prediction
y_pred = classifier.predict(X_test)
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Random Forest
Random Forest algorithm is another approach to supervised classification algorithm after
decision trees. It is like the proper extension to decision trees algorithm. There is a correlation
between the number of trees in the forest and the results it calculates, hence. higher the
frequency of trees, the better and accurate will be the result.
Random forests are equivalent to ensemble learning technique for classification and regression
techniques. Random forest avoids the problem of overfitting by taking care of the fact that there
are enough trees in the model. Another advantage is that the classifier of random forests can
easily manage missing values. It can also be modeled for categorical values.
Working
Working of the random forest depends on 2 stages- one is creating a random forest and the
other is making predictions and extracting useful observations from the random forest classifier
created in the first stage.
These are some of the steps used in the creation of random forests.
• We need to select some random “k” features out of the total “m” features where k is less
than m.
• Among the selected “k” features, we need to calculate a node “d” applying the best split
point.
• We need to split the node into further nodes using the derived best split.
• Steps 1, 2 and 3 must be repeated until some “l” number of nodes has been achieved.
• Construct the forest by re-applying steps 1 to 4 for “n” number of times to create “n”
number of trees.
Applications
Stock market- A random forest can be used to identify the right stock which can attract profits
for the user at most times.
E-commerce- It can be effective in this field by predicting the products which the customer can
buy in future, based on their past choices.
Banking- It can recognize the defaulters and the non-defaulters by analyzing the behavior of
the customer through their past records.
Code in python
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(X_train)
x_test = sc.transform(X_test)
# Fitting the Random Forest Classification model to the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 7, criterion = 'impurity', random_state = 0)
classifier.fit(x_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
K-means clustering
Clustering is the process of classifying the given data points into a number of groups or classes
such that the data points in the same groups are compatible with each other in terms of features
and characteristics. In simple words, k-means has the modus operandi of segregating points
into groups with similar properties and assign them into clusters.
Working
It starts with specifying the desired number of clusters ‘k’ required, let’s consider k as 2 for the
five random data points in 2-D space.
Then, we need to randomly assign each data point to a cluster. We will assign three points in
cluster 1 as shown in red color and two points in cluster 2 as shown in grey color.
Next, we need to compute centroids for these clusters, the centroid of data points in the red
cluster is signified by a red cross while for the grey cluster, it is shown using a grey cross.
Then comes the step of re-assigning each individual data point to the closest cluster centroid.
The data point which is at the bottom is assigned to the red cluster even though it is closer to
the centroid of the grey cluster. Hence, we assign that data point into the grey cluster.
In the end, we need to recompute cluster centroids- We have to recompute the centroids for
both the clusters.
Feature engineering is the process of using the domain knowledge and expertise to choose
which data variables to input as features before building a machine learning model. Feature
engineering plays a key role in k-means clustering; using meaningful features that capture the
variability and essence of data is essential before imputing the selected features for applying k-
means.
Feature transformations are conducted, particularly to represent rates rather than
measurements, which help in normalizing the data. At times, it is observed that this engineering
might help get rid of 80% of the error in a dataset. It proves to be effective in maintaining the
accuracy of machine learning model that is implemented to have great insights from the data.
Code in python
# Importing the required libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Retrieving the dataset
dataset = pd.read_csv('customers.csv')
x = dataset.iloc[:, [3, 4]].values
y = dataset.iloc[:, 3].values
# Splitting the dataset into the training and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
# Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)
# Finding the optimal number of clusters
from sklearn.cluster import KMeans
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
# Visualising the results using plots
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
# Fitting the K-Means algorithm to our dataset
kmeans = KMeans(n_clusters = 10, init = 'k-means++', random_state = 55)
y_kmeans = kmeans.fit_predict(x)
K-nearest Neighbor(K-NN)
K-nearest neighbor can be considered for both classification and regression problems. A KNN
model is taken into consideration when n number of points need to be classified into groups that
contain data-points or in this case, features of a dataset, in a homogeneous way. These data-
points are all similar to each other and are together. When a new point is introduced in the
plane, it is classified based on its characteristic which matches any homogenous group or class.
It is a non-parametric approach meaning it doesn't depend on data to establish a normal
distribution It is also referred to as lazy classification model which predicts classes based on the
features of observations that are matching.
Selecting the number of nearest neighbors, that is, selecting the value of k, plays a significant
role in calculating the capacity of our model. Selection of k will determine how well the data can
be used to characterize the results of the kNN algorithm. A large k-value will generally tend to
reduce the variance in data due to the noisy data; which will develop a bias. This might lead to
smaller patterns in data which can be fruitful.
There are many data points in the plane whose distance can be calculated by the following
techniques.
Euclidean Distance: Euclidean distance is calculated to be the square root of the sum of the
squared differences between a new point (x) and an existing point (y).
ED= √Σ(x^2-y^2)
Manhattan Distance: Manhattan distance is the distance between vectors using the sum of
their absolute difference.
MD= Σ|x-y|
Hamming Distance: It is in favor of categorical variables. If the value (x) and the value (y) are
same, the distance D will be equivalent to zero.
HD= Σ|x-y|
Where x=y when D=0 and x≠y when D=1
KNN is mostly used for searching purposes. It enables the search by finding the nearest item to
the customers' interests. It can also be implemented for building Recommender Systems. It will
find similar items based on the users personal taste or preference. Normally, the KNN algorithm
is not preferred much when compared to SVM or neural networks as it runs slower compared to
other algorithms.
Code in python
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the data into the training and test set
from sklearn.cross_validation import train_test_split
x_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(X_train)
x_test = sc.transform(X_test)
# Fitting our K-nearest neighbor model to the Training data
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'hamming', p = 2)
classifier.fit(x_train, y_train)
# Test data result prediction
y_pred = classifier.predict(x_test)
# Creating the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Naive Bayes
Naive Bayes is a basic technique for building classifiers. These models assign class labels to
problem instances, represented as vectors of features. It is a part of classification techniques
based on Bayes’ theorem with the assumption that there exists independence between
predictor variables.
In plain terms, a naive Bayes classifier calculates the probability of the outcome assuming that
the presence of a defining feature in a class is not at all related to the presence of any other
feature in another class. For instance, a knife may be considered to have features like
sharpness, being made of stainless steel and a size of 20 inches. These features do not depend
on each other for their existence. Similarly, a naive Bayes approach would take into account all
of the properties of each variable to independently contribute to their probability.
Naive Bayes classifiers need to be trained effectively in a supervised learning setting for
different sorts of probability models. In many practical applications, parameter estimation for
naive bayes models depends on the execution maximum likelihood, which mean that one can
work with the naive bayes model without calculating the bayesian probability or using any
appropriate Bayesian methods.
P(c/x)=
P(x/c)*P(x)
P(x)
where, P(c|x) is called the posterior probability of target given predictor which is x(features), P(c)
is known prior probability of class, P(x|c) is the likelihood, which is the probability of predictor
given class and P(x) is the prior probability of predictor.
Code in python
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('SN_Ads.csv')
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the data into the training and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# Fitting the Naive Bayes model to the Training data
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

Contenu connexe

Tendances

Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)Abhimanyu Dwivedi
 
Unit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cUnit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cSowmya Jyothi
 
Numerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioningNumerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioningScilab
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Project in TLE
Project in TLEProject in TLE
Project in TLEPGT_13
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionKuppusamy P
 
[ITP - Lecture 09] Conditional Operator in C/C++
[ITP - Lecture 09] Conditional Operator in C/C++[ITP - Lecture 09] Conditional Operator in C/C++
[ITP - Lecture 09] Conditional Operator in C/C++Muhammad Hammad Waseem
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Report Group 4 Constants and Variables
Report Group 4 Constants and VariablesReport Group 4 Constants and Variables
Report Group 4 Constants and VariablesGenard Briane Ancero
 
Numerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsNumerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsScilab
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-maankit_ppt
 
Operators in c programming
Operators in c programmingOperators in c programming
Operators in c programmingsavitamhaske
 

Tendances (20)

Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
ML Workshop at SACON 2018
ML Workshop at SACON 2018ML Workshop at SACON 2018
ML Workshop at SACON 2018
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Time series project
Time series projectTime series project
Time series project
 
Implimenting_HJM
Implimenting_HJMImplimenting_HJM
Implimenting_HJM
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
 
Unit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in cUnit ii chapter 1 operator and expressions in c
Unit ii chapter 1 operator and expressions in c
 
Numerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioningNumerical analysis using Scilab: Numerical stability and conditioning
Numerical analysis using Scilab: Numerical stability and conditioning
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
 
Project in TLE
Project in TLEProject in TLE
Project in TLE
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
[ITP - Lecture 09] Conditional Operator in C/C++
[ITP - Lecture 09] Conditional Operator in C/C++[ITP - Lecture 09] Conditional Operator in C/C++
[ITP - Lecture 09] Conditional Operator in C/C++
 
Control structures(class 02)
Control structures(class 02)Control structures(class 02)
Control structures(class 02)
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Report Group 4 Constants and Variables
Report Group 4 Constants and VariablesReport Group 4 Constants and Variables
Report Group 4 Constants and Variables
 
Numerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsNumerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equations
 
Lesson 4 ar-ma
Lesson 4 ar-maLesson 4 ar-ma
Lesson 4 ar-ma
 
Operators in c programming
Operators in c programmingOperators in c programming
Operators in c programming
 

Similaire à working with python

Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslykhaled125087
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisMonica Franklin
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIIMax Kleiner
 
Correation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R softwareCorreation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R softwareshrikrishna kesharwani
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 

Similaire à working with python (20)

Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Bootcamp of new world to taken seriously
Bootcamp of new world to taken seriouslyBootcamp of new world to taken seriously
Bootcamp of new world to taken seriously
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
Correation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R softwareCorreation, Linear Regression and Multilinear Regression using R software
Correation, Linear Regression and Multilinear Regression using R software
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
Regression
RegressionRegression
Regression
 
Correlation and regression in r
Correlation and regression in rCorrelation and regression in r
Correlation and regression in r
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
R nonlinear least square
R   nonlinear least squareR   nonlinear least square
R nonlinear least square
 

Plus de bhavesh lande

The Annual G20 Scorecard – Research Performance 2019
The Annual G20 Scorecard – Research Performance 2019 The Annual G20 Scorecard – Research Performance 2019
The Annual G20 Scorecard – Research Performance 2019 bhavesh lande
 
information control and Security system
information control and Security systeminformation control and Security system
information control and Security systembhavesh lande
 
information technology and infrastructures choices
information technology and  infrastructures choicesinformation technology and  infrastructures choices
information technology and infrastructures choicesbhavesh lande
 
ethical issues,social issues
 ethical issues,social issues ethical issues,social issues
ethical issues,social issuesbhavesh lande
 
managing inforamation system
managing inforamation systemmanaging inforamation system
managing inforamation systembhavesh lande
 
• E-commerce, e-business ,e-governance
• E-commerce, e-business ,e-governance• E-commerce, e-business ,e-governance
• E-commerce, e-business ,e-governancebhavesh lande
 
organisations and information systems
organisations and  information systemsorganisations and  information systems
organisations and information systemsbhavesh lande
 
IT stratergy and digital goods
IT stratergy and digital goodsIT stratergy and digital goods
IT stratergy and digital goodsbhavesh lande
 
Implement Mapreduce with suitable example using MongoDB.
 Implement Mapreduce with suitable example using MongoDB. Implement Mapreduce with suitable example using MongoDB.
Implement Mapreduce with suitable example using MongoDB.bhavesh lande
 
aggregation and indexing with suitable example using MongoDB.
aggregation and indexing with suitable example using MongoDB.aggregation and indexing with suitable example using MongoDB.
aggregation and indexing with suitable example using MongoDB.bhavesh lande
 
Unnamed PL/SQL code block: Use of Control structure and Exception handling i...
 Unnamed PL/SQL code block: Use of Control structure and Exception handling i... Unnamed PL/SQL code block: Use of Control structure and Exception handling i...
Unnamed PL/SQL code block: Use of Control structure and Exception handling i...bhavesh lande
 
database application using SQL DML statements: all types of Join, Sub-Query ...
 database application using SQL DML statements: all types of Join, Sub-Query ... database application using SQL DML statements: all types of Join, Sub-Query ...
database application using SQL DML statements: all types of Join, Sub-Query ...bhavesh lande
 
database application using SQL DML statements: Insert, Select, Update, Delet...
 database application using SQL DML statements: Insert, Select, Update, Delet... database application using SQL DML statements: Insert, Select, Update, Delet...
database application using SQL DML statements: Insert, Select, Update, Delet...bhavesh lande
 
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...bhavesh lande
 
applications and advantages of python
applications and advantages of pythonapplications and advantages of python
applications and advantages of pythonbhavesh lande
 
introduction of python in data science
introduction of python in data scienceintroduction of python in data science
introduction of python in data sciencebhavesh lande
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their rolebhavesh lande
 

Plus de bhavesh lande (20)

The Annual G20 Scorecard – Research Performance 2019
The Annual G20 Scorecard – Research Performance 2019 The Annual G20 Scorecard – Research Performance 2019
The Annual G20 Scorecard – Research Performance 2019
 
information control and Security system
information control and Security systeminformation control and Security system
information control and Security system
 
information technology and infrastructures choices
information technology and  infrastructures choicesinformation technology and  infrastructures choices
information technology and infrastructures choices
 
ethical issues,social issues
 ethical issues,social issues ethical issues,social issues
ethical issues,social issues
 
managing inforamation system
managing inforamation systemmanaging inforamation system
managing inforamation system
 
• E-commerce, e-business ,e-governance
• E-commerce, e-business ,e-governance• E-commerce, e-business ,e-governance
• E-commerce, e-business ,e-governance
 
IT and innovations
 IT and  innovations  IT and  innovations
IT and innovations
 
organisations and information systems
organisations and  information systemsorganisations and  information systems
organisations and information systems
 
IT stratergy and digital goods
IT stratergy and digital goodsIT stratergy and digital goods
IT stratergy and digital goods
 
Implement Mapreduce with suitable example using MongoDB.
 Implement Mapreduce with suitable example using MongoDB. Implement Mapreduce with suitable example using MongoDB.
Implement Mapreduce with suitable example using MongoDB.
 
aggregation and indexing with suitable example using MongoDB.
aggregation and indexing with suitable example using MongoDB.aggregation and indexing with suitable example using MongoDB.
aggregation and indexing with suitable example using MongoDB.
 
Unnamed PL/SQL code block: Use of Control structure and Exception handling i...
 Unnamed PL/SQL code block: Use of Control structure and Exception handling i... Unnamed PL/SQL code block: Use of Control structure and Exception handling i...
Unnamed PL/SQL code block: Use of Control structure and Exception handling i...
 
database application using SQL DML statements: all types of Join, Sub-Query ...
 database application using SQL DML statements: all types of Join, Sub-Query ... database application using SQL DML statements: all types of Join, Sub-Query ...
database application using SQL DML statements: all types of Join, Sub-Query ...
 
database application using SQL DML statements: Insert, Select, Update, Delet...
 database application using SQL DML statements: Insert, Select, Update, Delet... database application using SQL DML statements: Insert, Select, Update, Delet...
database application using SQL DML statements: Insert, Select, Update, Delet...
 
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 Design and Develop SQL DDL statements which demonstrate the use of SQL objec... Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
 
applications and advantages of python
applications and advantages of pythonapplications and advantages of python
applications and advantages of python
 
introduction of python in data science
introduction of python in data scienceintroduction of python in data science
introduction of python in data science
 
tools
toolstools
tools
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their role
 
applications
applicationsapplications
applications
 

Dernier

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Dernier (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

working with python

  • 1. Working With Python Algorithm Implementations In Python The algorithms involved in machine learning and data science has two vital types of implementation: • Classification • Regression We will study and analyze some algorithms from both these types and understand how they accelerate the process of nurturing the data and bring important insights from them. Linear Regression Linear regression comes under predictive analysis and is used to find the relationship between two variables. These two variables are the target variable and the predictor variable. The dependent variable is the target variable and the independent variable is the predictor variable. Both of these variables are features that already exist in a dataset. The overall concept of regression is to check two things- does the given group of predictor variables do a satisfactory job in predicting the dependent variable? And which variables, in particular, are the real predictors of the dependent variable, and what is the impact the outcome variable? Linear regression is represented by a simple equation- Y = b*x+c Where Y equals to a dependent variable, b is the regression coefficient, x is the slope and c is the constant. The Line of Best Fit The line of best fit is a line which demonstrates the correlation between the observed or actual values against the predicted ones. After applying the linear regression algorithm to our data, we use this line to check how close the predicted values are to the actual ones. It helps in reducing the distance between both those values also pronounced as the error values. They are also referred to as residuals. These residuals are symbolized by the vertical lines showing the comparison between the predicted and actual values.
  • 2. For example, we can see that the weight of a person increases with an increase in their age. Therefore, the blue line represents our line of best fit which is also known as the regression line. For calculating the distance between the line and the points, we need the following formula SS(residual)= ∑[h(x)-y]^2 where h(x) is the predicted value and y is the actual value The Cost Function Let us consider an example to understand this case. A sales department of a company is planning to invest some capital to increase its sales in the next 6 months. But, they couldn't hit their targets and had to incur some loss. Hence, to minimize that loss, we use the cost function. This cost function is applied to represent and calculate the error of the model. Therefore, cost function, J(Θ0, Θ1) = 1/2m∑[h(x)-y]^2, where and x is the number of rows in the training set. Gradient Descent Gradient Descent is yet another important term which is used to find the minimalistic cost of a function or an equation. It is by far the best optimization algorithm incorporated in machine learning and deep learning. Based on a convex function, this descent makes some small tweaks and changes to its parameters iteratively in order to minimize a given function to a local minimum if possible. Gradient Descent can be imagined as climbing down to the bottom of a mountain, instead of climbing up. This is because it is a minimization technique used to minimize a given local function. Code in python
  • 3. # Importing the necessary libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Retrieving the dataset dataset = pd.read_csv('Salary_Data.csv') x = dataset.iloc[:, :-1].values y = dataset.iloc[:, 1].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3, random_state = 0) # Performing feature scaling from sklearn.preprocessing import StandardScaler sc_x = StandardScaler() x_train = sc_x.fit_transform(x_train) x_test = sc_x.transform(x_test) sc_y = StandardScaler() y_train = sc_y.fit_transform(y_train) # Fitting the Simple Linear Regression model to the Training set from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(x_train, y_train) # Test set results prediction y_pred = regressor.predict(x_test) Logistic Regression Logistic regression is a field of statistics that come under classification rather than regression. Like all regression techniques, the logistic regression comes under predictive analysis theory of implementation. Logistic regression is used to describe the structure of data and explain the correlation between a dependent binary variable and one or more nominal independent variables. It is favorable for predicting binary outcomes as 1/0 or yes/no or true/false considering the kind of dataset given and the output required. Logistic regression can also be considered as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as the dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. This type of regression can be characterized by probabilities of following events- Odds = p/(1-p) = probability of event occurring/probability of event not occurring Ln (odds) = ln (p/(1-p)) Logit (p) = ln (p/(1-p))
  • 4. In this, (p/1-p) is the odds ratio. If the log of the odd-oriented ratio is positive, the probability of success rate will always be higher than 50%. A typical logistic model plot is shown below. It is observed that the probability never goes below 0 and above 1. We can check the performance of this regression by testing it through the following parameters. Akaike Information Criteria- AIC is the measure of fitness which can penalize a model for the frequency of its model coefficients. Therefore, we always prefer the model with minimum Alkaline Information criteria value for better results. Null Deviance- Null Deviance represents the outcome predicted by a model with the help of the intercept. It all depends, if the null deviance is less, then the model will be better. Residual Deviance- Residual deviance describes the response predicted by a model on the addition of independent variables. Same goes for residual deviance, lower the value, better the results. Confusion Matrix- Confusion matrix is the tabular representation of actual vs predicted values. It helps in finding the performance of a machine learning model, either classification or regression and avoids overfitting. Predicted Values Actual Values True Positive False Positive False Positive True Negative The accuracy of a model can be calculated by True Positive(TP) + True Negative(TN) True Positive(TP) + True Negative(TN) + False Positive(FP) + False Negative(FN)
  • 5. ROC curve Receiver operating characteristic curve or ROC curve signifies how well the model can distinguish between two things by plotting the true positive rate with the false positive rate. Good models will be able to accurately distinguish between the two. Whereas, a poor model will have difficulties in differentiating between the two. Code in python # Importing the necessary libraries import numpy as np import pandas as pd from pandas import Series, DataFrame import scipy from scipy import stats from scipy.stats import spearmanr # Retrieving the dataset t1= 'C:/Users/ml/datasets/train.csv' train=pd.read_csv(t1) t2= 'C:/Users/ml/datasets/test.csv' test=pd.read_csv(t2) x= train.iloc[:, [2,4,5,6,7,9]].values y= train.iloc[:, 1].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split x_train, X_test, y_train, y_test= train_test_split(x,y,test_size= 0.25, random_state=0) # Performing feature scaling from sklearn.preprocessing import StandardScaler sc_x=StandardScaler()
  • 6. x_train=sc_x.fit_transform(x_train) x_test=sc_x.transform(x_test) # Fitting the Simple Linear Regression model to the training set from sklearn.linear_model import LogisticRegression classifier= LogisticRegression(random_state = 0) classifier.fit(x_train,y_train) # Test set results prediction y_pred=classifier.predict(x_test) # Creating the Confusion Matrix from sklearn.metrics import confusion_matrix cm=confusion_matrix(y_test, y_pred) Support Vector Machines Support Vector Machines(SVMs) are used to find the best hyperplane in an array of data points that will best suit the results in a supervised learning environment. Suppose we have got two columns x and y and they consist of some random data-points. These points are plotted in a two-dimensional plane. Our motive is to derive a line that is going to separate these points. The line that separates these points horizontally, vertically or diagonally is known as a hyperplane. This hyperplane calculates the distance between the data points and itself to determine the appropriate hyperplane which will enable in classifying these points. This distance is known as margin. SVM supports both regression and classification tasks and can tackle multiple continuous and categorical variables. For categorical variables, a dummy variable is created with case values as either 0 or 1. Thus, a categorical dependent variable consisting of three levels, say A, B, C, is represented by a set of three dummy variables
  • 7. A: {1 0 0} B: {0 1 0} C: {0 0 1} As we all know how to identify a hyperplane, the question is how to identify the right one? We can reach a conclusion by considering the following cases. CASE 1 There are three hyper-planes in our n-dimensional space which are x1, x2, x3. We need to identify the right hyperplane between the three. X1 and x3 are traversing between the points while x2 is separating these points in a perfect fashion. Hence, x3 is our ideal hyper-plane. CASE 2 The three hyperplanes x1, x2 and x3 are segregating the points quite well as they are all vertical and parallel to each other. So, how can we identify the right hyperplane in this situation? x1 and x3 are planes which are nearer to the points that mean their margins are quite small compared to x3. Hence, x3 is having more margin and hence it is the ideal hyperplane. CASE 3 In the third case, all the points are residing very close to each other in the center of the plane with little or no room for the hyperplane to pass between them. What can we do in such a case? This problem can be dealt with by adding a third axis, the Z-axis! As z is x^2 + y^2, all the values for z will be positive as z is the squared sum of both x and y. Sometimes, this trick won't be applicable to this type of scenario. Hence, kernel trick comes into play for such scarcity. It converts the not so separable problem(the scenario discussed above) to a separable problem. These functions are called kernels. They are useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then finds out the process to separate the data based on the labels or outputs that have been defined. Code in python # Importing the important libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Retrieving the dataset dataset = pd.read_csv('Social_Network_Ads.csv') x = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0) # Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() x_train = sc.fit_transform(x_train) x_test = sc.transform(x_test)
  • 8. # Fitting the SVM model to the training set from sklearn.svm import SVC classifier = SVC(kernel = 'linear', random_state = 0) classifier.fit(x_train, y_train) # Test set results prediction y_pred = classifier.predict(x_test) # Creating the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) Decision Trees Decision trees are the most preferred and favored machine learning classification technique in machine learning. It not only helps us with the prediction analysis but also is a very efficient algorithm to understand the characteristics of various variables. They come under the supervised learning algorithm consisting of a predefined target variable which is to be determined. This is suited for both categorical as well as continuous variables in the output. The basic functioning of decision trees goes this way- there are a set of points that are plotted on a plane. These points can’t be separated easily by a line due to their heterogeneous properties. Hence, decision trees divide these points into different clusters or leaves based on some predefined criteria and take care of them individually. There are two different types of decision trees which are classified based on the type of target variable we have taken. Binary Variable Decision Tree- The decision tree which has a binary target variable is known as Binary Variable Decision Tree. In this case, the output will be either “yes” or “no”. Continuous Variable Decision Tree- The decision tree which has a continuous target variable is known as Continuous Variable Decision Tree. In this case, the output will be any recurring value such as the salary of a person. Let us go through some of the key terms commonly used in decision trees. Root Node- It represents the entire population or the given sample and further gets divided into two or more homogeneous sets. Splitting- It enables the division of a node into two or more sub-nodes. Decision Node- This is like sub-node splitting into further sub-nodes. Leaf/Terminal Node- These are nodes with zero sub-nodes, that is, these nodes can’t be split further. Pruning- When the size of the decision trees is reduced by removing nodes, the process is called pruning.
  • 9. Branch/Subtree- A subsection of a decision tree is called as a branch or a sub-tree. Parent and Child Node- A node which is divided further into small sub-nodes is called a parent node of whereas sub-nodes are the children of this parent node. There are some important terms that we first need to understand before we can implement decision trees in python. Impurity Impurity is the measure of unknown or redundant data which is evident when there are traces of one class into another. There are reasons for its existence. The decision tree can run out of classes to divide the class any further. We have assumed that we can allow some percentage of impurity in our data for better performance which will introduce the impurity into our humble model! Entropy Entropy is the degree of redundancy of elements or in other terms, it is a measure of impurity. Mathematically, it can be calculated with the help of probability of the items as: H= -Σp(x)*log[p(x)] It is the negative summation of probability times the log of the probability of item x. Information Gain Information gain is the main ingredient that is instrumental in the construction and setting up of a decision tree. Constructing a decision tree from scratch is all about finding the attribute that will return the highest information gain in order to produce maximum accuracy in the decision trees. Therefore, IG is equal to entropy(parent) - (average weights) * entropy(children) Code in python # Importing the important libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Retrieving the dataset dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
  • 10. # Fitting the Decision Tree Classification model to the Training set from sklearn.tree import DecisionTreeClassifier classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0) classifier.fit(X_train, y_train) # Test set results prediction y_pred = classifier.predict(X_test) # Creating the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) Random Forest Random Forest algorithm is another approach to supervised classification algorithm after decision trees. It is like the proper extension to decision trees algorithm. There is a correlation between the number of trees in the forest and the results it calculates, hence. higher the frequency of trees, the better and accurate will be the result. Random forests are equivalent to ensemble learning technique for classification and regression techniques. Random forest avoids the problem of overfitting by taking care of the fact that there are enough trees in the model. Another advantage is that the classifier of random forests can easily manage missing values. It can also be modeled for categorical values. Working Working of the random forest depends on 2 stages- one is creating a random forest and the other is making predictions and extracting useful observations from the random forest classifier created in the first stage. These are some of the steps used in the creation of random forests. • We need to select some random “k” features out of the total “m” features where k is less than m. • Among the selected “k” features, we need to calculate a node “d” applying the best split point. • We need to split the node into further nodes using the derived best split. • Steps 1, 2 and 3 must be repeated until some “l” number of nodes has been achieved. • Construct the forest by re-applying steps 1 to 4 for “n” number of times to create “n” number of trees. Applications Stock market- A random forest can be used to identify the right stock which can attract profits for the user at most times. E-commerce- It can be effective in this field by predicting the products which the customer can buy in future, based on their past choices.
  • 11. Banking- It can recognize the defaulters and the non-defaulters by analyzing the behavior of the customer through their past records. Code in python # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') x = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() x_train = sc.fit_transform(X_train) x_test = sc.transform(X_test) # Fitting the Random Forest Classification model to the Training set from sklearn.ensemble import RandomForestClassifier classifier = RandomForestClassifier(n_estimators = 7, criterion = 'impurity', random_state = 0) classifier.fit(x_train, y_train) # Predicting the Test set results y_pred = classifier.predict(x_test) # Creating the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) K-means clustering Clustering is the process of classifying the given data points into a number of groups or classes such that the data points in the same groups are compatible with each other in terms of features and characteristics. In simple words, k-means has the modus operandi of segregating points into groups with similar properties and assign them into clusters. Working It starts with specifying the desired number of clusters ‘k’ required, let’s consider k as 2 for the five random data points in 2-D space.
  • 12. Then, we need to randomly assign each data point to a cluster. We will assign three points in cluster 1 as shown in red color and two points in cluster 2 as shown in grey color. Next, we need to compute centroids for these clusters, the centroid of data points in the red cluster is signified by a red cross while for the grey cluster, it is shown using a grey cross.
  • 13. Then comes the step of re-assigning each individual data point to the closest cluster centroid. The data point which is at the bottom is assigned to the red cluster even though it is closer to the centroid of the grey cluster. Hence, we assign that data point into the grey cluster. In the end, we need to recompute cluster centroids- We have to recompute the centroids for both the clusters.
  • 14. Feature engineering is the process of using the domain knowledge and expertise to choose which data variables to input as features before building a machine learning model. Feature engineering plays a key role in k-means clustering; using meaningful features that capture the variability and essence of data is essential before imputing the selected features for applying k- means. Feature transformations are conducted, particularly to represent rates rather than measurements, which help in normalizing the data. At times, it is observed that this engineering might help get rid of 80% of the error in a dataset. It proves to be effective in maintaining the accuracy of machine learning model that is implemented to have great insights from the data. Code in python # Importing the required libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Retrieving the dataset dataset = pd.read_csv('customers.csv') x = dataset.iloc[:, [3, 4]].values y = dataset.iloc[:, 3].values # Splitting the dataset into the training and test set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0) # Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc_x = StandardScaler() x_train = sc_x.fit_transform(x_train) x_test = sc_x.transform(X_test) sc_y = StandardScaler() y_train = sc_y.fit_transform(y_train)
  • 15. # Finding the optimal number of clusters from sklearn.cluster import KMeans for i in range(1, 11): kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) kmeans.fit(X) wcss.append(kmeans.inertia_) # Visualising the results using plots plt.plot(range(1, 11), wcss) plt.title('The Elbow Method') plt.xlabel('Number of clusters') plt.ylabel('WCSS') plt.show() # Fitting the K-Means algorithm to our dataset kmeans = KMeans(n_clusters = 10, init = 'k-means++', random_state = 55) y_kmeans = kmeans.fit_predict(x) K-nearest Neighbor(K-NN) K-nearest neighbor can be considered for both classification and regression problems. A KNN model is taken into consideration when n number of points need to be classified into groups that contain data-points or in this case, features of a dataset, in a homogeneous way. These data- points are all similar to each other and are together. When a new point is introduced in the plane, it is classified based on its characteristic which matches any homogenous group or class. It is a non-parametric approach meaning it doesn't depend on data to establish a normal distribution It is also referred to as lazy classification model which predicts classes based on the features of observations that are matching. Selecting the number of nearest neighbors, that is, selecting the value of k, plays a significant role in calculating the capacity of our model. Selection of k will determine how well the data can be used to characterize the results of the kNN algorithm. A large k-value will generally tend to reduce the variance in data due to the noisy data; which will develop a bias. This might lead to smaller patterns in data which can be fruitful. There are many data points in the plane whose distance can be calculated by the following techniques. Euclidean Distance: Euclidean distance is calculated to be the square root of the sum of the squared differences between a new point (x) and an existing point (y). ED= √Σ(x^2-y^2) Manhattan Distance: Manhattan distance is the distance between vectors using the sum of their absolute difference. MD= Σ|x-y|
  • 16. Hamming Distance: It is in favor of categorical variables. If the value (x) and the value (y) are same, the distance D will be equivalent to zero. HD= Σ|x-y| Where x=y when D=0 and x≠y when D=1 KNN is mostly used for searching purposes. It enables the search by finding the nearest item to the customers' interests. It can also be implemented for building Recommender Systems. It will find similar items based on the users personal taste or preference. Normally, the KNN algorithm is not preferred much when compared to SVM or neural networks as it runs slower compared to other algorithms. Code in python # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') x = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the data into the training and test set from sklearn.cross_validation import train_test_split x_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() x_train = sc.fit_transform(X_train) x_test = sc.transform(X_test) # Fitting our K-nearest neighbor model to the Training data from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'hamming', p = 2) classifier.fit(x_train, y_train) # Test data result prediction y_pred = classifier.predict(x_test) # Creating the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) Naive Bayes Naive Bayes is a basic technique for building classifiers. These models assign class labels to problem instances, represented as vectors of features. It is a part of classification techniques
  • 17. based on Bayes’ theorem with the assumption that there exists independence between predictor variables. In plain terms, a naive Bayes classifier calculates the probability of the outcome assuming that the presence of a defining feature in a class is not at all related to the presence of any other feature in another class. For instance, a knife may be considered to have features like sharpness, being made of stainless steel and a size of 20 inches. These features do not depend on each other for their existence. Similarly, a naive Bayes approach would take into account all of the properties of each variable to independently contribute to their probability. Naive Bayes classifiers need to be trained effectively in a supervised learning setting for different sorts of probability models. In many practical applications, parameter estimation for naive bayes models depends on the execution maximum likelihood, which mean that one can work with the naive bayes model without calculating the bayesian probability or using any appropriate Bayesian methods. P(c/x)= P(x/c)*P(x) P(x) where, P(c|x) is called the posterior probability of target given predictor which is x(features), P(c) is known prior probability of class, P(x|c) is the likelihood, which is the probability of predictor given class and P(x) is the prior probability of predictor. Code in python # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('SN_Ads.csv') x = dataset.iloc[:, [2, 3]].values y = dataset.iloc[:, 4].values # Splitting the data into the training and test set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0) # Performing Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() x_train = sc.fit_transform(x_train) x_test = sc.transform(x_test) # Fitting the Naive Bayes model to the Training data from sklearn.naive_bayes import GaussianNB classifier = GaussianNB()
  • 18. classifier.fit(x_train, y_train) # Predicting the Test set results y_pred = classifier.predict(x_test) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred)