SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1149
An Overview of Machine Learning Algorithms for Data Science
Rupesh Deshmukh1, Milind Kubal2
1,2Student, Dept. of Computer Engineering, Terna Engineering College, Nerul, Navi Mumbai, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract – Data Science has become a huge trend over
the past few years. Organizations all over the world have
realized the true intrinsic value of their data and the
demand for data scientists has risen tremendously. Setting
up Business Intelligence departments and making data-
driven decisions has gained popularity. Uncovering
knowledge and hidden patterns from huge chunks of data
can prove highly beneficial to an organization in terms of
profit or otherwise. But analyzing the data in spreadsheets
for this information with the naked eye turns out to be time-
consuming and highly inefficient. Various machine learning
algorithms have been designed over the past decade to
make the data classification and information extraction
process effortless. In this paper, we describe some of the
basic machine learning algorithms that every data science
enthusiast should be familiar with.
Key Words: Data Science, Machine Learning,
Supervised, Unsupervised, Algorithms
1. INTRODUCTION
Data Science is the art of uncovering patterns and
information from a huge chunk of data, that can prove
beneficial to the business. The first step is to understand
the business model and try to comprehend its goals and its
vision for its customers. It is possible to find something in
data if only we know what we are looking for. Most
datasets are never perfect enough in their raw form to be
able to extract information from it. The data needs to be
cleaned, sorted and even scaled [1]. Missing values are to
be handled and inconsistencies and redundancies have to
be taken care of. After the data cleaning is complete, it is
examined visually. If processing is performed on all the
data available, it might take months to complete the
analysis. To make it feasible, some important features are
selected from the complete dataset, and other features are
discarded. This saves a lot of unnecessary calculations and
processing time. Finally, when the targeted dataset is
ready, it can be processed using various machine learning
algorithms.
The insights derived from the data can be in more than
one form. They can be patterns, strays and even
predictions for future data. These insights are easy to
comprehend for data experts, but might not be
understandable by ordinary business people. Hence, it
needs to be presented visually in a way that could be
understood by anyone. This is the final step of the data
science process, and this presented information can be
used for further business operations.
Fig -1: Data Science Lifecycle
Machine Learning has a huge variety of applications and
they are growing every day. The algorithms in machine
learning learn from the training data and tune the
algorithm’s parameters accordingly to accurately
understand and classify the real-world data. The part of
the dataset selected for training always contains the same
features as the data to be classified. Machine Learning can
be considered as one of the most important tools in a data
science professional’s toolset. Nowadays, a huge number
of organizations rely on decisions backed by these
algorithmic findings from the data. As not all the data can
be considered useful, it is up to a data scientist to deal with
redundant or incomplete data, select the appropriate
algorithms and work with them to get the best insights out
of the datasets.
Fig -2: The Machine Learning Process
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1150
2. TYPES OF ALGORITHMS
Machine Learning algorithms can broadly be classified
into two types – Supervised Learning and Unsupervised
Learning.
2.1 Supervised Learning Algorithms
In Supervised Learning, the class labels are already known
by the machine learning model. The goal is to learn from
the training data and classify future data into the pre-
defined classes as accurately as possible. Below are some
of the supervised learning algorithms.
2.1.1 Decision Tree
Fig -3: Example of a Decision Tree Model
One of the most classic algorithms, the Decision Tree
algorithm consists of a flowchart-like structure that
indicates various outputs as a result of different sequences
of decisions. The decision tree model analyzes the training
dataset and automatically creates a hierarchical tree-like
structure [2]. This process is known as “Induction”. Future
data can be classified into different classes based on this
tree. Although this tree doesn’t require any expertise to
comprehend visually, a huge tree can make it harder for
the model to correctly classify new data and runs a risk of
overfitting as the new data cannot be expected to be
exactly accurate as training data. Hence, unnecessary
branches are removed from the tree. This process is
known as “Pruning”, and helps reduce the complexity of
the structure as well as makes it easier to understand.
Decision Trees can be further classified into two types –
Categorical Variable Decision Tree and Continuous
Variable Decision Tree. This is based on the nature of the
target variable. For example, making financial decisions
can only have outcomes like ‘Yes’ and ‘No’; there cannot be
a ‘Maybe’ outcome as it does not fit this application. But
some qualities, like the willingness of a customer to buy a
product, can be visualized on a continuous scale.
Tree-based algorithms empower prognosticative models
with high accuracy and easy interpretation. They are used
to map non-linear relationships as well. Techniques like
decision trees, random forest and gradient boosting are
being popularly used in all kinds of data science problems.
They are useful to solve classification as well as regression
problems.
2.1.2 K-Nearest Neighbours
Fig -4: Steps for KNN Algorithm
The K-Nearest Neighbours (KNN) algorithm is a simple,
easily interpretable supervised machine learning
algorithm that can be used to solve both classification and
regression problems. For each new data point in the
classified dataset, we calculate the distance between the
query point and the neighboring data points. The value of
‘K’ tells us about the number of closest neighbors to
consider. After the neighboring data points are selected,
we vote to find out the maximum points of a particular
cluster among them. The new data point is assigned to the
winning cluster [3].
KNN makes predictions based on the outcome of the ‘K’
neighbors closest to that point. Usually odd value of ‘K’ is
selected to avoid ties during voting. To make predictions
with KNN, we need to define a metric for measuring the
distance between the new and the neighboring points
from the data set. One of the most popular choices to
measure this distance is Euclidean Distance, although
Cosine, Chi-Square, and Minkowski measures are also
used.
KNN can out-perform many other classifiers in selected
applications that cannot entirely rely on features of their
datasets. Sometimes it becomes impossible to fill in the
missing data into the dataset. Hence, plotting the available
features on a graph and classifying new points based on
their proximity to original points becomes a better
alternative. Applications like text mining and
classification, and fingerprint and pattern recognition use
KNN to achieve their goals optimally.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1151
2.1.3 Linear Regression
Fig -5: Example of a Linear Regression Model
Linear regression is a supervised machine learning
algorithm that plots a linear relationship of a target
variable against other variables. The model plots the
available points from the dataset on a graph and finds a
statistical relationship between the scalar variable with
the target variable [4]. A statistical relationship differs
from a deterministic relationship in the sense that it
cannot be always accurately predicted. The basic idea of
this algorithm is to plot a line that can be used to
represent the trend of the target variable against other
variables with minimum error. An error in this model can
be defined as the closest distance from the point to the
line. Once the line is plotted, it can be used to predict the
value of the target variable for future independent
variables.
The equation of Linear Regression line can be given as
follows:
Here, ‘Y’ represents the target variable to be predicted, X is
the independent variable or predictor. ‘B0’ is the intercept
and B1 is the coefficient which represents the slope of the
line. To check the accuracy of our model, the R-square
metric can be formulated as follows:
Here, ‘RSS’ is the Root Mean Square which is the squared
difference between the observed value and the predicted
value. ‘TSS’ is the Total Sum of Squares which is defined as
the squared difference between the observed value and
the mean. Typically, a high R2 value is associated with a
good model, but it often depends on what the application
considers as a fit model. Linear Regression is popular with
applications that have a linear relationship but not exactly
predictable. For example, changes in house prices with
respect to the area or height of a person concerning his or
her age.
2.2 Unsupervised Algorithms
Unsupervised algorithms look for trends in unlabeled
classes of data with minimal or no human intervention.
They are usually classified into two types – Clustering and
Association.
Fig -6: Clustering
Clustering can be defined as grouping data points into
clusters based on the similarity between their features.
Different datasets might require different similarity
measures like Cosine, Jaccard or Euclidean to get the
optimum clusters based on the application [5].
Association can be defined as grouping objects that are
most frequently grouped based on past data. This does not
depend on attributes of the object but rather on the
frequent associations of objects with one another.
2.2.1 K-Means
Fig -7: Steps for K-Means Algorithm
K-Means is the most basic algorithm used for partitioning
data into the required number of clusters. For 'k' clusters,
'k' representatives are selected from the given data. These
are called as centroids. The proximity of every data
element with every centroid is calculated, and the data
elements are assigned to a cluster with the lowest
proximity from its centroid. The classical version of the K-
Means algorithm used Euclidean distance as a proximity
measure among data points plotted on a two-dimensional
plane. Once all the points are allotted to some cluster, new
centroids are calculated as the barycenter of each cluster,
and the process is repeated until a fixed criterion is met,
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1152
usually until two consecutive iterations do not show a
significant difference in the clusters [6].
Although the algorithm can be proved to terminate in
every case, it does not guarantee optimal partitioning of
data into clusters. Also, there are no guides for selecting
the number of clusters and the initial representatives
which lead to varied results with different values.
2.2.2 Hierarchical Clustering
Fig -8: Steps for Hierarchical Clustering Algorithm
Some applications require datasets to be partitioned into
nested clusters that might be arranged to form a tree-like
structure. The leaf or the smallest clusters contain a single
data element whereas the root or the largest cluster
contains the whole data set. There are two approaches to
creating the hierarchy – Agglomerative and Divisive [7].
The 'Agglomerative' method is a bottom-up approach that
starts with every cluster containing a single object, and
proceeds with merging clusters until one final cluster is
obtained. The 'Divisive' method is a top-down approach
that begins with the whole dataset in one cluster and splits
clusters with each iteration until every cluster contains a
single object.
Steps of creating the hierarchy can be given as follows:
1. Assign every object to an individual cluster.
2. After ‘n’ objects are assigned ‘n’ individual clusters, a
cluster pair-wise similarity matrix of order ‘n*n’ is created.
3. Based on the matrix, a similar pair of clusters are
merged into a single cluster, and the similarity matrix is
updated for new clusters.
4. Step 3 is repeated until a single cluster remains or
until set criteria are satisfied. Set criteria could be the
maximum number of merges or leaves.
Hierarchical Clustering is used in applications that need to
find the source or path from one node to another related
node. For example, the evolution tree of a species or a
possible source of a virus can be tracked using this
algorithm.
2.2.3 Apriori Algorithm
Fig -9: Apriori Algorithm
Apriori Algorithm is an association algorithm which is
used to find frequent item-sets in a given dataset. An
iterative or a level-wise approach can be used to eliminate
the infrequent item-sets. It uses k-frequent item-sets to
find (k+1)-frequent item-sets using the ‘Apriori’ property.
The Apriori property states that if an item-set is frequent,
all its non-empty subsets will also be considered frequent.
This implies that, if an item-set is infrequent, all its
supersets will also be infrequent.
To begin, we create a table for each item in the dataset.
This table is known as ‘Candidate Set’. Then we eliminate
the candidates that have frequencies less than a pre-set
support level. Once we get a valid candidate set, we create
a Level 2 candidate set with two elements each in the
item-set. This can be derived from the Level 1 candidate
set using Apriori Property. This process is repeated until
we get the required level of frequent item-sets. Association
Rules can be derived from these item-sets, and these rules
can be used to predict future item-sets [8].
In Data Science, these Association Rules can be of huge
significance since they can be used to find out patterns in
applications that record consumer behavior. For example,
supermarket stores and online shopping portals use these
insights to recommend similar products and services to
customers to increase their sales. Association algorithms
can be used in real-time as consumer data is constantly
updated every day.
3. SUMMARY
Machine Learning models and algorithms are an important
part of the Data Science process. They are useful to extract
important insights and patterns from the data, which can
be beneficial to the organization to achieve its goals. In the
real world, no two applications are the same. The
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1153
algorithm that will provide optimal results varies for each
application. Being familiar with the above machine
learning algorithms will help data science professionals
make the right decisions regarding their models.
REFERENCES
[1] Claus Weihs, Katja Ickstadt, “Data Science: The Impact
of Statistics”, International Journal of Data Science and
Analytics, 2018.
[2] Himani Sharma, Sunil Kumar, “A Survey on Decision
Tree Algorithms of Classification in Data Mining”,
International Journal of Science and Research (IJSR),
2016.
[3] Jasmina Novakovic, Alempije Veljovic, Sinisa Ilic, Milos
Papic, “Experimental Study of Using the K-Nearest
Neighbor Classifier with Filter Methods”, Computer
Science and Technology, At Varna, Bulgaria, June
2016.
[4] Azizur Rahman, Mayooran Thevaraja, Mathew Ekele
Gabriel, “Recent Developments in Data Science:
Comparing Linear, Ridge and Lasso Regressions
Techniques Using Wine Data”, DISP '19, Oxford, United
Kingdom, April 2019.
[5] Memoona Khanam, Tahira Mahboob, Warda Imtiaz,
Humaraia Abdul Ghafoor, Rabeea Sehar, “A Survey on
Unsupervised Machine Learning Algorithms for
Automation, Classification and Maintenance,
International Journal of Computer Applications
119(13):34-39, June 2015.
[6] Md. Zakir Hossain, Md.Nasim Akhtar, R.B. Ahmad,
Mostafijur Rahman, “A Dynamic K-Means Clustering
for Data Mining”, IJEECS, February 2019.
[7] Fahdah Alalyan, Nuha Zamzami, Nizar Bouguila,
“Model-Based Hierarchical Clustering for Categorical
Data”, IEEE 28th International Symposium on
Industrial Electronics (ISIE), June 2019.
[8] Foxiao Zhan, Xiaolan Zhu, Lei Zhang, Xuexi Wang, Lu
Wang, Chaoyi Liu, “Summary of Association Rules”,
IOP Conference Series Earth and Environmental
Science 252:032219, July 2019.

Contenu connexe

Tendances

Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using dataeSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover systemeSAT Journals
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET Journal
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
 
PCA and DCT Based Approach for Face Recognition
PCA and DCT Based Approach for Face RecognitionPCA and DCT Based Approach for Face Recognition
PCA and DCT Based Approach for Face Recognitionijtsrd
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3eSAT Journals
 
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...ijtsrd
 

Tendances (19)

Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence Chain
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
50120140503013
5012014050301350120140503013
50120140503013
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
 
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
PCA and DCT Based Approach for Face Recognition
PCA and DCT Based Approach for Face RecognitionPCA and DCT Based Approach for Face Recognition
PCA and DCT Based Approach for Face Recognition
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 
Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3Research scholars evaluation based on guides view using id3
Research scholars evaluation based on guides view using id3
 
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
 

Similaire à IRJET - An Overview of Machine Learning Algorithms for Data Science

CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESIRJET Journal
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
 
Review on Sentiment Analysis on Customer Reviews
Review on Sentiment Analysis on Customer ReviewsReview on Sentiment Analysis on Customer Reviews
Review on Sentiment Analysis on Customer ReviewsIRJET Journal
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET Journal
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesIRJET Journal
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONIRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...IRJET Journal
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesIRJET Journal
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET Journal
 
IRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET Journal
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONIRJET Journal
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1warishali570
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...IRJET Journal
 
RESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMRESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMIRJET Journal
 

Similaire à IRJET - An Overview of Machine Learning Algorithms for Data Science (20)

CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGESCASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
CASE STUDY: ADMISSION PREDICTION IN ENGINEERING AND TECHNOLOGY COLLEGES
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
 
Review on Sentiment Analysis on Customer Reviews
Review on Sentiment Analysis on Customer ReviewsReview on Sentiment Analysis on Customer Reviews
Review on Sentiment Analysis on Customer Reviews
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction Techniques
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
TERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTIONTERM DEPOSIT SUBSCRIPTION PREDICTION
TERM DEPOSIT SUBSCRIPTION PREDICTION
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
 
Loan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning TechniquesLoan Default Prediction Using Machine Learning Techniques
Loan Default Prediction Using Machine Learning Techniques
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
IRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data MiningIRJET- Financial Analysis using Data Mining
IRJET- Financial Analysis using Data Mining
 
IRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning Algorithm
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 
CUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTIONCUSTOMER CHURN PREDICTION
CUSTOMER CHURN PREDICTION
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...
A Comparative Study of Techniques to Predict Customer Churn in Telecommunicat...
 
RESUME SCREENING USING LSTM
RESUME SCREENING USING LSTMRESUME SCREENING USING LSTM
RESUME SCREENING USING LSTM
 

Plus de IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

Plus de IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Dernier

Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Dernier (20)

Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

IRJET - An Overview of Machine Learning Algorithms for Data Science

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1149 An Overview of Machine Learning Algorithms for Data Science Rupesh Deshmukh1, Milind Kubal2 1,2Student, Dept. of Computer Engineering, Terna Engineering College, Nerul, Navi Mumbai, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract – Data Science has become a huge trend over the past few years. Organizations all over the world have realized the true intrinsic value of their data and the demand for data scientists has risen tremendously. Setting up Business Intelligence departments and making data- driven decisions has gained popularity. Uncovering knowledge and hidden patterns from huge chunks of data can prove highly beneficial to an organization in terms of profit or otherwise. But analyzing the data in spreadsheets for this information with the naked eye turns out to be time- consuming and highly inefficient. Various machine learning algorithms have been designed over the past decade to make the data classification and information extraction process effortless. In this paper, we describe some of the basic machine learning algorithms that every data science enthusiast should be familiar with. Key Words: Data Science, Machine Learning, Supervised, Unsupervised, Algorithms 1. INTRODUCTION Data Science is the art of uncovering patterns and information from a huge chunk of data, that can prove beneficial to the business. The first step is to understand the business model and try to comprehend its goals and its vision for its customers. It is possible to find something in data if only we know what we are looking for. Most datasets are never perfect enough in their raw form to be able to extract information from it. The data needs to be cleaned, sorted and even scaled [1]. Missing values are to be handled and inconsistencies and redundancies have to be taken care of. After the data cleaning is complete, it is examined visually. If processing is performed on all the data available, it might take months to complete the analysis. To make it feasible, some important features are selected from the complete dataset, and other features are discarded. This saves a lot of unnecessary calculations and processing time. Finally, when the targeted dataset is ready, it can be processed using various machine learning algorithms. The insights derived from the data can be in more than one form. They can be patterns, strays and even predictions for future data. These insights are easy to comprehend for data experts, but might not be understandable by ordinary business people. Hence, it needs to be presented visually in a way that could be understood by anyone. This is the final step of the data science process, and this presented information can be used for further business operations. Fig -1: Data Science Lifecycle Machine Learning has a huge variety of applications and they are growing every day. The algorithms in machine learning learn from the training data and tune the algorithm’s parameters accordingly to accurately understand and classify the real-world data. The part of the dataset selected for training always contains the same features as the data to be classified. Machine Learning can be considered as one of the most important tools in a data science professional’s toolset. Nowadays, a huge number of organizations rely on decisions backed by these algorithmic findings from the data. As not all the data can be considered useful, it is up to a data scientist to deal with redundant or incomplete data, select the appropriate algorithms and work with them to get the best insights out of the datasets. Fig -2: The Machine Learning Process
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1150 2. TYPES OF ALGORITHMS Machine Learning algorithms can broadly be classified into two types – Supervised Learning and Unsupervised Learning. 2.1 Supervised Learning Algorithms In Supervised Learning, the class labels are already known by the machine learning model. The goal is to learn from the training data and classify future data into the pre- defined classes as accurately as possible. Below are some of the supervised learning algorithms. 2.1.1 Decision Tree Fig -3: Example of a Decision Tree Model One of the most classic algorithms, the Decision Tree algorithm consists of a flowchart-like structure that indicates various outputs as a result of different sequences of decisions. The decision tree model analyzes the training dataset and automatically creates a hierarchical tree-like structure [2]. This process is known as “Induction”. Future data can be classified into different classes based on this tree. Although this tree doesn’t require any expertise to comprehend visually, a huge tree can make it harder for the model to correctly classify new data and runs a risk of overfitting as the new data cannot be expected to be exactly accurate as training data. Hence, unnecessary branches are removed from the tree. This process is known as “Pruning”, and helps reduce the complexity of the structure as well as makes it easier to understand. Decision Trees can be further classified into two types – Categorical Variable Decision Tree and Continuous Variable Decision Tree. This is based on the nature of the target variable. For example, making financial decisions can only have outcomes like ‘Yes’ and ‘No’; there cannot be a ‘Maybe’ outcome as it does not fit this application. But some qualities, like the willingness of a customer to buy a product, can be visualized on a continuous scale. Tree-based algorithms empower prognosticative models with high accuracy and easy interpretation. They are used to map non-linear relationships as well. Techniques like decision trees, random forest and gradient boosting are being popularly used in all kinds of data science problems. They are useful to solve classification as well as regression problems. 2.1.2 K-Nearest Neighbours Fig -4: Steps for KNN Algorithm The K-Nearest Neighbours (KNN) algorithm is a simple, easily interpretable supervised machine learning algorithm that can be used to solve both classification and regression problems. For each new data point in the classified dataset, we calculate the distance between the query point and the neighboring data points. The value of ‘K’ tells us about the number of closest neighbors to consider. After the neighboring data points are selected, we vote to find out the maximum points of a particular cluster among them. The new data point is assigned to the winning cluster [3]. KNN makes predictions based on the outcome of the ‘K’ neighbors closest to that point. Usually odd value of ‘K’ is selected to avoid ties during voting. To make predictions with KNN, we need to define a metric for measuring the distance between the new and the neighboring points from the data set. One of the most popular choices to measure this distance is Euclidean Distance, although Cosine, Chi-Square, and Minkowski measures are also used. KNN can out-perform many other classifiers in selected applications that cannot entirely rely on features of their datasets. Sometimes it becomes impossible to fill in the missing data into the dataset. Hence, plotting the available features on a graph and classifying new points based on their proximity to original points becomes a better alternative. Applications like text mining and classification, and fingerprint and pattern recognition use KNN to achieve their goals optimally.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1151 2.1.3 Linear Regression Fig -5: Example of a Linear Regression Model Linear regression is a supervised machine learning algorithm that plots a linear relationship of a target variable against other variables. The model plots the available points from the dataset on a graph and finds a statistical relationship between the scalar variable with the target variable [4]. A statistical relationship differs from a deterministic relationship in the sense that it cannot be always accurately predicted. The basic idea of this algorithm is to plot a line that can be used to represent the trend of the target variable against other variables with minimum error. An error in this model can be defined as the closest distance from the point to the line. Once the line is plotted, it can be used to predict the value of the target variable for future independent variables. The equation of Linear Regression line can be given as follows: Here, ‘Y’ represents the target variable to be predicted, X is the independent variable or predictor. ‘B0’ is the intercept and B1 is the coefficient which represents the slope of the line. To check the accuracy of our model, the R-square metric can be formulated as follows: Here, ‘RSS’ is the Root Mean Square which is the squared difference between the observed value and the predicted value. ‘TSS’ is the Total Sum of Squares which is defined as the squared difference between the observed value and the mean. Typically, a high R2 value is associated with a good model, but it often depends on what the application considers as a fit model. Linear Regression is popular with applications that have a linear relationship but not exactly predictable. For example, changes in house prices with respect to the area or height of a person concerning his or her age. 2.2 Unsupervised Algorithms Unsupervised algorithms look for trends in unlabeled classes of data with minimal or no human intervention. They are usually classified into two types – Clustering and Association. Fig -6: Clustering Clustering can be defined as grouping data points into clusters based on the similarity between their features. Different datasets might require different similarity measures like Cosine, Jaccard or Euclidean to get the optimum clusters based on the application [5]. Association can be defined as grouping objects that are most frequently grouped based on past data. This does not depend on attributes of the object but rather on the frequent associations of objects with one another. 2.2.1 K-Means Fig -7: Steps for K-Means Algorithm K-Means is the most basic algorithm used for partitioning data into the required number of clusters. For 'k' clusters, 'k' representatives are selected from the given data. These are called as centroids. The proximity of every data element with every centroid is calculated, and the data elements are assigned to a cluster with the lowest proximity from its centroid. The classical version of the K- Means algorithm used Euclidean distance as a proximity measure among data points plotted on a two-dimensional plane. Once all the points are allotted to some cluster, new centroids are calculated as the barycenter of each cluster, and the process is repeated until a fixed criterion is met,
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1152 usually until two consecutive iterations do not show a significant difference in the clusters [6]. Although the algorithm can be proved to terminate in every case, it does not guarantee optimal partitioning of data into clusters. Also, there are no guides for selecting the number of clusters and the initial representatives which lead to varied results with different values. 2.2.2 Hierarchical Clustering Fig -8: Steps for Hierarchical Clustering Algorithm Some applications require datasets to be partitioned into nested clusters that might be arranged to form a tree-like structure. The leaf or the smallest clusters contain a single data element whereas the root or the largest cluster contains the whole data set. There are two approaches to creating the hierarchy – Agglomerative and Divisive [7]. The 'Agglomerative' method is a bottom-up approach that starts with every cluster containing a single object, and proceeds with merging clusters until one final cluster is obtained. The 'Divisive' method is a top-down approach that begins with the whole dataset in one cluster and splits clusters with each iteration until every cluster contains a single object. Steps of creating the hierarchy can be given as follows: 1. Assign every object to an individual cluster. 2. After ‘n’ objects are assigned ‘n’ individual clusters, a cluster pair-wise similarity matrix of order ‘n*n’ is created. 3. Based on the matrix, a similar pair of clusters are merged into a single cluster, and the similarity matrix is updated for new clusters. 4. Step 3 is repeated until a single cluster remains or until set criteria are satisfied. Set criteria could be the maximum number of merges or leaves. Hierarchical Clustering is used in applications that need to find the source or path from one node to another related node. For example, the evolution tree of a species or a possible source of a virus can be tracked using this algorithm. 2.2.3 Apriori Algorithm Fig -9: Apriori Algorithm Apriori Algorithm is an association algorithm which is used to find frequent item-sets in a given dataset. An iterative or a level-wise approach can be used to eliminate the infrequent item-sets. It uses k-frequent item-sets to find (k+1)-frequent item-sets using the ‘Apriori’ property. The Apriori property states that if an item-set is frequent, all its non-empty subsets will also be considered frequent. This implies that, if an item-set is infrequent, all its supersets will also be infrequent. To begin, we create a table for each item in the dataset. This table is known as ‘Candidate Set’. Then we eliminate the candidates that have frequencies less than a pre-set support level. Once we get a valid candidate set, we create a Level 2 candidate set with two elements each in the item-set. This can be derived from the Level 1 candidate set using Apriori Property. This process is repeated until we get the required level of frequent item-sets. Association Rules can be derived from these item-sets, and these rules can be used to predict future item-sets [8]. In Data Science, these Association Rules can be of huge significance since they can be used to find out patterns in applications that record consumer behavior. For example, supermarket stores and online shopping portals use these insights to recommend similar products and services to customers to increase their sales. Association algorithms can be used in real-time as consumer data is constantly updated every day. 3. SUMMARY Machine Learning models and algorithms are an important part of the Data Science process. They are useful to extract important insights and patterns from the data, which can be beneficial to the organization to achieve its goals. In the real world, no two applications are the same. The
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1153 algorithm that will provide optimal results varies for each application. Being familiar with the above machine learning algorithms will help data science professionals make the right decisions regarding their models. REFERENCES [1] Claus Weihs, Katja Ickstadt, “Data Science: The Impact of Statistics”, International Journal of Data Science and Analytics, 2018. [2] Himani Sharma, Sunil Kumar, “A Survey on Decision Tree Algorithms of Classification in Data Mining”, International Journal of Science and Research (IJSR), 2016. [3] Jasmina Novakovic, Alempije Veljovic, Sinisa Ilic, Milos Papic, “Experimental Study of Using the K-Nearest Neighbor Classifier with Filter Methods”, Computer Science and Technology, At Varna, Bulgaria, June 2016. [4] Azizur Rahman, Mayooran Thevaraja, Mathew Ekele Gabriel, “Recent Developments in Data Science: Comparing Linear, Ridge and Lasso Regressions Techniques Using Wine Data”, DISP '19, Oxford, United Kingdom, April 2019. [5] Memoona Khanam, Tahira Mahboob, Warda Imtiaz, Humaraia Abdul Ghafoor, Rabeea Sehar, “A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance, International Journal of Computer Applications 119(13):34-39, June 2015. [6] Md. Zakir Hossain, Md.Nasim Akhtar, R.B. Ahmad, Mostafijur Rahman, “A Dynamic K-Means Clustering for Data Mining”, IJEECS, February 2019. [7] Fahdah Alalyan, Nuha Zamzami, Nizar Bouguila, “Model-Based Hierarchical Clustering for Categorical Data”, IEEE 28th International Symposium on Industrial Electronics (ISIE), June 2019. [8] Foxiao Zhan, Xiaolan Zhu, Lei Zhang, Xuexi Wang, Lu Wang, Chaoyi Liu, “Summary of Association Rules”, IOP Conference Series Earth and Environmental Science 252:032219, July 2019.