Big data and machine learning for Businesses

Big Data and Machine
Learning for businesses
By Abdul Wahid

Who AM I
Abdul Wahid
CEO at GleeTech (http://gleetech.com)
• Started working as a Software Engineer in 2005.
• PhD (Computer Science).
• 2016, Victoria University of Wellington
• Help people to develop awesome digital products.
• Passionate about
• Big Data
• Machine Learning
• Tech Startups/Businesses
• Email:abdul@gleetech.com
• Personal site: http://awahid.net

Big Data
• Additional Dimentions
• Complexity
• Multiple sources and data streams
• Variability
• Unpredictable Data flows.
• Social media trending.

Data is the new Oil of the Digital
Economy
Wired 2014

Why Big Data is important
• Data contains information.
• Information leads to insights.
• Insights helps in making better decisions.

Examples - Using data
• Weather Data
• Utility & Manufacturing companies
• Transport & Tourism
• News, Accidents, Census
• Insurance Companies
• Survey, Blogs, Comments, Tweets
• Marketing Analysis
• Customer Segmentation

Examples - Benefits of data
• Sports
• Strategies, Data driven analytics
• Entertainment
• Users opinions, Sentiments
• Retail
• Consumer behavior
• Health Care
• Patient/symptom analysis
• Financial Institutions
• Fraud Detection

How to derive insights from data?

Machine Learning?
“Field of study that gives computers the ability to learn without being explicitly
programmed.” Arthur Samuel (1959).

https://www.linkedin.com/pulse/business-intelligence-its-relationship-big-data-geekstyle

Top Algorithms
• Supervised Learning
• Decision Trees
• Naïve Bayes Classification
• Ordinary Least Squares Regression
• Logistic Regression
• Support Vector Machines
• Ensemble Methods
• Unsupervised Learning
• Clustering Algorithms
• Centroid-based algorithms
• Connectivity-based algorithms
• Density-based algorithms
• Probabilistic
• Dimensionality Reduction
• Neural networks / Deep Learning
• Principal Component Analysis
• Singular Value Decomposition
• Independent Component Analysis

Example
• We want some hypothesis h that predicts whether we will be paid back
1. +a, -c, +i, +e, +o, +u: Y
2. -a, +c, -i, +e, -o, -u: N
3. +a, -c, +i, -e, -o, -u: Y
4. -a, -c, +i, +e, -o, -u: Y
5. -a, +c, +i, -e, -o, -u: N
6. -a, -c, +i, -e, -o, +u: Y
7. +a, -c, -i, -e, +o, -u: N
8. +a, +c, +i, -e, +o, -u: N
• Lots of possible hypotheses: will be paid back if…
• Income is high (wrong on 2 occasions in training data)
• Income is high and no Criminal record (always right in training data)
• (Address is known AND ((NOT Old) OR Unemployed)) OR ((NOT Address is known) AND (NOT
Criminal Record)) (always right in training data)
• Which one seems best? Anything better?

Decision trees
high Income?
yes no
NO
yes no
NO
Criminal record?
YES

Constructing a decision
tree, one step at a time address?
yes no
+a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
-a, +c, -i, +e, -o, -u: N
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
criminal? criminal?
-a, +c, -i, +e, -o, -u: N
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, +e, -o, -u: Y
-a, -c, +i, -e, -o, +u: Y
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
income?
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
yes no
yes no
yes no Address was
maybe not the
best attribute to
start with…

Starting with a different attribute
• Seems like a much better starting point than address
• Each node almost completely uniform
• Almost completely predicts whether we will be paid back
yes no
+a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
criminal?
-a, +c, -i, +e, -o, -u: N
-a, +c, +i, -e, -o, -u: N
+a, +c, +i, -e, +o, -u: N
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N

Linear Regression
• The task of fitting a
straight line through a
set of points

Logistic Regression
• Measures the relationship
between the categorical
dependent variable and
one or more independent
variables
• Credit Scoring
• Measuring the success
rates of marketing
campaigns
• Predicting the revenues of
a certain product

Support Vector Machine
• Binary classification algorithm
• SVM generates a (N — 1)
dimensional hyperlane to
separate those points into 2
groups.

Singular Value Decomposition
• PCA is actually a simple
application of SVD

Independent Component Analysis
• Revealing hidden factors that
underlie sets of random
variables.
• data variables are assumed to
be linear mixtures of some
unknown latent variables, and
the mixing system is also
unknown.
• The latent variables are
assumed non-gaussian and
mutually independent.
• Images, Documents

https://thinkgrowth.org/artificial-intelligence-and-you-demystifying-the-technology-landscape-21d4d34fbb10

Conclusions
• Data is nothing without insights
• Machine learning is the key for deriving insights from data
• Big Data and Machine Learning has a huge potential
• Mobile First to AI First Approach

Next Steps
• Machine Learning Course
• https://www.coursera.org/learn/machine-learning
• Big Data Course
• https://www.coursera.org/specializations/big-data
• Must know Machine Learning Algorithms
• http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-
engineers.html
• Machine Learning Startups
• https://angel.co/machine-learning
• Recent Cool Startups
• http://www.informationweek.com/big-data/software-platforms/10-cool-machine-
learning-startups-to-watch/d/d-id/1326571

References
• https://www.slideshare.net/nasrinhussain1/big-data-ppt-
31616290?from_action=save
• http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
• https://www.slideshare.net/larsga/introduction-to-big-datamachine-learning/40-
Top_10_machine_learning_algs1
• https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUK
Ewjx0_2Q5b7TAhXFPRQKHfiNCV8QFggmMAE&url=http%3A%2F%2Fnlp.cs.rpi.edu%2Fco
urse%2Fspring16%2Flecture10.pptx&usg=AFQjCNFFg6FbId7hqMj1lwbufcfPvb7Wmw
• https://thinkgrowth.org/artificial-intelligence-and-you-demystifying-the-technology-
landscape-21d4d34fbb10
• https://www.linkedin.com/pulse/business-intelligence-its-relationship-big-data-geekstyle

Big data and machine learning for Businesses

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Big data and machine learning for Businesses

Similaire à Big data and machine learning for Businesses (20)

Dernier

Dernier (20)

Big data and machine learning for Businesses