The document discusses machine learning and big data research at the Data Science Institute of Multimedia University. The institute conducts research across various domains using machine learning techniques. Some areas of research include high performance computing for massive data sources, social media analytics, smart cities, and public health analytics. The document provides examples of how machine learning can be applied to problems in business analytics like predictive customer churn analysis and operations analytics like predictive maintenance. It also outlines the basic machine learning process of obtaining data, exploring it, building predictive models, applying and validating models, and taking action based on forecasts.
2. Disclaimer: The views and opinions expressed in this slides are those of
the author and do not necessarily reflect the official policy or position
of Multimedia University. Examples of analysis performed within this
slides are only examples. They should not be utilized in real-world
analytic products as they are based only on very limited and dated
open source information. Assumptions made within the analysis are
not reflective of the position of Multimedia University.
3. Data Science Institute
• The Data Science Institute is a research
center based in the Faculty of Computing
& Informatics, Multimedia University.
• The members comprise of expertise
across faculties such as Faculty of
Computing and Informatics, Faculty of
Engineering, Faculty of Management &
Faculty of Information Science and
Technology.
• Conduct research in leading data science
areas including stream mining, video
analytics, machine learning, deep
learning, next generation data
visualization and advanced data
modelling.
4. Domain Sub-Domain Research Areas
Algorithm and Machine
Learning
High Performance and
Parallel Computing
1. HPC for massive heterogeneous data
sources
2. Enhanced algorithmic performance using
shared and distributed memory parallel
processing (GPGPU).
Performance Optimization 1. Big Data Stream Mining
2. Data Storage
Social Media Analytics Data mining 1. Predictive Analytics
Social Media Modelling 1. Sentiment Analysis
2. Topic Modelling
Research Structure
5. Research Structure
Domain Sub-Domain Research Areas
Behavioral Analytics Media Analytics 1. Media Recommender
2. Customer Profiling
Smart Cities 1. Sensor networks
Transport & mobility
management
1. Image and Video Analytics
Network Analysis 1. Fault Prediction
2. Intrusion Prediction
6. Domain Sub-Domain Research Areas
Public Health Analytics Public health data 1. Infectious Disease modeling
2. Home Monitoring and Sensing
Technologies
Multi-domain
Electronic Health Records
data
1. Knowledge + Data Driven Risk Factor
2. Text mining for clinical notes
Financial & Business
Analytics
Marketing and e-commerce 1. Finance and Banking
Financial market design and
behavior
1. Time Series Analysis
Research Structure
8. Machine learning is all around us…
• Machine learning is part of our daily live
• Email spam detection
• Photos searching using keywords
• Movies/Songs recommender systems
• Voice recognition
• Video captioning
• Self driving cars
• etc
10. Machine Learning 101
• Machine Learning is a process for generalizing
from examples
• examples = example or "training" data
• generalizing = building "statistical models" to capture
correlations
• process= on going process, we keep validating &
refitting models to improve accuracy
• Simple machine learning workflow:
• explore data
• FIT models based on data
• APPLY models in prediction
• Evaluate and validate the models
*all models are incorrect essentially, but some are
useful
11. 3 types of machine learning
• Supervised Learning – generalizing from labeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_2.jpg
12. 3 types of machine learning
• Unsupervised learning – generalizing from unlabeled data
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_3.jpg
13. 3 types of machine learning
• Reinforcement learning – generalizing based on feedbacks in time
http://cfs22.simplicdn.net/ice9/free_resources_article_thumb/Machine_Learning_5.jpg
14.
15. Common machine learning techniques…
Naive Bayes
Decision Tree
K-Nearest Neighbour
Artificial Neural Network
Support Vector Machine
Ensemble Methods: Random Forest,
Bagging, Adaboost
Logistic Regression
K-means
16. Which technique to use?
• What is size and dimensionality of my
training set?
• Is my data linearly separable?
• How much do I care about
computational efficiency?
• Model building vs real-time prediction time
• Eager vs lazy learning/ on-line vs batch
learning
• Prediction performance vs speed
• Do I care about interpretability or
should it "just work well?"
17. What can I do with machine learning?
• Customer Churn Analysis
• Predictive Maintenance
• Customer Segmentation
• Products Recommendation
18. Business Analytics: Predict Customer Churn
• Problem: Customer churn will lead to income loss and high expenses to
find new customers
• Solution: Build predictive model to forecast possible churn, act pre-
emptively and learn from previous historical dataset
1. Get customer data (set-top boxes, web logs, transaction history)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Identify customers likely to churn
5. Escalate the incidents to Business Ops. to investigate and act accordingly
19. Operation Analytics: Predictive Maintenance
• Problem: Network/Service outage will lead to income loss and high
expenses
• Solution: Build predictive model to forecast possible outage, act pre-
emptively and learn from previous historical dataset
1. Get resource usage data (latency, syslog, outage reports)
2. Explore data, and fit predictive models based on past or real-time data
3. Apply and validate models until predictions are accurate
4. Forecast resource saturation, demand and usage
5. Escalate the incidents to IT Ops. to investigate and act accordingly
20. Summary: The machine learning process
• Problem: Identify problem that may cost time and high expenses
• Solution: Build predictive model to forecast possible incidents, act
pre-emptively and learn
1. Get all relevant data to problem
2. Explore data, and fit predictive models on past/real-time data
3. Apply and validate models until predictions are accurate
4. Forecast KPIs & metrics associated to use case
5. Escalate the incidents to respective units to investigate and act