The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” of Big Data
1. The Paradigm of Fog Computing
with Bio-inspired Search Methods
and the “5Vs” of Big Data
Presenters:
Richard Millham, Israel Edem
Agbehadji, and Samuel Ofori Frimpong
Durban Univeristy of Technology, South Afrca
2. Outline
• Introduction
• Growth of Big Data
• The 5Vs of Big Data
• Framework to Manage Big Data
• Data Streaming vs Datasets
• Edge/Fog Computing paradigm
• Challenges of Fog computing and Potential
Solutions
• Conclusion
Durban Univeristy of Technology, South Afrca
3. Introduction
• This presentation seeks to briefly present some of the issues of
big data:
• What characteristics constitute big data?
• What methods and phases are needed to process big data?
• Datasets vs data streaming? What is the difference?
• What is the role and domain of bio-inspired algorithms?
• The drivers for fog/edge computing architecture?
Durban Univeristy of Technology, South Afrca
4. Big data
• Like many concepts, there is no consensus of what constitutes big data
• Many will say Big data is a voluminous amount of varied data available at
high rate, but it possesses other characteristics as well (5 Vs)
• Big data yields neither meaning nor value, it is important to understand
the unique features of data which may inform the analysis
• Any framework of analysing big data must address big data characteristics
namely velocity, variety, veracity, volume and value
• Sources of big data are numerous but have evolved with our changing
society
• IOT and smart entities
• Enterprise systems
• Social media
Durban Univeristy of Technology, South Afrca
5. The growth of IOT, along with the subsequent growth of IOT data, is one of
the main contributors to the growth of Big Data and the need for methods to
manage it Durban Univeristy of Technology, South Afrca
6. Smart Cities and IOT Sensors/Data Analytics
Smart City IOT/Data Analytics
• Smart cities enables its citizens to enjoy a
wide range of new services:
• health sector to monitor quality of
service delivery
• Government gains better insights for
better social intervention programs to
citizens
• Companies to customers to understand
customers perception of products
• These services are enabled through the use of
IOT sensors to monitor the environment and
data analytics to make sense of the
monitored data collected
Durban Univeristy of Technology, South Afrca
7. The 5-Vs of Big Data
Durban Univeristy of Technology, South Afrca
8. Big Data Framework
• To manage big data, a framework consisting of a set of steps
and phases. Although some of these phases may overlap and
the steps may vary, this framework is as follows:
• Data Pre-Processing
• Data Cleansing
• Acquire data from a multitude of heterogeneous
devices: social media, IOT sensors, mobile phones,
enterprise system transactions, GPS devices, etc
• Estimate missing values, if needed
• Remove redundant values
• Reformat heterogeneous data into a more uniform
format(s)
Durban Univeristy of Technology, South Afrca
9. Big Data Framework (cont)
Data Scattered in 3-D space Data Cleansing (Data Reduction)
• One of the most important steps
in data cleansing is data
reduction (reducing the amount
of data to be processed by later
stages). This can be
accomplished by:
• Removing outliers (noise)
• Removing redundant data
• Removing non-interesting data
(with little value)
Durban Univeristy of Technology, South Afrca
10. Big Data Framework (cont)
• After data cleaning is complete,
the next step is data clustering
or the combining of similar items
together into groups for easier
processing of data in later stages
• Clustering methods include:
• K-Nearest Neighbour
• Density-Based scan discovers
different cluster shapes
Durban Univeristy of Technology, South Afrca
11. Big Data Framework (cont)
Feature Extraction and Classification
• The next step after data clustering
is feature extraction and
classification where important
features are extracted from the
data and classified (labeled). This
reduces the amount of resources
used to describe a group of data
• Many tools may be used including:
• Autoencoder (to learn unlabeled
data)
Durban Univeristy of Technology, South Afrca
12. Big Data
Framework (cont)
• Data Mining Phase
• This phase involves finding relationships
among groups of data identified during the
previous phase
• These relationships include correlations
(dependencies among variables) and
association rules (if-then rules) among others
• Methods include Apriori, PageRank etc.
• Many data mining tools exist, using a variety
of methods, including:
• Orange
• Weka
• Apache Mahout
• RapidMiner
• KNIME integrates various components
for machine learning and data mining.
Durban Univeristy of Technology, South Afrca
13. Big Data
Framework
(cont)
• Visualisation/Business Intelligence Phase
• In this phase, the data relationships and classes identified in previous stages may be visualized
in the form of pie graphs, charts, linear diagrams, etc and/or incorporated into business rules
within the organization.
• Some examples:
• Linear graph may show the increase/decrease in sales of particular products based on
particular features offered. Hence, businesses may be able to determine the most
popular features for each price range
• Business rules may find associations between different itemsets. An example, a store
might find a strong association between the sale of hamburgers and rolls.
Durban Univeristy of Technology, South Afrca
14. Datasets vs Data
Streams
• Datasets may consist of high volume, veracity,
value and variety but are often fixed in terms of
velocity. In other words, these datasets may
contain the 4 Vs of big data and are modelled on
high velocity data coming in during the formation
of the dataset. However, once this dataset is
formed, they are stable. Consequently, many
different methods and tools may be used to
analyse them
• Data streaming, on the other hand, contains the
same characteristics of datasets but also contain
continuous high velocity with often changing
varieties, values, and veracities of data. Analysis
of this data, due to these characteristics, is
problematic and requires huge resources in
computation (i.e. a supercomputer)
Durban Univeristy of Technology, South Afrca
15. Datasets vs Data
Streams (cont)
• As this solution is not usually practical,
different methods must be used to
manage data streams including:
• Fixed or random sampling of the
stream (ex: 1 in 50 frames) to get a
snapshot of current data
• Sliding windows to contain these
samples and to ensure that these
samples are current as the streams
may change
• Potentially different methods that are
used for data streams in order to
handle the high velocity and produce
satisfactory results
Durban Univeristy of Technology, South Afrca
16. Big Data Analytics
• Following diagram shows some of
the methods mentioned or to be
mentioned in presentation under
the term Big Data Analytics
• Batch (dataset) vs stream processing
• Machine learning and advanced
learning (feature extraction,
classification, and business rules)
• Data mining
• Stochastic (probability) models for
preprocessing of noise, feature
extraction, classification, etc
• Edge computing and cloud computing
Durban Univeristy of Technology, South Afrca
18. Bio-inspired Computation
• Bio-inspired computation models the natural behavior of animals
(optimized over a very long time period) to achieve some set goal
• Numerous bio-inspired algorithms exist (200+) each with their
advantages and disadvantages
• One basic premise of these algorithms is exploration vs exploitation
• exploration:- search different regions of the solution space to find a global
solution
• exploitation:- search in a small region of the present solution in order to
improve its quality with a small perturbation
• Bio-inspired algorithms have been used in many application domains
such as route optimization, recommender systems, renewable energy
Durban Univeristy of Technology, South Afrca
21. Why is Edge/Fog Computing Needed?
Cloud Computing
Problems with Cloud – Need for New
Paradigm
• As illustrated in diagram, big data
(huge amounts from many types of
devices flow at high speed to the
cloud) to be processed using data
framework in cloud
• Network soon becomes overloaded
as many early phases
(preprocessing and data reduction)
are only done in the cloud
[Bottleneck]
Durban Univeristy of Technology, South Afrca
22. Fog Computing Paradigm
• The focus is on devices connected to the
edge of networks.
• The term fog computing or edge
computing operates on the concept that
instead of hosting devices to work from a
centralized location that is cloud server,
fog systems operate on network ends
(Naha et al. 2018).
• Advantage of fog computing is that it
avoids delay in processing of raw data
collected from edge networks rather than
sending it directly to the cloud for
processing
Durban Univeristy of Technology, South Afrca
25. Fog computing applications
SMART CITY
MONITORING
ENERGY EFFICIENT
MODEL
FOG COMPUTING IN
HEALTH MONITORING
Durban Univeristy of Technology, South Afrca
26. Quality Challenge of Fog computing and 5V’s and
Solution
• There are many issues in fog computing with big data but a key challenge is the issue of data quality.
• Solution: Fog Computing and “5Vs” for Quality-of-Use (QoU) Framework.
• This framework has analytical model that consider speed, size and type of data from
IoT devices and then determine the quality and importance of data to store on cloud
platform.
• The framework has two components, namely IoT (data) and fog computing
• The IoT (data) components is the location of sensors, Internet-enabled devices which
capture large data, at a speed and different types of data
• The data generated are processed and analyzed by fog computing component to
produce quality data that is useful
Durban Univeristy of Technology, South Afrca
27. More
Challenges in
Fog
Computing
and IoT
• The challenges include:
• energy consumption
• data distribution
• heterogeneity of edge devices
• dynamicity of fog network etc.
• This leads to finding new methods to
address the challenges
• One promising method is the use of bio-inspired
algorithms (a subset of Evolutionary algorithms)
to manage different aspects of these problems
Durban Univeristy of Technology, South Afrca
28. Fog Computing and Evolutionary
Algorithms Models
• Evolutionary Algorithm for Energy Efficient Model.
• Bio-Inspired Algorithm for Scheduling of Service Requests
to Virtual Machine (VMs).
• Bio-Inspired Algorithms and Fog Computing for Intelligent
Computing in Logistic Data Center.
• Ensemble of Swarm Algorithm for Fire-and-Rescue
Operations.
• Evolutionary Computation and Epidemic Models for Data
Availability in Fog Computing.
• Bio-Inspired Optimization for Job Scheduling in Fog
Computing.
Durban Univeristy of Technology, South Afrca
29. Conclusion
• This presentation is a brief overview of big data along with many of its aspects
• Increasing technological and societal changes make big data much more predominant
• With increasing prevalence of big data comes a demand to manage this data (particularly
data streams) through new methods and new architectures (edge/fog computing)
• Promising methods have emerged in the field of bio-inspired algorithms which have been
applied to a variety of domains, including challenges with new architectures
Durban Univeristy of Technology, South Afrca
Notes de l'éditeur
K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).
Auto-encoder: is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning)
PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web.