1. Academic Skills in English, Oct. 12.2015
Ins. Colette Gattoni (M.Ed.)
Visualizing Big Data
Dawit Nida
Abstract
Ever since the creation of computers, data have been generated on a continues manner, in several
forms and structures. Many organizations are utilizing Big Data, a huge amount of data, that have
not been handled by the traditional data management systems, to make their business decisions.
Big Data can be referred to unstructured, semi-structured or structured data generated by users.
Unstructured data, such as emails, texts, data from sensors, etc. structured data includes transactions.
Big Data can be characterized by the four Vs, Volume, Velocity, Variety and Veracity. Handling Big
Data requires expertise in the field to minimize the risks in security, data management and analytics.
Data scientists who make different kinds of analytics for organizations, use a set of techniques for
processing big amount data to create business decisions using business intelligence tools and methods.
Keywords
Big Data, Analytic, Business Intelligence
Contents
Introduction 1
1 Why Big Data is ‘BIG’? 1
1.1 The Four Vs of Big Data . . . . . . . . . . . . . 2
1.2 The Risks of Big Data . . . . . . . . . . . . . . . . 2
2 Big Data Analytics 3
2.1 Types of Analytics . . . . . . . . . . . . . . . . . . . 3
3 BI and Big Data 3
4 Conclusion 4
References 4
Introduction
Big Data is a huge amount of data, that have not
been handled by the traditional data manage-
ment systems. To handle these data, analytics
is required. Analytics is a knowledge discovery
and extracting valuable trends in data that can
be visualized for insights and patterns using a
set of techniques and tool required to collect,
store, visualize data into valuable information
and benefit from analysing and making efficient
business decisions.
1. Why Big Data is ‘BIG’?
Big data is a form of data, but huge amount of
data, from several sources in varies forms that
have not been handled by the traditional data
management systems. Big data can be referred
to unstructured, semi-structured or structured
data. Unstructured data from social media and
sensors includes emails, texts, data from various
sensors, video and audio data, and etc. struc-
tured data includes transaction data from cus-
tomers, etc.
For instance, based on the infographic post on
iDigitalTimes (July 2013) [1][2], in every 60 sec-
onds on the internet the following happens.
• 204 mil. emails sent out
• 1.8 mil. Facebook likes are generated by users
• 278,000 Tweets are twitted
• 200,000 photos are uploaded to Facebook
• Around 100 hours of video are uploaded to
YouTube
• 20 mil. photos are viewed on Flickr
• 120 new users register to LinkedIn
• 88,000 calls made on Skype
• €73,000 is spent on Amazon
2. Visualizing Big Data — 2/4
Figure 1. Two selected Big Data articles analysed using R-programming language
1.1 The Four Vs of Big Data
To define the features of Big Data, the four Vs
are described below[3].
Volume: refers to the amount of data generated
every millisecond. Huge number of transactional
data stored every second, machine and sensor
data, social media, and enterprise data is being
collected and stored. Even though, storage cost
is not a big issue, sorting out the data with
respect to their relevancy is today’s problem.
Velocity: refers to the speed of streaming data
that is required to be executed in real time. It is
not only how fast data is produced or changed
but also the speed it has to be received, under-
stood and processed.
Variety: in addition to the size and the speed,
data also varies with respect to structure and
types. For instance email, photos, video or audio
data are few examples of unstructured data that
can be used with traditional structured data that
fits into tables and relational databases.
Veracity: refers to the trustworthiness (uncer-
tainty of data) of the data answering where the
data came from and how accurate it is. The
amount of data from varies sources and the
amount of data directly affects the value and
quality of the data.
1.2 The Risks of Big Data
Due to the nature of Big Data and technology
advancement, less expertise of data scientists
that handle data and do visualization, Big Data
brings the following risks when considering using
it in organizations [4].
3. Visualizing Big Data — 3/4
Bad data and bad data analysis: collecting data
cannot support companies business to grow and
compete with similar businesses. Irrelevant and
outdated data can turn the company business
decision to undesired outcomes. While analysing,
misinterpreting data patters and trends in the
data is also risky.
Security: since Big Data contains data that can
be sensitive considering the logistics of data col-
lection and analysis can be insecure and may
expose data. Besides the bigger the data, the
bigger the risk will become to companies. Data
privacy can also be mentioned closely in related
to the issue of security.
Costs: collecting data, aggregating and process-
ing it, storing and then analysing it to generate
reports and visual graphs it requires budget plan-
ning and money, thus Big Data might be costly
compare to the efficiency of the data later.
2. Big Data Analytics
Today, several organizations are collecting and
storing petabytes and exabytes of data from cus-
tomers, sensors, transactions and various data
sources that requests high-performance analytics
and with new advances in computing technol-
ogy to process the data in order to figure out
what’s important and what isn’t. Analytics is
simply knowledge discovery and extracting valu-
able trends in data that can be visualized for
insights, unknown correlations and hidden pat-
terns. Using high-performance data and text
mining, diagnostic and predictive analytics, pre-
scriptive analytics for forecasting and optimiza-
tion to make the best possible business decisions
for an organization [5].
2.1 Types of Analytics
Data visualization refers to the approaches and
tools used to visually understand the insights
from data to prove or disprove a hypothesis.
There are four types of analytics organizations
use to make high-performance analytics to in-
crease competence in their domains.
1. Descriptive analytics is a set of techniques
for reviewing and examining the data set to
understand the data and analyse business per-
formance to provide insight about the past.
2. Diagnostic analytics is a set of techniques for
determine what has happened,why services
demand is low/high, customer segmentation,
any trend from the data.
3. Predictive a set of techniques that analyse
current and historical data to determine what
is most likely to (not) happen, who need more,
helps model and forecast.
4. Prescriptive analytics is a set of techniques for
computationally developing and analysing al-
ternatives in tactical or strategical model and
discover the unexpected, seeks to determine
the best solution or outcome among various
choices, given the known parameters.
3. BI and Big Data
Business Intelligence is a technique or process,
technology and tools required to collect, store,
analyse data into valuable information and ben-
efit from analysing and taking efficient business
advantages. BI provides users to easily consoli-
date, search, and visually analyse data and gain
unexpected business insights by understanding
how data is associated.
Usefulness of BI
• Optimize business processes
• Better decision making on every level in
the organization based on fact
• Create better customer experience
• Improve competitiveness and
• Increase revenues, etc.
Data discovery and visualization in BI can be
• Reports
• Scorecards
• Dashboards
• Ad hoc analysis
• Visualization
4. Visualizing Big Data — 4/4
4. Conclusion
To summarize, finding ways to use Big Data
and analytics has become a bigger concern from
small and mid-sized businesses to big corporates
that are looking for ways keep up with larger
competitors. Big Data enhance organizations in
their product quality, improve marketing opera-
tions and further customer relationships, better
business decisions, etc. BI and data visualiza-
tion are very essential for utilizing Big Data and
implementing using descriptive, diagnostic, pre-
dictive, prescriptive or combination of the four
types of analytics. Although Big Data has risks
and weakness, it is called renewable oil that can
be applicable in many organizations and fields.
References
[1] Big data, what it is and why it matters.
[2] Tomas Eklund. Dw lecture 4: Big data, ˚Aau.
4.5.2015.
[3] Big data analytics, advanced analytics in
oracle database. 2013.
[4] Bernard Marr. The 5 biggest risks of big
data.
[5] Marko Grobelnik. Big data tutorial.