The document discusses big data, including what it is, its history, current considerations, and importance. It notes that big data refers to large volumes of structured and unstructured data that businesses deal with daily. While the term is relatively new, collecting and storing large amounts of information for analysis has existed for a long time. Big data is now defined by its volume, velocity, and variety. Businesses can gain insights from big data analysis to make better decisions and strategic moves.
2. Big Data
What it is and why it matters
Big data is a term that describes the large
volume of data – both structured and
unstructured – that inundates a business
on a day-to-day basis. But it’s not the
amount of data that’s important. It’s what
organizations do with the data that
matters. Big data can be analyzed for
insights that lead to better decisions and
strategic business moves.
Big Data History and Current
Considerations
While the term “big data” is relatively new,
the act of gathering and storing large
amounts of information for eventual
analysis is ages old. The concept gained
momentum in the early 2000s when
industry analyst Doug Laney articulated
3. the now-mainstream definition of big data
as the three Vs:
Volume. Organizations collect data from a
variety of sources, including business
transactions, social media and information
from sensor or machine-to-machine data.
In the past, storing it would’ve been a
problem – but new technologies (such as
Hadoop) have eased the burden.
Velocity. Data streams in at an
unprecedented speed and must be dealt
with in a timely manner. RFID tags,
sensors and smart metering are driving the
need to deal with torrents of data in near-
real time.
Variety. Data comes in all types of formats
– from structured, numeric data in
traditional databases to unstructured text
documents, email, video, audio, stock
ticker data and financial transactions.
4. At SAS, we consider two additional
dimensions when it comes to big data:
Variability. In addition to the increasing
velocities and varieties of data, data flows
can be highly inconsistent with periodic
peaks. Is something trending in social
media? Daily, seasonal and event-triggered
peak data loads can be challenging to
manage. Even more so with unstructured
data.
Complexity. Today's data comes from
multiple sources, which makes it difficult
to link, match, cleanse and transform data
across systems. However, it’s necessary to
connect and correlate relationships,
hierarchies and multiple data linkages or
your data can quickly spiral out of control.
Big data’s big potential
The amount of data that’s being created
and stored on a global level is almost
inconceivable, and it just keeps growing.
5. That means there’s even more potential to
glean key insights from business
information – yet only a small percentage
of data is actually analyzed. What does
that mean for businesses? How can they
make better use of the raw information
that flows into their organizations every
day?
Why Is Big Data Important?
The importance of big data doesn’t revolve
around how much data you have, but what
you do with it. You can take data from any
source and analyze it to find answers that
enable 1) cost reductions, 2) time
reductions, 3) new product development
and optimized offerings, and 4) smart
decision making. When you combine big
data with high-powered analytics, you can
accomplish business-related tasks such
as:
Determining root causes of failures, issues
and defects in near-real time.
6. Generating coupons at the point of sale
based on the customer’s buying habits.
Recalculating entire risk portfolios in
minutes.
Detecting fraudulent behavior before it
affects your organization.
Why is Big Data so Important in Today’s
World?
There is no doubt that the industries are
going ablaze with the huge eruption of
data. None of the sectors have remained
untouched of this drastic change in a
decade. Technology has crept inside each
business arena and hence, it has become
an essential part of every processing unit.
Talking about IT industry specifically,
software and automation are the bare
essential terms and are used in each and
every phase of a process cycle.
7. Businesses are focusing more on agility
and innovation rather than stability and
adopting the big data technologies help the
companies achieve that in no time. Big
data analytics has not only allowed the
firms to stay updated with the changing
dynamics but has also let them predict the
future trends giving a competitive edge.
What is driving the widespread adoption of
big data across the industries?
Let's find out the reasons behind all the
hype of big data-
Firms witnessing surprising growth
Needless to say that Big Data is taking the
world by storm through its countless
benefits. Big Data is allowing the leading
firms like IBM, Amazon, to develop some of
the cutting-edge technologies providing
high-end services to their customers.
"Orchestrating Big Data, Cloud and Mobility
strategies leads to 53% greater growth than
8. peers not adopting these technologies." -
Forbes.
A survey conducted by Dell a surprising
revealed that the companies which were
using Big Data, Cloud and Mobility are
way ahead, i.e., 53%, of the companies
which are late adopters or the non-
adopters.
Though Big Data is still in its nascent
stage, but it is giving 50% more revenue to
the companies who have integrated this
concept into their processes. This clearly
highlights that Big Data is acting as oil or
coal for the staggering performance of
today's businesses.
Getting a hold on the dark data
A renowned consulting firm Gartner Inc.
describes dark data as the “information
assets that organizations collect, process
and store in the course of their regular
business activity, but generally fail to use
for other purposes.”
9. However, the advent of Big Data systems
has allowed the companies to put the
untouched data into use and extract
meaningful insights from it. Much to
everyone’s amazement, the data which was
left unrecognized or considered useless in
the past has suddenly become a goldmine
for the companies. The companies can
accelerate their processes and thus reduce
their operating costs at the end, all thanks
to big data analytics.
Software is eating the world
We are currently in a data-driven economy
where no organization can survive without
analyzing the current and future trends.
Whether it is a manufacturing firm or a
retail chain, wrangling data has become a
crucial job to be done before taking a
single step further.
As customers hold the central point for any
organization, it becomes highly imperative
to address their needs on time. This could
be possible only if the business strategies
10. are backed by strong software to support
and accelerate the business operations.
This ultimately fuels the need for powerful
big data technologies that can benefit the
organizations in numerous ways possible.
Smarter and faster decision-making
In this era of fierce competition, everyone
wants to stand out from the crowd. But the
question is HOW??? How will the
companies be perceived as unique despite
having the same operations as others in
the industry? The answer lies in the
practices adopted by the firms. In order to
perform better than the competitors, the
ability to make good and intelligent
decisions play a pivotal role in every step.
The decisions should not only be the good
ones, but should be made smartly and as
quickly as possible to allow the companies
to remain proactive in their approach
instead of being reactive.
The practice of implementing Big
Data analytics into the process sheds light
11. on the unstructured data in such a way
that helps the managers analyze their
decisions in a systematic manner and take
the alternative approach as and when
needed.
Customer-centricity is the new policy
Now that customers have the opportunity
to shop anywhere and anytime, it has
turned into a challenge for the companies
to make every interaction better than the
previous one with the help of relevant
information. But how will the companies
do it on a continuous basis? The answer is
“Big Data”. The customer dynamics are
ever-changing and so should be the
strategies of marketers accordingly. The
companies can become more responsive by
incorporating the past as well as the real-
time data to assess the taste and
preferences of the customers.
For example, Amazon has grown from a
product-based company to a big market
player comprising of 152 million customers
12. by leveraging the abilities of powerful big
data engines. Amazon aims to delight
customers by tracking their buying trend
and providing marketers all the related
information they need instantly. Moreover,
Amazon successfully fulfills the needs of
its customers by monitoring 1.5 billion
products across the world in real-time.
Value generation through leveraging
data silos
The companies are getting bigger and
hence different processes generate varied
data. Many of the important information
sulk in the data silos remain inaccessible.
However companies have been able to dig
this mountain with the weapon called big
data analytics that has let the analysts and
engineers drill deep and come out with
fresh and informative insights.
After this discussion, one thing is for sure
that this is just the beginning of a highly
digitized and technology-driven era
13. revolving around the powerful real-time big
data analytics.
How Midsized Businesses Can Take
Advantage of Big Data
More and more midsized businesses are
taking a serious look at data visualization
because it makes it easier to identify
insights in the huge amounts of data that
can now be tapped. With tight budgets,
limited IT resources, and (for the most
part) no highly trained data analysts on
staff, many midsized companies aren’t sure
where to begin.
This white paper offers some practical tips
on how to get started with data
visualization and how best to succeed. It
also covers some specific business
functions where visualizing and analyzing
data can deliver results.
14. A Non-Geek’s Big Data Playbook
This paper examines how a non-geek yet
technically savvy business professional can
understand how to use Hadoop – and how
it will affect enterprise data environments
for years to come.
Big data and data mining
Data mining expert Jared Dean wrote the
book on data mining. He explains how to
maximize your analytics program using
high-performance computing and
advanced analytics.
Adding Hadoop to your Big Data Mix?
The Big Data Hadoop Architect Master's
Program is a structured learning path
recommended by leading industry experts
and designed to give you an in-depth
education on Big Data technologies such
as Hadoop, Spark, Scala,
MongoDB/Cassandra, Kafka, Impala, and
Storm.
15. The program begins with Big Data Hadoop
and Spark developer course to provide a
solid foundation in the Big Data Hadoop
framework, then moves on to Apache
Spark and Scala to give you an in-depth
understanding of real time processing.
Finally, you will get an introduction to the
concepts of NoSQL database technology,
where you will choose between Apache
Cassandra and Mongo DB. We have
included Apache Storm, Kafka, and
Impala as electives to help you gain a
further edge.
You will get access to 100+ live instructor-
led online classrooms, 90+ hours of self-
paced video content, 7+ industry-based
projects to be practiced on
CloudLabs/virtual machines, 10+
simulation exams, a community
moderated by experts, and other
resources that ensure you follow the
optimal path to your dream role as Big
Data Hadoop architect.
16. Who uses big data?
Big data affects organizations across
practically every industry. See how
each industry can benefit from this
onslaught of information.
Banking
With large amounts of information
streaming in from countless sources,
banks are faced with finding new and
innovative ways to manage big data. While
it’s important to understand customers
and boost their satisfaction, it’s equally
important to minimize risk and fraud while
maintaining regulatory compliance. Big
data brings big insights, but it also
requires financial institutions to stay one
step ahead of the game with advanced
analytics.
17. Education
Educators armed with data-driven insight
can make a significant impact on school
systems, students and curriculums. By
analyzing big data, they can identify at-risk
students, make sure students are making
adequate progress, and can implement a
better system for evaluation and support of
teachers and principals.
Government
When government agencies are able to
harness and apply analytics to their big
data, they gain significant ground when it
comes to managing utilities, running
agencies, dealing with traffic congestion or
preventing crime. But while there are many
advantages to big data, governments must
also address issues of transparency and
privacy.
18. Health Care
Patient records. Treatment plans.
Prescription information. When it comes to
health care, everything needs to be done
quickly, accurately – and, in some cases,
with enough transparency to satisfy
stringent industry regulations. When big
data is managed effectively, health care
providers can uncover hidden insights that
improve patient care.
Manufacturing
Armed with insight that big data can
provide, manufacturers can boost quality
and output while minimizing waste –
processes that are key in today’s highly
competitive market. More and more
manufacturers are working in an
analytics-based culture, which means they
can solve problems faster and make more
agile business decisions.
19. Retail
Customer relationship building is critical
to the retail industry – and the best way to
manage that is to manage big data.
Retailers need to know the best way to
market to customers, the most effective
way to handle transactions, and the most
strategic way to bring back lapsed
business. Big data remains at the heart of
all those things.
Big data in action: UPS
As a company with many pieces and parts
constantly in motion, UPS stores a large
amount of data – much of which comes
from sensors in its vehicles. That data not
only monitors daily performance, but also
triggered a major redesign of UPS drivers'
route structures. The initiative was called
ORION (On-Road Integration Optimization
and Navigation), and was arguably the
world's largest operations research project.
20. It relied heavily on online map data to
reconfigure a driver's pickups and drop-
offs in real time.
The project led to savings of more than 8.4
million gallons of fuel by cutting 85 million
miles off of daily routes. UPS estimates
that saving only one daily mile per driver
saves the company $30 million, so the
overall dollar savings are substantial.
It’s important to remember that the
primary value from big data comes not
from the data in its raw form, but from the
processing and analysis of it and the
insights, products, and services that
emerge from analysis. The sweeping
changes in big data technologies and
management approaches need to be
accompanied by similarly dramatic shifts
in how data supports decisions and
product/service innovation.
data exploration
21. Data exploration is the first step in data
analysis and typically involves
summarizing the main characteristics of a
dataset. It is commonly conducted using
visual analytics tools, but can also be done
in more advanced statistical software, such
as R.
Before a formal data analysis can be
conducted, the analyst must know how
many cases are in the dataset, what
variables are included, how many missing
observations there are and what general
hypotheses the data is likely to support.
An initial exploration of the dataset helps
answer these questions by familiarizing
analysts about the data with which they
are working.
Analysts commonly use data visualization
software for data exploration because it
allows users to quickly and simply view
most of the relevant features of their
dataset. From this step, users can identify
variables that are likely to have interesting
22. observations. By displaying data
graphically -- for example, through scatter
plots or bar charts -- users can see if two
or more variables correlate and determine
if they are good candidates for further in-
depth analysis.
Data mining project: Exploring data
Part of any data mining project is learning
about and understanding the nature of
your data. By leveraging controls from
Office Web Components (OWC), the DSV
Designer provides the functionality to
explore your data in four different views.
By right-clicking a DSV table and selecting
Explore Data, you can view your data as a
table, pivot table, simple charts, and a
pivot chart. By default, the Explore Data
component will sample 5,000 points of
your data. The option buttons in the upper
left of the Explore Data window allow you
to change this setting to a maximum of
20,000 points, due to a limitation of the
OWC controls.
23. The tabular views allow you to do a simple
exploration of your data. Clever use of the
pivot table will allow you to get a better
understanding of the data by arranging,
slicing, and aggregating your data in
different ways. For example, by exploring a
pivot chart on the Customers table, you
can find the average Age and its standard
deviation by using the Bedrooms column
we created previously. (See Figure 3.8.)
This is possible because we are exploring
the DSV table and not the actual source
table as it is in the data. We can explore
Named Queries in the DSV in precisely the
same manner.
The graphical exploration offers a page of
simple column, pie, and bar charts plus a
pivot chart view. Using the simple charts
you can see histograms and pies of various
attributes side by side. If your data is
continuous, the chart divides the
continuous range into 10 buckets. The
pivot chart, on the contrary, provides a
wealth of graphing controls to analyze your
24. data, from your standard line, bar, scatter,
column, and pie charts, to more exotic
types such as doughnut and radar charts,
as shown in Figure 3.9.
The pivot table and chart have many
configuration options to help you analyze
your data in different ways. Many of these
are available through the context-sensitive
Command and Options dialog box, from
the Context menu, or from embedded
toolbars. Virtually every aspect of the
tables and charts can be modified, either
by graphically selecting the object or by
using the selection box on the General tab
of the dialog. Describing the full feature set
of the OWC could easily fill another book
and mastering the OWC controls for best
value will take some practice, but with
experience you will be able to manipulate
the controls to find exactly the right view
for you. Additionally, the pivot table and
chart are linked, so you can switch back
and forth, make edits, and see how the
change affected the other view.
25. One additional feature of the pivot chart
that is important for data exploration is
graphical named query generation. By
clicking the Named Query button on the
toolbar, you can use elements of the chart
to define a named query. For instance you
could select only those homeowners with
one bedroom and renters with four or more
on the chart and add them to the query.
This named query becomes like any other
and can be used as a source for exploring
data.
Note: Although the Explore Data window
looks like other document windows, it
is, in fact, a tool window like the
Solution Explorer and Properties
windows. By right-clicking the Window
tab you can change the Explore Data
window into a floating or dockable
window. You can also open up many
Explore Data windows on different DSV
tables to display charts and tables side
by side.
26. Click here to return to the complete list
of book excerpts from Chapter 3, 'Using
SQL Server 2005 data mining,' from the
book Data Mining with SQL Server 2005.
How It Works
Before discovering how big data can
work for your business, you should first
understand where it comes from. The
sources for big data generally fall into
one of three categories:
Streaming data
This category includes data that reaches
your IT systems from a web of connected
devices. You can analyze this data as it
arrives and make decisions on what data
to keep, what not to keep and what
requires further analysis.
Social media data
The data on social interactions is an
increasingly attractive set of information,
27. particularly for marketing, sales and
support functions. It's often in
unstructured or semistructured forms, so
it poses a unique challenge when it comes
to consumption and analysis.
Publicly available sources
Massive amounts of data are available
through open data sources like the US
government’s data.gov, the CIA World
Factbook or the European Union Open
Data Portal.
After identifying all the potential
sources for data, consider the decisions
you’ll need to make once you begin
harnessing information. These include:
How to store and manage it
Whereas storage would have been a
problem several years ago, there are now
low-cost options for storing data if that’s
the best strategy for your business.
28. How much of it to analyze
Some organizations don't exclude any data
from their analyses, which is possible with
today’s high-performance technologies
such as grid computing or in-memory
analytics. Another approach is to
determine upfront which data is relevant
before analyzing it.
How to use any insights you uncover
The more knowledge you have, the more
confident you’ll be in making business
decisions. It’s smart to have a strategy in
place once you have an abundance of
information at hand.
The final step in making big data work
for your business is to research the
technologies that help you make the
most of big data and big data analytics.
Consider:
Cheap, abundant storage.
29. Faster processors.
Affordable open source, distributed big
data platforms, such as Hadoop.
Parallel processing, clustering, MPP,
virtualization, large grid environments,
high connectivity and high throughputs.
Cloud computing and other flexible
resource allocation arrangements.