Most of what companies know is typically held
in a data warehouse – a database that collects transactions and looks at customer transaction activity over time to understand who is buying what through which channel.
2. This book belongs to:
A LITTLE BEE BOOK
What is
Big Data?
from a blog post by Mike Ferguson
with additional material from Bob Yelland
For more copies of this book, or to read others in the series, visit: littlebeelibrary.com
BACK NEXT
3. 4
Everywhere you look today, people are looking
down at a mobile device. They are online: browsing,
collaborating, shopping for goods and services, and
transacting business.
And it’s not just the consumer. It is happening in
business-to-business as well.
In today’s on-line world, customers and prospects
have become all powerful.
Social networks, comparison websites, and review
websites have allowed them to quickly become well
informed before they buy, and with information on
their side, they will sacrifice loyalty in the blink of an
eye or the click of a mouse if quality and service are
not good.
BACK NEXT
4. 6
In such a fast-moving, online, mobile economy, where
company lifespans are shrinking, it is not surprising
that company CEOs are so customer focused.
This is about business survival where companies
need to:
• Retain and grow customers
• Optimise business operations
• Reduce risk
• Improve financial management
Business survival is becoming as much about
customer retention as it is about growth and if
companies are going to survive they need to know as
much as possible about their customers.
BACK NEXT
5. 8
So the question is what do you know about them?
Most of what companies know is typically held
in a data warehouse – a database that collects
transactions and looks at customer transaction
activity over time to understand who is buying what
through which channel.
So unless there is a transaction we know very little,
and with competition breathing down your neck the
next question is can you get more data, analyse it
and enrich what you already know about customers?
The answer, of course, is yes.
BACK NEXT
6. 10
We can get a lot of data from a wide range of new
sources. These include:
1. Click stream data
(all the clicks of every visitor on your website(s))
Analysing this type of data allows you to
understand site navigation behaviour, the paths
people take to buying products and services, what
else they looked at on the way to buying, paths
that led to abandonment, etc.
This helps improve customer experience and
conversion. It may also be possible to associate
clicks with customers and prospects.
2. Shopping cart data from your website
This lets you see what people are putting into and
taking out of shopping carts en route to your online
checkout.
BACK NEXT
7. 12
3. Social networks data
e.g. Twitter, Facebook, LinkedIn
Analysing this type of data allows you to get
additional information about customers that you
don’t yet have and identify sentiment, i.e. what
people are saying about your products, your
customer service, your brand, and their likes and
dislikes.
Analysis also allows you to identify who the
influencers are in the network and how people are
connected across multiple communities.
Targeting influencers with marketing campaigns
could significantly boost sales.
BACK NEXT
8. 14
4. Sensor data
This is data from smart products (e.g. GPS
sensors in phones) to give you information on
product usage or location.
Sensor data may also exist to help monitor
production lines, asset performance, medical
equipment, supply chains and distribution
chains, e.g. to see if customers are getting
deliveries on time.
The challenge with these four data types is that
they don’t all fit in the traditional tables of rows and
columns that we are used to in relational databases.
BACK NEXT
9. 16
A key reason for this is because the format varies
by data type. Text, for example, is unstructured data
but we still want to analyse it to extract details about
people, products, locations, monetary amounts,
dates, and times and to understand sentiment.
Also data volumes can be very large, and there is an
increasing requirement to capture and analyse high
velocity data in real time.
These three attributes of big data: volume, variety
and velocity, have necessitated a new technological
approach to both the storage and analysis of data.
Companies need a single approach to analyse
structured (e.g. transactions), semi-structured (e.g.
JSON, XML), and unstructured (e.g. text, image) data.
BACK NEXT
10. 18
Because big data is more complex, larger in volume
and arriving very quickly, new types of analysis have
emerged. These new analytical workloads include:
• Analysis of data in motion (streaming analytics)
• Analytics at the edge – for sensor data
• Complex analysis of structured data
• Machine learning to find patterns and
correlations in data
• Exploratory analysis of un-modelled
multi‑structured data such as Twitter data for
sentiment analytics
• Graph analysis, e.g. social networks, fraud
detection and real-time recommendation engines
BACK NEXT
11. 20
The result has been the emergence of new platforms
more suited to these new analytical workloads, that
extend the analytical environment beyond the data
warehouse into multiple types of data store.
NoSQL databases are data stores for unstructured
or semi-structured data.
Apache Hadoop is an open-source software
framework used for distributed storage and
processing of very large data sets.
Apache Spark is a fast, in-memory data
processing engine for streaming and machine
learning workloads.
These will be discussed further in other Little Bee
books.
BACK NEXT
12. 22
The IBM Watson Data Platform is a cloud-based
data and analytics platform designed to integrate
all types of data and enable artificial intelligence-
powered decision-making, such as natural language-
based discovery, machine learning and cognitive
analytics services.
The Watson Data Platform is based on Apache
Spark technology and encompasses relational
databases, document stores, graph and Hadoop
environments.
The goal: to make it simple for business leaders and
data professionals to collect, organise, govern and
secure all data, so they can get the insights needed
to become a cognitive business.
Why not sign up for a free trial?
BACK NEXT