Handwritten Text Recognition for manuscripts and early printed texts
Big data vs datawarehousing
1. Big data
BIG DATA VS DATA WAREHOUSING
A LOOK AT THE VALUE AND DIFFERENCES OF DATA WAREHOUSING AND
BIG DATA
Tshegofatso Mogomotsi
2. The purpose of the presentation is to outline the value that Big data
and Data warehousing can contribute into a business respectively.
Differentiate the two concepts and their benefits.
Tshegofatso Mogomotsi
2016
3. Overview
What is Data warehousing, Big data, and Fast data
Big data tools
Use Case
Summary of differences
4. Defining Data warehousing, Big data and Fast data in
business
Data warehousing
Data warehouses are usually used to correspond broad business data from various data sources to provide greater
insight into the performance of a business. Data warehouses are different from regular databases in that databases
are optimized to maintain strict accuracy of data by rapidly updating real-time data. Unlike relational databases, data
warehouses are designed to give a long-range view of data over time and specialize in data gathering which
allows for further processed like data mining (Informatica, 2016)
Big data
Big data is defined by large or complex data sets that traditional data processing techniques and applications are
inadequate. Challenges include analysis, storage, transfer, visualization, querying, updating, and information privacy.
The term often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced
data analytics methods that extract value from data.
Fast data
Big data grows through a constant stream of incoming data. John Hugg, a software architect, proposes that instead
of simply storing that data to be analyzed later, perhaps we've reached the point where it can be analyzed as it's
ingested while still maintaining extremely high intake rates.
Big data is not only measured by volume of data, it is also measured by volume in terms of time-velocity. Velocity
represents working data, immediate status, or data with ongoing purpose. The best way to capture the value of
incoming data is to react to it the instant it arrives. If you are processing incoming data in batches, you've already lost
time and, thus, the value of the active data.
5. Defining Data warehousing, Big data and Fast data in
business
Deliver
business value
through the
analysis of data
William H. Inmon, described a data
warehouse as being a subject-oriented,
integrated, time-variant collection of data that
supports management's decision-making
process.
Big data is technology capable of carrying
large amounts of data stored in an
unstructured format. This data, when
captured, manipulated, and analyzed can
help a corporation to gain useful insight.
Fast data is the application of big data analytics
to smaller data sets in real-time in order to
solve a problem or create business value. The
goal of fast data is to quickly gather and mine
structured and unstructured data so that action
can be taken.
6. Big data tools
Big
Data
Data storage
Data
cleaning
Data mining
Data
analysis
Data
Visualisation
Below is a view of some the applications/tools used for Big data
management and processing
Data Storage and Management
Cloudera
MongoDB
Oracle Database(or the Oracle NoSQL Database)
Data cleaning tools
OpenRefine
DataCleaner
Data mining tools – predictive analysis
Rapid Miner
IBM SPSS Modeler
Oracle Data Miner GUI
Data analytics
Oracle R
BigML
Data visualization
Tableau
Silk
7. Uses: Case study
Company ABC is a large South African shoe manufacturing company that
also has retail stores across the African region. A manufacturer of various
shoe types for the whole family. ABC annual turnover for the 2015/16
financial was 16.6 million.
The company is looking to increase their profit margin by 10 percent in the
next 2017/18 financial year and to achieve this they recently invested in Big
data infrastructure.
8. Uses: Case study
Big data
ABC recently recognized that there is an increasing amount of data which is
not captured in their operational databases such as clickstream logs, social
feeds, customer support emails, location data from mobile devices and chat
transcripts. Big data systems harness these new sources of data, and allow
businesses to analyze and extract business value from these large data sets.
Example of how Big data systems can add value to ABC
Using Big data tools, the BI team identifies customers that are active on
specific marathon websites, search information related to
marathons/running, and engage with social feeds related to
marathons/running. Then uses the data to predict that these customers
may be running a marathon soon, then forward products and specials of
running shoes to these customers.
9. Uses: Case study
Data warehouse
ABC’s data warehouse contains data from its company financials systems, its customer
marketing systems, its billing systems, its point-of-sales systems, and so on.
Traditionally, data warehouses source data solely from other databases. The need for a
data warehouse often becomes evident when analytic requirements become challenging
for the ongoing performance of operational databases.
The data warehouse stores current and historical data and is used for creating analytical
reports for knowledge workers throughout the company. Examples of reports could
range from annual and quarterly comparisons and trends to detailed daily sales analysis.
The data warehouse provides the company with reliable, believable and accessible data
that everyone in the company can rely on.
Even with a Big data initiative incorporated into the ABC’s business, the data warehouse
- built upon a relational database, can continue to be the primary analytic database for
storing much of a company’s core transactional data: financial records, customer data,
point of-sale data and so forth.
10. Summary of differences
Big data Data warehousing
Big data solution is a technology- a means to store and manage large
amounts of data
Data warehousing is an architecture - a way of organizing data so that there
is corporate credibility and integrity.
The Big data scope of data is beyond data found in the corporation (Web,
sales, customer contact center, social media, mobile data).
An enterprise’s data warehouse contains data from its enterprise databases.
Big data applies an architecture that acquires data from multiple data
sources, organizes and stores that data in a suitable format for analysis.
Data warehouses do not excel at handling raw, unstructured, or complex
data.
Big data is measured by volume and velocity. A data warehouse is measured by volume.
If unlocked properly – data can contain much valuable information that can
lead to better decisions that, in turn, can lead to more revenue, more
profitability and increased market share.
Data warehouse provides a “single version of the truth” for decision making
in the corporation. With a data warehouse there is an integrated, granular,
historical single point of reference for data in the corporation.