What is big data? A 3-pg summary of the key differences between "big data" and "small data."
Includes comparison of data jargon, high level technologies, staffing / people, and the nature of the data itself.
Perfect for data-savvy marketers & agencies, and beginner-to-intermediate data and analytics professionals.
1. Big Data vs. Small Data
Finally, a (somewhat) layman’s guide
to what the hell that means.
Dense “1-pgr”
2. Even my mom has heard the phrase “Big Data,” but what does it actually
mean?
The phrase has been over-used to the point of obscurity, and when asked,
most marketing professionals, clients, and even some tech & data pros
will cop to the fact that they have no idea how to start defining what
makes big data, well…big.
This is the quick, though dense, guide to the different facets that
differentiate big from small data.
3. What makes “Big Data”… well, big?
(simple version)
Small Data Big Data
Overview
A steady stream of lots of relatively consistent data that
the human brain can handle and work with
Gigantic waves of erratic data every millisecond that
humans can’t comprehend, let alone try to work with
manually
Nomenclature
• Traditional
• Relational
• SQL
• Flat
• Structured
• Small
• [New-ish, Contemporary]
• Non-Relational
• NoSQL
• [Not Flat: Erratic]
• Unstructured
• Big
Technology 30 year-old, standardized tech
<5 year old tech, with more layers, but often actually less
expensive
People
Data Engineers/ DBAs
Analysts
More Data Engineers/ DBAs
Data Scientists
Analysts
The
world
is
moving
toward
Big
Data
Tech
&
Methods
But
Big
Data
methods
exist
to
translate
Big
Data
BACK
into
small
data
(structured)
so
humans
can
make
it
useful
4. Small Data Big Data
Data Volume A lot, but a human brain can handle it
More than a human can comprehend, let alone
understand
Data Velocity Steady Stream throughout the day
Gigantic waves of volume every millisecond
(think keeping track of all FB “likes” all the time,
across the internet)
Data Uncertainty
(Veracity)
Know where changes might happen
Almost no idea how data might come in, or in what
format
Variety Expected set of file types Unknown file types
Structure Relational Non-Relational
Hardware One server More than one server
Database SQL NoSQL
Processing &
Querying
SequelPro Spark, Hive, or Pig (on TOP of Hadoop)
Query Language Sequel Python, Java, R, Sequel
Analysis Areas Data Marts (Analytics)
Clusters (Data Scientist)
Data Marts (Analytics)
Optimization
Manual, Human-powered
(almost always)
Machine-Learning
People
Data Engineers/ DBAs
Analysts
More Data Engineers/ DBAs
Data Scientists
Analysts
Data Nomenclature Database, Data Warehouse, Data Mart Data Lake
Perhaps Most Importantly:
“Big Data” technology exists
in order to convert
unstructured data
into
structured data;
i.e. into a format
humans can understand
and work with.
What makes “Big Data”… well, big?
(full version)
5. Data
Source
Data
Source
Data
Source
Data
Source
Data
Warehouse
(Databases
&
Biz
Rules/ETL)
Data SourceData Source Data Source Data Source
Data SourceData Source Data Source Data Source
NoSQL Database
Environ.
Hadoop
(Hadoop “Clusters”)
Data Science Sandbox
“DataLake”
Data Source
Relational Data
Warehouse
Data
Mart
BI
Tools
Data
Mart
BI
Tools
DataEngineers&
DataScientists
AnalystsDataPartners
(people) Small Data Big Data
What makes “Big Data”… well, big?
(Infrastructure)