2. Textbook Definitions
• “extremely large data sets that may be analysed computationally to reveal
patterns, trends, and associations, especially relating to human behaviour
and interactions.”
• “Big data is a broad term for data sets so large or complex that traditional
data processing applications are inadequate. Challenges include analysis,
capture, data curation, search, sharing, storage, transfer, visualization, and
information privacy. The term often refers simply to the use of predictive
analytics or other certain advanced methods to extract value from data,
and seldom to a particular size of data set.”
3. In Simple Terms
• Big data is literally just a lot of data. While it's more of a marketing term
than anything, the implication is usually that you have so much data that
you can't analyze all of the data at once because the amount of memory
(RAM) it would take to hold the data in memory to process and analyze it
is greater than the amount of available memory.
4. An Example of Big Data
• Let’s say Facebook wants to know which adverts work best for people with
degrees. Let's say there are 200,000,000 Facebook users with degrees, and
they have been each served 100 ads. That's 20,000,000,000 events of
interest, and each "event" (an ad being served) contains several data
points (features) about the ad: what was the ad for? Did it have a picture
in it? Was there a man or woman in the ad? How big was the ad? What
was the most prominent color? Let's say for each ad there are 50
"features". This means you have 1,000,000,000,000 (one trillion) pieces of
data to sort through. If each "piece" of data was only 100 bytes, you'd
have about 93 GB of data to parse. That's pretty big but we’re still only on
the brink of "big data" territory, it gets much much bigger!