This document discusses big data principles including what data is, why big data is important, how it differs from traditional data, and its key characteristics. Big data is characterized by volume, variety, and velocity. It comes from many sources and in many formats. Tools like Hadoop enable storage and analysis at scale. Applications include search, customer analytics, business optimization, health, and security. Benefits are better decisions and flexibility to store now and analyze later. The future of big data is predicted to be a $100 billion industry growing at 10% annually.
2. Today we will discuss:
• What is Data?
• Why Big Data?
• How it is Different?
• Characteristic of Big Data
• Application of Big Data
• Benefits of Big Data
• Future of Big Data
2
To be continued …
3. What is Data?
• Data can be any character, text, words,
number, pictures, sound, or video and, if not
put into context, means little or nothing to a
human.
• Information is useful and usually formatted
in a manner that allows it to be understood
by a human.
4. Big data V/s Small Data
Big Data
• The large picture
• Encompasses many
different types of data
• Unstructured data
• Unfocused
• Difficulty to interpret
Small Data
• The small picture.
• Mostly Homogenous
• Structured
• Focused
• Easily Interpreted
5. Why the hype around Big Data?
• An aim to solve new problems or old
problems in a better way
• Big Data generates value from the storage
and processing of very large quantities of
digital information
6. How is big data different?
• Automatically generated by a machine (e.g.
Sensor embedded in an engine)
• Typically an entirely new source of data (e.g.
Use of the internet)
• Not designed to be friendly (e.g. Text
streams)
• May not have much values
• Need to focus on the important part
8. How big is big data?
• Analysts predict that by 2020, there will be 5,200
gigabytes of data on every person in the world.
• On average, people send about 500 million tweets per
day.
• The average U.S. customer uses 1.8 gigabytes of data
per month on his or her cell phone plan.
• Walmart processes one million customer transactions
per hour.
• Amazon sells 600 items per second.
• On average, each person who uses email receives 88
emails per day and send 34. That adds up to more than
200 billion emails each day.
• MasterCard processes 74 billion transactions per year.
• Commercial airlines make about 5,800 flights per day.
9. Big data is not much howbig the
data is, it is about the value within
the data
12. Volume
• Today, Facebook ingests 500 terabytes of new
data every day.
• Boeing 737 will generate 240 terabytes of
flight data during a single flight across the US.
• The smart phones, the data they create and
consume; sensors embedded into everyday
objects will soon result in billions of new,
constantly-updated data feeds containing
environmental, location, and other
information, including video.
14. Velocity
• Clickstreams and ad impressions capture user
behavior at millions of events per second
• High-frequency stock trading algorithms
reflect market changes within microseconds
• Machine to machine processes exchange data
between billions of devices
• Infrastructure and sensors generate massive
log data in realtime
• On-line gaming systems support millions of
concurrent users, each producing multiple
inputs per second.
16. Variety
• Big Data analysis includes different types of
data
• Geospatial data, 3D data, audio and video,
and unstructured text, including log files and
social media.
• Traditional database systems were designed
to address smaller volumes of structured
data, fewer updates or a predictable,
consistent data structure.
17. Some common Types
• Activity Data
• Conversation Data
• Photo and Video Image
• Sensor Data
• IoT Data
• Scientific Data
• Geo-spatial Data
• Biological Data
18. Veracity – The 4th V
Refers to the massiveness or trust worthies of
the data
20. Storing Big Data
• Data models: key value, graph, document,
column-family
• Hadoop Distributed File System
• HBase
• Hive
Overview of Big Data stores
21. Storing Big Data
• Selecting data sources for analysis
• Eliminating redundant data
• Establishing the role of NoSQL
Analyzing your data characteristics
22. Data Analytics
• Examining large amount of data
• Identification of hidden patterns, unknown
correlations
• Better business decisions: strategic and
operational
• Effective marketing, customer satisfaction,
increased revenue
29. Risks of Big Data
• Will be so overwhelmed
• Need the right people and solve the right
problems
• Costs escalate too fast
• Isn’t necessary to capture 100%
• Data privacy
• Self-regulation
• Legal regulation
39. Benefits of Big Data
• Ability to make better decisions and take
meaningful actions at the right time.
• Technologies like Hadoop give you the scale
and flexibility to store data before you know
how you are going to process it.
40. Benefits of Big Data
• Organizations are using big data to target
customer-centric outcomes, tap into internal
data and build a better information
ecosystem.
• Technologies such as MapReduce, Hive and
Impala enable you to run queries without
changing the data structures underneath.
41. Future of Big Data
• $15 billion on software firms only specializing
in data management and analytics.
• This industry on its own is worth more than
$100 billion and growing at almost 10% a
year which is roughly twice as fast as the
software business as a whole. •
• In February 2012, the open source analyst
firm Wikibon released the first market
forecast for Big Data , listing $5.1B revenue in
2012 with growth to $53.4B in 2017
42. Thank you
You can download the presentation at
slideshare.com/enfarose