SlideShare une entreprise Scribd logo
1  sur  42
Big Data
Principles of Database Design
Textbook Reference:
Oracle The Big Data Handbook
Today we will discuss:
• What is Data?
• Why Big Data?
• How it is Different?
• Characteristic of Big Data
• Application of Big Data
• Benefits of Big Data
• Future of Big Data
2
To be continued …
What is Data?
• Data can be any character, text, words,
number, pictures, sound, or video and, if not
put into context, means little or nothing to a
human.
• Information is useful and usually formatted
in a manner that allows it to be understood
by a human.
Big data V/s Small Data
Big Data
• The large picture
• Encompasses many
different types of data
• Unstructured data
• Unfocused
• Difficulty to interpret
Small Data
• The small picture.
• Mostly Homogenous
• Structured
• Focused
• Easily Interpreted
Why the hype around Big Data?
• An aim to solve new problems or old
problems in a better way
• Big Data generates value from the storage
and processing of very large quantities of
digital information
How is big data different?
• Automatically generated by a machine (e.g.
Sensor embedded in an engine)
• Typically an entirely new source of data (e.g.
Use of the internet)
• Not designed to be friendly (e.g. Text
streams)
• May not have much values
• Need to focus on the important part
Some real examples
How big is big data?
• Analysts predict that by 2020, there will be 5,200
gigabytes of data on every person in the world.
• On average, people send about 500 million tweets per
day.
• The average U.S. customer uses 1.8 gigabytes of data
per month on his or her cell phone plan.
• Walmart processes one million customer transactions
per hour.
• Amazon sells 600 items per second.
• On average, each person who uses email receives 88
emails per day and send 34. That adds up to more than
200 billion emails each day.
• MasterCard processes 74 billion transactions per year.
• Commercial airlines make about 5,800 flights per day.
Big data is not much howbig the
data is, it is about the value within
the data
Characteristics of Big Data
Volume Data Quantity
VarietyData Types
Velocity Data Speed
Volume
Refers to vast amount of data that is generated
every second
Volume
• Today, Facebook ingests 500 terabytes of new
data every day.
• Boeing 737 will generate 240 terabytes of
flight data during a single flight across the US.
• The smart phones, the data they create and
consume; sensors embedded into everyday
objects will soon result in billions of new,
constantly-updated data feeds containing
environmental, location, and other
information, including video.
Velocity
Refers to the speed at which new data is
generated
Velocity
• Clickstreams and ad impressions capture user
behavior at millions of events per second
• High-frequency stock trading algorithms
reflect market changes within microseconds
• Machine to machine processes exchange data
between billions of devices
• Infrastructure and sensors generate massive
log data in realtime
• On-line gaming systems support millions of
concurrent users, each producing multiple
inputs per second.
Variety
Different Types of Data
Variety
• Big Data analysis includes different types of
data
• Geospatial data, 3D data, audio and video,
and unstructured text, including log files and
social media.
• Traditional database systems were designed
to address smaller volumes of structured
data, fewer updates or a predictable,
consistent data structure.
Some common Types
• Activity Data
• Conversation Data
• Photo and Video Image
• Sensor Data
• IoT Data
• Scientific Data
• Geo-spatial Data
• Biological Data
Veracity – The 4th V
Refers to the massiveness or trust worthies of
the data
Big Data Sources
Sources
Users
Systems
Sensors
Application
Storing Big Data
• Data models: key value, graph, document,
column-family
• Hadoop Distributed File System
• HBase
• Hive
Overview of Big Data stores
Storing Big Data
• Selecting data sources for analysis
• Eliminating redundant data
• Establishing the role of NoSQL
Analyzing your data characteristics
Data Analytics
• Examining large amount of data
• Identification of hidden patterns, unknown
correlations
• Better business decisions: strategic and
operational
• Effective marketing, customer satisfaction,
increased revenue
TYPES OF TOOLS IN BIG DATA
Where processing is hosted?
• Distributed Servers / Cloud (e.g. Amazon EC2)
Where data is stored?
• Distributed Storage (e.g. Amazon S3)
What is the programming model?
• Distributed Processing (e.g. MapReduce)
How data is stored & indexed?
• High-performance schema-free databases (e.g. MongoDB)
What operations are performed on data?
• Analytic / Semantic Processing
Risks of Big Data
• Will be so overwhelmed
• Need the right people and solve the right
problems
• Costs escalate too fast
• Isn’t necessary to capture 100%
• Data privacy
• Self-regulation
• Legal regulation
Applications of Big Data
Search Quality
Better understand and
target customers
Understand and
Optimize Business
Improving Health
Improving Security
and Law Enforcement
Trading Analytics
Multichannel Sales
Basically…. Endless….
Benefits of Big Data
• Ability to make better decisions and take
meaningful actions at the right time.
• Technologies like Hadoop give you the scale
and flexibility to store data before you know
how you are going to process it.
Benefits of Big Data
• Organizations are using big data to target
customer-centric outcomes, tap into internal
data and build a better information
ecosystem.
• Technologies such as MapReduce, Hive and
Impala enable you to run queries without
changing the data structures underneath.
Future of Big Data
• $15 billion on software firms only specializing
in data management and analytics.
• This industry on its own is worth more than
$100 billion and growing at almost 10% a
year which is roughly twice as fast as the
software business as a whole. •
• In February 2012, the open source analyst
firm Wikibon released the first market
forecast for Big Data , listing $5.1B revenue in
2012 with growth to $53.4B in 2017
Thank you
You can download the presentation at
slideshare.com/enfarose

Contenu connexe

Tendances

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentDATAVERSITY
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 

Tendances (20)

Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on Investment
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Big data
Big dataBig data
Big data
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 

Similaire à Big data

Similaire à Big data (20)

Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
 
Big data
Big dataBig data
Big data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data Analytics - A Glimpse
Big Data Analytics - A GlimpseBig Data Analytics - A Glimpse
Big Data Analytics - A Glimpse
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Big data
Big dataBig data
Big data
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Big data

  • 1. Big Data Principles of Database Design Textbook Reference: Oracle The Big Data Handbook
  • 2. Today we will discuss: • What is Data? • Why Big Data? • How it is Different? • Characteristic of Big Data • Application of Big Data • Benefits of Big Data • Future of Big Data 2 To be continued …
  • 3. What is Data? • Data can be any character, text, words, number, pictures, sound, or video and, if not put into context, means little or nothing to a human. • Information is useful and usually formatted in a manner that allows it to be understood by a human.
  • 4. Big data V/s Small Data Big Data • The large picture • Encompasses many different types of data • Unstructured data • Unfocused • Difficulty to interpret Small Data • The small picture. • Mostly Homogenous • Structured • Focused • Easily Interpreted
  • 5. Why the hype around Big Data? • An aim to solve new problems or old problems in a better way • Big Data generates value from the storage and processing of very large quantities of digital information
  • 6. How is big data different? • Automatically generated by a machine (e.g. Sensor embedded in an engine) • Typically an entirely new source of data (e.g. Use of the internet) • Not designed to be friendly (e.g. Text streams) • May not have much values • Need to focus on the important part
  • 8. How big is big data? • Analysts predict that by 2020, there will be 5,200 gigabytes of data on every person in the world. • On average, people send about 500 million tweets per day. • The average U.S. customer uses 1.8 gigabytes of data per month on his or her cell phone plan. • Walmart processes one million customer transactions per hour. • Amazon sells 600 items per second. • On average, each person who uses email receives 88 emails per day and send 34. That adds up to more than 200 billion emails each day. • MasterCard processes 74 billion transactions per year. • Commercial airlines make about 5,800 flights per day.
  • 9. Big data is not much howbig the data is, it is about the value within the data
  • 10. Characteristics of Big Data Volume Data Quantity VarietyData Types Velocity Data Speed
  • 11. Volume Refers to vast amount of data that is generated every second
  • 12. Volume • Today, Facebook ingests 500 terabytes of new data every day. • Boeing 737 will generate 240 terabytes of flight data during a single flight across the US. • The smart phones, the data they create and consume; sensors embedded into everyday objects will soon result in billions of new, constantly-updated data feeds containing environmental, location, and other information, including video.
  • 13. Velocity Refers to the speed at which new data is generated
  • 14. Velocity • Clickstreams and ad impressions capture user behavior at millions of events per second • High-frequency stock trading algorithms reflect market changes within microseconds • Machine to machine processes exchange data between billions of devices • Infrastructure and sensors generate massive log data in realtime • On-line gaming systems support millions of concurrent users, each producing multiple inputs per second.
  • 16. Variety • Big Data analysis includes different types of data • Geospatial data, 3D data, audio and video, and unstructured text, including log files and social media. • Traditional database systems were designed to address smaller volumes of structured data, fewer updates or a predictable, consistent data structure.
  • 17. Some common Types • Activity Data • Conversation Data • Photo and Video Image • Sensor Data • IoT Data • Scientific Data • Geo-spatial Data • Biological Data
  • 18. Veracity – The 4th V Refers to the massiveness or trust worthies of the data
  • 20. Storing Big Data • Data models: key value, graph, document, column-family • Hadoop Distributed File System • HBase • Hive Overview of Big Data stores
  • 21. Storing Big Data • Selecting data sources for analysis • Eliminating redundant data • Establishing the role of NoSQL Analyzing your data characteristics
  • 22. Data Analytics • Examining large amount of data • Identification of hidden patterns, unknown correlations • Better business decisions: strategic and operational • Effective marketing, customer satisfaction, increased revenue
  • 23. TYPES OF TOOLS IN BIG DATA
  • 24. Where processing is hosted? • Distributed Servers / Cloud (e.g. Amazon EC2)
  • 25. Where data is stored? • Distributed Storage (e.g. Amazon S3)
  • 26. What is the programming model? • Distributed Processing (e.g. MapReduce)
  • 27. How data is stored & indexed? • High-performance schema-free databases (e.g. MongoDB)
  • 28. What operations are performed on data? • Analytic / Semantic Processing
  • 29. Risks of Big Data • Will be so overwhelmed • Need the right people and solve the right problems • Costs escalate too fast • Isn’t necessary to capture 100% • Data privacy • Self-regulation • Legal regulation
  • 39. Benefits of Big Data • Ability to make better decisions and take meaningful actions at the right time. • Technologies like Hadoop give you the scale and flexibility to store data before you know how you are going to process it.
  • 40. Benefits of Big Data • Organizations are using big data to target customer-centric outcomes, tap into internal data and build a better information ecosystem. • Technologies such as MapReduce, Hive and Impala enable you to run queries without changing the data structures underneath.
  • 41. Future of Big Data • $15 billion on software firms only specializing in data management and analytics. • This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole. • • In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017
  • 42. Thank you You can download the presentation at slideshare.com/enfarose