SlideShare a Scribd company logo
1 of 21
Agenda
• What is Big data?
• Some BIG facts
• Objective
• Sources
• 3 V’s of Big data
• 3 + 1 V’s of Big data
• Technologies
• Opportunities
• Major Players
• Questions
• Conclusion
What is Big data?

Data

Big Data
What is Big data?

Data

Big Data
Some BIG facts
• 90% of the data in the world today has been created in the
last two years alone
• IDC Forecasting: The global universe of data will double
every two years, reaching 40,000 exabytes or 40 trillion GB
by 2020
• The Large Hadron Collider near Geneva, Switzerland, will
produce about 15 petabytes of data per year.
• Ancestry.com, the genealogy site, stores around 2.5
petabytes of data.
• The Internet Archive stores around 2 petabytes of data, and
is growing at a rate of 20 terabytes per month.
Some BIG facts – What happens everyday?
• The New York Stock Exchange generates about one
terabyte of new trade data
• Zynga processes 1 Petabyte of content
• 30 billion pieces of content were added to Facebook
• 2 billion videos are watched in Youtube
• 2.5 quintillion bytes of data is created
Some BIG facts – What happens every minute?

Courtesy: http://practicalanalytics.files.wordpress.com
Big data – Objective

Effectively store, manage and analyze all
the data to create meaningful information
out of it
Big data – Sources
Big data – 3 V’s of Big data

Courtesy: bigdatablog.emc.com
Big data – 3 + 1 V’s of Big data

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Volumes are in:
• Terabytes
• Exabytes
• Petabytes
• Zetabytes

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Name

Value

1 GB
1 Terabyte (TB)

1024 GB

1 Petabyte (PB)

1,048,576 GB

1 Exabyte (EB)

1,073,741,824 GB

1 Zeta byte (ZB)

1,099,511,627,776 GB

1 Yottabyte (YB)

Courtesy: http://www.datasciencecentral.com/

1,073,741,824 bytes

1,125,899,906,842,624 GB
Big data - Velocity

• Live Stream
• Real time
• Batch

Courtesy: http://www.datasciencecentral.com/
Big data - Variety

• Structured (Tables)
• Unstructured (Tweets, SMSes)
• Semi-structured (Logfiles, RFID)

Courtesy: http://www.datasciencecentral.com/
Big data - Veracity

• This kind of data is often
overlooked
• It is now considered as
important as 3 V’s of Big Data
• Effort to clean up data is rather
not given importance
• Poor data quality costs the U.S.
economy around $3.1 trillions a
year

Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
Big data Technologies
Technologies & Solution providers:
• Storage (MS SqlServer, Apache Hadoop, Mongo DB)
• Processing (MapReduce, Impala)
• Analytics (SAS, R, Business Intelligence)
• Integration (Flume, Sqoop)
Big data - Opportunities
•
•
•
•
•

Storage
Processing
Analytics
Integration
Solution
Big data – Major Players
Big data – Questions?
Big data – Thank you !!!

More Related Content

Viewers also liked (7)

Big data big rewards
Big data big rewardsBig data big rewards
Big data big rewards
 
Case 3.1 - Big data big rewards
Case 3.1 - Big data big rewardsCase 3.1 - Big data big rewards
Case 3.1 - Big data big rewards
 
Week 3 Case 1 : Big Data Big Reward
Week 3 Case 1 :  Big Data Big RewardWeek 3 Case 1 :  Big Data Big Reward
Week 3 Case 1 : Big Data Big Reward
 
Case study 8
Case study 8Case study 8
Case study 8
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Welcome to big data

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
DATAVERSITY
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 

Similar to Welcome to big data (20)

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business StorageBriefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
big data
big data big data
big data
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Big data
Big dataBig data
Big data
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big data
Big dataBig data
Big data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 

More from Saravanan Subburayal (6)

Devops as a service
Devops as a serviceDevops as a service
Devops as a service
 
Machine learning
Machine learningMachine learning
Machine learning
 
Azure series 2 creating a cloud service - web role
Azure series 2   creating a cloud service - web roleAzure series 2   creating a cloud service - web role
Azure series 2 creating a cloud service - web role
 
Fluent validation
Fluent validationFluent validation
Fluent validation
 
Asp.Net MVC3 - Basics
Asp.Net MVC3 - BasicsAsp.Net MVC3 - Basics
Asp.Net MVC3 - Basics
 
Cloud - Azure – an introduction
Cloud -  Azure – an introductionCloud -  Azure – an introduction
Cloud - Azure – an introduction
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Welcome to big data

  • 1.
  • 2. Agenda • What is Big data? • Some BIG facts • Objective • Sources • 3 V’s of Big data • 3 + 1 V’s of Big data • Technologies • Opportunities • Major Players • Questions • Conclusion
  • 3. What is Big data? Data Big Data
  • 4. What is Big data? Data Big Data
  • 5. Some BIG facts • 90% of the data in the world today has been created in the last two years alone • IDC Forecasting: The global universe of data will double every two years, reaching 40,000 exabytes or 40 trillion GB by 2020 • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
  • 6. Some BIG facts – What happens everyday? • The New York Stock Exchange generates about one terabyte of new trade data • Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook • 2 billion videos are watched in Youtube • 2.5 quintillion bytes of data is created
  • 7. Some BIG facts – What happens every minute? Courtesy: http://practicalanalytics.files.wordpress.com
  • 8. Big data – Objective Effectively store, manage and analyze all the data to create meaningful information out of it
  • 9. Big data – Sources
  • 10. Big data – 3 V’s of Big data Courtesy: bigdatablog.emc.com
  • 11. Big data – 3 + 1 V’s of Big data Courtesy: http://www.datasciencecentral.com/
  • 12. Big data - Volume Volumes are in: • Terabytes • Exabytes • Petabytes • Zetabytes Courtesy: http://www.datasciencecentral.com/
  • 13. Big data - Volume Name Value 1 GB 1 Terabyte (TB) 1024 GB 1 Petabyte (PB) 1,048,576 GB 1 Exabyte (EB) 1,073,741,824 GB 1 Zeta byte (ZB) 1,099,511,627,776 GB 1 Yottabyte (YB) Courtesy: http://www.datasciencecentral.com/ 1,073,741,824 bytes 1,125,899,906,842,624 GB
  • 14. Big data - Velocity • Live Stream • Real time • Batch Courtesy: http://www.datasciencecentral.com/
  • 15. Big data - Variety • Structured (Tables) • Unstructured (Tweets, SMSes) • Semi-structured (Logfiles, RFID) Courtesy: http://www.datasciencecentral.com/
  • 16. Big data - Veracity • This kind of data is often overlooked • It is now considered as important as 3 V’s of Big Data • Effort to clean up data is rather not given importance • Poor data quality costs the U.S. economy around $3.1 trillions a year Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
  • 17. Big data Technologies Technologies & Solution providers: • Storage (MS SqlServer, Apache Hadoop, Mongo DB) • Processing (MapReduce, Impala) • Analytics (SAS, R, Business Intelligence) • Integration (Flume, Sqoop)
  • 18. Big data - Opportunities • • • • • Storage Processing Analytics Integration Solution
  • 19. Big data – Major Players
  • 20. Big data – Questions?
  • 21. Big data – Thank you !!!

Editor's Notes

  1. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  2. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  3. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  4. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  5. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  6. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.