SlideShare a Scribd company logo
1 of 17
Download to read offline
Copyright ©2012 Big Logic Technologies
A Big Data - Technology, Consulting & Training Firm
-- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it
provides a Big Data Analytics Platform.

-- At Big Logic, we share our experiences after guiding many enterprises through successful Big
Data projects. We empower you to decide on build versus buy when it comes to achieving your
defined business objectives across various technical environments.

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Big data is a term applied to data sets whose size is beyond the ability of commonly used
software tools to capture, manage, and process the data within a tolerable elapsed time.

Gartner Predicts
800% data
growth over next
5 years

4
Copyright ©2012 Big Logic Technologies

80-90% of data
produced today
is unstructured
Copyright ©2012 Big Logic Technologies
6
Copyright ©2012 Big Logic Technologies
gigabyte (GB)

109

1024MB

terabyte (TB)

1012

1024GB

petabyte (PB)

1015

1024TB

exabyte (EB)

1018

1024PB

zettabyte (ZB)

1021

1024EB

yottabyte (YB)

1024

1024YB

2020
35 zettabytes
i.e. 35Billion TBs

44x as much
Data and Content
Over Coming Decade

2009
800,000 petabytes

Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010

1 zettabyte = 1 099 511 627 776 GB
7

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Source:
http://www.slideshare.net/cultureofperform
ance/gartner-idc-and-mckinsey-on-big-data
Copyright ©2012 Big Logic Technologies
“ Moore's law is the observation that, over the history of computing hardware, the
number of transistors on integrated circuits doubles approximately every two years. ”
..Intel co-founder Gordon E. Moore

Copyright ©2012 Big Logic Technologies
RAM Max Capacity : 32GB

HDD Max Size : 6TB

-------------------CPU Max Speed-------------------

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies
If I Need to process 100TB datasets
• On 1 node:
– scanning @ 50MB/s = 23 days
• On 1000 node cluster:
– scanning @ 50MB/s = 33 min
 Challenge: Hardware Problems / Process and combine data from
Multiple disks

Copyright ©2012 Big Logic Technologies
•Apache Hadoop is an open source framework for storing, processing
and analysing massive amounts of multi-structured data in a
distributed environment.
•Hadoop was inspired by Google's MapReduce and Google File
System (GFS) papers.
Copyright ©2012 Big Logic Technologies
If you are in any of the above segments you would be the part of the above revenue

Copyright ©2012 Big Logic Technologies
Copyright ©2012 Big Logic Technologies

More Related Content

What's hot

متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
جشنوارهٔ روز آزادی نرم‌افزار تهران
 

What's hot (20)

What is big data?
What is big data?What is big data?
What is big data?
 
Big data
Big dataBig data
Big data
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big data
Big dataBig data
Big data
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
e-Infrastructure @ Science
e-Infrastructure @ Sciencee-Infrastructure @ Science
e-Infrastructure @ Science
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
متن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌دادهمتن‌بازسازی کلان‌داده
متن‌بازسازی کلان‌داده
 
Big data analysis using map/reduce
Big data analysis using map/reduceBig data analysis using map/reduce
Big data analysis using map/reduce
 
A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 
Our big data
Our big dataOur big data
Our big data
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
 
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
VFB 2013 - HP Labs - Horizon Scanning - Technology TrendsVFB 2013 - HP Labs - Horizon Scanning - Technology Trends
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 

Viewers also liked

Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
anilnursery
 
Ensenyament de pentinats
Ensenyament de pentinatsEnsenyament de pentinats
Ensenyament de pentinats
laia16
 

Viewers also liked (9)

Food From The Heart
Food From The HeartFood From The Heart
Food From The Heart
 
insilico: Neuronal Network Simulation C++ Library
insilico: Neuronal Network Simulation C++ Libraryinsilico: Neuronal Network Simulation C++ Library
insilico: Neuronal Network Simulation C++ Library
 
Project management for waste water treatment project
Project management for waste water treatment projectProject management for waste water treatment project
Project management for waste water treatment project
 
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
Landscaping and Horticulture Service Providers in Noida, Greater Noida and De...
 
Ensenyament de pentinats
Ensenyament de pentinatsEnsenyament de pentinats
Ensenyament de pentinats
 
Food From the Heart
Food From the HeartFood From the Heart
Food From the Heart
 
Socialconstructivism
SocialconstructivismSocialconstructivism
Socialconstructivism
 
Git Tutorial
Git TutorialGit Tutorial
Git Tutorial
 
Sejarah wawasan 2020
Sejarah wawasan 2020Sejarah wawasan 2020
Sejarah wawasan 2020
 

Similar to Introduction to Big Data by Manouj Bongirr

The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Romeo Kienzler
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Fang Mac
 

Similar to Introduction to Big Data by Manouj Bongirr (20)

Murli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina NetworksMurli Thirumale, CEO Ocarina Networks
Murli Thirumale, CEO Ocarina Networks
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with HadoopCafé da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
Café da manhã - São Paulo - Use-cases and opportunities in BigData with Hadoop
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
MT30 Best practices for data lake adoption
MT30 Best practices for data lake adoptionMT30 Best practices for data lake adoption
MT30 Best practices for data lake adoption
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Introduction to Big Data by Manouj Bongirr

  • 1. Copyright ©2012 Big Logic Technologies
  • 2. A Big Data - Technology, Consulting & Training Firm -- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it provides a Big Data Analytics Platform. -- At Big Logic, we share our experiences after guiding many enterprises through successful Big Data projects. We empower you to decide on build versus buy when it comes to achieving your defined business objectives across various technical environments. Copyright ©2012 Big Logic Technologies
  • 3. Copyright ©2012 Big Logic Technologies
  • 4. Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Gartner Predicts 800% data growth over next 5 years 4 Copyright ©2012 Big Logic Technologies 80-90% of data produced today is unstructured
  • 5. Copyright ©2012 Big Logic Technologies
  • 6. 6 Copyright ©2012 Big Logic Technologies
  • 7. gigabyte (GB) 109 1024MB terabyte (TB) 1012 1024GB petabyte (PB) 1015 1024TB exabyte (EB) 1018 1024PB zettabyte (ZB) 1021 1024EB yottabyte (YB) 1024 1024YB 2020 35 zettabytes i.e. 35Billion TBs 44x as much Data and Content Over Coming Decade 2009 800,000 petabytes Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010 1 zettabyte = 1 099 511 627 776 GB 7 Copyright ©2012 Big Logic Technologies
  • 8. Copyright ©2012 Big Logic Technologies
  • 10. “ Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. ” ..Intel co-founder Gordon E. Moore Copyright ©2012 Big Logic Technologies
  • 11. RAM Max Capacity : 32GB HDD Max Size : 6TB -------------------CPU Max Speed------------------- Copyright ©2012 Big Logic Technologies
  • 12. Copyright ©2012 Big Logic Technologies
  • 13. Copyright ©2012 Big Logic Technologies
  • 14. If I Need to process 100TB datasets • On 1 node: – scanning @ 50MB/s = 23 days • On 1000 node cluster: – scanning @ 50MB/s = 33 min  Challenge: Hardware Problems / Process and combine data from Multiple disks Copyright ©2012 Big Logic Technologies
  • 15. •Apache Hadoop is an open source framework for storing, processing and analysing massive amounts of multi-structured data in a distributed environment. •Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. Copyright ©2012 Big Logic Technologies
  • 16. If you are in any of the above segments you would be the part of the above revenue Copyright ©2012 Big Logic Technologies
  • 17. Copyright ©2012 Big Logic Technologies