SlideShare a Scribd company logo
1 of 28
What is Better Alert?Big Data in Action: Operations, Analytics and more
Agenda
• Meet & Greet Introduction.
• Unfolding the term “Big Data”.
– Evolution of Data to Big Data : Static to Stream.
– 3 V’s of Big Data.
• Overview of Implementing Big Data
– Examples of implementation of Big Data
– Implementing Big data with Hadoop infrastructure
– Implementing Big data with NoSql like Cassandra & MongoDB.
• Advantages of implementing Big Data solutions.
• Open Forum Discussion/ Networking.
Vibhu Bhutani
Technical Project Manager
Started as a Java developer, and I have many years of experience in developing and managing
state of the art applications. With extensive experience in the phases of the SDLC model, I
leads the team of innovations & mobile excellence in softweb soloutions. Am involved in
various innovative implementations which include the implementation of Big Data systems,
IOT implementations and iBeacon developments at Softweb Solutions.
in/vibhuis
Welcome
Unfolding the Term Big Data
• IBM reported in a study that every day we create roughly 2.5 quintillion data from various
data sources like Climate Sensors, GPS Signals, Social Media, Online transactions. Out of
which 90% was created in the last couple of years. Big Data is a buzz word of a technology
that shows a potential to process, huge amount of data so that we get some valuable
information out of it.
• How old is Big Data?
– Its as old as data however the parameters changes every year. In 2012 it was about couple of
Petabytes and now its about few Exabyte's.
• Why do we now here about Big Data?
– Although big data is old, but now a days more industries are knowing about the implications of
big data. In 2004 Google introduced a paper explaining Map Reduce technique to analyze large
datasets. After that many other companies joined together and the buzz word Big Data came
into existence.
• Static data VS Dynamic Data
Evolution of Data
In 76 KB of
Hardwired
Memory, Nasa
successfully took
Men to moon and
brought them back.
With an 8 Gigs
iPhone it can be
done 108 times.
Strange Fact
Evolution of Data
Necessity is the Mother of Invention, and I believe Technology his father
3 V’s of Big Data
3 V’s of Big Data
4th V of Big Data
Application of Big Data - Cern
• In 1960 Cern used to store data in a main frame
computer.
• In 1970 cern used to distribute data in several
machines dividing mainframe computer into a
smaller piece of equipments and cern net was
introduced to bridge these machine and travel
was reduced.
• In 1980 these machines were placed in different
countries of US and Europe and internet was
introduced to connect these machines.
• Due to enormous increase of data in 2000 a cern
grid was introduced connecting different smaller
computers together to analyze and process the
data.
• Detector with 150 million sensors are used in LHC
where protons collides at a light speed works as a
3D camera where pictures are by a rate of 40
million times per second. The data is now stored
in cloud and analyzed using big data techniques.
Implementation of Big Data - Cern
Proton injection for collision Collision of particles recording data
in sensors
Other Industries using Big Data
• Government Application:
– US government invested a lot in the big data applications. Big data analysis played a large role
in Barack Obama's successful 2012 re-election campaign.
– The Utah Data Center is a data center currently being constructed by the United States National
Security Agency. The exact amount of storage space is unknown, but more recent sources claim it
will be on the order of a few exabytes.
– Big data analysis was, in parts, responsible for the BJP and its allies to win a highly successful Indian
General Election 2014.
– UK government is utilizing big data to improv weather forecasting & new drug release forecasts.
• Manufacturing Industries:
• Vast amount of sensory data such as acoustics, vibration, pressure, current, voltage and controller
data in addition to historical data construct the big data in manufacturing. The generated big data
acts as the input into predictive tools and preventive strategies.
• Technology Industries:
• Ebay and Amazon are industry leaders for maintaining large amount of user searches and predictive
analysis. This helps to identify user needs and provide them with better results.
• Retail Industries:
• Walmart contains about 2.5 peta bytes of data handling 1 million customer transaction every hour.
• Amazon does a transaction of USD 80,000 in an hour. Amazon has worlds three largest databases.
Big Data Solutions - Hadoop
• Hadoop is an open-source system to reliably
store and process lot of information.
• Solution of Big Data that handles complexity
involved in volume, variety and velocity of data.
• It transform the commodity hardware to services
to handle peta bytes of data into distributed
environments: Pigeon Computing.
• Hadoop is redundant , reliable, powerful, batch
process centric, distributed.
High Level Architecture of Hadoop
Map Reduce program – Word Count
Hadoop Implementation in Real World
• Yahoo:
– In 2008 Yahoo claimed world’s largest hadoop prodcution application. Yahoo Search Webmap is a hadoop
application that runs on Linux with more that 10,000 cores.
• Facebook:
– In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. On
June 13, 2012 they announced the data had grown to 100 PB] On November 8, 2012 they announced the
data gathered in the warehouse grows by roughly half a PB per day
• As of 2013, Hadoop adoption is widespread. For example, more
than half of the Fortune 50 use Hadoop.
• The New York Times used 100 Amazon EC2 instances and a
Hadoop application to process 4 TB of raw image TIFF data (stored
in S3) into 11 million finished PDFs in the space of 24 hours at a
computation cost of about $240 (not including bandwidth)
Distributed System - CAP Theorem
Introduction to No SQL
• A NoSQL database provides a mechanism
for storage and retrieval of data that is modeled in means other
than the tabular relations used in relational databases
• Types of NoSQL Databases:
– Column: Cassandra, HBase
– Document: Apache CouchDB, MongoDB
– Key-value: CouchDB, Dynamo, Redis
– Graph: Neo4J
– Multi-model: OrientDB, Alchemy Database, CortexDB
High Level Architecture - Cassandra
• Ring based replication
• Only 1 type of server (cassandra)
• All nodes hold data and can answer queries
• No Single Point of Failure
• Build for HA & Scalability
• Multi-DC
• Data is found by key (CQL)
• Runs on JVM
High Level Architecture - Cassandra
High Level Architecture - Cassandra
Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition
High Level Architecture - Cassandra
Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions
High Level Architecture - Cassandra
• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"
• Cassandra is built for fast write. The data model should be deformalize to do few Reads as
possible
High Level Architecture – Mongo DB
• Open-source, Document-oriented, popular for its
agile and scalable approach
• Notable Features :
– JSON/BSON data model with dynamic schema
– Auto-sharding for horizontal scalability
– Built-in replication with automated fail-overs
– Full, flexible index support including secondary
indexes
– Rich document-based queries
– Aggregation framework and Map / Reduce
– GridFS for large file storage
High Level Architecture – Mongo DB
• Ensures High Availability, Redundancy, Automated
Fail-over
• Writes to the Primary, Reads from all
• Asynchronous replication
• In conventional terms, more like Master/Slave
replication
• Members can be configured to be: Secondary only
/ Non- Voting / Hidden / Arbiters / Delayed
When to use : Mongo DB
• Unstructured data from multiple suppliers
• GridFS : Stores large binary objects
• Spring Data Services
• Embedding and linking documents
• Easy replication set up for AWS
Advantages of using Big Data
Thank you for your patience.
Thank You!

More Related Content

What's hot

Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
Kaniska Mandal
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 

What's hot (20)

Service generated big data and big data-as-a-service
Service generated big data and big data-as-a-serviceService generated big data and big data-as-a-service
Service generated big data and big data-as-a-service
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Core concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data AnalyticsCore concepts and Key technologies - Big Data Analytics
Core concepts and Key technologies - Big Data Analytics
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
BigData
BigDataBigData
BigData
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 

Similar to Big Data in Action : Operations, Analytics and more

ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 

Similar to Big Data in Action : Operations, Analytics and more (20)

Big data
Big dataBig data
Big data
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Bigdata
BigdataBigdata
Bigdata
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 

More from Softweb Solutions

Noti-fi Android App at Softweb Hackthon 2014
Noti-fi Android App at Softweb Hackthon 2014Noti-fi Android App at Softweb Hackthon 2014
Noti-fi Android App at Softweb Hackthon 2014
Softweb Solutions
 

More from Softweb Solutions (20)

Sitecore 9 Pre-Migration Assessment
Sitecore 9 Pre-Migration AssessmentSitecore 9 Pre-Migration Assessment
Sitecore 9 Pre-Migration Assessment
 
Enterprise Sales App with Salesforce Integration - Softweb Solutions
Enterprise Sales App with Salesforce Integration - Softweb SolutionsEnterprise Sales App with Salesforce Integration - Softweb Solutions
Enterprise Sales App with Salesforce Integration - Softweb Solutions
 
How Salesforce FSL is redefining field service operations
How Salesforce FSL is redefining field service operationsHow Salesforce FSL is redefining field service operations
How Salesforce FSL is redefining field service operations
 
Salesforce integration with ERP
Salesforce integration with ERPSalesforce integration with ERP
Salesforce integration with ERP
 
A complete Salesforce implementation guide on how to implement Salesforce
A complete Salesforce implementation guide on how to implement SalesforceA complete Salesforce implementation guide on how to implement Salesforce
A complete Salesforce implementation guide on how to implement Salesforce
 
How cognitive services can be used in various industries
How cognitive services can be used in various industriesHow cognitive services can be used in various industries
How cognitive services can be used in various industries
 
5 jobs where bots will replace humans
5 jobs where bots will replace humans5 jobs where bots will replace humans
5 jobs where bots will replace humans
 
How Amazon Echo can be helpful for the healthcare industry
How Amazon Echo can be helpful for the healthcare industryHow Amazon Echo can be helpful for the healthcare industry
How Amazon Echo can be helpful for the healthcare industry
 
Top 8 questions to ask to an IoT platform provider
Top 8 questions to ask to an IoT platform providerTop 8 questions to ask to an IoT platform provider
Top 8 questions to ask to an IoT platform provider
 
Deep Dive into Service Design
Deep Dive into Service DesignDeep Dive into Service Design
Deep Dive into Service Design
 
Leverage IoT to Setup Smart Manufacturing Solutions
Leverage IoT to Setup Smart Manufacturing SolutionsLeverage IoT to Setup Smart Manufacturing Solutions
Leverage IoT to Setup Smart Manufacturing Solutions
 
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
Sensors, Wearables and Internet of Things - The Dawn of the Smart EraSensors, Wearables and Internet of Things - The Dawn of the Smart Era
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
 
Secure and scalable motioning solution with aws
Secure and scalable motioning solution with awsSecure and scalable motioning solution with aws
Secure and scalable motioning solution with aws
 
How enterprise can benefit from internet of things
How enterprise can benefit from internet of thingsHow enterprise can benefit from internet of things
How enterprise can benefit from internet of things
 
Enterprise Mobility Solutions for Manufacturing Industry
Enterprise Mobility Solutions for Manufacturing IndustryEnterprise Mobility Solutions for Manufacturing Industry
Enterprise Mobility Solutions for Manufacturing Industry
 
Noti-fi Android App at Softweb Hackthon 2014
Noti-fi Android App at Softweb Hackthon 2014Noti-fi Android App at Softweb Hackthon 2014
Noti-fi Android App at Softweb Hackthon 2014
 
Song Sharing with Nodejs - Softweb Hackathon 2014
Song Sharing with Nodejs - Softweb Hackathon 2014Song Sharing with Nodejs - Softweb Hackathon 2014
Song Sharing with Nodejs - Softweb Hackathon 2014
 
Tracking Application - Softweb Hackathon 2014
Tracking Application - Softweb Hackathon 2014Tracking Application - Softweb Hackathon 2014
Tracking Application - Softweb Hackathon 2014
 
Beacon applications - Softweb Hackathon 2014
Beacon applications - Softweb Hackathon 2014Beacon applications - Softweb Hackathon 2014
Beacon applications - Softweb Hackathon 2014
 
Softweb Hackathon iOffice - An iBeacon App
Softweb Hackathon iOffice - An iBeacon AppSoftweb Hackathon iOffice - An iBeacon App
Softweb Hackathon iOffice - An iBeacon App
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Big Data in Action : Operations, Analytics and more

  • 1. What is Better Alert?Big Data in Action: Operations, Analytics and more
  • 2. Agenda • Meet & Greet Introduction. • Unfolding the term “Big Data”. – Evolution of Data to Big Data : Static to Stream. – 3 V’s of Big Data. • Overview of Implementing Big Data – Examples of implementation of Big Data – Implementing Big data with Hadoop infrastructure – Implementing Big data with NoSql like Cassandra & MongoDB. • Advantages of implementing Big Data solutions. • Open Forum Discussion/ Networking.
  • 3. Vibhu Bhutani Technical Project Manager Started as a Java developer, and I have many years of experience in developing and managing state of the art applications. With extensive experience in the phases of the SDLC model, I leads the team of innovations & mobile excellence in softweb soloutions. Am involved in various innovative implementations which include the implementation of Big Data systems, IOT implementations and iBeacon developments at Softweb Solutions. in/vibhuis Welcome
  • 4. Unfolding the Term Big Data • IBM reported in a study that every day we create roughly 2.5 quintillion data from various data sources like Climate Sensors, GPS Signals, Social Media, Online transactions. Out of which 90% was created in the last couple of years. Big Data is a buzz word of a technology that shows a potential to process, huge amount of data so that we get some valuable information out of it. • How old is Big Data? – Its as old as data however the parameters changes every year. In 2012 it was about couple of Petabytes and now its about few Exabyte's. • Why do we now here about Big Data? – Although big data is old, but now a days more industries are knowing about the implications of big data. In 2004 Google introduced a paper explaining Map Reduce technique to analyze large datasets. After that many other companies joined together and the buzz word Big Data came into existence. • Static data VS Dynamic Data
  • 5. Evolution of Data In 76 KB of Hardwired Memory, Nasa successfully took Men to moon and brought them back. With an 8 Gigs iPhone it can be done 108 times. Strange Fact
  • 6. Evolution of Data Necessity is the Mother of Invention, and I believe Technology his father
  • 7. 3 V’s of Big Data
  • 8. 3 V’s of Big Data
  • 9. 4th V of Big Data
  • 10. Application of Big Data - Cern • In 1960 Cern used to store data in a main frame computer. • In 1970 cern used to distribute data in several machines dividing mainframe computer into a smaller piece of equipments and cern net was introduced to bridge these machine and travel was reduced. • In 1980 these machines were placed in different countries of US and Europe and internet was introduced to connect these machines. • Due to enormous increase of data in 2000 a cern grid was introduced connecting different smaller computers together to analyze and process the data. • Detector with 150 million sensors are used in LHC where protons collides at a light speed works as a 3D camera where pictures are by a rate of 40 million times per second. The data is now stored in cloud and analyzed using big data techniques.
  • 11. Implementation of Big Data - Cern Proton injection for collision Collision of particles recording data in sensors
  • 12. Other Industries using Big Data • Government Application: – US government invested a lot in the big data applications. Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign. – The Utah Data Center is a data center currently being constructed by the United States National Security Agency. The exact amount of storage space is unknown, but more recent sources claim it will be on the order of a few exabytes. – Big data analysis was, in parts, responsible for the BJP and its allies to win a highly successful Indian General Election 2014. – UK government is utilizing big data to improv weather forecasting & new drug release forecasts. • Manufacturing Industries: • Vast amount of sensory data such as acoustics, vibration, pressure, current, voltage and controller data in addition to historical data construct the big data in manufacturing. The generated big data acts as the input into predictive tools and preventive strategies. • Technology Industries: • Ebay and Amazon are industry leaders for maintaining large amount of user searches and predictive analysis. This helps to identify user needs and provide them with better results. • Retail Industries: • Walmart contains about 2.5 peta bytes of data handling 1 million customer transaction every hour. • Amazon does a transaction of USD 80,000 in an hour. Amazon has worlds three largest databases.
  • 13. Big Data Solutions - Hadoop • Hadoop is an open-source system to reliably store and process lot of information. • Solution of Big Data that handles complexity involved in volume, variety and velocity of data. • It transform the commodity hardware to services to handle peta bytes of data into distributed environments: Pigeon Computing. • Hadoop is redundant , reliable, powerful, batch process centric, distributed.
  • 15. Map Reduce program – Word Count
  • 16. Hadoop Implementation in Real World • Yahoo: – In 2008 Yahoo claimed world’s largest hadoop prodcution application. Yahoo Search Webmap is a hadoop application that runs on Linux with more that 10,000 cores. • Facebook: – In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. On June 13, 2012 they announced the data had grown to 100 PB] On November 8, 2012 they announced the data gathered in the warehouse grows by roughly half a PB per day • As of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 use Hadoop. • The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4 TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth)
  • 17. Distributed System - CAP Theorem
  • 18. Introduction to No SQL • A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases • Types of NoSQL Databases: – Column: Cassandra, HBase – Document: Apache CouchDB, MongoDB – Key-value: CouchDB, Dynamo, Redis – Graph: Neo4J – Multi-model: OrientDB, Alchemy Database, CortexDB
  • 19. High Level Architecture - Cassandra • Ring based replication • Only 1 type of server (cassandra) • All nodes hold data and can answer queries • No Single Point of Failure • Build for HA & Scalability • Multi-DC • Data is found by key (CQL) • Runs on JVM
  • 20. High Level Architecture - Cassandra
  • 21. High Level Architecture - Cassandra Example: Single Row Partition • Simple User system • Identified by name (pk) • 1 Row per partition
  • 22. High Level Architecture - Cassandra Example: Multiple Rows • Comments on photos • Comments are always selected by the photo_id • There are only 4 rows in 2 partitions
  • 23. High Level Architecture - Cassandra • Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row" • Cassandra is built for fast write. The data model should be deformalize to do few Reads as possible
  • 24. High Level Architecture – Mongo DB • Open-source, Document-oriented, popular for its agile and scalable approach • Notable Features : – JSON/BSON data model with dynamic schema – Auto-sharding for horizontal scalability – Built-in replication with automated fail-overs – Full, flexible index support including secondary indexes – Rich document-based queries – Aggregation framework and Map / Reduce – GridFS for large file storage
  • 25. High Level Architecture – Mongo DB • Ensures High Availability, Redundancy, Automated Fail-over • Writes to the Primary, Reads from all • Asynchronous replication • In conventional terms, more like Master/Slave replication • Members can be configured to be: Secondary only / Non- Voting / Hidden / Arbiters / Delayed
  • 26. When to use : Mongo DB • Unstructured data from multiple suppliers • GridFS : Stores large binary objects • Spring Data Services • Embedding and linking documents • Easy replication set up for AWS
  • 28. Thank you for your patience. Thank You!

Editor's Notes

  1. 4. Example of streaming data: If there is a application searching of some text in the emails that we send. Emails can be considered as a stream of data, algorithms work to get some text identification done on the basis of specific pattern and send’s an alert if something is found. Now a days many government agencies are working on these kind of stuffs.
  2. 5. The image shows how the data was evolved. Archaeology findings shows that around 2000 BC Phaistos Disc were getting used to store the information. These were the clay discs which embeds the data and store it for a long period of time. Later people used wrote things in pyramids following by store tabs.
  3. 6. Necessity is the mother of invention. Human brain always want to know more and to know more, we need to process more. Information Era gave us the data, and to process this data we created big data.
  4. 7. Characteristics of Bid Data consists of 3V. Volume, Variety and Velocity. Volume represents the bulk and size of data. Every decade the definition of big data changes. Previously it was hard to store KB’s of data but now we are storing huge amounts of data on a smartphone. The image shows the amount of data that is getting stored in different parts of the world. Next comes variety, it’s the categorization of big data. By categorizing data we make it easy for data analyst to group some inter dependent data and get some advantage out of it.
  5. 8. Velocity represents the speed of generating this data. The image shows by what speed we generate this data. Its really to think what happens with this enormous amount of data that we are generating and this leads to the 4th V.
  6. 9. Value. What is the value of analyzing the data. The image shows how the various industries are utilizing & analyzing this data. Apart from the monetary benefits, many other fields like machine learning, scientific experiments, medicine etc. are benefited by Big Data.
  7. 10. In 1962 Arthur Samuel wrote a computer program to play checkers. The program got defeated initially but later Samuel wrote a sub program to analyze the board and compute the plays for wining. When the sub program got linked with the checkers program the computer started to win. This was a first incident of artificial intelligence where the data generated by the computer was recorded and used to plan the moves.
  8. 11.
  9. 12. Some car manufactures are gathering data from the sensors on the drivers seat, they are identifying the pattern when a driver feels sleepy and informs the driver by vibrating the steering. The same technology is getting used to identify the theft based on sitting patterns.
  10. 13.
  11. 14. Map reduce is the processing part, it runs the computation and return the results. Second part is HDFS. It stores all the data, with files and directories and is highly scalable and distributed.
  12. 15. This is a classic map reduce program for word count.
  13. 16.
  14. 17. In theoretical computer science, the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it succeeded or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  15. 18. A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements, Unique name, value & timestramp. Document: A trivial example would be scanning paper documents, extracting the title, author, and date from them either by OCR or having a human locate and enter them, and storing each document in a 4-column relational database, the columns being author, title, date, and a blob full of page images Key Value: an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of  pairs, such that each possible key appears just once in the collection. Graph:  graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data
  16. 19.
  17. 20.
  18. 21.
  19. 22.
  20. 23.
  21. 24.
  22. 25.
  23. 26.
  24. 27. Not to say there are some disadvantages too: Issues with finding the right talent. Issue with finding the proper use case. Impact on white colar jobs due to high needs of data scientists. Analyzing and finding out Good Data from Big Data.
  25. 28.