SlideShare une entreprise Scribd logo
1  sur  18
Big Data
Presented by,
Mohamedsalman S
(BIT CSE)
contents
 Introduction.
 Components.
 Methods.
 What is Hadoop.
 Hadoop Offers.
 Map reduce.
 What is HPCC.
 HPCC Components.
 Big Data Samples.
 Difference between Hpcc and Hadoop.
 Private and Security issues.
 Knowledge Discovery.
 Conclusion.
Introduction
 Big data and its analysis are at the center of modern science and
business.
 These data are generated from online transactions, emails, videos,
audios, images etc.
 They are stored in databases grow massively and become difficult to
capture, store, manage, share.
 It is predicted to double every two years reaching about 8zettabytes
of data by 2015.
Components
 Vareity.
Variety makes big data really big.
Big data comes from a great variety of sources.
Generally has in three types structured, unstructured and semi-
structured.
Structured data inserts a data warehouse already tagged and
easily sorted.
Unstructured data is random and difficult to analyze.
Components
Semi-structured data does not conform to fixed fields but contains
tags to separate data elements.
 Volume.
Volume or the size of data now is larger than terabytes, petabytes and
zettabytes.
 Velocity.
The flow of data is massive and continuous.
Big data should be used as it streams into the organization in order to
maximize its value.
Methods
 Facing lots of new data which arrives in many different forms.
 Big data has generated a whole new industry of supporting
architectures such as MapReduce.
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step. Hpcc.
Reduce Step. Hadoop.
What is Hadoop?
 Hadoop is an open-source software framework.
 Its Java based framework.
 Essentially it accomplishes two tasks massive data storage and faster
processing.
 Its not replace in database warehouse or ETL.
Hadoop Offers
 HDFS - responsible for storing data on the clusters.
 MapReduce.
 Hbase - distributed database for random read/write access.
 Pig - high level data processing system.
 Hive - data warehouse application.
 Sqoop - transferring data between relational databases and Hadoop.
Mapreduce
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step.
Reduce Step.
Map Reduce
What is HPCC?
 HPCC also known as DAS.
 HPCC Systems distributed data intensive open source computing
platform and provides big data workflow management services.
 Unlike Hadoop, HPCC’s data model defined by user.
 HPCC Platform does not require third party tools like GreenPlum,
Cassandra, RDBMS, Oozie.
HPCC Components
 HPCC Data Refinery
Massively parallel ETL engine that enables data integration
and provides batch oriented data manipulation.
 HPCC Data Delivery Engine
High throughput, ultra fast, low latency.
 Enterprise Control Language
Simple usage programming language optimized for big data
operations and query transactions.
Big Data Samples
 Biological science.
 Life sciences.
 Medical records.
 Scientific research.
 Mobile phones.
 Government.
Difference between Hpcc and
Hadoop
Knowledge Discovery
 Some operations designed to get information from complicated data
sets.
 Removing noise, handling missing data fields and calculating time
information.
 Mapping purposes to a particular data mining methods.
 Choose data mining algorithm and method for searching data
patterns.
Privacy and Security Issues
 It required that big data stores are rightly controlled.
 To ensure authentication a cryptographically secure communication
framework has to be implemented.
 They control data according to specified by the regulations such as
imposing store periods.
 Organizations have to consider legal branching for storing data.
Knowledge Discovery
 Some operations designed to get information from complicated data
sets.
 Removing noise, handling missing data fields and calculating time
information.
 Mapping purposes to a particular data mining methods.
 Choose data mining algorithm and method for searching data
patterns.
Conclusion
 Difficult to managing the data.
 Data keep in secure manner.
 Its used more no of organization.

Contenu connexe

Tendances

Bigdata
BigdataBigdata
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
Shankar R
 

Tendances (20)

Big Data
Big DataBig Data
Big Data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Bigdata
BigdataBigdata
Bigdata
 
big data
big databig data
big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
What is Big Data ?
What is Big Data ?What is Big Data ?
What is Big Data ?
 

En vedette

Hermenéutica analógica
Hermenéutica analógicaHermenéutica analógica
Hermenéutica analógica
Lunatikka
 
Pbm single point_gowrishankar
Pbm single point_gowrishankarPbm single point_gowrishankar
Pbm single point_gowrishankar
mobilesquared Ltd
 
Práctica no. 4
Práctica no. 4Práctica no. 4
Práctica no. 4
Mat_
 
Gabriel
GabrielGabriel
Gabriel
gacp
 
M2 roadshow us gowri shankar, single point
M2 roadshow us   gowri shankar, single pointM2 roadshow us   gowri shankar, single point
M2 roadshow us gowri shankar, single point
mobilesquared Ltd
 

En vedette (20)

'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s
'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s
'Denktank' Minerva is politiek vehikel van vakbonden en ngo’s
 
Fundraising in healthcare what investors look for
Fundraising in healthcare   what investors look forFundraising in healthcare   what investors look for
Fundraising in healthcare what investors look for
 
nicolettibell
nicolettibellnicolettibell
nicolettibell
 
Revista Oficial Real Avilés CF, número 3
Revista Oficial Real Avilés CF, número 3Revista Oficial Real Avilés CF, número 3
Revista Oficial Real Avilés CF, número 3
 
Great leaders
Great leadersGreat leaders
Great leaders
 
Looking at crowdsourcing and some of its legal implications
Looking at crowdsourcing and some of its legal implicationsLooking at crowdsourcing and some of its legal implications
Looking at crowdsourcing and some of its legal implications
 
INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...
INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...
INITIATIVES DES FONDS SODÉMEX, FONDS DE SOLIDARITÉ FTQ ET SIDEX POUR FINANCER...
 
Kpi Partners Company Profile1
Kpi Partners   Company Profile1Kpi Partners   Company Profile1
Kpi Partners Company Profile1
 
Hermenéutica analógica
Hermenéutica analógicaHermenéutica analógica
Hermenéutica analógica
 
Comparativ therapeutics of fever
Comparativ therapeutics of feverComparativ therapeutics of fever
Comparativ therapeutics of fever
 
Pbm single point_gowrishankar
Pbm single point_gowrishankarPbm single point_gowrishankar
Pbm single point_gowrishankar
 
실리콘밸리 2014년 1분기 벤처투자리스트.
실리콘밸리 2014년 1분기 벤처투자리스트.실리콘밸리 2014년 1분기 벤처투자리스트.
실리콘밸리 2014년 1분기 벤처투자리스트.
 
Práctica no. 4
Práctica no. 4Práctica no. 4
Práctica no. 4
 
Overview of MEASURE Evaluation In the ECOWAS Region
Overview of MEASURE Evaluation In the ECOWAS RegionOverview of MEASURE Evaluation In the ECOWAS Region
Overview of MEASURE Evaluation In the ECOWAS Region
 
Global and APAC OTC trends (March 2013)
Global and APAC OTC trends (March 2013)Global and APAC OTC trends (March 2013)
Global and APAC OTC trends (March 2013)
 
Winning in growth cities
Winning in growth citiesWinning in growth cities
Winning in growth cities
 
Kpi Partners
Kpi PartnersKpi Partners
Kpi Partners
 
Emploi tourisme-aquitaine
Emploi tourisme-aquitaineEmploi tourisme-aquitaine
Emploi tourisme-aquitaine
 
Gabriel
GabrielGabriel
Gabriel
 
M2 roadshow us gowri shankar, single point
M2 roadshow us   gowri shankar, single pointM2 roadshow us   gowri shankar, single point
M2 roadshow us gowri shankar, single point
 

Similaire à Big data

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similaire à Big data (20)

paper
paperpaper
paper
 
Hadoop
HadoopHadoop
Hadoop
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 

Dernier

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Dernier (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 

Big data

  • 2. contents  Introduction.  Components.  Methods.  What is Hadoop.  Hadoop Offers.  Map reduce.  What is HPCC.  HPCC Components.  Big Data Samples.  Difference between Hpcc and Hadoop.  Private and Security issues.  Knowledge Discovery.  Conclusion.
  • 3. Introduction  Big data and its analysis are at the center of modern science and business.  These data are generated from online transactions, emails, videos, audios, images etc.  They are stored in databases grow massively and become difficult to capture, store, manage, share.  It is predicted to double every two years reaching about 8zettabytes of data by 2015.
  • 4. Components  Vareity. Variety makes big data really big. Big data comes from a great variety of sources. Generally has in three types structured, unstructured and semi- structured. Structured data inserts a data warehouse already tagged and easily sorted. Unstructured data is random and difficult to analyze.
  • 5. Components Semi-structured data does not conform to fixed fields but contains tags to separate data elements.  Volume. Volume or the size of data now is larger than terabytes, petabytes and zettabytes.  Velocity. The flow of data is massive and continuous. Big data should be used as it streams into the organization in order to maximize its value.
  • 6. Methods  Facing lots of new data which arrives in many different forms.  Big data has generated a whole new industry of supporting architectures such as MapReduce.  MapReduce is a programming framework for distributed computing.  Created by google using divide and conquer method.  MapReduce can be divided into two stages. Map Step. Hpcc. Reduce Step. Hadoop.
  • 7. What is Hadoop?  Hadoop is an open-source software framework.  Its Java based framework.  Essentially it accomplishes two tasks massive data storage and faster processing.  Its not replace in database warehouse or ETL.
  • 8. Hadoop Offers  HDFS - responsible for storing data on the clusters.  MapReduce.  Hbase - distributed database for random read/write access.  Pig - high level data processing system.  Hive - data warehouse application.  Sqoop - transferring data between relational databases and Hadoop.
  • 9. Mapreduce  MapReduce is a programming framework for distributed computing.  Created by google using divide and conquer method.  MapReduce can be divided into two stages. Map Step. Reduce Step.
  • 11. What is HPCC?  HPCC also known as DAS.  HPCC Systems distributed data intensive open source computing platform and provides big data workflow management services.  Unlike Hadoop, HPCC’s data model defined by user.  HPCC Platform does not require third party tools like GreenPlum, Cassandra, RDBMS, Oozie.
  • 12. HPCC Components  HPCC Data Refinery Massively parallel ETL engine that enables data integration and provides batch oriented data manipulation.  HPCC Data Delivery Engine High throughput, ultra fast, low latency.  Enterprise Control Language Simple usage programming language optimized for big data operations and query transactions.
  • 13. Big Data Samples  Biological science.  Life sciences.  Medical records.  Scientific research.  Mobile phones.  Government.
  • 15. Knowledge Discovery  Some operations designed to get information from complicated data sets.  Removing noise, handling missing data fields and calculating time information.  Mapping purposes to a particular data mining methods.  Choose data mining algorithm and method for searching data patterns.
  • 16. Privacy and Security Issues  It required that big data stores are rightly controlled.  To ensure authentication a cryptographically secure communication framework has to be implemented.  They control data according to specified by the regulations such as imposing store periods.  Organizations have to consider legal branching for storing data.
  • 17. Knowledge Discovery  Some operations designed to get information from complicated data sets.  Removing noise, handling missing data fields and calculating time information.  Mapping purposes to a particular data mining methods.  Choose data mining algorithm and method for searching data patterns.
  • 18. Conclusion  Difficult to managing the data.  Data keep in secure manner.  Its used more no of organization.