SlideShare une entreprise Scribd logo
1  sur  9
Data processing, Preprocessing
Cloud computing
Author: Sergey Borkhonov
MNU, Computer Science
What is Big Data?
• Big data is an all-encompassing term for any
collection of data sets so large and complex
that it becomes difficult to process using on-
hand data management tools or traditional data
processing applications.
• Big Data refers to extremely vast amounts of
multi-structured data that typically has been
cost prohibitive to store and analyze.
In the simplest terms, Big Data can be broken
down into two basic types, structured and
unstructured data.
• Structured – Predefined data type
• Spreadsheets and Oracle Relational database
• Unstructured – is non pre-defined data model
or is not organized in a pre-defined manner.
• Video, Audio, Images, Metadata, etc…
• Semi-structured – Structured data embedded
with some unstructured data
• Email, Text Messaging
• Where is the big data come from?
• A simple answer is ‘everywhere’.
• The sources we ignored earlier because of technical limitations
are treated as gold mines today.
• Big data may come from web logs, RFIDs, GPS systems,
sensor networks, social networks, IOT, search indices, detail
call records, science experiments like nuclear physics, medical
records, military surveillance, photo archives, video archives,
e-commerce practices etc.
• Since the advent of data warehouses in early 90s, companies
are storing relevant data in large volumes.
• Many believe that big data is not only dependent on data itself
but variety, velocity, veracity, variability and value preposition
are also an important aspects of Big Data.
• Cover varying types of data sources
Data can be streaming, batch, structured, unstructured, and
semi-structured, depending on the information type, where
it comes from and its primary use. Big Data must be able
to accommodate all of these various types of data on a very
large scale.
• Analytics
Big Data must provide the mechanisms to allow ad-hoc
queries, data discovery and experimentation on the large
data sets to effectively correlate various events and data
types to get an understanding of the data that is useful and
addresses business needs.
• Big data is typically defined by three “V”s :
– Volume,
– Variety and
– Velocity.
• In addition to these three, leading big data solution
providers added other Vs such as
– Veracity (IBM)
– Variability (SAS)
– Value Proposition
• Although there are number of different
technologies that are useful in analyzing Big
Data. Most of them share some common
characteristics.
• There are three Big Data Technologies that
stand out of the lot:
– MapReduce
– Hadoop
– NoSQL
Big Data Technologies
• MapReduce is a technique popularized by Google
that distributes the processing of a very large multi-
structured data files across a large cluster of
machines.
• High performance is achieved by breaking the
processing into small units of work that can be run in
parallel across thousands of clusters.
• Map reduce help organization in processing and
analyzing large volumes of multi-structured data. For
example- graph analysis, text analysis, machine
learning, data transformation etc.
• It is an open source framework for processing, storing
and analyzing massive amounts of distributed,
unstructured data.
• Hadoop was inspired by MapReduce and was
designed to handle petabytes and exabytes of data.
• Rather than banging away huge block of data with
single machine, Hadoop breaks up Big Data into
multiple parts so each part can be processed and
analyzed at the same time.
• Sources of data may include log files, social media
feeds and internal data sources.

Contenu connexe

Similaire à big data processing.pptx

Similaire à big data processing.pptx (20)

TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
BIg Data Overview
BIg Data OverviewBIg Data Overview
BIg Data Overview
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Big Data
Big DataBig Data
Big Data
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data
Big dataBig data
Big data
 
Big Data przt.pptx
Big Data przt.pptxBig Data przt.pptx
Big Data przt.pptx
 
1
11
1
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 

Dernier

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Dernier (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

big data processing.pptx

  • 1. Data processing, Preprocessing Cloud computing Author: Sergey Borkhonov MNU, Computer Science
  • 2. What is Big Data? • Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on- hand data management tools or traditional data processing applications. • Big Data refers to extremely vast amounts of multi-structured data that typically has been cost prohibitive to store and analyze.
  • 3. In the simplest terms, Big Data can be broken down into two basic types, structured and unstructured data. • Structured – Predefined data type • Spreadsheets and Oracle Relational database • Unstructured – is non pre-defined data model or is not organized in a pre-defined manner. • Video, Audio, Images, Metadata, etc… • Semi-structured – Structured data embedded with some unstructured data • Email, Text Messaging
  • 4. • Where is the big data come from? • A simple answer is ‘everywhere’. • The sources we ignored earlier because of technical limitations are treated as gold mines today. • Big data may come from web logs, RFIDs, GPS systems, sensor networks, social networks, IOT, search indices, detail call records, science experiments like nuclear physics, medical records, military surveillance, photo archives, video archives, e-commerce practices etc. • Since the advent of data warehouses in early 90s, companies are storing relevant data in large volumes. • Many believe that big data is not only dependent on data itself but variety, velocity, veracity, variability and value preposition are also an important aspects of Big Data.
  • 5. • Cover varying types of data sources Data can be streaming, batch, structured, unstructured, and semi-structured, depending on the information type, where it comes from and its primary use. Big Data must be able to accommodate all of these various types of data on a very large scale. • Analytics Big Data must provide the mechanisms to allow ad-hoc queries, data discovery and experimentation on the large data sets to effectively correlate various events and data types to get an understanding of the data that is useful and addresses business needs.
  • 6. • Big data is typically defined by three “V”s : – Volume, – Variety and – Velocity. • In addition to these three, leading big data solution providers added other Vs such as – Veracity (IBM) – Variability (SAS) – Value Proposition
  • 7. • Although there are number of different technologies that are useful in analyzing Big Data. Most of them share some common characteristics. • There are three Big Data Technologies that stand out of the lot: – MapReduce – Hadoop – NoSQL Big Data Technologies
  • 8. • MapReduce is a technique popularized by Google that distributes the processing of a very large multi- structured data files across a large cluster of machines. • High performance is achieved by breaking the processing into small units of work that can be run in parallel across thousands of clusters. • Map reduce help organization in processing and analyzing large volumes of multi-structured data. For example- graph analysis, text analysis, machine learning, data transformation etc.
  • 9. • It is an open source framework for processing, storing and analyzing massive amounts of distributed, unstructured data. • Hadoop was inspired by MapReduce and was designed to handle petabytes and exabytes of data. • Rather than banging away huge block of data with single machine, Hadoop breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time. • Sources of data may include log files, social media feeds and internal data sources.