SlideShare une entreprise Scribd logo
1  sur  49
Hassnain Ali 15081598-066
Nadeem Tahir 15081598-106
What is Big Data?
“Big data is the data characterized by 4 key
attributes: volume, variety, velocity and
value.”
-- Oracle
Let’s look at
Big Data
in a different way.
Byte
Byte : one grain of rice
Kilobyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
: 3 Semi trucks
: 2 Container Ships
Petabyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
Petabyte
: 3 Semi trucks
: 2 Container Ships
: Blankets Manhattan
OEnxeabByyttee
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
Petabyte
Exabyte
: 3 Semi trucks
: 2 Container Ships
: Blankets Manhattan
: Blankets west coast states
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
Petabyte
Exabyte
: 3 Semi trucks
: 2 Container Ships
: Blankets Manhattan
: Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Zettabyte
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
Petabyte
Exabyte
: 3 Semi trucks
: 2 Container Ships
: Blankets Manhattan
: Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL! Yottabyte
Hobbyist
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte
Terabyte
Petabyte
Exabyte
: 3 Semi trucks
: 2 Container Ships
: Blankets Manhattan
: Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Desktop
Hobbyist
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Petabyte
Exabyte
: 2 Container Ships
: Blankets Manhattan
: Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Desktop
Hobbyist
Internet
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Petabyte
: 2 Container Ships
: Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Desktop
Hobbyist
Internet
BigData
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Petabyte
: 2 Container Ships
: Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Petabyte
: 2 Container Ships
: Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Desktop
Hobbyist
The Future?
Internet
BigData
Byte
Kilobyte
: one grain of rice
: cup of rice
Megabyte : 8 bags of rice
Gigabyte : 3 Semi trucks
Terabyte
Petabyte
: 2 Container Ships
: Blankets Manhattan
Exabyte : Blankets west coast states
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE RICEBALL!
Big Data is not about the size of the data,
it’s about the value within the data.
We are generating huge
amounts of data.
Data with a
lot of information.
… and a lot of noise.
The ability to hear the signal
from the noise is the key…
to unlocking the human conversation
that is taking place around us.
Did it work?
Most people don’t know
what to do with all the data
that they already have…
Get Big
by starting
small
Big Data isn’t big, if you know
how to use it.
Storing Big
Data
• Data start to play an increasingly important role in
business and science.
• Storing, searching, sharing, analysing and visualising big
data has become a challenge.
• Especially storing of data is often disregarded as an
issue. Note that sometimes a MySQL database is not
enough.
• Hadoop offers an out of the box distributed filesystem for
storing data files. However, the challenge appears when
someone needs DB capabilities, frequent updates or real
Problems Now A days
 Nowadays traditional relational databases can reach their limit
in performance.
 Data keep on coming in high velocity, high volumes, and high
variety.
 Common practices to increase performance fail after a while:
buying a faster server, getting more RAM, using materialised
views, fine tuning queries...
 Furthermore, “alter table” doesn’t really work with lots of
data. Backups and data availability becomes an issue.
NO SQL
• The term is too broad and new to really define it.
• No schema
• No joins between tables
• No common scripting language (like SQL)
• No ACID (atomicity, consistency, isolation, durability)
• On the other hand you gain horizontal scalability and high performance.
Also, most NoSQL systems are Map/Reduce ready and/or bind with
Hadoop.
MangoDB Example:-
A document is represented in JSON format:
{
“ id” : 12345678,
“Link” : “http://news.scotsman.com/abc.html”, “Title”:“Blah blah
blah”,
“Content”: “More blah blah”, “OutletID” : 14,
“Date” : ISODate(“2011-11-17T20:33:15.097Z”), “ Hash” :
550973592,
“Tags” : [ International, News, Scotland],
MongoDB - Replication
Master/Slave
Single Server
MongoDB - Sharding MongoDB
If new shard is added, data is balanced automaticall
Data Processing
 Without data processing, organizations have no access to
massive amounts of data that can help them gain a competitive
edge, give them insight into sales, marketing strategies and
consumer needs. It is imperative that companies large and small
understand the necessity of data processing.
 Data processing occurs when data is collected and translated
into usable information
The Six Stages of Data Processing
• Data Collection
• Data Preparation
• Data Input
• Processing
• Data Output/Interpretation
• Data Storage
The Future of Data Processing
The future of data processing lies in the cloud. Cloud technology
builds on the convenience of current electronic data processing
methods and accelerates its speed and effectiveness. Faster,
higher-quality data means more data for each organization to
utilize and more valuable insights to extract.
Big data tools:-
1. Apache Hadoop 2. Microsoft HDInsight
3. NoSQL 4. Hive
5. Sqoop
7. Big data in EXCEL 8. Presto
6. PolyBase
Big Data Techniques
Quantitative Analysis
Quantitative analysis is a data analysis technique that focuses on quantifying
the patterns and correlations found in the data. Based on statistical practices,
this technique involves analyzing a large number of observations from a dataset
Qualitative Analysis
Qualitative analysis is a data analysis technique that focuses
on describing various data qualities using words. It involves
analyzing a smaller sample in greater depth compared to
quantitative data analysis. These analysis results cannot be
generalized to an entire dataset due to the small sample size
DATA MINING
Data mining, also known as data discovery, is a specialized form of
data analysis that targets large datasets. In relation to Big Data
analysis, data mining generally refers to automated, software-based
techniques that sift through massive datasets to identify patterns and
trends.
STATISTICAL ANALYSIS
Statistical analysis uses statistical methods based on mathematical formulas as a
means for analyzing data. Statistical analysis is most often quantitative, but can also be
qualitative. This type of analysis is commonly used to describe datasets via
summarization, such as providing the mean, median, or mode of statistics associated
with the dataset.
MACHINE LEARNING
Humans are good at spotting patterns and relationships within data.
Unfortunately, we cannot process large amounts of data very quickly.
Machines, on the other hand, are very adept at processing large amounts of
data quickly, but only if they know how.
SEMANTIC ANALYSIS
A fragment of text or speech data can carry different meanings in different
contexts, whereas a complete sentence may retain its meaning, even if
structured in different ways. In order for the machines to extract valuable
information, text and speech data needs to be understood by the machines
in the same way as humans do. Semantic analysis represents practices for
extracting meaningful information from textual and speech data.
VISUAL ANALYSIS
Visual analysis is a form of data analysis that involves the
graphic representation of data to enable or enhance its visual
perception. Based on the premise that humans can
understand and draw conclusions from graphics more quickly
than from text, visual analysis acts as a discovery tool in the
field of Big Data.
Intro to big data and how it works

Contenu connexe

Tendances (20)

Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big Data
Big DataBig Data
Big Data
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data
Big dataBig data
Big data
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 

Similaire à Intro to big data and how it works

Similaire à Intro to big data and how it works (20)

Big data
Big data Big data
Big data
 
Big Data Chapter1.pdf
Big Data Chapter1.pdfBig Data Chapter1.pdf
Big Data Chapter1.pdf
 
Big data anuj
Big data anujBig data anuj
Big data anuj
 
Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01Whatisbigdata 130718170809-phpapp01
Whatisbigdata 130718170809-phpapp01
 
What is big data
What is big dataWhat is big data
What is big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
L21 Big Data and Analytics
L21 Big Data and AnalyticsL21 Big Data and Analytics
L21 Big Data and Analytics
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big data
Big dataBig data
Big data
 
L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Big data
Big dataBig data
Big data
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 

Dernier

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Dernier (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

Intro to big data and how it works

  • 1. Hassnain Ali 15081598-066 Nadeem Tahir 15081598-106
  • 2. What is Big Data?
  • 3. “Big data is the data characterized by 4 key attributes: volume, variety, velocity and value.” -- Oracle
  • 4. Let’s look at Big Data in a different way.
  • 5. Byte Byte : one grain of rice
  • 6. Kilobyte Byte Kilobyte : one grain of rice : cup of rice
  • 7. Megabyte Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice
  • 8. Gigabyte Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks
  • 9. Terabyte Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte : 3 Semi trucks : 2 Container Ships
  • 10. Petabyte Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte Petabyte : 3 Semi trucks : 2 Container Ships : Blankets Manhattan
  • 11. OEnxeabByyttee Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte Petabyte Exabyte : 3 Semi trucks : 2 Container Ships : Blankets Manhattan : Blankets west coast states
  • 12. Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte Petabyte Exabyte : 3 Semi trucks : 2 Container Ships : Blankets Manhattan : Blankets west coast states Zettabyte : Fills the Pacific Ocean Zettabyte
  • 13. Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte Petabyte Exabyte : 3 Semi trucks : 2 Container Ships : Blankets Manhattan : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL! Yottabyte
  • 14. Hobbyist Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte Terabyte Petabyte Exabyte : 3 Semi trucks : 2 Container Ships : Blankets Manhattan : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 15. Desktop Hobbyist Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte Petabyte Exabyte : 2 Container Ships : Blankets Manhattan : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 16. Desktop Hobbyist Internet Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte Petabyte : 2 Container Ships : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 17. Desktop Hobbyist Internet BigData Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte Petabyte : 2 Container Ships : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 18. Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte Petabyte : 2 Container Ships : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 19. Desktop Hobbyist The Future? Internet BigData Byte Kilobyte : one grain of rice : cup of rice Megabyte : 8 bags of rice Gigabyte : 3 Semi trucks Terabyte Petabyte : 2 Container Ships : Blankets Manhattan Exabyte : Blankets west coast states Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE RICEBALL!
  • 20. Big Data is not about the size of the data, it’s about the value within the data.
  • 21. We are generating huge amounts of data.
  • 22. Data with a lot of information.
  • 23. … and a lot of noise.
  • 24. The ability to hear the signal from the noise is the key…
  • 25. to unlocking the human conversation that is taking place around us.
  • 27. Most people don’t know what to do with all the data that they already have…
  • 29.
  • 30. Big Data isn’t big, if you know how to use it.
  • 32. • Data start to play an increasingly important role in business and science. • Storing, searching, sharing, analysing and visualising big data has become a challenge. • Especially storing of data is often disregarded as an issue. Note that sometimes a MySQL database is not enough. • Hadoop offers an out of the box distributed filesystem for storing data files. However, the challenge appears when someone needs DB capabilities, frequent updates or real
  • 33. Problems Now A days  Nowadays traditional relational databases can reach their limit in performance.  Data keep on coming in high velocity, high volumes, and high variety.  Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries...  Furthermore, “alter table” doesn’t really work with lots of data. Backups and data availability becomes an issue.
  • 34. NO SQL • The term is too broad and new to really define it. • No schema • No joins between tables • No common scripting language (like SQL) • No ACID (atomicity, consistency, isolation, durability) • On the other hand you gain horizontal scalability and high performance. Also, most NoSQL systems are Map/Reduce ready and/or bind with Hadoop.
  • 35. MangoDB Example:- A document is represented in JSON format: { “ id” : 12345678, “Link” : “http://news.scotsman.com/abc.html”, “Title”:“Blah blah blah”, “Content”: “More blah blah”, “OutletID” : 14, “Date” : ISODate(“2011-11-17T20:33:15.097Z”), “ Hash” : 550973592, “Tags” : [ International, News, Scotland],
  • 37. MongoDB - Sharding MongoDB If new shard is added, data is balanced automaticall
  • 38. Data Processing  Without data processing, organizations have no access to massive amounts of data that can help them gain a competitive edge, give them insight into sales, marketing strategies and consumer needs. It is imperative that companies large and small understand the necessity of data processing.  Data processing occurs when data is collected and translated into usable information
  • 39. The Six Stages of Data Processing • Data Collection • Data Preparation • Data Input • Processing • Data Output/Interpretation • Data Storage
  • 40. The Future of Data Processing The future of data processing lies in the cloud. Cloud technology builds on the convenience of current electronic data processing methods and accelerates its speed and effectiveness. Faster, higher-quality data means more data for each organization to utilize and more valuable insights to extract.
  • 41. Big data tools:- 1. Apache Hadoop 2. Microsoft HDInsight 3. NoSQL 4. Hive 5. Sqoop 7. Big data in EXCEL 8. Presto 6. PolyBase
  • 42. Big Data Techniques Quantitative Analysis Quantitative analysis is a data analysis technique that focuses on quantifying the patterns and correlations found in the data. Based on statistical practices, this technique involves analyzing a large number of observations from a dataset
  • 43. Qualitative Analysis Qualitative analysis is a data analysis technique that focuses on describing various data qualities using words. It involves analyzing a smaller sample in greater depth compared to quantitative data analysis. These analysis results cannot be generalized to an entire dataset due to the small sample size
  • 44. DATA MINING Data mining, also known as data discovery, is a specialized form of data analysis that targets large datasets. In relation to Big Data analysis, data mining generally refers to automated, software-based techniques that sift through massive datasets to identify patterns and trends.
  • 45. STATISTICAL ANALYSIS Statistical analysis uses statistical methods based on mathematical formulas as a means for analyzing data. Statistical analysis is most often quantitative, but can also be qualitative. This type of analysis is commonly used to describe datasets via summarization, such as providing the mean, median, or mode of statistics associated with the dataset.
  • 46. MACHINE LEARNING Humans are good at spotting patterns and relationships within data. Unfortunately, we cannot process large amounts of data very quickly. Machines, on the other hand, are very adept at processing large amounts of data quickly, but only if they know how.
  • 47. SEMANTIC ANALYSIS A fragment of text or speech data can carry different meanings in different contexts, whereas a complete sentence may retain its meaning, even if structured in different ways. In order for the machines to extract valuable information, text and speech data needs to be understood by the machines in the same way as humans do. Semantic analysis represents practices for extracting meaningful information from textual and speech data.
  • 48. VISUAL ANALYSIS Visual analysis is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception. Based on the premise that humans can understand and draw conclusions from graphics more quickly than from text, visual analysis acts as a discovery tool in the field of Big Data.