SlideShare une entreprise Scribd logo
1  sur  15
CL ASSIFICATION OF DATA
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
OBJECTIVES
• To understand the various Classification of Data
• To know What is Structured Data?
• To know What is Unstructured Data?
• To know What is Semistructured Data?
• To understand the Key Differences between Structured and Unstructured
Data
(CentreforKnowledgeTransfer)
institute
DISCUSSION TOPICS
• Classification of Data
• What is Structured Data?
• What is Unstructured Data?
• What is Semistructured Data?
• Structured vs Unstructured Data: 5 Key Differences
(CentreforKnowledgeTransfer)
institute
CLASSIFICATION OF DATA
• Data classification is broadly defined as the process of organizing data by relevant
categories so that it may be used more efficiently. On a basic level, the classification
process makes data easier to locate and retrieve. Data classification is of particular
importance when it comes to risk management, compliance, and data security.
• Data classification involves tagging data to make it easily searchable and trackable. It
also eliminates multiple duplications of data, which can reduce storage and backup
costs while speeding up the search process.
(CentreforKnowledgeTransfer)
institute
PYRAMID OF DATA
(CentreforKnowledgeTransfer)
institute
WHAT IS STRUCTURED DATA?
• The term structured data refers to data that resides in a fixed field within a file or
record. Structured data is typically stored in a relational database (RDBMS). It can
consist of numbers and text, and sourcing can happen automatically or manually, as
long as it's within an RDBMS structure. It depends on the creation of a data model,
defining what types of data to include and how to store and process it.
• The programming language used for structured data is SQL (Structured Query
Language). Typical examples of structured data are names, Reg. No., Marks,
Attendence, and so on.
S.No. First Name Last Name Reg. No.
1 Priya Dharshini 18132001
2 Mawa Chouhan 18132002
3
Sai
phanindra Muvvala 18132003
4 Nandhini
Venkatesa
n 18132004
(CentreforKnowledgeTransfer)
institute
WHAT IS UNSTRUCTURED DATA?
• Unstructured data is more or less all the data that is not structured. Even though
unstructured data may have a native, internal structure, it's not structured in a
predefined way. There is no data model; the data is stored in its native format.
• Typical examples of unstructured data are rich media, text, social media activity,
surveillance imagery, and so on.
The amount of unstructured data
is much larger than that of
structured data. Unstructured
data makes up a 80% of all
enterprise data, and the
percentage keeps growing. This
means that companies not taking
unstructured data into account
are missing out on a lot of
valuable business intelligence.
(CentreforKnowledgeTransfer)
institute
EXAMPLES: UNSTRUCTURED DATA
(CentreforKnowledgeTransfer)
institute
WHAT IS SEMI-STRUCTURED DATA?
• Semistructured data is a third category that falls somewhere between the other two.
It's a type of structured data that does not fit into the formal structure of a relational
database. But while not matching the description of structured data entirely, it still
employs tagging systems or other markers, separating different elements and enabling
search. Sometimes, this is referred to as data with a self-describing structure.
• A typical example of semistructured data is smartphone photos. Every photo taken
with a smartphone contains unstructured image content as well as the tagged time,
location, and other identifiable (and structured) information. Semi-structured data
formats include JSON, CSV, and XML file types.
(CentreforKnowledgeTransfer)
institute
STRUCTURED VS UNSTRUCTURED DATA:
• Defined vs Undefined Data
• Qualitative vs Quantitative Data
• Storage in Data Houses vs Data Lakes
• Easy vs Hard to Analyze
• Predefined format vs a variety of formats
(CentreforKnowledgeTransfer)
institute
DEFINED VS UNDEFINED DATA
Defined Undefined Data
Structured data is clearly defined
types of data in a structure
unstructured data is usually
stored in its native format
Structured data lives in rows and
columns and it can be mapped
into pre-defined fields
Unlike structured data, which
is organized and easy to access
relational databases,
data does not have a predefined
data model
(CentreforKnowledgeTransfer)
institute
QUANTITATIVE VS QUALITATIVE DATA
Quantitative Data Qualitative Data
Structured data is often quantitative
data, meaning it usually consists of
hard numbers or things that can be
counted.
Unstructured data, on the other hand,
is often categorized as qualitative data,
and cannot be processed and analyzed
using conventional tools and methods.
Methods for analysis include regression
(to predict relationships between
variables); classification (to estimate
probability); and clustering of data
(based on different attributes).
In a business context, qualitative data
can, for example, come from customer
surveys, interviews, and social media
interactions. Extracting insights from
qualitative data requires advanced
analytics techniques like data
mining and data stacking.
(CentreforKnowledgeTransfer)
institute
STORAGE IN DATA HOUSES VS DATA LAKES
Storage in Data Houses Storage in Data Lakes
Structured data is often stored in data
warehouses
unstructured data is stored in data
lakes
A data warehouse is the endpoint for
the data’s journey through an ETL
pipeline. Both have the potential for
cloud-use
A data lake, on the other hand, is a
of almost limitless repository where
data is stored in its original format or
after undergoing a basic “cleaning”
process.
Structured data requires less storage
space
unstructured data requires more. For
example, even a tiny image takes up
more space than many pages of text
As for databases, structured data is
usually stored in a relational
database (RDBMS),
the best fit for unstructured data
instead is so-called non-relational,
or NoSQL databases
(CentreforKnowledgeTransfer)
institute
EASE OF ANALYSIS
One of the most significant differences between structured and unstructured data is how
well it lends itself to analysis..
Structured
data
Unstructured data
Structured data is
easy to search,
both for humans
and for
algorithms
Unstructured data, on the other hand, is intrinsically more
difficult to search and requires processing to become
understandable
It's challenging to deconstruct since it lacks a predefined
data model and hence doesn't fit in in relational databases.
there are a wide
array of
sophisticated
analytics tools for
structured data
most analytics tools for mining and arranging unstructured
data are still in the developing phase
The lack of predefined structure makes data mining tricky,
and developing best practices on how to handle data
sources like rich media, blogs, social media data, and
(CentreforKnowledgeTransfer)
institute
PREDEFINED FORMAT VS VARIETY OF FORMATS
Predefined Format Variety of Formats
The most common
for structured data is text
and numbers
Unstructured data, on the other hand, comes in a
variety of shapes and sizes. It can consist of
everything from audio, video, and imagery to
and sensor data.
Structured data has been
defined beforehand in a
data model.
There is no data model for the unstructured data; it
is stored natively or in a data lake that doesn't
require any transformation.
Structured data requires
less storage space
unstructured data requires more. For example,
even a tiny image takes up more space than many
pages of text
As for databases,
structured data is usually
stored in a relational
the best fit for unstructured data instead is so-
non-relational, or NoSQL databases

Contenu connexe

Tendances

Data warehouse and Decision support system
Data warehouse  and Decision support system Data warehouse  and Decision support system
Data warehouse and Decision support system
Enaam Alotaibi
 

Tendances (20)

Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Data analytics
Data analyticsData analytics
Data analytics
 
Cspro training material
Cspro training materialCspro training material
Cspro training material
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data warehouse and Decision support system
Data warehouse  and Decision support system Data warehouse  and Decision support system
Data warehouse and Decision support system
 
Data reduction
Data reductionData reduction
Data reduction
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Data analytics
Data analyticsData analytics
Data analytics
 
Kdd process
Kdd processKdd process
Kdd process
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 

Similaire à Classification of data

Chapter 2.ppt on Types of Digital f Data
Chapter 2.ppt on Types of Digital f DataChapter 2.ppt on Types of Digital f Data
Chapter 2.ppt on Types of Digital f Data
FatimaNaqvi47
 
Navigating the BI Stack _
Navigating the BI Stack _Navigating the BI Stack _
Navigating the BI Stack _
Michael Phipps
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
SubhamitaKanungo
 

Similaire à Classification of data (20)

introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
computer.pdf
computer.pdfcomputer.pdf
computer.pdf
 
What are the implications of unstructured data to database design- Sup.docx
What are the implications of unstructured data to database design- Sup.docxWhat are the implications of unstructured data to database design- Sup.docx
What are the implications of unstructured data to database design- Sup.docx
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
Chapter 2.ppt on Types of Digital f Data
Chapter 2.ppt on Types of Digital f DataChapter 2.ppt on Types of Digital f Data
Chapter 2.ppt on Types of Digital f Data
 
Navigating the BI Stack _
Navigating the BI Stack _Navigating the BI Stack _
Navigating the BI Stack _
 
Unit_II_1_Types_of_Data.pptx
Unit_II_1_Types_of_Data.pptxUnit_II_1_Types_of_Data.pptx
Unit_II_1_Types_of_Data.pptx
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
AWC Career Bootcamp- August 21, 2013
AWC Career Bootcamp- August 21, 2013AWC Career Bootcamp- August 21, 2013
AWC Career Bootcamp- August 21, 2013
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Introduction to Data (1).pptx
Introduction to Data (1).pptxIntroduction to Data (1).pptx
Introduction to Data (1).pptx
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Managing Data Strategically
Managing Data StrategicallyManaging Data Strategically
Managing Data Strategically
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 

Plus de Dr. C.V. Suresh Babu

Plus de Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Classification of data

  • 1. CL ASSIFICATION OF DATA Dr. C.V. Suresh Babu (CentreforKnowledgeTransfer) institute
  • 2. (CentreforKnowledgeTransfer) institute OBJECTIVES • To understand the various Classification of Data • To know What is Structured Data? • To know What is Unstructured Data? • To know What is Semistructured Data? • To understand the Key Differences between Structured and Unstructured Data
  • 3. (CentreforKnowledgeTransfer) institute DISCUSSION TOPICS • Classification of Data • What is Structured Data? • What is Unstructured Data? • What is Semistructured Data? • Structured vs Unstructured Data: 5 Key Differences
  • 4. (CentreforKnowledgeTransfer) institute CLASSIFICATION OF DATA • Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used more efficiently. On a basic level, the classification process makes data easier to locate and retrieve. Data classification is of particular importance when it comes to risk management, compliance, and data security. • Data classification involves tagging data to make it easily searchable and trackable. It also eliminates multiple duplications of data, which can reduce storage and backup costs while speeding up the search process.
  • 6. (CentreforKnowledgeTransfer) institute WHAT IS STRUCTURED DATA? • The term structured data refers to data that resides in a fixed field within a file or record. Structured data is typically stored in a relational database (RDBMS). It can consist of numbers and text, and sourcing can happen automatically or manually, as long as it's within an RDBMS structure. It depends on the creation of a data model, defining what types of data to include and how to store and process it. • The programming language used for structured data is SQL (Structured Query Language). Typical examples of structured data are names, Reg. No., Marks, Attendence, and so on. S.No. First Name Last Name Reg. No. 1 Priya Dharshini 18132001 2 Mawa Chouhan 18132002 3 Sai phanindra Muvvala 18132003 4 Nandhini Venkatesa n 18132004
  • 7. (CentreforKnowledgeTransfer) institute WHAT IS UNSTRUCTURED DATA? • Unstructured data is more or less all the data that is not structured. Even though unstructured data may have a native, internal structure, it's not structured in a predefined way. There is no data model; the data is stored in its native format. • Typical examples of unstructured data are rich media, text, social media activity, surveillance imagery, and so on. The amount of unstructured data is much larger than that of structured data. Unstructured data makes up a 80% of all enterprise data, and the percentage keeps growing. This means that companies not taking unstructured data into account are missing out on a lot of valuable business intelligence.
  • 9. (CentreforKnowledgeTransfer) institute WHAT IS SEMI-STRUCTURED DATA? • Semistructured data is a third category that falls somewhere between the other two. It's a type of structured data that does not fit into the formal structure of a relational database. But while not matching the description of structured data entirely, it still employs tagging systems or other markers, separating different elements and enabling search. Sometimes, this is referred to as data with a self-describing structure. • A typical example of semistructured data is smartphone photos. Every photo taken with a smartphone contains unstructured image content as well as the tagged time, location, and other identifiable (and structured) information. Semi-structured data formats include JSON, CSV, and XML file types.
  • 10. (CentreforKnowledgeTransfer) institute STRUCTURED VS UNSTRUCTURED DATA: • Defined vs Undefined Data • Qualitative vs Quantitative Data • Storage in Data Houses vs Data Lakes • Easy vs Hard to Analyze • Predefined format vs a variety of formats
  • 11. (CentreforKnowledgeTransfer) institute DEFINED VS UNDEFINED DATA Defined Undefined Data Structured data is clearly defined types of data in a structure unstructured data is usually stored in its native format Structured data lives in rows and columns and it can be mapped into pre-defined fields Unlike structured data, which is organized and easy to access relational databases, data does not have a predefined data model
  • 12. (CentreforKnowledgeTransfer) institute QUANTITATIVE VS QUALITATIVE DATA Quantitative Data Qualitative Data Structured data is often quantitative data, meaning it usually consists of hard numbers or things that can be counted. Unstructured data, on the other hand, is often categorized as qualitative data, and cannot be processed and analyzed using conventional tools and methods. Methods for analysis include regression (to predict relationships between variables); classification (to estimate probability); and clustering of data (based on different attributes). In a business context, qualitative data can, for example, come from customer surveys, interviews, and social media interactions. Extracting insights from qualitative data requires advanced analytics techniques like data mining and data stacking.
  • 13. (CentreforKnowledgeTransfer) institute STORAGE IN DATA HOUSES VS DATA LAKES Storage in Data Houses Storage in Data Lakes Structured data is often stored in data warehouses unstructured data is stored in data lakes A data warehouse is the endpoint for the data’s journey through an ETL pipeline. Both have the potential for cloud-use A data lake, on the other hand, is a of almost limitless repository where data is stored in its original format or after undergoing a basic “cleaning” process. Structured data requires less storage space unstructured data requires more. For example, even a tiny image takes up more space than many pages of text As for databases, structured data is usually stored in a relational database (RDBMS), the best fit for unstructured data instead is so-called non-relational, or NoSQL databases
  • 14. (CentreforKnowledgeTransfer) institute EASE OF ANALYSIS One of the most significant differences between structured and unstructured data is how well it lends itself to analysis.. Structured data Unstructured data Structured data is easy to search, both for humans and for algorithms Unstructured data, on the other hand, is intrinsically more difficult to search and requires processing to become understandable It's challenging to deconstruct since it lacks a predefined data model and hence doesn't fit in in relational databases. there are a wide array of sophisticated analytics tools for structured data most analytics tools for mining and arranging unstructured data are still in the developing phase The lack of predefined structure makes data mining tricky, and developing best practices on how to handle data sources like rich media, blogs, social media data, and
  • 15. (CentreforKnowledgeTransfer) institute PREDEFINED FORMAT VS VARIETY OF FORMATS Predefined Format Variety of Formats The most common for structured data is text and numbers Unstructured data, on the other hand, comes in a variety of shapes and sizes. It can consist of everything from audio, video, and imagery to and sensor data. Structured data has been defined beforehand in a data model. There is no data model for the unstructured data; it is stored natively or in a data lake that doesn't require any transformation. Structured data requires less storage space unstructured data requires more. For example, even a tiny image takes up more space than many pages of text As for databases, structured data is usually stored in a relational the best fit for unstructured data instead is so- non-relational, or NoSQL databases