SlideShare une entreprise Scribd logo
1  sur  12
Submitted by,
M. Kavitha M.Sc.,
Nadar Saraswathi College of
Art & Science, Theni.
Data Mining
Data Integration and
Transformation
Data Integration
* Data Integration involves combining data from
several disparate source, which are stored using various
technologies and provide a unified view of the data.
* The later initiative is often called a data warehouse.
* It merges the data from multiple data stores (data
source).
* It includes multiple databases, data cubes or flat
files.
* Metadata, correlation analysis, data conflict detection
and resolution of semantic heterogeneity contribute towards
smooth data integration.
Advantages :
1. Independence.
2. Faster query processing.
3. Complex query processing.
4. Advanced data summarization & storage possible.
5. High volume data processing.
Disadvantages :
1. Latency (since data needs to be loaded using ETL).
2. Costlier (data localization, infrastructure, security).
There are a number of issues to consider during data integration.
1. Schema Integration.
2. Redundancy.
3. Detection and resolution of data value conflicts.
Schema integration :
The real-world entities from multiple source be matched
is referred to as the entity identification problem.
For example,
Data analyst or the computer be sure that customer_id in
one database and cust_number in another refer to the same
entity. Databases and data warehouses that is a data about the
data it’s a meta data.
Redundancy :
* It is another important issue.
* An attribute may be redundant if it can be “derived”
from another table, such as annual revenue.
* Some redundancies can be detected by correlation
analysis.
For example,
Two attributes, such analysis can measure how
strongly one attribute implies the other based on the
available data.
The correlation between attributes attribute A and B by
Detection and resolution of data value conflicts :
* A third important issue in data integration is the
detection and resolution of data value conflicts.
* The same real-world entity, attribute values from
different sources. This may be due to differences in
representation, scaling, or encoding.
* An attribute in one system may be recorded at a
lower level of abstraction than the “same” attribute in another.
* For example, the total sales in one database may
refer to one branch of All Electronics, an attribute of the same
name in another database may refer to the total sales for All
Electronics stores in a given region.
Data Transformation
* Data transformation the data are transformed or
consolidated into forms in appropriate for mining.
* Data transformation can involve
1. Smoothing.
2. Aggregation.
3. Generalization.
4. Normalization.
5. Attribute construction.
Smoothing :
Which works to remove the noise from data. Such
techniques include binning, clustering and regression.
Aggregation :
* Where summary or aggregation operations are applied
to the data.
* For example, the daily sales data may be aggregated so
as to compute monthly and annual total amounts.
Generalization :
* The data where low-level or “primitive” data are placed
by higher-level concepts through the use of concept through
the use of concept hierarchies.
* For example, the attributes like street can be
generalized to higher-level concept city or country when the
numeric attributes to higher-level concept young, middle-
aged and street.
Normalization :
Where the attribute data are scaled so as to fall within
a specified range, such as -1.0 to 1.0 or 0.0 to 1.0
Attribute construction :
Where new attribute are a constructed and added
from the given set of attributes to help the mining
process.
There are many method for data normalization.
* Min-Max normalization.
* Z-Score normalization.
* Normalization by decimal scaling.
Min – Max Normalization :
It performs a linear transformation on the original data.
Suppose that min A and max A are the minimum and
maximum values of attributes A. A Min – Max
normalization maps a value v of A to v’ in the range.
Z – Score Normalization :
The Z – Score normalization a value of an attribute A
are normalized based on the mean and standard deviation of
A. A value v of A is normalized to v’
Normalization by Decimal Scaling :
Normalization by decimal scaling normalizes by moving
the decimal point of values of attribute A.
The number of decimal points moved depends on the
maximum absolute value of A. A value v of A is normalized
to v’ by computing
where j is the smallest integer such that Max(|V’|) < 1.
Thank You

Contenu connexe

Tendances

Database Management System
Database Management SystemDatabase Management System
Database Management SystemNishant Munjal
 
1.4 data independence
1.4 data independence1.4 data independence
1.4 data independenceBHARATH KUMAR
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization Hafiz faiz
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architecturesPooja Dixit
 
Architecture of data mining system
Architecture of data mining systemArchitecture of data mining system
Architecture of data mining systemramya marichamy
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational modelChirag vasava
 
Characteristic of dabase approach
Characteristic of dabase approachCharacteristic of dabase approach
Characteristic of dabase approachLuina Pani
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 
Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)shalinikarunakaran1
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed databaseSonia Panesar
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 

Tendances (20)

Database security
Database securityDatabase security
Database security
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
1.4 data independence
1.4 data independence1.4 data independence
1.4 data independence
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Data reduction
Data reductionData reduction
Data reduction
 
Architecture of data mining system
Architecture of data mining systemArchitecture of data mining system
Architecture of data mining system
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Characteristic of dabase approach
Characteristic of dabase approachCharacteristic of dabase approach
Characteristic of dabase approach
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)Attributes of output primitive(line attributes)
Attributes of output primitive(line attributes)
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 

Similaire à Data Integration and Transformation Techniques for Data Mining

Unit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfUnit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfPyingkodi Maran
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysisHirra Sultan
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdfDimpyJindal4
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&toolsAmandeep Gill
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processingDatamining Tools
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningPyingkodi Maran
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and ComparisonBenjamin Franklin
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.pptPalaniKumarR2
 

Similaire à Data Integration and Transformation Techniques for Data Mining (20)

Data integration
Data integrationData integration
Data integration
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Unit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdfUnit_2_Feature Engineering.pdf
Unit_2_Feature Engineering.pdf
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysis
 
1234
12341234
1234
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and Comparison
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 

Plus de kavitha muneeshwaran (13)

Physical Security
Physical SecurityPhysical Security
Physical Security
 
Digital Audio
Digital AudioDigital Audio
Digital Audio
 
Java
JavaJava
Java
 
Data structure
Data structureData structure
Data structure
 
Internet Programming with Java
Internet Programming with JavaInternet Programming with Java
Internet Programming with Java
 
Digital image processing
Digital image processing  Digital image processing
Digital image processing
 
Staffing level estimation
Staffing level estimation Staffing level estimation
Staffing level estimation
 
Transaction Management - Deadlock Handling
Transaction Management - Deadlock HandlingTransaction Management - Deadlock Handling
Transaction Management - Deadlock Handling
 
Process
ProcessProcess
Process
 
Digital Logic circuit
Digital Logic circuitDigital Logic circuit
Digital Logic circuit
 
C and C++ functions
C and C++ functionsC and C++ functions
C and C++ functions
 
I/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architectureI/O system in intel 80386 microcomputer architecture
I/O system in intel 80386 microcomputer architecture
 
narrow Band ISDN
narrow Band ISDNnarrow Band ISDN
narrow Band ISDN
 

Dernier

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Dernier (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Data Integration and Transformation Techniques for Data Mining

  • 1. Submitted by, M. Kavitha M.Sc., Nadar Saraswathi College of Art & Science, Theni. Data Mining Data Integration and Transformation
  • 2. Data Integration * Data Integration involves combining data from several disparate source, which are stored using various technologies and provide a unified view of the data. * The later initiative is often called a data warehouse. * It merges the data from multiple data stores (data source). * It includes multiple databases, data cubes or flat files. * Metadata, correlation analysis, data conflict detection and resolution of semantic heterogeneity contribute towards smooth data integration.
  • 3. Advantages : 1. Independence. 2. Faster query processing. 3. Complex query processing. 4. Advanced data summarization & storage possible. 5. High volume data processing. Disadvantages : 1. Latency (since data needs to be loaded using ETL). 2. Costlier (data localization, infrastructure, security).
  • 4. There are a number of issues to consider during data integration. 1. Schema Integration. 2. Redundancy. 3. Detection and resolution of data value conflicts. Schema integration : The real-world entities from multiple source be matched is referred to as the entity identification problem. For example, Data analyst or the computer be sure that customer_id in one database and cust_number in another refer to the same entity. Databases and data warehouses that is a data about the data it’s a meta data.
  • 5. Redundancy : * It is another important issue. * An attribute may be redundant if it can be “derived” from another table, such as annual revenue. * Some redundancies can be detected by correlation analysis. For example, Two attributes, such analysis can measure how strongly one attribute implies the other based on the available data. The correlation between attributes attribute A and B by
  • 6. Detection and resolution of data value conflicts : * A third important issue in data integration is the detection and resolution of data value conflicts. * The same real-world entity, attribute values from different sources. This may be due to differences in representation, scaling, or encoding. * An attribute in one system may be recorded at a lower level of abstraction than the “same” attribute in another. * For example, the total sales in one database may refer to one branch of All Electronics, an attribute of the same name in another database may refer to the total sales for All Electronics stores in a given region.
  • 7. Data Transformation * Data transformation the data are transformed or consolidated into forms in appropriate for mining. * Data transformation can involve 1. Smoothing. 2. Aggregation. 3. Generalization. 4. Normalization. 5. Attribute construction. Smoothing : Which works to remove the noise from data. Such techniques include binning, clustering and regression.
  • 8. Aggregation : * Where summary or aggregation operations are applied to the data. * For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts. Generalization : * The data where low-level or “primitive” data are placed by higher-level concepts through the use of concept through the use of concept hierarchies. * For example, the attributes like street can be generalized to higher-level concept city or country when the numeric attributes to higher-level concept young, middle- aged and street.
  • 9. Normalization : Where the attribute data are scaled so as to fall within a specified range, such as -1.0 to 1.0 or 0.0 to 1.0 Attribute construction : Where new attribute are a constructed and added from the given set of attributes to help the mining process. There are many method for data normalization. * Min-Max normalization. * Z-Score normalization. * Normalization by decimal scaling.
  • 10. Min – Max Normalization : It performs a linear transformation on the original data. Suppose that min A and max A are the minimum and maximum values of attributes A. A Min – Max normalization maps a value v of A to v’ in the range. Z – Score Normalization : The Z – Score normalization a value of an attribute A are normalized based on the mean and standard deviation of A. A value v of A is normalized to v’
  • 11. Normalization by Decimal Scaling : Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. The number of decimal points moved depends on the maximum absolute value of A. A value v of A is normalized to v’ by computing where j is the smallest integer such that Max(|V’|) < 1.