SlideShare une entreprise Scribd logo
1  sur  6
Télécharger pour lire hors ligne
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1463 | P a g e
Slicing : A Efficient Method For Privacy Preservation In Data
Publishing
D. Mohanapriya*, Dr. T. Meyyappan**
*(Resarch scholar ,Department of Computer Science and engineering, AlagappaUniversity, karaikudi
** (Professor, Department of Computer Science and engineering, AlagappaUniversity, karaikudi
ABSTRACT
In this paper we propose and prove a new
technique called “Overlapping Slicing” for
privacy preservation of high dimensional data.
The process of publishing the data in the web,
faces many challenges today. The data usually
contains the personal information which are
personally identifiable to anyone, thus poses the
problem of Privacy. Privacy is an important issue
in data publishing. Many organizations distribute
non-aggregate personal data for research, and
they must take steps to ensure that an adversary
cannot predict sensitive information pertaining to
individuals with high confidence. Recent work in
data publishing information, especially for high
dimensional data. Bucketization, on the other
hand, does not prevent membership disclosure.
We propose an overlapping slicing method for
handling high into more than one column; we
protect privacy by breaking the association of
uncorrelated attributes and preserve data utility
by preserving the association between highly
correlated attributes. This technique releases mo
correlations thereby, overlapping slicing preserves
better data utility than generalization and is more
effective than bucketization in workloads
involving the sensitive attribute
Keywords :Overlapping slicing, privacy
preservation, high dimensional data, privacy
techniques
I. INTRODUCTION
Privacy preserving publishing of microdata
has been studied extensively in recent years.
Microdata contain records each of which contains
information about an individual entity, such as a
person, a household, or an organization. Several
microdata anonymization techniques have been
proposed. The most popular ones are generalization,
for k-anonymity and bucketization for diversity. In
both approaches, attributes are partitioned into three
categories:
 Some attributes are identifiers that can
uniquely identify an individual, such as Name
or Social Security Number.
 Some attributes are Quasi Identifiers (QI),
which the adversary may already know
(possibly from other publicly available
databases) and which, when taken together,
can potentially identify an individual, e.g.,
Birthdate, Sex, and Zipcode.
 Some attributes are Sensitive Attributes (SAs),
which are unknown to the adversary and are
considered sensitive, such as Disease and
Salary.
In both generalization and bucketization, one
first removes identifiers from the data and then
partitions tuples into buckets. The two techniques
differ in the next step.Generalization transforms the
QI-values in each bucket into “less specific but
semantically consistent” values so that tuples in the
same bucket cannot be distinguished by their QI
values. In bucketization, one separates the SAs from
the QIs by randomly permuting the SA values in each
bucket. The anonymized data consist of a set of
buckets with permuted sensitive attribute values.
II. RELATED WORKS
In this chapter we discuss about the literature
survey and related works done in privacy preserving
microdata and their techniques. The main
disadvantage of Generalization is: it loses
considerable amount of information, especially for
high- dimensional data. And also, Bucketization does
not prevent membership disclosure and does not apply
for data that do not have a clear separation between
quasi-identifying attributes and sensitive attributes.
Generalization loses considerable amount of
information, especially for high- dimensional data.
Bucketizations do not have a clear separation between
quasi-identifying attributes and sensitive attributes.
C.Aggarwal [1] initially proposed On k-
anonymity and curse of dimensionality concept.
Where the author [1] proposed privacy preserving
anonymization technique where a record is released
only if it indistinguishable from k other entities of
data. In this paper [1] the authors [1] show that when
the data contains a large number of attributes which
may be considered quasi-identifiers, it becomes
difficult to anonymize the data without an
unacceptably high amount of information loss. This is
because an exponential number of combinations of
dimensions can be used to make precise inference
attacks, even when individual attributes are partially
specified within a range. In this paper they provide an
analysis of the effect of dimensionality on k-
anonymity methods. They [1] conclude that when a
data set contains a large number of attributes which
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1464 | P a g e
are open to inference attacks, and also the author [1]
faced with a choice of either completely suppressing
most of the data or losing the desired level of
anonymity. Thus, the work showed that the curse of
high dimensionality also applies to the problem of
privacy preserving data mining.
A. Blum[2] et.al., proposed a new
framework for practical privacy and they named it as
SULQ framework. They[2] consider a statistical
database in which a trusted administrator introduces
noise to the query responses with the goal of
maintaining privacy of individual database entries. In
such a database, a query consists of a pair (S, f)
where S is a set of rows in the database and f is a
function mapping database rows to {0, 1}. The true
answer is P i∈S f(di), and a noisy version is released
as the response to the query. Results of Dinur,
Dwork, and Nissim show that a strong form of
privacy can be maintained using a surprisingly small
amount of noise – much less than the sampling error
– provided the total number of queries is sublinear in
the number of database rows. We call this query and
(slightly) noisy reply the SuLQ (Sub-Linear Queries)
primitive. The assumption of sublinearity becomes
reasonable as databases grow increasingly large. The
authors [2] extend the work in two ways. First, they
[2] modify the privacy analysis to real-valued
functions f and arbitrary row types, as a consequence
greatly improving the bounds on noise required for
privacy. Second, they [2] examine the computational
power of the SuLQ primitive. They [2] show that it is
very powerful indeed, in that slightly noisy versions
of the following computations can be carried out with
very few invocations of the primitive: principal
component analysis, k means clustering, the
Perceptron Algorithm, the ID3 algorithm, and
(apparently!) all algorithms that operate in the in the
statistical query learning model.
J. Brickell [3] introduced a new
anonymization technique called the cost of privacy.
In this work, Re-identification is a major privacy
threat to public datasets containing individual
records. Many privacy protection algorithms rely on
generalization and suppression of "quasi-identifier"
attributes such as ZIP code and birthdate. Their
objective is usually syntactic sanitization: for
example, k-anonymity requires that each "quasi-
identifier" tuple appear in at least k records, while l-
diversity requires that the distribution of sensitive
attributes for each quasi-identifier have high entropy.
The utility of sanitized data is also measured
syntactically, by the number of generalization steps
applied or the number of records with the same quasi-
identifier. In this paper [3], query generalization and
suppression of quasi-identifiers offer any benefits
over trivial sanitization which simply separates quasi-
identifiers from sensitive attributes. Previous work
showed that k-anonymous databases can be useful for
data mining, but k-anonymization does not guarantee
any privacy. By contrast, we measure the tradeoff
between privacy (how much can the adversary learn
from the sanitized records?) and utility, measured as
accuracy of data-mining algorithms executed on the
same sanitized records.
For our experimental evaluation, we use the
same datasets from the UCI machine learning
repository as were used in previous research on
generalization and suppression. Our results
demonstrate that even modest privacy gains require
almost complete destruction of the data-mining utility.
In most cases, trivial sanitization provides equivalent
utility and better privacy than k-anonymity, l-
diversity, and similar methods based on generalization
and suppression.
A multidimensional technique was proposed
by B.C. Chen et. al [4], which they named as Skyline
based technique. Privacy is an important issue in data
publishing.
I.Dinur [5] proposed another technique of revealing
information while preserving privacy. The authors [5]
examine the tradeoff between privacy and usability of
statistical databases.
Consider microdata such as census data
andmedical data. Typically, microdata are stored in a
table, and each record (row) corresponds to one
individual. Each record has a number of attributes,
which can be divided into the following three
categories:
1. Identifier: Identifiers are attributes that clearly
identify individuals. Examples include Social
Security Number and Name.
2. Quasi-Identifier: Quasi-identifiers are attributes
whose values when taken together can potentially
identify an individual. Examples include Zip-
code, Birthdate, and Gender. An adversary may
already know the QI values of some individuals
in the data. This knowledge can be either from
personal contact or from other publicly available
databases (e.g., a voter registration list) that
include both explicit identifiers and quasi-
identifiers.
3. Sensitive Attribute: Sensitive attributes are
attributes whose values should not be associated
with an individual by the adversary. Examples
include Disease and Salary.
TABLE1 – ORIGINAL TABLE
Name Age Gender Zipcode Disease
Ann 20 F 12345 AIDS
Bob 24 M 12342 Flu
Cary 23 F 12344 Flu
Dick 27 M 12344 AIDS
Ed 35 M 12412 Flu
Frank 34 M 12433 Cancer
Gary 31 M 12453 Flu
Tom 38 M 12455 AIDS
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1465 | P a g e
TABLE 2: GENERALIZATION
TABLE 3: BUCKETIZATION
III. PROPOSED WORK
In this paper, we present a novel technique
called slicing for privacy-preserving data publishing.
Our contributions include the following.
First, we introduce slicing as a new technique for
privacy preserving data publishing.Slicing has
several advantages when compared with
generalization and bucketization. It preserves better
data utility than generalization. It preserves more
attribute correlations with the SAs than
bucketization. It can also handle high-dimensional
data and data without a clear separation of QIs and
SAs.
Second, we show that slicing can be
effectively used for preventing attribute disclosure,
based on the privacy requirement of l-diversity. We
introduce a notion called l-diverse slicing, which
ensures that the adversary cannot learn the sensitive
value of any individual with a probability greater
than 1/l.
Third, we develop an efficient algorithm for
computing the sliced table that satisfies ldiversity.
Our algorithm partitions attributes into columns,
applies column generalization, and partitions tuples
into buckets. Attributes that are highly correlated are
in the same column; this preserves the correlations
between such attributes. The associations between
uncorrelated attributes are broken; the provides better
privacy as the associations between such attributes
are less- frequent and potentially identifying.
Fourth, we describe the intuition behind
membership disclosure and explain how slicing
prevents membership disclosure. A bucket of size k
can potentially match kc tuples where c is the number
of columns. Because only k of the kc tuples are
actually in the original data, the existence of the other
kc − k tuples hides the membership information of
tuples in the original data.
Slicing partitions the dataset both vertically
and horizontally. Vertical partitioning is done by
grouping attributes into columns based on the
correlations among the attributes. Each column
contains a subset of attributes that are highly
correlated. Horizontal partitioning is done by grouping
tuples into buckets.
Finally, within each bucket, values in each
column are randomly permutated (or sorted) to break
the linking between different columns. The basic idea
of slicing is to break the association cross columns,
but to preserve the association within each column.
This reduces the dimensionality of the data and
preserves better utility than generalization and
bucketization.
Slicing preserves utility because it groups
highly correlated attributes together, and preserves the
correlations between such attributes. Slicing protects
privacy because it breaks the associations between
uncorrelated attributes, which areinfrequent and thus
identifying. Note that when the dataset contains QIs
and one SA, bucketization has to break their
correlation; slicing, on the other hand, can group some
QI attributes with the SA, preserving attribute
correlations with the sensitive attribute.
Slicing retains improved data utility than
generalization and can be recycled for membership
exposure shield. Additional important benefit of
slicing is that it can manage data with greater
dimension. We depict how slicing can be recycled for
attribute exposure protection and build an effective
algorithm for calculating the sliced data that comply
with the ℓ -diversity requisite. Slicing conserves
enhanced utility than generalization and is more
efficient than binning in assignments comprising the
sensitive attribute. Slicing can be used to stop
membership exposure.
Age Gender Zipcode Disease
[20-38] F 12*** AIDS
[20-38] M 12*** Flu
[20-38] F 12*** Flu
[20-38] M 12*** AIDS
[20-38] M 12*** Flu
[20-38] M 12*** Cancer
[20-38] M 12*** Flu
[20-38] M 12*** AIDS
Age Gender Zipcode Disease
[20-27] * 1234* AIDS
[20-27] * 1234* Flu
[20-27] * 1234* Flu
[20-27] * 1234* AIDS
[35-38] * 124** Flu
[35-38] * 124** Cancer
[35-38] * 124** Flu
[35-38] * 124** AIDS
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1466 | P a g e
Fig 1: Overall process diagram
First, generalization for k-anonymity suffers
from the curse of dimensionality. In order for
generalization to be effective, records in the same
bucket must be close to each other so that
generalizing the records would not lose too much
information. However, in high dimensional data,
most data points have similar distances with each
other, forcing a great amount of generalization to
satisfy k-anonymity even forrelatively small
k’s.While bucketization [14], [11], [7] has better data
utility than generalization, it has several
limitations.Bucketization does not prevent
membership disclosure [14]. Because bucketization
publishes the QI values in their original forms, an
adversary can find out whether an individual has a
record in the published data or not. attributes
(Birthdate, Sex, and Zipcode). A microdata (e.g.,
census data) usually contains many other attributes
besides those three attributes. This means that the
membership information of most individuals can be
inferred from the bucketized table.
1) Slicing Algorithms:
An effective slicing algorithm to obtain ℓ -
diverse slicing is offered. For a given a micro data
table T and two factors c and ℓ , the
algorithm calculates the sliced table that involves of c
columns and gratifies the privacy requisite of ℓ -
diversity. Our algorithm involves of three steps:
attribute partitioning column generalization and tuple
partitioning. The three phases are
1.1 Attribute Partitioning:
Our algorithm divides attributes such that
largely related attributes are in the same column. This
is better for utility as well as privacy. With respect to
privacy, the association of not related attributes shows
more identification risks than that of the association of
high related attributes since the association of
unrelated attribute values is very less common and
therefore more identifiable. Thus, it is good to split the
associations among uncorrelated attributes to guard
privacy. In this step, we first calculate the relations
among pairs of attributes and then group attributes on
the basis of their correlations.
1.2 Column Generalization
Records are generalized to gratify certain
minimum frequencyrequisite.
1.3 Tuple Partitioning
In the tuple partitioning steps, records are
divided into buckets. We change Mondrian algorithm
for tuple partition. Not like Mondrian k-anonymity, no
other generalization can be related to the records; we
make use of the Mondrian for the reason of dividing
tuples into buckets.
1.4 Membership Disclosure Protection
Let us first inspect how a challenger can
conclude membership data from binning. Since
binning liberates the QI values in their real form and
more individuals can be solely determined using the
QI values, the challenger can easily settle the
membership of single individual in the real data by
inspecting the regularity of the QI values in the binned
information. Precisely, if the regularity is 0, the
challenger knows for certain that the individual is not
in information.
If the regularity is higher than 0, the
challenger knows with good assurance that the
individual is in the information, since this similar
records must fit to that unique as nearly no further
individual has the identical values of QI. The above
perception advises that so as to defend data of
members, it is necessary that, in the anonymized
information, a record in the real information should
have a same occurrence as a record which is not
present in the original information. Or else, by
investigating their occurrences in the data that is
anonymized,the opponent can be able to distinguish
records in the real information from records that are
not present in the original information.
1.5 Sliced Data
Another important advantage of slicing is its
ability to handle high-dimensional data. By
partitioning attributes into columns, slicing reduces
the dimensionality of the data. Each column of the
table can be viewed as a sub-table with a lower
dimensionality. Slicing is also different from the
Dataset
Slicing
Multiset
Generaliz
ation
Multiset
Attribute
Update
Process
Column
Generaliz
ation
Tuple
PartitionDisproba
bility
Multiset Slicing
Classifica
tion
accuracy
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1467 | P a g e
approach of publishing multiple independent sub-
tables in that these sub-tables are linked by the
buckets in slicing.
TABLE 4 ORIGINAL DATA
TABLE 5 SLICED DATA
IV. ALGORITHM
Our Algorithm of “Overlapping Slicing”, is
presented below:
1. load dataset;
2. attribute partition and column
3. process tuple partition and buckets
4. slicing
5. undergo column generalization
6. do matching buckets
7. duplicate an attribute in more than one
columns
8. end;
V. VI SIMULATION WORKS/RESULTS
We have simulated our system in Dot NET.
We implemented and tested with a system
configuration on Intel Dual Core processor, Windows
XP and using Visual Studio 2008 (C#.net). We have
used the following modules in our implementation
part. The details of each module for this system are as
follows:
Fig 2: Load the dataset
Fig 3: Execution of Generalization Process
Fig 4 : Execution of Bucketization Process
Fig 5 Resultant of Overlapping Slicing
Name Age Gender Zipcode Disease
Ann 20 F 12345 AIDS
Bob 24 M 12342 Flu
Cary 23 F 12344 Flu
Dick 27 M 12344 AIDS
Ed 35 M 12412 Flu
Frank 34 M 12433 Cancer
Gary 31 M 12453 Flu
Tom 38 M 12455 AIDS
(Age, Gender,
Disease)
(Zip-code, Disease)
20,F,Flu 12345,Flu
24,M,AIDS 12342,AIDS
23,F,AIDS 12344,AIDS
27,M,Flu 12344,Flu
35,M,Flu 12412,Flu
34,M,AIDS 12433,AIDS
31,M,Flu 12453,Flu
38,M,Cancer 12455,Cancer
D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and
Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468
1468 | P a g e
VI. CONCLUSION & FUTURE WORK
Thus from our theories and implementation
we prove that Overlapping Slicing overcomes the
limitations of existing techniques of generalization
and bucketization and pre- serves better utility while
protecting against privacy threats. Overlapping
slicing to prevent attribute disclosure and
membership disclosure.
Overlapping Slicing preserves better data utility than
generalization and is more effective than
bucketization in workloads involving the sensitive
attribute.
The general methodology proposed by this work is
that: before anonymizing the data, one can analyze
the data characteristics and use these characteristics
in data anonymization. As our future work we plan to
design more effective tuple grouping algorithms. The
trade-off between column generalization and tuple
partitioning is the subject of future work. The design
of tuple grouping algorithms is left to future work.
REFERENCES
[1] C. Aggarwal, “On k-Anonymity and the
Curse of Dimensionality,” Proc. Int’l Conf.
Very Large Data Bases (VLDB), pp. 901-
909, 2005.
[2] A. Blum, C. Dwork, F. McSherry, and K.
Nissim, “Practical Privacy: The SULQ
Framework,” Proc. ACM Symp. Principles
of Database Systems (PODS), pp. 128-138,
2005.
[3] J. Brickell and V. Shmatikov, “The Cost of
Privacy: Destruction of Data-Mining Utility
in Anonymized Data Publishing,” Proc.
ACM SIGKDD Int’l Conf. Knowledge
Discovery and Data Mining (KDD), pp. 70-
78, 2008.
[4] B.-C. Chen, K. LeFevre, and R.
Ramakrishnan, “Privacy Skyline: Privacy
with Multidimensional Adversarial
Knowledge,” Proc. Int’l Conf. Very Large
Data Bases (VLDB), pp. 770-781, 2007.
[5] H. Cramt’er, Mathematical Methods of
Statistics. Princeton Univ. Press, 1948.
[6] I. Dinur and K. Nissim, “Revealing
Information while Preserving Privacy,”
Proc. ACM Symp. Principles of Database
Systems (PODS), pp. 202-210, 2003.
[7] C. Dwork, “Differential Privacy,” Proc. Int’l
Colloquium Automata, Languages and
Programming (ICALP), pp. 1-12, 2006.
[8] C. Dwork, “Differential Privacy: A Survey
of Results,” Proc. Fifth Int’l Conf. Theory
and Applications of Models of Computation
(TAMC), pp. 1-19, 2008.
[9] C. Dwork, F. McSherry, K. Nissim, and A.
Smith, “Calibrating Noise to Sensitivity in
Private Data Analysis,” Proc. Theory of
Cryptography Conf. (TCC), pp. 265-284,
2006.
[10] J.H. Friedman, J.L. Bentley, and R.A. Finkel,
“An Algorithm for Finding Best Matches in
Logarithmic Expected Time,” ACM Trans.
Math. Software, vol. 3, no. 3, pp. 209-226,
1977.
[11] B.C.M. Fung, K. Wang, and P.S. Yu, “Top-
Down Specialization for Information and
Privacy Preservation,” Proc. Int’l Conf. Data
Eng. (ICDE), pp. 205-216, 2005.
[12] G. Ghinita, Y. Tao, and P. Kalnis, “On the
Anonymization of Sparse High-Dimensional
Data,” Proc. IEEE 24th Int’l Conf. Data Eng.
(ICDE), pp. 715-724, 2008.

Contenu connexe

Tendances

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudIOSR Journals
 
78201919
7820191978201919
78201919IJRAT
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningROMALEE AMOLIC
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd Iaetsd
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningVrushali Malvadkar
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacyijsrd.com
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Kato Mivule
 

Tendances (18)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in CloudEnabling Use of Dynamic Anonymization for Enhanced Security in Cloud
Enabling Use of Dynamic Anonymization for Enhanced Security in Cloud
 
130509
130509130509
130509
 
78201919
7820191978201919
78201919
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
 
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Seminar
SeminarSeminar
Seminar
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
 
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
Lit Review Talk - Signal Processing and Machine Learning with Differential Pr...
 

En vedette (20)

If3415111540
If3415111540If3415111540
If3415111540
 
Ie3415061510
Ie3415061510Ie3415061510
Ie3415061510
 
Ih3415471551
Ih3415471551Ih3415471551
Ih3415471551
 
Hw3414551459
Hw3414551459Hw3414551459
Hw3414551459
 
Hp3414121418
Hp3414121418Hp3414121418
Hp3414121418
 
Hn3414011407
Hn3414011407Hn3414011407
Hn3414011407
 
Ia3414771479
Ia3414771479Ia3414771479
Ia3414771479
 
Co33548550
Co33548550Co33548550
Co33548550
 
Cj33518522
Cj33518522Cj33518522
Cj33518522
 
Ck33523530
Ck33523530Ck33523530
Ck33523530
 
Dz33758761
Dz33758761Dz33758761
Dz33758761
 
Id3313941396
Id3313941396Id3313941396
Id3313941396
 
Io3116021612
Io3116021612Io3116021612
Io3116021612
 
Fj349961003
Fj349961003Fj349961003
Fj349961003
 
Fn3410321036
Fn3410321036Fn3410321036
Fn3410321036
 
Fk3410041018
Fk3410041018Fk3410041018
Fk3410041018
 
Do35641647
Do35641647Do35641647
Do35641647
 
Ah35190193
Ah35190193Ah35190193
Ah35190193
 
Dc32644652
Dc32644652Dc32644652
Dc32644652
 
Primeiras Páginas de Jornais do Mundo - domingo, 2005.04.02
Primeiras Páginas de Jornais do Mundo - domingo, 2005.04.02Primeiras Páginas de Jornais do Mundo - domingo, 2005.04.02
Primeiras Páginas de Jornais do Mundo - domingo, 2005.04.02
 

Similaire à Hy3414631468

IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET Journal
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining TechniquesIJMER
 
78201919
7820191978201919
78201919IJRAT
 
journal for research
journal for researchjournal for research
journal for researchrikaseorika
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionIOSR Journals
 
ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONpharmaindexing
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Journals
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-AnonymityWaqas Tariq
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishingijcisjournal
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsIOSR Journals
 
51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-datakarunyaieeeproj
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...cscpconf
 
Paper id 212014109
Paper id 212014109Paper id 212014109
Paper id 212014109IJRAT
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Kumar Goud
 

Similaire à Hy3414631468 (20)

130509
130509130509
130509
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
 
A Comparative Study on Privacy Preserving Datamining Techniques
A Comparative Study on Privacy Preserving Datamining  TechniquesA Comparative Study on Privacy Preserving Datamining  Techniques
A Comparative Study on Privacy Preserving Datamining Techniques
 
78201919
7820191978201919
78201919
 
journal for research
journal for researchjournal for research
journal for research
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATION
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishing
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
 
S34119122
S34119122S34119122
S34119122
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
F046043234
F046043234F046043234
F046043234
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data51 privacy-preserving-publication-of-set-valued-data
51 privacy-preserving-publication-of-set-valued-data
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
 
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
 
Paper id 212014109
Paper id 212014109Paper id 212014109
Paper id 212014109
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
 

Dernier

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Dernier (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Hy3414631468

  • 1. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1463 | P a g e Slicing : A Efficient Method For Privacy Preservation In Data Publishing D. Mohanapriya*, Dr. T. Meyyappan** *(Resarch scholar ,Department of Computer Science and engineering, AlagappaUniversity, karaikudi ** (Professor, Department of Computer Science and engineering, AlagappaUniversity, karaikudi ABSTRACT In this paper we propose and prove a new technique called “Overlapping Slicing” for privacy preservation of high dimensional data. The process of publishing the data in the web, faces many challenges today. The data usually contains the personal information which are personally identifiable to anyone, thus poses the problem of Privacy. Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. Recent work in data publishing information, especially for high dimensional data. Bucketization, on the other hand, does not prevent membership disclosure. We propose an overlapping slicing method for handling high into more than one column; we protect privacy by breaking the association of uncorrelated attributes and preserve data utility by preserving the association between highly correlated attributes. This technique releases mo correlations thereby, overlapping slicing preserves better data utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute Keywords :Overlapping slicing, privacy preservation, high dimensional data, privacy techniques I. INTRODUCTION Privacy preserving publishing of microdata has been studied extensively in recent years. Microdata contain records each of which contains information about an individual entity, such as a person, a household, or an organization. Several microdata anonymization techniques have been proposed. The most popular ones are generalization, for k-anonymity and bucketization for diversity. In both approaches, attributes are partitioned into three categories:  Some attributes are identifiers that can uniquely identify an individual, such as Name or Social Security Number.  Some attributes are Quasi Identifiers (QI), which the adversary may already know (possibly from other publicly available databases) and which, when taken together, can potentially identify an individual, e.g., Birthdate, Sex, and Zipcode.  Some attributes are Sensitive Attributes (SAs), which are unknown to the adversary and are considered sensitive, such as Disease and Salary. In both generalization and bucketization, one first removes identifiers from the data and then partitions tuples into buckets. The two techniques differ in the next step.Generalization transforms the QI-values in each bucket into “less specific but semantically consistent” values so that tuples in the same bucket cannot be distinguished by their QI values. In bucketization, one separates the SAs from the QIs by randomly permuting the SA values in each bucket. The anonymized data consist of a set of buckets with permuted sensitive attribute values. II. RELATED WORKS In this chapter we discuss about the literature survey and related works done in privacy preserving microdata and their techniques. The main disadvantage of Generalization is: it loses considerable amount of information, especially for high- dimensional data. And also, Bucketization does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. Generalization loses considerable amount of information, especially for high- dimensional data. Bucketizations do not have a clear separation between quasi-identifying attributes and sensitive attributes. C.Aggarwal [1] initially proposed On k- anonymity and curse of dimensionality concept. Where the author [1] proposed privacy preserving anonymization technique where a record is released only if it indistinguishable from k other entities of data. In this paper [1] the authors [1] show that when the data contains a large number of attributes which may be considered quasi-identifiers, it becomes difficult to anonymize the data without an unacceptably high amount of information loss. This is because an exponential number of combinations of dimensions can be used to make precise inference attacks, even when individual attributes are partially specified within a range. In this paper they provide an analysis of the effect of dimensionality on k- anonymity methods. They [1] conclude that when a data set contains a large number of attributes which
  • 2. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1464 | P a g e are open to inference attacks, and also the author [1] faced with a choice of either completely suppressing most of the data or losing the desired level of anonymity. Thus, the work showed that the curse of high dimensionality also applies to the problem of privacy preserving data mining. A. Blum[2] et.al., proposed a new framework for practical privacy and they named it as SULQ framework. They[2] consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is P i∈S f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise – much less than the sampling error – provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (Sub-Linear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large. The authors [2] extend the work in two ways. First, they [2] modify the privacy analysis to real-valued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, they [2] examine the computational power of the SuLQ primitive. They [2] show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model. J. Brickell [3] introduced a new anonymization technique called the cost of privacy. In this work, Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi- identifier" tuple appear in at least k records, while l- diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi- identifier. In this paper [3], query generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi- identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records. For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l- diversity, and similar methods based on generalization and suppression. A multidimensional technique was proposed by B.C. Chen et. al [4], which they named as Skyline based technique. Privacy is an important issue in data publishing. I.Dinur [5] proposed another technique of revealing information while preserving privacy. The authors [5] examine the tradeoff between privacy and usability of statistical databases. Consider microdata such as census data andmedical data. Typically, microdata are stored in a table, and each record (row) corresponds to one individual. Each record has a number of attributes, which can be divided into the following three categories: 1. Identifier: Identifiers are attributes that clearly identify individuals. Examples include Social Security Number and Name. 2. Quasi-Identifier: Quasi-identifiers are attributes whose values when taken together can potentially identify an individual. Examples include Zip- code, Birthdate, and Gender. An adversary may already know the QI values of some individuals in the data. This knowledge can be either from personal contact or from other publicly available databases (e.g., a voter registration list) that include both explicit identifiers and quasi- identifiers. 3. Sensitive Attribute: Sensitive attributes are attributes whose values should not be associated with an individual by the adversary. Examples include Disease and Salary. TABLE1 – ORIGINAL TABLE Name Age Gender Zipcode Disease Ann 20 F 12345 AIDS Bob 24 M 12342 Flu Cary 23 F 12344 Flu Dick 27 M 12344 AIDS Ed 35 M 12412 Flu Frank 34 M 12433 Cancer Gary 31 M 12453 Flu Tom 38 M 12455 AIDS
  • 3. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1465 | P a g e TABLE 2: GENERALIZATION TABLE 3: BUCKETIZATION III. PROPOSED WORK In this paper, we present a novel technique called slicing for privacy-preserving data publishing. Our contributions include the following. First, we introduce slicing as a new technique for privacy preserving data publishing.Slicing has several advantages when compared with generalization and bucketization. It preserves better data utility than generalization. It preserves more attribute correlations with the SAs than bucketization. It can also handle high-dimensional data and data without a clear separation of QIs and SAs. Second, we show that slicing can be effectively used for preventing attribute disclosure, based on the privacy requirement of l-diversity. We introduce a notion called l-diverse slicing, which ensures that the adversary cannot learn the sensitive value of any individual with a probability greater than 1/l. Third, we develop an efficient algorithm for computing the sliced table that satisfies ldiversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly correlated are in the same column; this preserves the correlations between such attributes. The associations between uncorrelated attributes are broken; the provides better privacy as the associations between such attributes are less- frequent and potentially identifying. Fourth, we describe the intuition behind membership disclosure and explain how slicing prevents membership disclosure. A bucket of size k can potentially match kc tuples where c is the number of columns. Because only k of the kc tuples are actually in the original data, the existence of the other kc − k tuples hides the membership information of tuples in the original data. Slicing partitions the dataset both vertically and horizontally. Vertical partitioning is done by grouping attributes into columns based on the correlations among the attributes. Each column contains a subset of attributes that are highly correlated. Horizontal partitioning is done by grouping tuples into buckets. Finally, within each bucket, values in each column are randomly permutated (or sorted) to break the linking between different columns. The basic idea of slicing is to break the association cross columns, but to preserve the association within each column. This reduces the dimensionality of the data and preserves better utility than generalization and bucketization. Slicing preserves utility because it groups highly correlated attributes together, and preserves the correlations between such attributes. Slicing protects privacy because it breaks the associations between uncorrelated attributes, which areinfrequent and thus identifying. Note that when the dataset contains QIs and one SA, bucketization has to break their correlation; slicing, on the other hand, can group some QI attributes with the SA, preserving attribute correlations with the sensitive attribute. Slicing retains improved data utility than generalization and can be recycled for membership exposure shield. Additional important benefit of slicing is that it can manage data with greater dimension. We depict how slicing can be recycled for attribute exposure protection and build an effective algorithm for calculating the sliced data that comply with the ℓ -diversity requisite. Slicing conserves enhanced utility than generalization and is more efficient than binning in assignments comprising the sensitive attribute. Slicing can be used to stop membership exposure. Age Gender Zipcode Disease [20-38] F 12*** AIDS [20-38] M 12*** Flu [20-38] F 12*** Flu [20-38] M 12*** AIDS [20-38] M 12*** Flu [20-38] M 12*** Cancer [20-38] M 12*** Flu [20-38] M 12*** AIDS Age Gender Zipcode Disease [20-27] * 1234* AIDS [20-27] * 1234* Flu [20-27] * 1234* Flu [20-27] * 1234* AIDS [35-38] * 124** Flu [35-38] * 124** Cancer [35-38] * 124** Flu [35-38] * 124** AIDS
  • 4. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1466 | P a g e Fig 1: Overall process diagram First, generalization for k-anonymity suffers from the curse of dimensionality. In order for generalization to be effective, records in the same bucket must be close to each other so that generalizing the records would not lose too much information. However, in high dimensional data, most data points have similar distances with each other, forcing a great amount of generalization to satisfy k-anonymity even forrelatively small k’s.While bucketization [14], [11], [7] has better data utility than generalization, it has several limitations.Bucketization does not prevent membership disclosure [14]. Because bucketization publishes the QI values in their original forms, an adversary can find out whether an individual has a record in the published data or not. attributes (Birthdate, Sex, and Zipcode). A microdata (e.g., census data) usually contains many other attributes besides those three attributes. This means that the membership information of most individuals can be inferred from the bucketized table. 1) Slicing Algorithms: An effective slicing algorithm to obtain ℓ - diverse slicing is offered. For a given a micro data table T and two factors c and ℓ , the algorithm calculates the sliced table that involves of c columns and gratifies the privacy requisite of ℓ - diversity. Our algorithm involves of three steps: attribute partitioning column generalization and tuple partitioning. The three phases are 1.1 Attribute Partitioning: Our algorithm divides attributes such that largely related attributes are in the same column. This is better for utility as well as privacy. With respect to privacy, the association of not related attributes shows more identification risks than that of the association of high related attributes since the association of unrelated attribute values is very less common and therefore more identifiable. Thus, it is good to split the associations among uncorrelated attributes to guard privacy. In this step, we first calculate the relations among pairs of attributes and then group attributes on the basis of their correlations. 1.2 Column Generalization Records are generalized to gratify certain minimum frequencyrequisite. 1.3 Tuple Partitioning In the tuple partitioning steps, records are divided into buckets. We change Mondrian algorithm for tuple partition. Not like Mondrian k-anonymity, no other generalization can be related to the records; we make use of the Mondrian for the reason of dividing tuples into buckets. 1.4 Membership Disclosure Protection Let us first inspect how a challenger can conclude membership data from binning. Since binning liberates the QI values in their real form and more individuals can be solely determined using the QI values, the challenger can easily settle the membership of single individual in the real data by inspecting the regularity of the QI values in the binned information. Precisely, if the regularity is 0, the challenger knows for certain that the individual is not in information. If the regularity is higher than 0, the challenger knows with good assurance that the individual is in the information, since this similar records must fit to that unique as nearly no further individual has the identical values of QI. The above perception advises that so as to defend data of members, it is necessary that, in the anonymized information, a record in the real information should have a same occurrence as a record which is not present in the original information. Or else, by investigating their occurrences in the data that is anonymized,the opponent can be able to distinguish records in the real information from records that are not present in the original information. 1.5 Sliced Data Another important advantage of slicing is its ability to handle high-dimensional data. By partitioning attributes into columns, slicing reduces the dimensionality of the data. Each column of the table can be viewed as a sub-table with a lower dimensionality. Slicing is also different from the Dataset Slicing Multiset Generaliz ation Multiset Attribute Update Process Column Generaliz ation Tuple PartitionDisproba bility Multiset Slicing Classifica tion accuracy
  • 5. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1467 | P a g e approach of publishing multiple independent sub- tables in that these sub-tables are linked by the buckets in slicing. TABLE 4 ORIGINAL DATA TABLE 5 SLICED DATA IV. ALGORITHM Our Algorithm of “Overlapping Slicing”, is presented below: 1. load dataset; 2. attribute partition and column 3. process tuple partition and buckets 4. slicing 5. undergo column generalization 6. do matching buckets 7. duplicate an attribute in more than one columns 8. end; V. VI SIMULATION WORKS/RESULTS We have simulated our system in Dot NET. We implemented and tested with a system configuration on Intel Dual Core processor, Windows XP and using Visual Studio 2008 (C#.net). We have used the following modules in our implementation part. The details of each module for this system are as follows: Fig 2: Load the dataset Fig 3: Execution of Generalization Process Fig 4 : Execution of Bucketization Process Fig 5 Resultant of Overlapping Slicing Name Age Gender Zipcode Disease Ann 20 F 12345 AIDS Bob 24 M 12342 Flu Cary 23 F 12344 Flu Dick 27 M 12344 AIDS Ed 35 M 12412 Flu Frank 34 M 12433 Cancer Gary 31 M 12453 Flu Tom 38 M 12455 AIDS (Age, Gender, Disease) (Zip-code, Disease) 20,F,Flu 12345,Flu 24,M,AIDS 12342,AIDS 23,F,AIDS 12344,AIDS 27,M,Flu 12344,Flu 35,M,Flu 12412,Flu 34,M,AIDS 12433,AIDS 31,M,Flu 12453,Flu 38,M,Cancer 12455,Cancer
  • 6. D. Mohanapriya, Dr. T. Meyyappan / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 4, Jul-Aug 2013, pp.1463-1468 1468 | P a g e VI. CONCLUSION & FUTURE WORK Thus from our theories and implementation we prove that Overlapping Slicing overcomes the limitations of existing techniques of generalization and bucketization and pre- serves better utility while protecting against privacy threats. Overlapping slicing to prevent attribute disclosure and membership disclosure. Overlapping Slicing preserves better data utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. The general methodology proposed by this work is that: before anonymizing the data, one can analyze the data characteristics and use these characteristics in data anonymization. As our future work we plan to design more effective tuple grouping algorithms. The trade-off between column generalization and tuple partitioning is the subject of future work. The design of tuple grouping algorithms is left to future work. REFERENCES [1] C. Aggarwal, “On k-Anonymity and the Curse of Dimensionality,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 901- 909, 2005. [2] A. Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The SULQ Framework,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 128-138, 2005. [3] J. Brickell and V. Shmatikov, “The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 70- 78, 2008. [4] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, “Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge,” Proc. Int’l Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007. [5] H. Cramt’er, Mathematical Methods of Statistics. Princeton Univ. Press, 1948. [6] I. Dinur and K. Nissim, “Revealing Information while Preserving Privacy,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 202-210, 2003. [7] C. Dwork, “Differential Privacy,” Proc. Int’l Colloquium Automata, Languages and Programming (ICALP), pp. 1-12, 2006. [8] C. Dwork, “Differential Privacy: A Survey of Results,” Proc. Fifth Int’l Conf. Theory and Applications of Models of Computation (TAMC), pp. 1-19, 2008. [9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating Noise to Sensitivity in Private Data Analysis,” Proc. Theory of Cryptography Conf. (TCC), pp. 265-284, 2006. [10] J.H. Friedman, J.L. Bentley, and R.A. Finkel, “An Algorithm for Finding Best Matches in Logarithmic Expected Time,” ACM Trans. Math. Software, vol. 3, no. 3, pp. 209-226, 1977. [11] B.C.M. Fung, K. Wang, and P.S. Yu, “Top- Down Specialization for Information and Privacy Preservation,” Proc. Int’l Conf. Data Eng. (ICDE), pp. 205-216, 2005. [12] G. Ghinita, Y. Tao, and P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data,” Proc. IEEE 24th Int’l Conf. Data Eng. (ICDE), pp. 715-724, 2008.