Now a days, data management applications have evolved from pure storage and retrieval of information to finding interesting patterns and associations from large amounts of data. With the advancement of Internet and networking technologies, more and more computing applications, including data mining programs, are required to be conducted among multiple data sources that scattered around different spots, and to jointly conduct the computation to reach a common result. However, due to legal constraints and competition edges, privacy issues arise in the area of distributed data mining, thus leading to the interests from research community of both data mining.
In this project each party participates in a protocol to learn the output of some function f over the joint inputs of the parties. We mainly focus on the DNCC model instead of considering a probabilistic extension. Deterministic Non Cooperative Computation needs to be extended to include the possibility of collusion.
2. ABSTRACT
• In many cases, competing parties who have private data
may collaboratively conduct privacy-preserving distributed
data analysis (PPDA) tasks to learn beneficial data models
or analysis results. The field of privacy has seen rapid
advances in recent years because of the increases in the
ability to store data. In particular, recent advances in the
data mining field have lead to increased concerns about
privacy.
• It is often highly valuable for organizations to have their
data analyzed by external agents. However, any program
that computes on potentially sensitive data risks leaking information through its output. Differential privacy provides a
theoretical framework for processing data while protecting
the privacy of individual records in a dataset.
3. EXISTING SYSTEM
• SECURE MULTIPARTY COMPUTATION
• Definition:
In existing, we generally assume that
participating parties provide truthful inputs. This
assumption is usually justified by the fact that
learning the correct data analysis models or
results is in the best interest of all participating
parties. If any party does not want to learn data
models and analysis results, the party should not
participate in the protocol.
4. PROPOSED SYSTEM
• The term incentive compatible means that participating
parties have the incentive or motivation to provide their
actual inputs when they compute functionality. Although
SMC-based privacy-preserving data analysis protocols
(under the malicious adversary model) can prevent
participating parties from modifying their inputs once the
protocols are initiated, they cannot prevent the parties
from modifying their inputs before the execution. On the
other hand, parties are expected to provide their true
inputs to correctly evaluate a function that satisfies the
NCC model. Therefore, any functionality that satisfies the
NCC model is inherently incentive compatible under the
assumption that participating parties prefers to learn the
function result correctly and if possible exclusively. Now
the question is which functionalities or data analysis tasks
satisfy the NCC model.
5. ADVANTAGES IN PROPOSED SYSTEM
• Each of these deals with the problem of
ensuring truthfulness in data mining.
However, each one requires the ability to
verify the data after the calculation.
• Although verification based techniques are
very useful, there are cases where verification
is not feasible due to legal, social and privacy
concerns.
7. Module Description
• USER INTERFACE DESIGN:
•
In this module we create a user page using Graphical User
Interface(GUI), which will be the media to Connect user with the server
and through which client can able to give request to the server and server
can send the response to the client, through this module we can establish
the communication between client and server using webpage.
•
A program interface that takes advantage of
the computer's graphics capabilities to make the program easier to use.
Well-designed graphical user interfaces can free the user from learning
complex command languages. On the other hand, many users find that
they work more effectively with a command-driven interface, especially
if they already know the command language. Its goal is to enhance the
efficiency and ease of use for the underlying logical design of a
stored program. Thus the user interacts with information by manipulating
visual widgets that allow for interactions appropriate to the kind of data
they hold. The widgets of a well-designed interface are selected to
support the actions necessary to achieve the goals of the user.
8. Module
Description(contd..)
• CREATE MULTIPLE ORGANIZATIONS:
This is second module of our project. Here we are design no. of
parties. Each and every party may have information to store their
database. All the parties may send their inputs to Data Analysis module.
Here all n no. of parties will send their inputs to single data analysis . The
data analysis will store their inputs either horizontal or vertical partitions.
In this module we can create no. of parties. Each and every party may
nave own data base it can store their information either vertical portion or
horizontal portion.
9. Module
Description(contd..)
• DATA ANALYSIS AND INTEGRATION:
This is the third module of our project. Our Data Analysis designed
using cryptographic techniques. Data are generally assumed to be either
vertically or horizontally partitioned. In the case of horizontally
partitioned data, different sites collect the same set of information about
different entities. In the case of vertically partitioned data, we assume
that different sites collect information about the same set of entities. A
party can store their input data either vertical partition or horizontal
partitioned. If parties choose horizontal partition then the input data for
many different individuals. Same way if parties choose horizontal
partition then the input data for many different individuals.
10. Module
Description(contd..)
• Inputs computation model
•
This is fourth module of our project. This model to design for compute
all the truthful inputs of all participating parties here going to assumptions
like the first priority for every participating party is to learn the correct
result. Another one is, if possible, every participating party prefers to
learn the correct result exclusively.
11. Module
Description(contd..)
• ASSOCIATION DATA MINING
• This is last module of our project. Our data mining is summarize the
association rule mining and analyze whether the association rule mining
can be done in an incentive compatible manner over horizontally or
vertically partitioned database. If get in the requested query then it
search where it is located either horizontal partition or vertical partition
retrieve the result from partition after that result send to particular party.
12. TECHNIQUE USED
ASSOCIATION RULEMINING ALGORITHM
The above definition simply states what function could be computed in NCC setting
deterministically (i.e., computation result is correct with probability one), and no
party could correctly compute the correct result once the party lies about his or her
inputs in a way that changes the original function result. In other words, if a party i
replaces its true input vi with v_ i and if f(v_ i, v−i) _= f(vi, v−i), then party i should
not be able to calculate the correct f(vi, v−i) from f(v_ i, v−i). And vi. Note that
strategy (ti, gi) means that the way the input is modified, denoted by ti, and the way
the output is calculated, denoted by gi. In ti can be considered as choosing a value
different from the actual input, and gi can be considered as the ways the correct μ
and s2 are computed. Another implication of the above definition is that for any
ti, the corresponding gi should be deterministic, because each party want to exactly
compute the “correct” result.
13. • A two-party protocol is proposed to securely compute JC.
The protocol consists of two stages
15. System Architecture Description
• In above diagram contains client
Login, Database, Work Allocation, Worker
Page, Computing, Reposting, and Work
Grouping. First computation node will start
running. After party node enter user name and
password that is validated by compatible node.
Then computation node assigns the work to the
data mining nodes. Data mining node finishes
his work and reposted to the compatible node.
TTP collects the inputs of parties and group of
parties input for particular work presented by
party nodes.
16. USE CASE DIAGRAM
party1
private inputs
TTP
input
compute the input data
party2
function over join the inputs
NCC model
vertical portion
party3
horizantal portion
Data mining
18. SEQUENCE DIAGRAM
parties
data analysis
NCC Model
Rule mining
to store data
either v
ertical or horizantal
sending the inputs
all the inputs are compute
diff inputs og parties stored
sending requested data to NCC
response
28. Conclusion
• Even though privacy-preserving data analysis
techniques guarantee that nothing other than the final
result is disclosed, whether or not participating parties
provide truthful input data cannot be verified. In this
paper, we have investigated what kinds of PPDA tasks
are incentives compatible under the NCC model.
Based on our findings, there are several important
PPDA tasks that are incentive driven. Table II
classifies the common data analysis tasks studied in
this paper into DNCC or Non-DNCC categories. Most
often, data partition schemes can make a difference in
determining DNCC or Non-DNCC classifications.
Notes de l'éditeur
In the above diagram contain actors like parties, NCC and Data mining model reaming models are use case like inputs, function over join the inputs. A party sends inputs to the NCC actor. This actor assign this work to TTP, it compute all the input parties and again send to function every parties. After then party sends some request related input in that time this model assign this work to data mining.
In the above diagram contains classes like parties, NCC model, and data mining and competitive model. Data mining class is maintained to store the information of parties either vertical portion or horizontal portion. After choose party store their data in mining all parties send their input data to the NCC class, this model assign this work to the TTP. It computes all the input information and send back to the parties.
It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. In the sequence e diagram after the login of both party and data analysis first step is data mining identify the party’s nodes after identifying the data analysis nodes next step is splitting the work. After the splitting the work next step is work allocation. Then next step is sending the inputs to the TTP nodes. Then we are computing the work finally finished work is reposted to data mining.
Activity diagrams are graphical representations of workflows of stepwise activities and actions with support for choice, iteration and concurrency. An activity diagram shows the overall flow of control. Above diagram tells about the activity processed in the party node. First dote is represented the starting point tin that we are starting the party node. It will go to the validation process if it is no valid it moves to the login page and stop. If its valid means the activity of data analysis is continuing again data mining activity is related to the identifying the requested inputs and distribution.