1. Ethics in DW&DM R.Abethan (PGM-IT10-0410) MSC-IT Sri Lanka Institute of Information Technology 9 Oct 2010
2. Ethic Therules of conduct recognizedinrespecttoaparticular class of humanactionsoraparticulargroup,culture,etc. 2
3. Road Map What data mining can do? Why Ethics in DW & DM? Who is responsible? Ethically Speaking Summary Conclusion 3
4. What data mining can do? Data mining provides correlations market basket analysis neural networks other advanced artificial intelligence (AI) allowing discovery of patterns and relationships where none existed before. Data mining works because it produces higher levels of confidence with higher volumes of information at its disposal. 4
5. Why Ethics in DW & DM? Data is sensitive When applied to people, DW and DM is frequently used to discriminate who gets the loan who gets the special offer, and so on Certain kinds of discrimination Racial Sexual religious, and so on are not only unethical but also illegal. 5
6. Who is responsible? The project manager is responsible for providing the tools that the business uses to gain new insights. “The project manager should worry about what uses the data will be put to within the organization, they have a need to establish different layers/gatekeepers and qualifications on who has access to the information,” “The task of deciding what is ethical usage and what is not falls on focus groups of business users to look at nomenclature, access and security.” Dr. Donald Burton, Executive Director, The International Import Export Institute. 6
7. Without these considerations…. There is a chance that end-users may have access to information that they should not be examining. Without knowing it the end-user may break federal regulations, state laws, or worse. 7
8. Ethically Speaking… The implementers of the technology are simply told to integrate the data, and the project manager builds a project to make it happen (with the support of the business). In the future, as ethical concerns become a hot topic in local governments, it will be more important that they begin to ask the business users to supply the documents that outline access, roles, and ethical uses of the information they will receive. 8
9. There are also ethical considerations around the use of basic ETL processes and BI tools in the small data set arena. Ethical considerations abound with small data sets being moved from source systems to target systems for testing purposes. It doesn’t have to be a large data set to be an ethical concern, although large data sets lend themselves to a particular host of ethical problems such as profiling and segmentation: users are learning things they shouldn’t know, and in some cases aren’t allowed to know (especially in classified areas). 9
10. The PM must decide of the publicly available information, which is acceptable to integrate and which is potentially a risky proposition (once integrated, may raise ethical concerns). Eg: Yahoo Subscribers Religion wise……. End users may begin to ask the warehousing team to integrate external data sources such as stock trades, financial portfolio information, newsletter, and yahoo subscription information. All of which is public (to a degree). 10
11. Summary… Data mining and data warehousing raise ethical and legal issues Combining information via data warehousing could violate Privacy Act Must tell people how their information will be used when the data is obtained Data mining raises ethical issues mainly during application of results E.g. using ethnicity as a factor in loan approval decisions E.g. screening job applications based on age or sex (where not directly relevant) E.g. declining insurance coverage based on neighbourhood if this is related to race (“red-lining” is illegal in much of the US) Whether something is ethical depends on the application E.g. probably ethical to use ethnicity to diagnose and choose treatments for a medical problem, but not to decline medical insurance 11
12. Checklist for Project Manager and technology implementers… Develop SLA’s with end users that define who has access to what levels of information Have end-users involved in defining the ethical standards of use for the data that will be delivered. Define the bounds around the integration efforts of public data, where it will be integrated and where it will not – so as to avoid conflicts of interest. Do not use “live” or real data for testing purposes – or lock down the test environment; too often test environments are left wide-open and accessible to too many individuals. Define where, how, and who will be using Data Mining – restrict the mining efforts to specific sets of information. Build a notification system to monitor data mining usage. Allow customers to “block” the integration of their own information (this one is questionable) depending on if the customer information after integration will be made available on the web. Remember that any efforts made are still subject to governmental laws. 12
13. Conclusion It is a challenging quest to maintain balance, control and security over our ever growing data sets. It’s also our duty to examine the ethical consequences of the business decisions we make through the use of that information. Finally we must consider the quality of the information we are basing our decisions on. Incorrect information can harm more than it can help. 13