Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Governance compliance

124 vues

Publié le

Recipes for GDPR-friendly Data Science

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Governance compliance

  1. 1. www.kensu.io GOVERNANCE AND COMPLIANCE 1 Recipes for GDPR-friendly Data Science
  2. 2. www.kensu.io ANDY -|- KENSU 2 Andy Petrella - Founder @ Kensu Maths MSc / Computer Science MSc 10+ years in data computing (science?) http://kensu.io Analytics, AI Governance 2 Analytics Governance Perform ance Compliance
  3. 3. www.kensu.io a. Data Privacy b. Risk c. Ethic I. COMPLIANCE x. How to guarantee compliance
  4. 4. www.kensu.io A. DATA PRIVACY Information privacy, also known as data privacy or data protection, is the relationship between the collection and dissemination of a. data,  b. technology, c. the public expectation of privacy,  d. legal  and political issues surrounding them.[1] Privacy  concerns exist wherever  personally identifiable information  or other  sensitive information  is collected, stored, used, and finally destroyed or deleted – in digital form or otherwise. Improper or non-existent disclosure control can be the root cause for privacy issues. https://en.wikipedia.org/wiki/Information_privacy
  5. 5. www.kensu.io Each  controller/processor  shall maintain a record of processing activities under its responsibility (cf. Art. 30). That record shall contain many information including: • The purposes of the processing • A description of the categories of data subjects and of the categories of personal data
 
 etc. A. DATA PRIVACY GDPR
  6. 6. www.kensu.io A. DATA PRIVACY Prior to collecting Californian’s personal data, businesses must disclose in their privacy policy:
 “the categories of personal information to be collected and the purposes for which the categories of personal information shall be used”
 with any additional uses requiring notice to the consumer CaCPA: California Consumer Privacy Act of 2018
  7. 7. www.kensu.io B. RISKS Risks are present wherever data is used: - Managing business risks with data - Building new data business https://www.eiuperspectives.economist.com/sites/default/files/RetailBanksandBigData.pdf
  8. 8. www.kensu.io B. RISKS - Retail worry about credit risk:
 imbalance between the sizes of classes (defaulters <<< non-defaulters) generates overly optimistic scores…
 - Commercial focus on market risk:
 VaR and variations requires important backtesting
 - Investment are concerned about operational risk:
 Just think about BCBS… govern, monitor, control! Business’ risks… risks
  9. 9. www.kensu.io B. RISKS Intrinsic https://unicsoft.net/risks-data-science-project/
  10. 10. www.kensu.io B. RISKS Intrinsic Loosers Records stolen JP Morgan Chase 76,000,000 Evernote 50,000,000 eBay 145,000,000 Target 70,000,000 LinkedIn 117,000,000 Yahoo 1,000,000,000
  11. 11. www.kensu.io B. RISKS Intrinsic Improper Analytics One tiny mistake can ruin the whole project. Low Data Quality Even most advanced analytics methods fail with incorrect data
  12. 12. www.kensu.io C. ETHIC Data Ethics refers to systemising, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. Data ethics is different from information ethics because the focus of information ethics is more concerned with issues of intellectual property. https://en.wikipedia.org/wiki/Big_data_ethics While data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers — governments — large corporations.
  13. 13. www.kensu.io C. ETHIC WAT? http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf Data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data - generation - recording - processing - dissemination - sharing and use algorithms - artificial intelligence - artificial agents - machine learning - robots (well…) practices - responsible innovation - programming - hacking - professional codes in order to formulate and support morally good solutions
  14. 14. www.kensu.io C. ETHIC WAT? http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf Data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data - generation - recording - processing - dissemination - sharing and use algorithms - artificial intelligence - artificial agents - machine learning - robots (well…) practices - responsible innovation - programming - hacking - professional codes in order to formulate and support morally good solutions E T H I C ? E T H I C ? E T H I C ?
  15. 15. www.kensu.io C. ETHIC WAT? The ethics of data focuses on ethical problems posed by the collection and analysis of large datasets and on issues ranging from the use of big data in - biomedical research and social sciences - profilings - advertising - data philanthropy - open data
  16. 16. www.kensu.io C. ETHIC WAT? The ethics of algorithms addresses issues posed by the increasing complexity and autonomy of algorithms broadly understood, especially in the case of machine learning applications. Crucial challenges include moral responsibility and accountability of both designers and data scientists with respect to unforeseen and undesired consequences as well as missed opportunities.
  17. 17. www.kensu.io C. ETHIC WAT? The ethics of practices addresses the pressing questions concerning the responsibilities and liabilities of people and organizations in charge of data processes, strategies and policies, including data scientists’ work to ensure ethical practices fostering the protection of the data subject rights.
  18. 18. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf
  19. 19. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Non-discrimination Right to explanation
  20. 20. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Non-discrimination 1. Article 21 of the Charter of Fundamental Rights of the European Union 2. Article 14 of the European Convention on Human Rights 3. Articles 18-25 of the Treaty on the Functioning of the European Union.
  21. 21. www.kensu.io C. ETHIC Automated decision-making https://www.miamiherald.com/news/nation-world/national/article89562297.html Discrimination… can be unintended
  22. 22. www.kensu.io C. ETHIC Automated decision-making https://www.miamiherald.com/news/nation-world/national/article89562297.html Discrimination… can be unintended “Ingress players, like the database volunteers, appeared to skew male, young and English-speaking, […]. 
 Though the surveys did not gather data on race or income levels, the average player spent almost $80 on the Ingress game […] suggesting access to disposable income.”
  23. 23. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation Profiling is inherently discriminatory 
 Data subjects are grouped in categories and decisions are made on this basis Plus, as said, machine learning can reify existing patterns of discrimination
 
 Consequences: Biased decisions are presented as the outcome of an “objective” algorithm.
  24. 24. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation Standard supervised machine learning algorithms are based on discovering reliable associations to make predictions. There is no concern for causal reasoning or “explanation”
  25. 25. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation For Burrell in How the machine “thinks”: Understanding opacity in machine learning algorithms, there are three barriers to transparency 1. Intentional hiding of the decision procedures by corporations 2. Code sources are overly complex 3. Machine learning can reason at very high dimensions, humans’ brains don’t
  26. 26. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE a. Monitoring b. Automated Reporting
  27. 27. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE In (data) engineering, processes have improved to satisfy the need for stability, quality and compliance by introducing: 1. logging 2. testing 3. continuous deployment Monitoring
  28. 28. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE Data science projects are slightly different in nature than pure engineering projects. In that, most issues may come from the dynamicity of the experimentations and the volatility of the data. Such that, monitoring becomes key to AUTOMATED compliance! Monitoring
  29. 29. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE For data project, monitoring is about: - what/how data are used (e.g. data lineage, products, …) - what/how models are build (e.g. methods, metrics, …) - where/how data products are used (e.g. marketing, fraud, …) Monitoring
  30. 30. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE Pursuing the parallel with engineering: 
 CI/CD and Q/A are similar to our current compliance needs! Automated Reporting The automation of compliance can be approached with a set of rules to estimate the level of risks and to limit the efforts to only actionable events. Reporting is mandatory for compliance.
 Reports can be generated from the conjunction of monitored activities and established rules dictated by regulations.
  31. 31. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE The Kensu way: Data Activity Manager Monitor Automated Registry Report
  32. 32. www.kensu.io a. Data in the Wild b. Effects of contraints II. GOVERNANCE x. How to govern
  33. 33. www.kensu.io A. DATA IN THE WILD Working on data is perceived as the Wild West. • Experimentations in highly dynamic environments (e.g. notebooks) • Local copy or duplication of datasets • Creation of intermediate dumps (models, prepared datasets)
  34. 34. www.kensu.io B. EFFECTS OF CONTRAINTS Adding constraints (policies) to govern is a classic… So, we would have the following examples: - predefine the set of needed data - list methods to be used - create documents… maintain them Rules, laws, …
  35. 35. www.kensu.io B. EFFECTS OF CONTRAINTS The consequences of such constraints are: • Lack of freedom • Anonymization • what about marketing use case • what is the reliability of the process • anonymisation is actually itself a process to be listed! • poor/slow reactivity to market changes (performance drop) … might not be best
  36. 36. www.kensu.io X. HOW TO GOVERN For compliance reasons, we have to introduce monitoring. Monitoring data opens new governance doors: - Govern data activities with a bottom-up approach - Control vs Constrain In other terms, data governance in a data-driven fashion
  37. 37. www.kensu.io X. HOW TO GOVERN The Kensu way: Data Activity Manager
  38. 38. www.kensu.io THANKS! http://kensu.io Analytics, AI Governance Analytics Governance Perform ance Compliance Q/A Checkout Kensu Data Activity Manager

×