Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Business added value of GDPR's accountability principles

276 vues

Publié le

Trough the lense of Data Science Governance and data (science) driven enterprise.
How to adapt your business to renew and disrupt your market

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Business added value of GDPR's accountability principles

  1. 1. www.kensu.io DATA SCIENCE GOVERNANCE 1 Turn GDPR’s accountability principles into an added-value for your business Big Data Spain, 2017
  2. 2. www.kensu.io 2 - CEO & Founder - Mathematics Computer Science ANDY PETRELLA KENSU & ME Started in Belgium by building en enterprise stack for Data Scientists (Agile Data Science Toolkit) Pivot on internal component: Data Science Catalog Focus on Data Science Governance Accelerated by Alchemist Accelerator in San Francisco and The Faktory in Belgium Kensu Inc. in October! Spark Notebook O’Reilly Training O’Reilly Book
  3. 3. www.kensu.io TOPICS 1. Some thoughts on “Data Science” 2. Data Science Governance: What 3. Data Science Governance: How 4. GDPR: Accountability principle and transparency 5. Business advantages 3
  4. 4. www.kensu.io SOME THOUGHTS ON “DATA SCIENCE” 4
  5. 5. www.kensu.io MACHINE LEARNING Pioneers in 1950s AI Winter in 1970s due pessimism Resurgence in 1980s Machine Learning (and related) is used since the 1990s (esp. SVM and RNN) Deep learning see widespread commercial use in 2000s Machine learning receives great publicity (read: buzz) in 2010s 5ref: https://en.wikipedia.org/wiki/Timeline_of_machine_learning
  6. 6. www.kensu.io DATA SCIENCE: +ENGINEERING Claim: “Data Scientist” coined by DJ Patil in 2008. Pretty much where Machine Learning was part of Softwares In a way, when we added “engineering” to the mix Also, engineering is even more prominent with Big Data Distributed Computing 6
  7. 7. www.kensu.io DATA SCIENCE: +EXPERIMENTATION So much data available So many tools, libraries, frameworks, … So many things we can try We have distributed computing now, right? => Let’s try everything Discover new insights (and potentially new businesses) 7
  8. 8. www.kensu.io DATA SCIENCE: RECAP Maths: stats, machine learning and so on Engineering: ETL, Databases, Computing framework, Softwares, Platforms, … Creativity: “From business intelligence To intelligent business”- Michael Fergusson Data Science is an umbrella on top of all activities on data 8
  9. 9. www.kensu.io DON’T BELIEVE ME? 9https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  10. 10. www.kensu.io DATA SCIENCE GOVERNANCE: WHAT 10
  11. 11. www.kensu.io DATA PIPELINE Data pipeline is connecting activities on data, potentially involving several technologies. A pipeline is generally thought as an End-to-End processing line to solve one problem. But, part of pipelines are reused to save computation, storage, time, … Thus interdependency between pipeline segments grows with initiatives 11
  12. 12. www.kensu.io GOAL: TAKE DECISION Data Pipelines, connected together, aren’t created for the beauty of it. The ultimate goal is always to take decisions. Decisions are generally taken or linked to humans with responsibilities.
 (even for self driving cars, in case of problem) Given that pipelines are cut-and-wired, interleaved, … How not to be anxious at deploying the last piece used by the decision maker 12
  13. 13. www.kensu.io SOURCES OF ANXIETY What if: • one of the data used in the process has different patterns suddenly? • one of the tools, projects or similar is modified upstream? • the insights are deviating from the reality? • … 13
  14. 14. www.kensu.io DEBUGGING? To reduce the anxiety or, actually, reducing the risks, we need ways to debug. In pure engineering, we have unit, function, integrations tests,… but How do we do when the problems come from the data themselves? We can’t generate all cases of data variations, right? How to debug? 
 Without the big picture, we may try to optimise a model for weeks for nothing 14
  15. 15. www.kensu.io DATA SCIENCE GOVERNANCE Data governance: controls that data meets precise standards and involves monitoring against production data. Data Science Governance: control that data activity meets precise standards and involves monitoring against production data activity. A Data Activity is described by at least technologies, users, systems, data, processing 15
  16. 16. www.kensu.io GOVERNING DATA SCIENCE Who does what on which data and where it is done? What is the impact of a process on the global system? What are the performance metrics (quality, execution,…) of the processes? 16
  17. 17. www.kensu.io CONTINUOUS INTEGRATION FOR DATA SCIENCE Data Scientists/Citizens have a view on all the activities applied to the original sources used in his/her own process. They also have a control on their own results in production They have the opportunity to analyse and debug a pipeline involving all activities: • independently of the technologies • involving several people in the enterprise 17
  18. 18. www.kensu.io DATA SCIENCE GOVERNANCE: HOW 18
  19. 19. www.kensu.io CHALLENGES So many tools are using data! The number of processing is growing impressively. We have to take care of the legacy… 19
  20. 20. www.kensu.io GET THE DATA As usual, we have to collect the right data to take right decision. First run an assessment to create a high level map of all the tools involved into a company. For each tool, do whatever it takes to collect information about the activities it is creating. Information are metadata, lineage, statistics, accuracy measures, … 20
  21. 21. www.kensu.io CONNECT THE DATA Data Science Governance needs the global picture. To do that we need to connect all data that can be collected. So that, it is possible to create a cartography of all on-going processes. This map tracks all data and their descendants 21
  22. 22. www.kensu.io USE THE DATA This is where the fun part starts… the map of data activities is an amazing source of information Here are a few things you can think of when using this kind of data: • impact analysis • dependency analysis • optimisation • recommendation 22
  23. 23. www.kensu.io GDPR 23 General Data Protection Regulation
  24. 24. www.kensu.io ACCOUNTABILITY PRINCIPLE Implement appropriate technical and organisational measures that ensure and demonstrate that you comply. This may include internal data protection policies such as staff training, internal audits of processing activities, and reviews of internal HR policies. 24
  25. 25. www.kensu.io TRANSPARENCY As well as your obligation to provide comprehensive, clear and transparent privacy policies, if your organisation has more than 250 employees, you must maintain additional internal records of your processing activities. 25
  26. 26. www.kensu.io ACCOUNTABILITY: DATA SCIENCE GOVERNANCE To govern data science, we have to: • collect activities • connect activities With this information we can reliably create automatically the process registry 26
  27. 27. www.kensu.io TRANSPARENCY: DATA SCIENCE GOVERNANCE To govern data science seen as a continuous integration solution: 
 we have to explain and measure activities independently of the technologies. With this information we can reliably create transparent reports of activities across the whole chain of processing 27
  28. 28. www.kensu.io CONSEQUENCES 28 Connect data and business Spoiler attack: one-line ahead
  29. 29. www.kensu.io DATA TO BUSINESS 29 Business KPIs are nothing but data!
  30. 30. www.kensu.io BUSINESS TO DATA 30 Change the business to match the data ADAPT!
  31. 31. www.kensu.io KENSU Taking the idea further 31
  32. 32. www.kensu.io 32 SOLUTION: DATA SCIENCE ON DATA SCIENCE Data: Oracle Activity: Tensorflow (*) collect activities metadata (*) performance optimisations Data Science Governance CompliancePerformance
  33. 33. www.kensu.io OUR PRODUCT: KENSU DATA ACTIVITY MANAGEMENT 33 Data Science Governance First Governance, Compliance and Performance solution for Data science Feature Benefit Why it matters Connect.Collect.Learn Automatically captures all data science relevant activities related to governance, compliance and performance within a given domain. Provided end-to-end control and insights into all relevant aspects of data science related activities
 #GDPR DPO Dashboard One-stop control center for all potential data privacy violations Near-realtime notifications and actionable intelligence current state of “compliance health” #GDPR Compliance Reporting One-click reports for all relevant governance and compliance reports Guarantee for good relationship with authorities in charge by respecting their templates #GDPR
  34. 34. www.kensu.io DATA SCIENCE GOVERNANCE Andy Petrella CEO Co Founder @noootsab @kensuio