SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
1©	2019 - Kensu	Inc.w w w. k e n s u . i o
Andy	Petrella
CEO	@	Kensu
https://kensu.io
How	to	Build	A	Global	Data	Mapping
2©	2019 - Kensu	Inc.w w w. k e n s u . i o
How	I	got here…
Ms.	Mathematics
Specialization	in	Graph	Theory
Ms.	Computer	Science
Specialization	in	Semantics
Study
1999-2005
Big	Data
Open	Source:	Spark	Notebook
Speaker	at	world	class	events
Data	Science
2012-2014
Enterprise	Ready	Data	Science	Platform
Cutting-edge	distributed	and	AI	systems	
Pivot	on	Internal	Data	Activity	Tool	(Adalog)
Spark	Notebook	Enterprise
2015-2017
Data	Mapping
GDPR
And	beyond
LDPD
Today
Data	Analysis
Catalog
Large	scale	processing
International	projects
Geospatial
2006-2011
Governance-Compliance-
Performance
New	Category
GDPR
Customers	across	EU
Kensu	Data	Activity	Manager
2018-2019
3©	2019 - Kensu	Inc.w w w. k e n s u . i o
DATA	&	ACTIVITY
GOVERNANCE.	
COMPLIANCE.
PERFORMANCE.
4©	2019 - Kensu	Inc.w w w. k e n s u . i o
Agenda
What	is	it?	And… Why
Data	Mapping	&	GDPR01
Reconquer	your	IT.	Enforce	governance.
Data	Activity	for	GDPR02
Business	process	tracker,	automation	of	maintenance	and	AI-powered
Data	Activity	to	Data	Mapping03
Thinking	outside	of	the	box.
Turn	GDPR	activity	into	value04
5©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Mapping	
&	GDPR
• Collected	(from	sensor,	human,	…)
• A	result	of	a	series	of	
transformations	(processing,	AI,	…)
All	data	are	either	
one	of
• Where	the	data	was	collected	from
• How	the	data	has	been	transformed
A	Data	Mapping	
explains
• All	information	about	internal	or	
external	usage	of	data
• A	series	of	Data	Mappings
GDPR	requires	
accountability	at	the	
Business	Processes	
level,	which	is
• Sort	of,	limits	to	personal	data	onlyGDPR
6©	2019 - Kensu	Inc.w w w. k e n s u . i o
(interlude)	
Reasons	of	
GDPR	(my2¢)
Awareness of	the	value	
of	the	data
Accessibility of
• Big	Data	&	Real	Time
• Cloud
• Data	Science
Global	understanding	
of	Data	Science	and	AI	
impacts
Worries about	social	
networks	global	effects
Strategical	investments	
in	AI
Data	Science	
considered	as	“Wild	
West”
7©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Mapping	
Challenges
Mismatch
Strategy:	use	more data,	creativity
with	data,	faster on	the	market
Compliance:	pre-analysis,	periodical
checks,	manual reporting
All	fine,	but	almost	contradictory	with	
reasons	behind	GDPR	(?)
Top-Down
Manual	Periodical	Audit	/	Handmade	
Documentation
Post-processing	to	extract	GDPR	
reports
8©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Activity
Data	Activity	is	the	new	pillar	of	Data	Governance
With	Data	Activity	the	scope	of	Data	Governance	includes:
• Collection origins	(Provenance)
• Data	Transformation	Tools behaviors
Data	Activity is	the	metadata	mandatory	to	govern	Business	
Processes
• They	are	Generated by	the	Tools	and	the	IT	Systems
• It’s	a	New	Concept in	Data
• They	are	Technology	Agnostic
9©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Activity	&	GDPR
• Continually	creates	and	updates	Art.	30,	Process	Registry
• Tracks	Consents usage	and	demonstrate	accountability
• Simplifies	the	handling	or	avoid	Data	Subjects	Rights issues
• Automatically	discovers	DPIAs needs	and	requirements
• Reports	activities	and	impact	of	Data	Breach
• And	even,	tracks	the	quality	of	Profiling	&	Decision	making	(WP	29)
10©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Activity	
&	Data	
Mapping However,	Data	Activities	require	a	
dedicated	system	to	convert	their	
constant	flow	into	Compliance	assets.
Kensu	Data	Activity	Manager
Data	Activity	is	the	technical	measure ,	
the	implementation	tactic	to	create	and	
maintain	the	global	Data	Map
11©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Activity:	Implementation	Strategy
Remember	that
1. Data	Activities	are	published	by	the	Tools	themselves
2. Tools	include	legacy	and	new	systems.
Here	is	one	approach	to	create	and	use	the	global	Data	Map	at	your	advantage:
1. Choose	a	few	Business	Processes	(e.g.	Create	invoice,	create	marketing	campaign)
2. Instrument	the	tools	along	the	chain	of	value	to	send	all	their	activities
3. Start	governing	the	chosen	Business	Processes
4. Let	the	system	discover	all	other	Business	Processes	to	be	governed
12©	2019 - Kensu	Inc.w w w. k e n s u . i o
Values
Data	Activity	&	Data	Mapping
13©	2019 - Kensu	Inc.w w w. k e n s u . i o
GDPR:	
From	Manual	
to	Automated
Art.	30,	Process	Registry
Consent	Management
Data	Subjects	Rights	Management
Art.	35,	DPIA
Breaches	Management
Art.	29,	Data	Science	and	Quality	Management
14©	2019 - Kensu	Inc.w w w. k e n s u . i o
Data	Strategy
Data	Activity	&	Data	Mapping
15©	2019 - Kensu	Inc.w w w. k e n s u . i o
Validated
Governance
Data	Migration	Control
Govern	Hybrid	Infrastructures
Data	Marketplace
Segregated	Governance	across	Units
Efficient	Data	Change	Management
Control	Data	Activity	/	Science	KPIs
16©	2019 - Kensu	Inc.w w w. k e n s u . i o
Thanks!
Q/A
Check	out:	http://kensu.io
Come	visit	us	at	our	Kensu	booth

Contenu connexe

Tendances

This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019NVIDIA
 
TED Talk – Govers – IT Impact on Relational Work
TED Talk – Govers – IT Impact on Relational WorkTED Talk – Govers – IT Impact on Relational Work
TED Talk – Govers – IT Impact on Relational WorkSociotechnical Roundtable
 
mantIQ Business Intelligence Platform
mantIQ Business Intelligence PlatformmantIQ Business Intelligence Platform
mantIQ Business Intelligence PlatformA. Anil Sinaci
 
Corporate Data Goes Mobile
Corporate Data Goes MobileCorporate Data Goes Mobile
Corporate Data Goes MobileNetApp
 
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBig Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBigDataExpo
 
The Widening Technology Talent Gap
The Widening Technology Talent GapThe Widening Technology Talent Gap
The Widening Technology Talent GapMarc Hoppers
 
sustainable computing
sustainable computingsustainable computing
sustainable computingsaimashahab1
 
6 trends of proptech that are revolutionizing real estate
6 trends of  proptech that are revolutionizing real estate6 trends of  proptech that are revolutionizing real estate
6 trends of proptech that are revolutionizing real estateClevAir
 
If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"JAX Chamber IT Council
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019NVIDIA
 
Raya Yunakova, programme director, Pi Labs
Raya Yunakova, programme director, Pi LabsRaya Yunakova, programme director, Pi Labs
Raya Yunakova, programme director, Pi LabsPlace North West
 
2021 Trends in Data Science
2021 Trends in Data Science2021 Trends in Data Science
2021 Trends in Data ScienceMITAcademy1
 
Ann Clarke, join managing director, Claremont
Ann Clarke, join managing director, ClaremontAnn Clarke, join managing director, Claremont
Ann Clarke, join managing director, ClaremontPlace North West
 
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013Kevin Halter
 
Key Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in OctoberKey Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in OctoberNVIDIA
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)Inside Analysis
 
Smart data & smart energy
Smart data & smart energySmart data & smart energy
Smart data & smart energyMichel de Goede
 
Денис Гурський
Денис ГурськийДенис Гурський
Денис ГурськийSmartLviv
 

Tendances (20)

This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019This Week in Data Science - Top 5 News - April 26, 2019
This Week in Data Science - Top 5 News - April 26, 2019
 
TED Talk – Govers – IT Impact on Relational Work
TED Talk – Govers – IT Impact on Relational WorkTED Talk – Govers – IT Impact on Relational Work
TED Talk – Govers – IT Impact on Relational Work
 
mantIQ Business Intelligence Platform
mantIQ Business Intelligence PlatformmantIQ Business Intelligence Platform
mantIQ Business Intelligence Platform
 
Corporate Data Goes Mobile
Corporate Data Goes MobileCorporate Data Goes Mobile
Corporate Data Goes Mobile
 
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBig Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
 
The Widening Technology Talent Gap
The Widening Technology Talent GapThe Widening Technology Talent Gap
The Widening Technology Talent Gap
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
sustainable computing
sustainable computingsustainable computing
sustainable computing
 
6 trends of proptech that are revolutionizing real estate
6 trends of  proptech that are revolutionizing real estate6 trends of  proptech that are revolutionizing real estate
6 trends of proptech that are revolutionizing real estate
 
If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"If companies are not careful, "Big Data" will become "Big Dilbert"
If companies are not careful, "Big Data" will become "Big Dilbert"
 
Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019Top 5 Deep Learning and AI Stories - August 30, 2019
Top 5 Deep Learning and AI Stories - August 30, 2019
 
7 Big Facts About Data-Driven Innovation
7 Big Facts About Data-Driven Innovation7 Big Facts About Data-Driven Innovation
7 Big Facts About Data-Driven Innovation
 
Raya Yunakova, programme director, Pi Labs
Raya Yunakova, programme director, Pi LabsRaya Yunakova, programme director, Pi Labs
Raya Yunakova, programme director, Pi Labs
 
2021 Trends in Data Science
2021 Trends in Data Science2021 Trends in Data Science
2021 Trends in Data Science
 
Ann Clarke, join managing director, Claremont
Ann Clarke, join managing director, ClaremontAnn Clarke, join managing director, Claremont
Ann Clarke, join managing director, Claremont
 
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013
Top Mobile Apps for Construction Job-Sites_AGC Fall Conference 2013
 
Key Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in OctoberKey Healthcare Takeaways from GTC in October
Key Healthcare Takeaways from GTC in October
 
DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)DisrupTech - Robin Bloor (2)
DisrupTech - Robin Bloor (2)
 
Smart data & smart energy
Smart data & smart energySmart data & smart energy
Smart data & smart energy
 
Денис Гурський
Денис ГурськийДенис Гурський
Денис Гурський
 

Similaire à How to Build a Global Data Mapping

A moore cv_functional_aus_100215
A moore cv_functional_aus_100215A moore cv_functional_aus_100215
A moore cv_functional_aus_100215Andy Moore
 
Splunk Discovery Köln - 17-01-2020 - Willkommen!
Splunk Discovery Köln - 17-01-2020 - Willkommen!Splunk Discovery Köln - 17-01-2020 - Willkommen!
Splunk Discovery Köln - 17-01-2020 - Willkommen!Splunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
Data Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8thData Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8thJonathan Woodward
 
Splunk Discovery Dusseldorf: September 2017 - IT Ops Session
Splunk Discovery Dusseldorf: September 2017 - IT Ops SessionSplunk Discovery Dusseldorf: September 2017 - IT Ops Session
Splunk Discovery Dusseldorf: September 2017 - IT Ops SessionSplunk
 
RWDG Slides: Stay Non-Invasive in Your Data Governance Approach
RWDG Slides: Stay Non-Invasive in Your Data Governance ApproachRWDG Slides: Stay Non-Invasive in Your Data Governance Approach
RWDG Slides: Stay Non-Invasive in Your Data Governance ApproachDATAVERSITY
 
SymEx 2015 - Agile Process for Big Data Analytic
SymEx 2015 - Agile Process for Big Data AnalyticSymEx 2015 - Agile Process for Big Data Analytic
SymEx 2015 - Agile Process for Big Data AnalyticPMI Indonesia Chapter
 
Fujitsu SUSE presentation at SAPPHIRE 2016
Fujitsu SUSE presentation at SAPPHIRE 2016Fujitsu SUSE presentation at SAPPHIRE 2016
Fujitsu SUSE presentation at SAPPHIRE 2016Mike Nelson
 
EACS Newsletter 2013
EACS Newsletter 2013EACS Newsletter 2013
EACS Newsletter 2013iantaylor2100
 
Eacs newsletter 2013
Eacs newsletter 2013Eacs newsletter 2013
Eacs newsletter 2013iantaylor2100
 
EACS Newsletter 2013
EACS Newsletter 2013EACS Newsletter 2013
EACS Newsletter 2013EACS1234
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
Taxation project new
Taxation project newTaxation project new
Taxation project newIT
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)GICTTraining
 
resume_Holger_Schuetz_en
resume_Holger_Schuetz_enresume_Holger_Schuetz_en
resume_Holger_Schuetz_enHolger Schuetz
 
MIBA - MSc in Business Analytics
MIBA - MSc in Business AnalyticsMIBA - MSc in Business Analytics
MIBA - MSc in Business AnalyticsESADE
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...DataWorks Summit
 

Similaire à How to Build a Global Data Mapping (20)

A moore cv_functional_aus_100215
A moore cv_functional_aus_100215A moore cv_functional_aus_100215
A moore cv_functional_aus_100215
 
Splunk Discovery Köln - 17-01-2020 - Willkommen!
Splunk Discovery Köln - 17-01-2020 - Willkommen!Splunk Discovery Köln - 17-01-2020 - Willkommen!
Splunk Discovery Köln - 17-01-2020 - Willkommen!
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
Data Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8thData Culture Keynote and Exec Track Birm Dec 8th
Data Culture Keynote and Exec Track Birm Dec 8th
 
The 10 best performing data center solutions providers july 2017
The 10 best performing data center solutions providers july 2017The 10 best performing data center solutions providers july 2017
The 10 best performing data center solutions providers july 2017
 
Splunk Discovery Dusseldorf: September 2017 - IT Ops Session
Splunk Discovery Dusseldorf: September 2017 - IT Ops SessionSplunk Discovery Dusseldorf: September 2017 - IT Ops Session
Splunk Discovery Dusseldorf: September 2017 - IT Ops Session
 
RWDG Slides: Stay Non-Invasive in Your Data Governance Approach
RWDG Slides: Stay Non-Invasive in Your Data Governance ApproachRWDG Slides: Stay Non-Invasive in Your Data Governance Approach
RWDG Slides: Stay Non-Invasive in Your Data Governance Approach
 
Future of Big Data
Future of Big DataFuture of Big Data
Future of Big Data
 
SymEx 2015 - Agile Process for Big Data Analytic
SymEx 2015 - Agile Process for Big Data AnalyticSymEx 2015 - Agile Process for Big Data Analytic
SymEx 2015 - Agile Process for Big Data Analytic
 
Fujitsu SUSE presentation at SAPPHIRE 2016
Fujitsu SUSE presentation at SAPPHIRE 2016Fujitsu SUSE presentation at SAPPHIRE 2016
Fujitsu SUSE presentation at SAPPHIRE 2016
 
EACS Newsletter 2013
EACS Newsletter 2013EACS Newsletter 2013
EACS Newsletter 2013
 
Eacs newsletter 2013
Eacs newsletter 2013Eacs newsletter 2013
Eacs newsletter 2013
 
EACS Newsletter 2013
EACS Newsletter 2013EACS Newsletter 2013
EACS Newsletter 2013
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Taxation project new
Taxation project newTaxation project new
Taxation project new
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)
 
resume_Holger_Schuetz_en
resume_Holger_Schuetz_enresume_Holger_Schuetz_en
resume_Holger_Schuetz_en
 
MIBA - MSc in Business Analytics
MIBA - MSc in Business AnalyticsMIBA - MSc in Business Analytics
MIBA - MSc in Business Analytics
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
 

Plus de Andy Petrella

Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best PracicesAndy Petrella
 
Interactive notebooks
Interactive notebooksInteractive notebooks
Interactive notebooksAndy Petrella
 
Governance compliance
Governance   complianceGovernance   compliance
Governance complianceAndy Petrella
 
Data science governance and GDPR
Data science governance and GDPRData science governance and GDPR
Data science governance and GDPRAndy Petrella
 
Data science governance : what and how
Data science governance : what and howData science governance : what and how
Data science governance : what and howAndy Petrella
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scalaAndy Petrella
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Andy Petrella
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformAndy Petrella
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...Andy Petrella
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserAndy Petrella
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open ScienceAndy Petrella
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 

Plus de Andy Petrella (20)

Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
Interactive notebooks
Interactive notebooksInteractive notebooks
Interactive notebooks
 
Governance compliance
Governance   complianceGovernance   compliance
Governance compliance
 
Data science governance and GDPR
Data science governance and GDPRData science governance and GDPR
Data science governance and GDPR
 
Data science governance : what and how
Data science governance : what and howData science governance : what and how
Data science governance : what and how
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browser
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open Science
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Spark devoxx2014
Spark devoxx2014Spark devoxx2014
Spark devoxx2014
 

Dernier

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Dernier (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

How to Build a Global Data Mapping

  • 1. 1© 2019 - Kensu Inc.w w w. k e n s u . i o Andy Petrella CEO @ Kensu https://kensu.io How to Build A Global Data Mapping
  • 2. 2© 2019 - Kensu Inc.w w w. k e n s u . i o How I got here… Ms. Mathematics Specialization in Graph Theory Ms. Computer Science Specialization in Semantics Study 1999-2005 Big Data Open Source: Spark Notebook Speaker at world class events Data Science 2012-2014 Enterprise Ready Data Science Platform Cutting-edge distributed and AI systems Pivot on Internal Data Activity Tool (Adalog) Spark Notebook Enterprise 2015-2017 Data Mapping GDPR And beyond LDPD Today Data Analysis Catalog Large scale processing International projects Geospatial 2006-2011 Governance-Compliance- Performance New Category GDPR Customers across EU Kensu Data Activity Manager 2018-2019
  • 3. 3© 2019 - Kensu Inc.w w w. k e n s u . i o DATA & ACTIVITY GOVERNANCE. COMPLIANCE. PERFORMANCE.
  • 4. 4© 2019 - Kensu Inc.w w w. k e n s u . i o Agenda What is it? And… Why Data Mapping & GDPR01 Reconquer your IT. Enforce governance. Data Activity for GDPR02 Business process tracker, automation of maintenance and AI-powered Data Activity to Data Mapping03 Thinking outside of the box. Turn GDPR activity into value04
  • 5. 5© 2019 - Kensu Inc.w w w. k e n s u . i o Data Mapping & GDPR • Collected (from sensor, human, …) • A result of a series of transformations (processing, AI, …) All data are either one of • Where the data was collected from • How the data has been transformed A Data Mapping explains • All information about internal or external usage of data • A series of Data Mappings GDPR requires accountability at the Business Processes level, which is • Sort of, limits to personal data onlyGDPR
  • 6. 6© 2019 - Kensu Inc.w w w. k e n s u . i o (interlude) Reasons of GDPR (my2¢) Awareness of the value of the data Accessibility of • Big Data & Real Time • Cloud • Data Science Global understanding of Data Science and AI impacts Worries about social networks global effects Strategical investments in AI Data Science considered as “Wild West”
  • 7. 7© 2019 - Kensu Inc.w w w. k e n s u . i o Data Mapping Challenges Mismatch Strategy: use more data, creativity with data, faster on the market Compliance: pre-analysis, periodical checks, manual reporting All fine, but almost contradictory with reasons behind GDPR (?) Top-Down Manual Periodical Audit / Handmade Documentation Post-processing to extract GDPR reports
  • 8. 8© 2019 - Kensu Inc.w w w. k e n s u . i o Data Activity Data Activity is the new pillar of Data Governance With Data Activity the scope of Data Governance includes: • Collection origins (Provenance) • Data Transformation Tools behaviors Data Activity is the metadata mandatory to govern Business Processes • They are Generated by the Tools and the IT Systems • It’s a New Concept in Data • They are Technology Agnostic
  • 9. 9© 2019 - Kensu Inc.w w w. k e n s u . i o Data Activity & GDPR • Continually creates and updates Art. 30, Process Registry • Tracks Consents usage and demonstrate accountability • Simplifies the handling or avoid Data Subjects Rights issues • Automatically discovers DPIAs needs and requirements • Reports activities and impact of Data Breach • And even, tracks the quality of Profiling & Decision making (WP 29)
  • 10. 10© 2019 - Kensu Inc.w w w. k e n s u . i o Data Activity & Data Mapping However, Data Activities require a dedicated system to convert their constant flow into Compliance assets. Kensu Data Activity Manager Data Activity is the technical measure , the implementation tactic to create and maintain the global Data Map
  • 11. 11© 2019 - Kensu Inc.w w w. k e n s u . i o Data Activity: Implementation Strategy Remember that 1. Data Activities are published by the Tools themselves 2. Tools include legacy and new systems. Here is one approach to create and use the global Data Map at your advantage: 1. Choose a few Business Processes (e.g. Create invoice, create marketing campaign) 2. Instrument the tools along the chain of value to send all their activities 3. Start governing the chosen Business Processes 4. Let the system discover all other Business Processes to be governed
  • 12. 12© 2019 - Kensu Inc.w w w. k e n s u . i o Values Data Activity & Data Mapping
  • 13. 13© 2019 - Kensu Inc.w w w. k e n s u . i o GDPR: From Manual to Automated Art. 30, Process Registry Consent Management Data Subjects Rights Management Art. 35, DPIA Breaches Management Art. 29, Data Science and Quality Management
  • 14. 14© 2019 - Kensu Inc.w w w. k e n s u . i o Data Strategy Data Activity & Data Mapping
  • 15. 15© 2019 - Kensu Inc.w w w. k e n s u . i o Validated Governance Data Migration Control Govern Hybrid Infrastructures Data Marketplace Segregated Governance across Units Efficient Data Change Management Control Data Activity / Science KPIs
  • 16. 16© 2019 - Kensu Inc.w w w. k e n s u . i o Thanks! Q/A Check out: http://kensu.io Come visit us at our Kensu booth