SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Using ML to Protect Customer Privacy
by fmr Amazon Sr PM
www.productschool.com
CERTIFICATES
Your Product Management Certificate Path
Product Leadership
Certificate™
Full Stack Product
Management Certificate™
Product Management
Certificate™
20 HOURS
40 HOURS
40 HOURS
Corporate
Training
Level up your team’s Product
Management skills
Free Product Management Resources
BOOKS
EVENTS
JOB PORTAL
COMMUNITIES
bit.ly/product_resources
COURSES
Using ML to
protect customer
privacy
Pushpak Pujari
PM at Verkada | ex Sr. PM at Amazon
Bio
PM at Verkada for Security Cameras
Sr. PM at Amazon Alexa AI - Privacy
Sr. PM at Amazon Web Services IoT
Wharton MBA, EE from IIT Delhi
Hobbies: Tennis, Hiking, Beer Brewing
Takeaways from this Webinar
Privacy fundamentals
Privacy preservation techniques
Using ML for privacy – a walkthrough
Strategies for being an impactful Privacy PM
Why Privacy Matters
Companies collect and retain tons of
customer data:
• Fulfilling a service request
• Legal or regulatory requirements
• Better CX: recommendations, marketing etc.
• Resell data to 3P
Collected data can contain sensitive
information
Such data landing into wrong hands can be
devastating – both for customer and the
organization
Why Privacy Matters
• Data breaches happen way more
frequently than you think
• Data is spread across different
organizations and medium. Almost
impossible to track data lineage
• Rise of privacy laws (HIPAA, GDPR, CCPA,
COPPA etc.) with more coming soon
• Growing distrust of social media providers
• Customers want transparency on how their
data is being used
What constitutes Personal Data
• Direct identifiers
• E.g.: Full name, address, SSN, phone
number
• Indirect identifiers
• E.g.: location history, gender, demographic
information, salary
Data classification
• Identified: contains direct or
indirect identifiers
• Pseudonymous: eliminate or
transform direct identifiers
• De-identified: direct and known
indirect identifiers removed
• Anonymous: mathematically
proven to prevent re-
identification
John Doe
Personal Data
eEf2gT_334
Pseudonymized Data
Mary Jane
Personal Data
********
Anonymous Data
Random
Noise
Key
Privacy vs Utility Tradeoff
Picture credit: Mostly AI
Stakeholders in Privacy
Enactment
• Compliance Team
• Information Security
• Legal
• Privacy Engineering
• Product
Benefits of being Privacy-first
• Avoid huge fines
• Prevent loss of business licenses
• Brand impact, trust
• Customer loyalty and retention
• Increase Customer Lifetime Value and higher conversion
• Competitive moat
Privacy-first positioning is table stakes
Sources of
Privacy Risk
Raw Customer Data and its
derivatives
Metadata and logs
ML Models
For attackers, raw data is the holy grail, but ML Model should not be ignored
Privacy Risks from ML Models
non-members
in training
dataset
member in
training dataset
predictions
Output distributions
Delta denotes
privacy risk
Test Dataset (potential members)
Source: Privacy-Preserving Machine Learning: Threats and Solutions
Don’t be alarmed!
• Locking customer data in a secure vault and
throwing away the key is not the answer
• Goal is to protect customer data while using it
to deliver great CX without sacrificing
customer privacy
Rest of the presentation is focused on using ML
to mitigate the privacy risks while leaving
enough utility in the data
Data
Sanitization
Privacy Preservation Techniques
Privacy-preserving
Computation
• Direct Identifier Detection
and Filtering
• Pseudonymization
• K-anonymization
• Differential Privacy
• Homomorphic Encryption
• Secure Multi-Party Computation
• Trusted Execution Environments
• Federated Learning
Direct
Identifiers
Examples
• Name
• Address (all geographic subdivisions smaller than state)
• All dates related to an individual
• Telephone / Fax numbers
• Email address
• Social Security Number
• Medical record number
• Health plan beneficiary number
• Any account number
• Any certificate or license number
• Vehicle identifiers including license plate numbers
• Device identifiers and serial numbers
• Web URLs
• Internet Protocol (IP) Address
• Biometrics including finger or voice print
• Photographic image - not limited to images of the face
Direct Identifier Detection
and Filtering
Define a list of identifiers and scan
datasets for said identifiers
Easiest to implement
No measurable guarantees
Needs humans in the loop
Maintaining and improving models is hard
Pseudonymization
Map direct identifiers to unique tokens
Can be one-way or two-way
Easier to implement
Allows joins with other data tables
Re-identification impossible from tokens
Original data can be extracted
Needs consistent implementation
----------------------------
----------------------------
4145 4455 3489 9985
----------------------------
----------------------------
41ss utoh dkjbg 9985
K-anonymization
Generalize quasi-identifiers and make each
record indistinguishable from at least k-1
other records
Stronger anonymization
Reduces data utility
Choosing ideal k value is hard
Choosing generalization logic is hard
944*
94401
94454
94432
Zip Codes
26
24
27
29
Age
Differential Privacy
Query outcome is not dependent
on any one record
Measurable privacy guarantees
Hard to choose right parameters
Not practical for a lot of use cases (yet)
Maintaining DP datasets over time is expensive
Picture credit: Winton Research
ML to detect direct identifiers: a walkthrough
• Use cases:
• [p0] Scan search phrases for direct identifiers, if found delete immediately
• [p1] If an employee is trying to access customer data for customer analytics, ensure
that it contains no direct identifiers
• Functional requirements
• Detect 5 types of identifiers: full name, address, telephone numbers, email id, SSN
• en_US locale only
• Goal Success Criteria – precision 70%, recall 95%
• Non-functional requirements
• [p0] Scan 1 query (~5 search words) in 250ms
• [p1] Provide API for batch detection
Ingredients for a spicy ML model
Training
Data
Success
Metrics
Model
architecture
ML
Infrastructure
Continuous
improvement
Training Data
• Garbage-in, garbage-out: training data should be as close as to your
runtime data as possible in syntax and semantics
• Human labeling challenges
• Identifying which search phrases contain PII so it can be annotated
• Ambiguity – high quality ground truth requires multiple passes
• Using actual customer data might lead to privacy exposure
• Track Labeling metrics as it directly impact model performance
• Size and diversity in training data to minimize overfit and underfit
Metrics and
Performance
Evaluation
Precision and Recall – which one is
more important?
Sampling challenges with skewed
identifier distribution
Measurement can be expensive
How frequently should your run
measurement workflow
Model Architecture: Choose Your Weapon
• Logistic Regression based binary classifiers
• Easy to implement
• Hard to attribute what is working and what isn’t
• Regular Expression (Regex)
• Highly effective for direct identifiers which have consistent schema
• Dumb, hard to generalize, hard to expand and scale
• NER (Stanford NER, Stanza, FLAIR, spaCy, transformers like BERT)
• Ideal for names, addresses and context dependent identifiers
• Computationally expensive, requires large training data
• No one size fits all solution
• Trial and Error based experimentation is key
Model Architecture: Choose Your Weapon
1. Name - NER
2. Address - NER
3. Telephone numbers - Regex
4. Email address - Regex
5. Social Security Number - Regex
Infrastructure
All public cloud providers have offerings
for training, testing, hosting and MLOps
Work with ML scientists to pick
framework of choice
Continuous
Improvement
Workflow
Re-train your model
periodically
Track model
performance metrics
regularly
Optimize training
frequency
Watch out for model
drift over time
Track labeling quality
metrics regularly
Optimize labeling
workflow
Effective Privacy PM
Cheat Sheet
The most rewarding
PM opportunity
Can seem technically challenging and ambiguous
but
• True opportunity to lead and stand out
• Core Product Management
• Tremendous learning opportunity, build specific
skills for the data-first world
• Truly multi-disciplinary cutting –AI/ML/data,
security, legal, compliance, cloud
• Create positive impact and make the world a
better place
Strategies to
Gain
Leverage
Partner Identify who cares – CISO, senior leadership
Quantify Quantify impact on Brand and tie it to organization’s
business metrics
Goals Work backwards from Customer Promises
Vision Set an exciting and appealing North Star vision
Strategies to
Gain
Leverage
Team
Put together a cross-
team task force of
curious people
Incremental
Build an incremental
roadmap with few
quick wins
Visibility
Provide continuous
visibility
Incentivize
Create adoption plan
with the right
incentives
Where to begin
Follow the data
Chart the customer data
lifecycle
Create threat map
Where are humans in the loop
What tools do they use to
access the data
Identify use cases Privacy vs Utility tradeoff
Identify drivers and define success
metrics
Ingestion Deletion
Usage
Storage
Best Practices
Stay abreast with
new technology
Build a community Join conferences Experiment
Resources
• Visual guide to practical data de-identification: https://fpf.org/wp-
content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf
• Google's Patent on PII detection: https://patents.google.com/patent/US8561185B1/en
• Microsoft Presidio: https://github.com/microsoft/presidio
• Use NER mode to detect person names in text: https://pii-tools.com/detect-person-
names-in-text/
• Custom NLP approaches to data anonymization: https://towardsdatascience.com/nlp-
approaches-to-data-anonymization-1fb5bde6b929
• Detecting and redacting PII using Amazon Comprehend:
https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using-
amazon-comprehend/
Thank
you!
• https://www.linkedin.com/in/pushpakpujari/
• @pushpakpujari
www.productschool.com
Part-time Product Management Training Courses
and
Corporate Training

Contenu connexe

Tendances

Business Analyst Job Course.pptx
Business Analyst Job Course.pptxBusiness Analyst Job Course.pptx
Business Analyst Job Course.pptx
Rohit Dubey
 

Tendances (20)

Product Analytics Workshop
Product Analytics WorkshopProduct Analytics Workshop
Product Analytics Workshop
 
Customer Centricity and Product Led Growth by Airbnb Product & Growth
Customer Centricity and Product Led Growth by Airbnb Product & Growth Customer Centricity and Product Led Growth by Airbnb Product & Growth
Customer Centricity and Product Led Growth by Airbnb Product & Growth
 
How to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTOHow to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTO
 
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
 
Navigating Polarities as a PM by Google Product Leader
Navigating Polarities as a PM by Google Product LeaderNavigating Polarities as a PM by Google Product Leader
Navigating Polarities as a PM by Google Product Leader
 
Prioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMPrioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PM
 
Crafting Product Strategy Blueprint for Success by Atlassian PM.pdf
Crafting Product Strategy Blueprint for Success by Atlassian PM.pdfCrafting Product Strategy Blueprint for Success by Atlassian PM.pdf
Crafting Product Strategy Blueprint for Success by Atlassian PM.pdf
 
Product Discovery At Google
Product Discovery At GoogleProduct Discovery At Google
Product Discovery At Google
 
Customer Centric & Hypothesis Driven Innovation by Cruise VP of Product Engin...
Customer Centric & Hypothesis Driven Innovation by Cruise VP of Product Engin...Customer Centric & Hypothesis Driven Innovation by Cruise VP of Product Engin...
Customer Centric & Hypothesis Driven Innovation by Cruise VP of Product Engin...
 
Agile Product Roadmaps
Agile Product RoadmapsAgile Product Roadmaps
Agile Product Roadmaps
 
Revolutionizing the Customer Experience_ Innovating and Scaling within Enterp...
Revolutionizing the Customer Experience_ Innovating and Scaling within Enterp...Revolutionizing the Customer Experience_ Innovating and Scaling within Enterp...
Revolutionizing the Customer Experience_ Innovating and Scaling within Enterp...
 
Solving Design Problem in 2.5 Hours with Google Design Sprint
Solving Design Problem in 2.5 Hours with Google Design SprintSolving Design Problem in 2.5 Hours with Google Design Sprint
Solving Design Problem in 2.5 Hours with Google Design Sprint
 
B2B vs B2C Product Management by Booking.com Sr PM
B2B vs B2C Product Management by Booking.com Sr PMB2B vs B2C Product Management by Booking.com Sr PM
B2B vs B2C Product Management by Booking.com Sr PM
 
Product Management
Product ManagementProduct Management
Product Management
 
Agile Roles & responsibilities
Agile Roles & responsibilitiesAgile Roles & responsibilities
Agile Roles & responsibilities
 
Agile and Generative AI - friends or foe?
Agile and Generative AI - friends or foe?Agile and Generative AI - friends or foe?
Agile and Generative AI - friends or foe?
 
Lean Product Discovery
Lean Product DiscoveryLean Product Discovery
Lean Product Discovery
 
Business Analyst Job Course.pptx
Business Analyst Job Course.pptxBusiness Analyst Job Course.pptx
Business Analyst Job Course.pptx
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PM
 
Creating Agile Product Roadmaps Everyone Understands
Creating Agile Product Roadmaps Everyone UnderstandsCreating Agile Product Roadmaps Everyone Understands
Creating Agile Product Roadmaps Everyone Understands
 

Similaire à Using ML to Protect Customer Privacy by fmr Amazon Sr PM

It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience Presentation
Yao H. Morin, Ph.D.
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
Cybersecurity Education and Research Centre
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
Shiv Bharti
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
Peter Skomoroch
 
AI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AIAI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AI
Skyl.ai
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
DATAVERSITY
 

Similaire à Using ML to Protect Customer Privacy by fmr Amazon Sr PM (20)

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
How to classify documents automatically using NLP
How to classify documents automatically using NLPHow to classify documents automatically using NLP
How to classify documents automatically using NLP
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PM
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
It’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience PresentationIt’s all about me_ From big data models to personalized experience Presentation
It’s all about me_ From big data models to personalized experience Presentation
 
Securing the Digital Enterprise
Securing the Digital EnterpriseSecuring the Digital Enterprise
Securing the Digital Enterprise
 
Dont let governance risk and compliance be a roll of the dice | ESPC22
Dont let governance risk and compliance be a roll of the dice |  ESPC22 Dont let governance risk and compliance be a roll of the dice |  ESPC22
Dont let governance risk and compliance be a roll of the dice | ESPC22
 
Building a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICSBuilding a Complete View Across the Customer Experience on Oracle BICS
Building a Complete View Across the Customer Experience on Oracle BICS
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
How to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDMHow to achieve a single view of critical business data with MDM
How to achieve a single view of critical business data with MDM
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
How AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee ExperienceHow AI-Powered Search Drives Employee Experience
How AI-Powered Search Drives Employee Experience
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
AI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AIAI in Quality Control: How to do visual inspection with AI
AI in Quality Control: How to do visual inspection with AI
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 

Plus de Product School

Plus de Product School (20)

Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdfWebinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
Webinar How PMs Use AI to 10X Their Productivity by Product School EiR.pdf
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 
Unlocking High-Performance Product Teams by former Meta Global PMM
Unlocking High-Performance Product Teams by former Meta Global PMMUnlocking High-Performance Product Teams by former Meta Global PMM
Unlocking High-Performance Product Teams by former Meta Global PMM
 
The Types of TPM Content Roles by Facebook product Leader
The Types of TPM Content Roles by Facebook product LeaderThe Types of TPM Content Roles by Facebook product Leader
The Types of TPM Content Roles by Facebook product Leader
 
Match Is the New Sell in The Digital World by Amazon Product leader
Match Is the New Sell in The Digital World by Amazon Product leaderMatch Is the New Sell in The Digital World by Amazon Product leader
Match Is the New Sell in The Digital World by Amazon Product leader
 
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping RevolutionBeyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
Beyond the Cart: Unleashing AI Wonders with Instacart’s Shopping Revolution
 
Designing Great Products The Power of Design and Leadership
Designing Great Products The Power of Design and LeadershipDesigning Great Products The Power of Design and Leadership
Designing Great Products The Power of Design and Leadership
 
Command the Room: Empower Your Team of Product Managers with Effective Commun...
Command the Room: Empower Your Team of Product Managers with Effective Commun...Command the Room: Empower Your Team of Product Managers with Effective Commun...
Command the Room: Empower Your Team of Product Managers with Effective Commun...
 
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
Metrics That Matter: Bridging User Needs and Board Priorities for Business Su...
 
Customer-Centric PM: Anticipating Needs Across the Product Life Cycle
Customer-Centric PM: Anticipating Needs Across the Product Life CycleCustomer-Centric PM: Anticipating Needs Across the Product Life Cycle
Customer-Centric PM: Anticipating Needs Across the Product Life Cycle
 
AI in Action The New Age of Intelligent Products and Sales Automation
AI in Action The New Age of Intelligent Products and Sales AutomationAI in Action The New Age of Intelligent Products and Sales Automation
AI in Action The New Age of Intelligent Products and Sales Automation
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Using ML to Protect Customer Privacy by fmr Amazon Sr PM

  • 1. Using ML to Protect Customer Privacy by fmr Amazon Sr PM www.productschool.com
  • 2. CERTIFICATES Your Product Management Certificate Path Product Leadership Certificate™ Full Stack Product Management Certificate™ Product Management Certificate™ 20 HOURS 40 HOURS 40 HOURS
  • 3. Corporate Training Level up your team’s Product Management skills
  • 4. Free Product Management Resources BOOKS EVENTS JOB PORTAL COMMUNITIES bit.ly/product_resources COURSES
  • 5. Using ML to protect customer privacy Pushpak Pujari PM at Verkada | ex Sr. PM at Amazon
  • 6. Bio PM at Verkada for Security Cameras Sr. PM at Amazon Alexa AI - Privacy Sr. PM at Amazon Web Services IoT Wharton MBA, EE from IIT Delhi Hobbies: Tennis, Hiking, Beer Brewing
  • 7. Takeaways from this Webinar Privacy fundamentals Privacy preservation techniques Using ML for privacy – a walkthrough Strategies for being an impactful Privacy PM
  • 8. Why Privacy Matters Companies collect and retain tons of customer data: • Fulfilling a service request • Legal or regulatory requirements • Better CX: recommendations, marketing etc. • Resell data to 3P Collected data can contain sensitive information Such data landing into wrong hands can be devastating – both for customer and the organization
  • 9. Why Privacy Matters • Data breaches happen way more frequently than you think • Data is spread across different organizations and medium. Almost impossible to track data lineage • Rise of privacy laws (HIPAA, GDPR, CCPA, COPPA etc.) with more coming soon • Growing distrust of social media providers • Customers want transparency on how their data is being used
  • 10. What constitutes Personal Data • Direct identifiers • E.g.: Full name, address, SSN, phone number • Indirect identifiers • E.g.: location history, gender, demographic information, salary
  • 11. Data classification • Identified: contains direct or indirect identifiers • Pseudonymous: eliminate or transform direct identifiers • De-identified: direct and known indirect identifiers removed • Anonymous: mathematically proven to prevent re- identification John Doe Personal Data eEf2gT_334 Pseudonymized Data Mary Jane Personal Data ******** Anonymous Data Random Noise Key
  • 12. Privacy vs Utility Tradeoff Picture credit: Mostly AI
  • 13. Stakeholders in Privacy Enactment • Compliance Team • Information Security • Legal • Privacy Engineering • Product
  • 14. Benefits of being Privacy-first • Avoid huge fines • Prevent loss of business licenses • Brand impact, trust • Customer loyalty and retention • Increase Customer Lifetime Value and higher conversion • Competitive moat Privacy-first positioning is table stakes
  • 15. Sources of Privacy Risk Raw Customer Data and its derivatives Metadata and logs ML Models For attackers, raw data is the holy grail, but ML Model should not be ignored
  • 16. Privacy Risks from ML Models non-members in training dataset member in training dataset predictions Output distributions Delta denotes privacy risk Test Dataset (potential members) Source: Privacy-Preserving Machine Learning: Threats and Solutions
  • 17. Don’t be alarmed! • Locking customer data in a secure vault and throwing away the key is not the answer • Goal is to protect customer data while using it to deliver great CX without sacrificing customer privacy Rest of the presentation is focused on using ML to mitigate the privacy risks while leaving enough utility in the data
  • 18. Data Sanitization Privacy Preservation Techniques Privacy-preserving Computation • Direct Identifier Detection and Filtering • Pseudonymization • K-anonymization • Differential Privacy • Homomorphic Encryption • Secure Multi-Party Computation • Trusted Execution Environments • Federated Learning
  • 19. Direct Identifiers Examples • Name • Address (all geographic subdivisions smaller than state) • All dates related to an individual • Telephone / Fax numbers • Email address • Social Security Number • Medical record number • Health plan beneficiary number • Any account number • Any certificate or license number • Vehicle identifiers including license plate numbers • Device identifiers and serial numbers • Web URLs • Internet Protocol (IP) Address • Biometrics including finger or voice print • Photographic image - not limited to images of the face
  • 20. Direct Identifier Detection and Filtering Define a list of identifiers and scan datasets for said identifiers Easiest to implement No measurable guarantees Needs humans in the loop Maintaining and improving models is hard
  • 21. Pseudonymization Map direct identifiers to unique tokens Can be one-way or two-way Easier to implement Allows joins with other data tables Re-identification impossible from tokens Original data can be extracted Needs consistent implementation ---------------------------- ---------------------------- 4145 4455 3489 9985 ---------------------------- ---------------------------- 41ss utoh dkjbg 9985
  • 22. K-anonymization Generalize quasi-identifiers and make each record indistinguishable from at least k-1 other records Stronger anonymization Reduces data utility Choosing ideal k value is hard Choosing generalization logic is hard 944* 94401 94454 94432 Zip Codes 26 24 27 29 Age
  • 23. Differential Privacy Query outcome is not dependent on any one record Measurable privacy guarantees Hard to choose right parameters Not practical for a lot of use cases (yet) Maintaining DP datasets over time is expensive Picture credit: Winton Research
  • 24. ML to detect direct identifiers: a walkthrough • Use cases: • [p0] Scan search phrases for direct identifiers, if found delete immediately • [p1] If an employee is trying to access customer data for customer analytics, ensure that it contains no direct identifiers • Functional requirements • Detect 5 types of identifiers: full name, address, telephone numbers, email id, SSN • en_US locale only • Goal Success Criteria – precision 70%, recall 95% • Non-functional requirements • [p0] Scan 1 query (~5 search words) in 250ms • [p1] Provide API for batch detection
  • 25. Ingredients for a spicy ML model Training Data Success Metrics Model architecture ML Infrastructure Continuous improvement
  • 26. Training Data • Garbage-in, garbage-out: training data should be as close as to your runtime data as possible in syntax and semantics • Human labeling challenges • Identifying which search phrases contain PII so it can be annotated • Ambiguity – high quality ground truth requires multiple passes • Using actual customer data might lead to privacy exposure • Track Labeling metrics as it directly impact model performance • Size and diversity in training data to minimize overfit and underfit
  • 27. Metrics and Performance Evaluation Precision and Recall – which one is more important? Sampling challenges with skewed identifier distribution Measurement can be expensive How frequently should your run measurement workflow
  • 28. Model Architecture: Choose Your Weapon • Logistic Regression based binary classifiers • Easy to implement • Hard to attribute what is working and what isn’t • Regular Expression (Regex) • Highly effective for direct identifiers which have consistent schema • Dumb, hard to generalize, hard to expand and scale • NER (Stanford NER, Stanza, FLAIR, spaCy, transformers like BERT) • Ideal for names, addresses and context dependent identifiers • Computationally expensive, requires large training data • No one size fits all solution • Trial and Error based experimentation is key
  • 29. Model Architecture: Choose Your Weapon 1. Name - NER 2. Address - NER 3. Telephone numbers - Regex 4. Email address - Regex 5. Social Security Number - Regex
  • 30. Infrastructure All public cloud providers have offerings for training, testing, hosting and MLOps Work with ML scientists to pick framework of choice
  • 31. Continuous Improvement Workflow Re-train your model periodically Track model performance metrics regularly Optimize training frequency Watch out for model drift over time Track labeling quality metrics regularly Optimize labeling workflow
  • 33. The most rewarding PM opportunity Can seem technically challenging and ambiguous but • True opportunity to lead and stand out • Core Product Management • Tremendous learning opportunity, build specific skills for the data-first world • Truly multi-disciplinary cutting –AI/ML/data, security, legal, compliance, cloud • Create positive impact and make the world a better place
  • 34. Strategies to Gain Leverage Partner Identify who cares – CISO, senior leadership Quantify Quantify impact on Brand and tie it to organization’s business metrics Goals Work backwards from Customer Promises Vision Set an exciting and appealing North Star vision
  • 35. Strategies to Gain Leverage Team Put together a cross- team task force of curious people Incremental Build an incremental roadmap with few quick wins Visibility Provide continuous visibility Incentivize Create adoption plan with the right incentives
  • 36. Where to begin Follow the data Chart the customer data lifecycle Create threat map Where are humans in the loop What tools do they use to access the data Identify use cases Privacy vs Utility tradeoff Identify drivers and define success metrics Ingestion Deletion Usage Storage
  • 37. Best Practices Stay abreast with new technology Build a community Join conferences Experiment
  • 38. Resources • Visual guide to practical data de-identification: https://fpf.org/wp- content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf • Google's Patent on PII detection: https://patents.google.com/patent/US8561185B1/en • Microsoft Presidio: https://github.com/microsoft/presidio • Use NER mode to detect person names in text: https://pii-tools.com/detect-person- names-in-text/ • Custom NLP approaches to data anonymization: https://towardsdatascience.com/nlp- approaches-to-data-anonymization-1fb5bde6b929 • Detecting and redacting PII using Amazon Comprehend: https://aws.amazon.com/blogs/machine-learning/detecting-and-redacting-pii-using- amazon-comprehend/
  • 40. www.productschool.com Part-time Product Management Training Courses and Corporate Training