Do you have customers residing in California? If so, you need to prepare yourself for the California Consumer Privacy Act (CCPA) going into effect in January 2020. CCPA mandates data privacy protection for California consumers much like GDPR. Personal information for consumers, households, and devices is covered and it is broadly applied. It’s not just names and addresses or personal identifiers like driver’s license and social security number but includes: geolocation data; records of personal property; products or services purchased, obtained, or even considered for purchase; browsing history; education information; professional information; and more. And you need to know where all that information is.
In order to ensure compliance, it’s time to put data profiling to work! You need rapid insight into your data sources whether on traditional platforms or in your data lake, and you need to find the outliers, not just cursory review of data samples, that help you ensure you’ve identified all the places this information has spread to as the information has been copied, reported, and delivered from central data stores.
View this webinar on-demand where we talk through some of the salient points of CCPA and show you how to leverage Trillium Discovery to profile, assess, and evaluate the data sources to find this data at risk.
1. Finding Data at Risk
for CCPA Compliance
Harald Smith, Director Product Marketing
Jeff Knotts, Principal Sales Engineer
2. Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your computer speakers.
• If you need technical assistance with the web interface or audio,
please reach out to us using the chat window.
Questions Welcome
• Submit your questions at any time during the presentation
using the chat window.
• We will answer them during our Q&A session following
the presentation.
Recording and slides
• This webcast is being recorded. You will receive an
email following the webcast with a link to download
both the recording and the slides.
2
3. Speakers
3
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management: focus
on data quality, integration, and governance
• Co-author of Patterns of Information
Management
• Author of two Redbooks on Information
Governance and Data Integration
• Blog author: currently through Dataversity;
previously “Data Democratized”
Jeff Knotts
• Principal Sales Engineer, Syncsort
• 25+ years in Data Quality and Data
Governance
4. Data Governance
is top of mind
Volume and complexity of data
is growing
Broader and deeper compliance
& regulation
May 2018 Jan 2020
4
5. Data Governance is
enterprise-wide
We are all responsible for enterprise data
and ensuring compliance to regulations
More than just training!
We must:
• Know the data we are
working with
• Know where affected
data resides
• Actively consider data
use, and whether it is
valid for that purpose
5
7. What types of personal & sensitive
data you hold
What you are doing with it and how
you are using it
Whether you have
permission to sell it
What specific data you have
stored and what it looks like
Whether the data has suffered a
breach
How you are keeping it SAFE
CCPA is essentially about knowing:
7
8. What do you know about me?
Right to access data plus receive a copy of data
Customers are now recognising their new power
Did you get my consent?
Right to restrict data shared on minors
Erase all my data for good!
Right to be forgotten
Has my data been breached?
Right to be informed within 72 hours
How do you use my data?
Right to limit processing of personal data
and whether it can be sold
Demand human interaction
Right to not participate in fully-automated
decisions based on customer profile
8
9. What is personal data under CCPA?
9
Identifiers Real name, alias, postal address, unique personal identifier, online identifier IP address, email
address, account name, Social Security number, driver’s license number, passport number,
signature, bank account, credit/debit card numbers, medical or health insurance information,
physical characteristics or description, or other similar identifiers
Protected Categories Race, religion, gender, national origin, sexual orientation, etc.
Commercial information Records of personal property, products or services purchased, obtained or considered, or other
purchasing or consuming histories or tendencies
Biometrics, Location, etc. Biometric information, geolocation data, audio, electronic, visual, thermal, olfactory or similar
information
Internet Activity Including, but not limited to, browsing history, search history and information regarding a consumer’s
interaction with a website, application or advertisement
Background information Professional or employment-related information, education information
Inferences Drawn from any of the information identified in this subdivision to create a profile about a consumer
reflecting the consumer’s preferences, characteristics, psychological trends, preferences, predispositions,
behavior, attitudes, intelligence, abilities and aptitudes
10. Where is all that data? What connects with what?
Multiple touchpoints/databases
10
MDM
ERP
CRM
Website
ODS
EDW Data Lake
Marketing
Data Mart
Data Science
sandbox
Reports &
spreadsheets
?
?
BI reports
?
11. CCPA – where DQ helps deliver compliance
3. Data Integration
Integration with Data Governance
tools (e.g. Collibra). Triggers issue
management and controls so that
CCPA rules and reports (and overall
compliance) can be easily
understood and monitored by data
stewards.
2. Data Quality Processing
Real-time & batch data cleansing & matching
across multiple data sources generating Single
Customer View (SCV); enabling businesses to
locate records by a single record quickly
SCV also means customer permissions are
respected, records can be amended or
suppressed / deleted, plus businesses can
react to consumer requests quickly
Full traceability of original data source
Documented DQ routines for transparency &
auditing (e.g. user & process control, security)
1. Data Profiling
Confirms actual content vs. metadata
Highlights patterns, types of data, mis-
fielded data, outlying data not
conforming to policy, formatting,
structure, syntax, etc
Exposes relevant fields with buried,
unexpected personal & sensitive data
Applies business rules to data to
identify and monitor for data issues
11
Caution
Cleaning in progress
!
13. Demo:
Trillium Discovery
Watch for 3 Scenarios
1. Initiating data profiling to
discover content
2. Reviewing profiled results
looking for patterns, anomalies,
and other content
3. Discovering related content
through join analysis
Notice key features
• Wizard-driven data source
selection
• Visual and text-based review
• Exploration and drilldown to
any content
• Annotation and export of
findings
14. Culture of Data Literacy
Meeting CCPA compliance requires cultural support
• Empowered to ask questions about the data
• Trained to understand the business context, compliance requirements,
and appropriate use of data
• Trained to understand and evaluate data
• Traditional data, Big data, 3rd party data, …
• Empowered to identify data at risk
Program of Data Governance
• Provide the processes, practices, and tools necessary for success
• Identify, evaluate, and document
• Continous iteration and development
• Communicate what you’ve discovered! (and where others can find!)
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
Communicate!
14
15. Integrate findings through Collibra
Collibra Data Governance Center
• Enables non-technical users to define
business policies and data quality rules
in plain language
• Makes data quality results available
to all users
Trillium Discovery
• Imports DGC business rules so technical user
can convert to executable data quality rules
• Constantly runs data quality metrics on near
real-time basis, passes results back to Collibra
dashboards
Rulebooks to Rules
Quality test Results
Bi-directional connectivity Constant sync
Metric falling below
thresholds can
trigger workflow in
Collibra Issue
Management
15
16. Summary
Addressing CCPA
• CCPA mandates tight control
of customer data!
• Without data governance,
at-risk data will propagate,
resulting in misunderstanding
and misrespecting
the customers’ wishes and
demands.
• Over time, this will inevitably
escalate to non-compliance
of CCPA and fines!
Finding Data at Risk
1. Understand that everyone
in your organization has
responsibility!
2. Understand the requirements
of CCPA – what data is at risk
3. Profile and review data in
use; ask questions!
4. Communicate findings to
ensure data remains in
compliance with CCPA!
Data Quality helps
ensure CCPA
compliance!
16
17. Further Resources
• Syncsort Trillium for Data Governance
• Introducing Trillium DQ for Big Data: Powerful Profiling
and Data Quality for the Data Lake
• How to Strengthen Enterprise Data Governance
with Data Quality
• Unlocking Greater Insights with Integrated
Data Quality for Collibra
harald.smith@syncsort.com
jeff.knotts@syncsort.com
17