SlideShare une entreprise Scribd logo
1  sur  37
Tenant-based encryption in Flink
An introduction to the challenges
of tenanted data-streaming in WORKDAY
Enrico Agnoli
Sr. Software Engineer
Machine Learning Foundation
Leire Fernandez de Retana
Sr. Software Engineer
Machine Learning Foundation
Workday Inc.
Abstract
At WORKDAY Inc. we process data for thousands of customers and our strict security regulations demand we
always encrypt customer data at rest and in transit. That means, each piece of data should always be stored,
encrypted with the customer key.
This is a challenge in a Data Streaming platform like Flink, where data may be persisted in multiple phases:
• Storage of States in Checkpoints or Savepoints
• Temporary fs storage for time-window aggregation
• Common spilling to disk when heap is full
On top of that, we need to consider that in a Flink dataflow the data might get manipulated. After the manipulation we
need to maintain the context needed to correctly encrypt it.
We solved this challenge by extending the serialization libraries (AVRO) to enable encryption at serialization.
In this talk we will walk through the complexity of having a runtime encryption in a multi-tenant data streaming world,
how we solved it and and how we support data traceability for GDPR.
Agenda
- Introduction to the business case
- Flink @ Workday
- Our special requirements
- Data Streaming in a multi-tenancy world
- Solution overview
- Summary
- Other Challenges
Introduction to the business case
What do we do
About Workday
A leading provider of enterprise cloud applications, Workday delivers financial management,
human capital management, planning, and analytics applications designed for the world's largest
organizations.
• Founded in 2005
• Over 11k employees and a NASDAQ listed company.
• Development headquarters in California and Ireland.
• Over 100 billion transactions were processed with Workday in FY19, a 50% increase YoY
• A community of 40 million workers, including 40 percent of Fortune 500 organizations
• Awards:
‒ #1 2018 Future 50 Fortune, best prospects for long-term growth.
‒ #4 2019 100 Best Companies to Work For, Fortune
‒ #2 2019 50 Best Workplaces in Technology, Fortune
‒ ….
The Leading Enterprise Cloud for Finance and HR
40 Million +
workers
100 Billion +
transactions per year
96.1%
transactions < 1
seconds
99.9%
actual availability
~200
companies
#1
Future 50, Fortune
#2
40 Best Workplaces in
Technology, Fortune
10 Thousand +
certified resources
in the ecosystem
Flink @Workday
How we do Data Streaming
Why does Workday need Flink?
- Workday success is connected to a Monolith Service
- Expansion brought more microservices and external platforms,
data processing can happen outside TS
- Machine Learning is expensive and can’t be done individually per
each customer, need to have a unique separate system
- Flink allow us to fast develop and deploy logics to
correlate/aggregate data from and to multiple services
Infrastructure
- How we deploy
- Openstack + K8, 1 job per cluster
- tooling to deploy jobs on Workday’s Openstack platform
- Plugs into Workday’s metrics, logs and stats infrastructures
- Accelerate the job development providing libraries to integrate internal stack
- Jobs for ML and data transfer
- Currently supporting 6 different flink jobs across 6 datacenters, ingesting
hundreds of millions of messages per week
- DataPipelines
- To thousands of buckets S3
- To thousands of buckets HDFS
- Enriching data with async calls
- Mainly to do offline/online model training
- Anomalies-Detection
- Generic Inference Services
- Do inference on data using the models created offline
- Eg: help financial transaction bookkeeping
Jobs
Our special requirements
Introducing the main requirements at the base of this work
Multi-tenancy
From Gartner glossary: Multitenancy
- 1 Tenant ≃ 1 Customer
Security requirement
Requirement: At WORKDAY Inc. we process data for thousands of
customers and our strict security regulations demand we always
encrypt customer data at rest and in transit.
Data Streaming
in a multi-tenancy world
Given the requirements above, see how the architecture is impacted
Common Flink architecture
Event
production
Bus Data Streaming
Platform
DataLake
This translates to these internal (de)serializations
Flink architecture in WORKDAY
Bus Data Streaming
Platform
DataLakeEvent
production
Where we need to look at - possible issues
Solution overview
Give a high level diagram/explanation of the solution
We started by wrapping the data in a “container” where all the interaction
with the real data is controlled by a logic (get/setPayload) that encrypt/decrypt
it as needed.
However this is not enough:
- Key in streaming is the possibility of filtering, keying, map data: a simple
wrapper doesn’t work
Initial attempt - wrapping the message
Solution: Encryption at serialization
Solution is to handle encryption at serialization: so any time the object
is serialized by the platform (FLINK), encryption will be used
transparently.
Two options:
Extend Flink Extend AVRO
Encryption via AVRO
Overview implementation
Three main parts:
- A serialization library: that delegates the encryption work to the
object and is executed transparently by Flink
- A common library: were we define the the encryption logic
- A process that allows to mark objects as encryptable (sensitive)
so the new serialization will kick in
Overview implementation
Serialization library
How we delegate some work to the object itself...
AVRO - new interface
package org.apache.avro;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.Encoder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.ByteArrayOutputStream;
public interface SerializeFinalizationDelegate {
void afterSerialization(ByteArrayOutputStream serializedData, Encoder finalEncoder);
Decoder beforeDeserialization(Decoder dataToDecode);
}
SerializeFinalizationDelegate.java
AVRO - create 2 hooks in AVRO writer/reader
GenericDatumWriter.java GenericDatumReader.java
public void write(D datum, Encoder out) throws IOException {
// Check if we should delegate the after-serialization
if (datum instanceof SerializeFinalizationDelegate) {
//create a new encoder to handle the serialization separately without
// writing directly to the output stream attached to the received (out) encoder
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Encoder newEncoder = EncoderFactory.get().binaryEncoder(baos, null);
//call the standard serialization
write(root, datum, newEncoder);
//now delegate for the finalization
newEncoder.flush();
SerializeFinalizationDelegate delegate = (SerializeFinalizationDelegate) datum;
delegate.afterSerialization(baos, out);
} else {
write(root, datum, out);
}
}
public D read(D reuse, Decoder in) throws IOException {
try {
Class<?> clazz = Class.forName(actual.getFullName());
Constructor<?> constructor = clazz.getConstructor();
if(reuse == null){
reuse = (D)constructor.newInstance();
}
if(reuse instanceof SerializeFinalizationDelegate){
SerializeFinalizationDelegate delegate = (SerializeFinalizationDelegate)reuse;
in = delegate.beforeDeserialization(in);
}
} catch (InstantiationException | InvocationTargetException | NoSuchMethodException |
IllegalAccessException e) {
LOG.debug("Not possible to instantiate object of the class.");
} catch (ClassNotFoundException e) {
LOG.debug("The class can't be find in the classLoader, skip...");
}
ResolvingDecoder resolver = getResolver(actual, expected);
resolver.configure(in);
D result = (D) read(reuse, expected, resolver);
resolver.drain();
return result;
}
Logic to encrypt data
How the delegation can be used to encrypt the object
AVRO - details - implementation example
SerializeWithTenantEncryption.java
Definition of tenanted data
how we generate the java classes used on the platform
Use the schemas
A pipeline with a Gradle project builds our POJOs out of .avsc
https://github.com/davidmc24/gradle-avro-plugin
Then in the Flink project we use these class as dependency! DONE!
Let’s look at the template...
task generateAvroTenanted(type: com.commercehub.gradle.plugin.avro.GenerateAvroJavaTask) {
//tenanted schemas
source("${rootDir}/basicTypes", generatedAvroTenantedSchemas)
outputDir = generatedJavaTenantedSources
templateDirectory = "$rootDir/avro_compiler_tenanted_templates/"
}
Modify avro templates
Modified the standard template at avro/compiler/specific/templates/java/classic/record.vm
- So the generated class is “tenanted”
- If a piece of the class is extracted, the context should be passed along
...
public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends
org.apache.avro.workday.specific.SpecificExceptionBase#else extends SerializeWithTenantEncryption#end implements
org.apache.avro.workday.specific.SpecificRecord {
...
public ${this.javaType($field.schema())} ${this.generateGetMethod($schema, $field)}() {
${this.javaType($field.schema())} local_value = ${this.mangle($field.name(), $schema.isError())};
if(SerializeWithTenantEncryption.class.isInstance(local_value)){
SerializeWithTenantEncryption.class.cast(local_value).__setTenant(this.__getTenant());
}
return local_value;
}
Summary
• A schema is defined and out of it a java class created
• Data produced using the schema class
‒ Avro serializes the info
‒ Then finds it is a tenanted message
‒ Delegates the encryption to the message
‒ Bytes are sent
• Flink receives the message,
avro sees it is a tenanted info
‒ Delegates to the obj decryption
‒ Then deserializes
‒ If a piece of data is extracted
the context is pushed down
• All done transparently at any serialization
How is the flow
- Openstack cluster management
- Need to interact with FLINK API: changes are painful
- Flink UI can’t be used as we can’t access PROD endpoints
- Anomalies Detection job having 120GB+ states
- S3 writing to thousands of folders blocks the Sink
- Tests/Performances of AWS services can be expensive
- End-to-end test are complex because
- Production like data distributions
- Encryption logics
- Flink ParameterTool flexibility
- We read some properties from local FS, but these are different from TM to JM
Other challenges
Thank You
Any
question?
enrico.agnoli@workday.com
leire.retana@workday.com

Contenu connexe

Plus de Flink Forward

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionFlink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 

Plus de Flink Forward (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Dernier

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Multi Tenanted Streams @Workday - Enrico Agnoli & Leire Fernandez de Retana Roitegui, Workday

  • 1. Tenant-based encryption in Flink An introduction to the challenges of tenanted data-streaming in WORKDAY
  • 2. Enrico Agnoli Sr. Software Engineer Machine Learning Foundation Leire Fernandez de Retana Sr. Software Engineer Machine Learning Foundation Workday Inc.
  • 3. Abstract At WORKDAY Inc. we process data for thousands of customers and our strict security regulations demand we always encrypt customer data at rest and in transit. That means, each piece of data should always be stored, encrypted with the customer key. This is a challenge in a Data Streaming platform like Flink, where data may be persisted in multiple phases: • Storage of States in Checkpoints or Savepoints • Temporary fs storage for time-window aggregation • Common spilling to disk when heap is full On top of that, we need to consider that in a Flink dataflow the data might get manipulated. After the manipulation we need to maintain the context needed to correctly encrypt it. We solved this challenge by extending the serialization libraries (AVRO) to enable encryption at serialization. In this talk we will walk through the complexity of having a runtime encryption in a multi-tenant data streaming world, how we solved it and and how we support data traceability for GDPR.
  • 4. Agenda - Introduction to the business case - Flink @ Workday - Our special requirements - Data Streaming in a multi-tenancy world - Solution overview - Summary - Other Challenges
  • 5. Introduction to the business case What do we do
  • 6. About Workday A leading provider of enterprise cloud applications, Workday delivers financial management, human capital management, planning, and analytics applications designed for the world's largest organizations. • Founded in 2005 • Over 11k employees and a NASDAQ listed company. • Development headquarters in California and Ireland. • Over 100 billion transactions were processed with Workday in FY19, a 50% increase YoY • A community of 40 million workers, including 40 percent of Fortune 500 organizations • Awards: ‒ #1 2018 Future 50 Fortune, best prospects for long-term growth. ‒ #4 2019 100 Best Companies to Work For, Fortune ‒ #2 2019 50 Best Workplaces in Technology, Fortune ‒ ….
  • 7. The Leading Enterprise Cloud for Finance and HR 40 Million + workers 100 Billion + transactions per year 96.1% transactions < 1 seconds 99.9% actual availability ~200 companies #1 Future 50, Fortune #2 40 Best Workplaces in Technology, Fortune 10 Thousand + certified resources in the ecosystem
  • 8. Flink @Workday How we do Data Streaming
  • 9. Why does Workday need Flink? - Workday success is connected to a Monolith Service - Expansion brought more microservices and external platforms, data processing can happen outside TS - Machine Learning is expensive and can’t be done individually per each customer, need to have a unique separate system - Flink allow us to fast develop and deploy logics to correlate/aggregate data from and to multiple services
  • 10. Infrastructure - How we deploy - Openstack + K8, 1 job per cluster - tooling to deploy jobs on Workday’s Openstack platform - Plugs into Workday’s metrics, logs and stats infrastructures - Accelerate the job development providing libraries to integrate internal stack - Jobs for ML and data transfer - Currently supporting 6 different flink jobs across 6 datacenters, ingesting hundreds of millions of messages per week
  • 11. - DataPipelines - To thousands of buckets S3 - To thousands of buckets HDFS - Enriching data with async calls - Mainly to do offline/online model training - Anomalies-Detection - Generic Inference Services - Do inference on data using the models created offline - Eg: help financial transaction bookkeeping Jobs
  • 12. Our special requirements Introducing the main requirements at the base of this work
  • 13. Multi-tenancy From Gartner glossary: Multitenancy - 1 Tenant ≃ 1 Customer
  • 14. Security requirement Requirement: At WORKDAY Inc. we process data for thousands of customers and our strict security regulations demand we always encrypt customer data at rest and in transit.
  • 15. Data Streaming in a multi-tenancy world Given the requirements above, see how the architecture is impacted
  • 16. Common Flink architecture Event production Bus Data Streaming Platform DataLake
  • 17. This translates to these internal (de)serializations
  • 18. Flink architecture in WORKDAY Bus Data Streaming Platform DataLakeEvent production
  • 19. Where we need to look at - possible issues
  • 20. Solution overview Give a high level diagram/explanation of the solution
  • 21. We started by wrapping the data in a “container” where all the interaction with the real data is controlled by a logic (get/setPayload) that encrypt/decrypt it as needed. However this is not enough: - Key in streaming is the possibility of filtering, keying, map data: a simple wrapper doesn’t work Initial attempt - wrapping the message
  • 22. Solution: Encryption at serialization Solution is to handle encryption at serialization: so any time the object is serialized by the platform (FLINK), encryption will be used transparently. Two options: Extend Flink Extend AVRO
  • 24. Overview implementation Three main parts: - A serialization library: that delegates the encryption work to the object and is executed transparently by Flink - A common library: were we define the the encryption logic - A process that allows to mark objects as encryptable (sensitive) so the new serialization will kick in
  • 26. Serialization library How we delegate some work to the object itself...
  • 27. AVRO - new interface package org.apache.avro; import org.apache.avro.io.Decoder; import org.apache.avro.io.Encoder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.ByteArrayOutputStream; public interface SerializeFinalizationDelegate { void afterSerialization(ByteArrayOutputStream serializedData, Encoder finalEncoder); Decoder beforeDeserialization(Decoder dataToDecode); } SerializeFinalizationDelegate.java
  • 28. AVRO - create 2 hooks in AVRO writer/reader GenericDatumWriter.java GenericDatumReader.java public void write(D datum, Encoder out) throws IOException { // Check if we should delegate the after-serialization if (datum instanceof SerializeFinalizationDelegate) { //create a new encoder to handle the serialization separately without // writing directly to the output stream attached to the received (out) encoder ByteArrayOutputStream baos = new ByteArrayOutputStream(); Encoder newEncoder = EncoderFactory.get().binaryEncoder(baos, null); //call the standard serialization write(root, datum, newEncoder); //now delegate for the finalization newEncoder.flush(); SerializeFinalizationDelegate delegate = (SerializeFinalizationDelegate) datum; delegate.afterSerialization(baos, out); } else { write(root, datum, out); } } public D read(D reuse, Decoder in) throws IOException { try { Class<?> clazz = Class.forName(actual.getFullName()); Constructor<?> constructor = clazz.getConstructor(); if(reuse == null){ reuse = (D)constructor.newInstance(); } if(reuse instanceof SerializeFinalizationDelegate){ SerializeFinalizationDelegate delegate = (SerializeFinalizationDelegate)reuse; in = delegate.beforeDeserialization(in); } } catch (InstantiationException | InvocationTargetException | NoSuchMethodException | IllegalAccessException e) { LOG.debug("Not possible to instantiate object of the class."); } catch (ClassNotFoundException e) { LOG.debug("The class can't be find in the classLoader, skip..."); } ResolvingDecoder resolver = getResolver(actual, expected); resolver.configure(in); D result = (D) read(reuse, expected, resolver); resolver.drain(); return result; }
  • 29. Logic to encrypt data How the delegation can be used to encrypt the object
  • 30. AVRO - details - implementation example SerializeWithTenantEncryption.java
  • 31. Definition of tenanted data how we generate the java classes used on the platform
  • 32. Use the schemas A pipeline with a Gradle project builds our POJOs out of .avsc https://github.com/davidmc24/gradle-avro-plugin Then in the Flink project we use these class as dependency! DONE! Let’s look at the template... task generateAvroTenanted(type: com.commercehub.gradle.plugin.avro.GenerateAvroJavaTask) { //tenanted schemas source("${rootDir}/basicTypes", generatedAvroTenantedSchemas) outputDir = generatedJavaTenantedSources templateDirectory = "$rootDir/avro_compiler_tenanted_templates/" }
  • 33. Modify avro templates Modified the standard template at avro/compiler/specific/templates/java/classic/record.vm - So the generated class is “tenanted” - If a piece of the class is extracted, the context should be passed along ... public class ${this.mangle($schema.getName())}#if ($schema.isError()) extends org.apache.avro.workday.specific.SpecificExceptionBase#else extends SerializeWithTenantEncryption#end implements org.apache.avro.workday.specific.SpecificRecord { ... public ${this.javaType($field.schema())} ${this.generateGetMethod($schema, $field)}() { ${this.javaType($field.schema())} local_value = ${this.mangle($field.name(), $schema.isError())}; if(SerializeWithTenantEncryption.class.isInstance(local_value)){ SerializeWithTenantEncryption.class.cast(local_value).__setTenant(this.__getTenant()); } return local_value; }
  • 35. • A schema is defined and out of it a java class created • Data produced using the schema class ‒ Avro serializes the info ‒ Then finds it is a tenanted message ‒ Delegates the encryption to the message ‒ Bytes are sent • Flink receives the message, avro sees it is a tenanted info ‒ Delegates to the obj decryption ‒ Then deserializes ‒ If a piece of data is extracted the context is pushed down • All done transparently at any serialization How is the flow
  • 36. - Openstack cluster management - Need to interact with FLINK API: changes are painful - Flink UI can’t be used as we can’t access PROD endpoints - Anomalies Detection job having 120GB+ states - S3 writing to thousands of folders blocks the Sink - Tests/Performances of AWS services can be expensive - End-to-end test are complex because - Production like data distributions - Encryption logics - Flink ParameterTool flexibility - We read some properties from local FS, but these are different from TM to JM Other challenges