SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Telco Data Pipelines in
the Cloud:
Architecture and Use
Cases
www.croz.net
Miro Miljanić
Data Architect
Agenda
• Introduction
• Use Cases:
• Reporting and data migration
• File processing offload
• User Feedback Anonymization
• Conclusion and Q&A
What do all
Cloud initiatives
have in
common?
They are all structured as a teen Rom-Com
1. Act One: Setup
- Introduction of the protagonist
- Establish the status quo
- Inciting incident
2. Act Two: Conflict
- Rising action
- Complications
- The false victory
- Emotional low point
3. Act Three: Resolution
- Climax
- Resolution
- Happily ever after
They also involve (over) optimistic assumptions…
Reporting and data migration
Reporting and
data migration
Act One
Setup
Business problem:
• We want to move the reporting and its data in the cloud:
• To have Single view of data, in Cloud (operational and analytical)
• Cloud Reimplementation of reporting – source and tool replacement
• Initial Governance setup
Reporting and
data migration
Act One
Setup
What does it really mean:
• (Delta) Replication of data from several on-prem DB into one
Cloud DB
• Reimplementation of reporting logic from legacy reports and
DB procedures
• Data catalogue, lineage, data access…
• Strict data security rules
Reporting and
data migration
Act One
Setup
On Premise
Data
Reports
Governance
Security
Business logic
reimplementation
Initial setup
Delta Data replication
Reporting and
data migration
Act One
Setup
Off we go
• Thorough analysis and Proofs of Concept (POC)
• Sources - Multiple DB and RDBMS, >10k objects
• Target architecture
• Delta load scenarios
• DB code reimplementation scenarios
• Reporting logic reimplementation scenarios and reporting
optimizations
• Data maintenance scenarios
• …
Reporting and
data migration
Act Two
Conflict
That’s all nice, but this is a bit too much for us now,
get for X days?
Reporting and
data migration
Act Three
Resolution
What did we do:
• Simpler scenario which was manageable in X days:
• No delta replication of data – initial DDLs and then loops and inserts
inserts from views and tables using data dictionary
• No code and report reimplementation
• Setup of all environments, systems and replication
• Security setup
• Development templates
• Cookbooks
Reporting and
data migration
Act One
Setup
On Premise
Data
Reports
Governance
Security
Business logic
reimplementation
Initial setup
Delta Data replication
File processing offload
File processing
offload
Act One
Setup
Business problem:
• File processing offload to Cloud
• We want to reduce our resources - DB, storage, ETL
• We want to be more flexible in terms of scaling
• We want to learn how this differs from our current process (and
have new solution as similar as possible to previous)
File processing
offload
Act One
Setup
What does it really mean:
• Large number of raw files
• Possible DQ and format issues – human intervention needed
• Time constraints for processing
• Performance requirements
• Detailed logging – per each file
• Preferred set of tools
File processing
offload
Act One
Setup
• Off we go
• POC
• Similar loading and processing logic as on prem: iterative, high
logging, multiple step processing
• Processing in ETL tool
• Detailed DQ checks per each file
File processing
offload
Act One
Setup
DQ checks
File preprocessing
Error Bucket Verify
and fix
File landing area
Error
Loop until ready for aggregation
Clean files
Aggregation
Export Archive
File processing
offload
Act Two
Conflict
The test run and billing estimate were too high!
Way too high…
File processing
offload
Act Three
Resolution
What did we do:
• Various load, processing and logging scenarios until we found the
found the solution
• Where to process files and how?
• What can be sequential and what can be parallel?
• How to log file processing?
• How to handle and pinpoint errors?
• Database, ETL or Code file processing?
• Database, ETL or Code data logging?
…or combination?
File processing
offload
Act Three
Resolution
• Preparation and DQ:
• Iterative, sequential, high logging
• Direct filesystem access
• Error pinpointing
• Orchestration
• Parallel processing of clean files
User Feedback Anonymization
User Feedback
Anonymization
Act One
Setup Business problem:
• User feedbacks contain free form entry in which users
sometime enter comments which contain personal information.
• We want to keep those comments, but we can keep this
information for only 90 days, after that we would like to
anonymize it.
User Feedback
Anonymization
Act One
Setup
• Hi, my name is Miro Miljanić from CROZ, I contacted your
agent yesterday about the problem on my address –
Marohnićeva 1, Zagreb. I checked today and it seems that
problem still exists. Please contact me on 099887766 or on this
e-mail - mmiljanic@croz.net. Thanks in advance.
• Hi, my name is <Person> from <Organization>, I contacted
your agent yesterday about the problem on my address –
<Location>, <Location>. I checked today and it seems that
still exists. Please contact me on <Phone Number> or on this
e-mail <e-mail>. Thanks in advance.
• Hi, my name is James Bond from Microsoft, I contacted your
agent yesterday about the problem on my address – 5th
Avenue, New York, NY. I checked today and it seems that
exists. Please contact me on 01999289029 or on this e-mail
dj.example2234@example.com. Thanks in advance.
User Feedback
Anonymization
Act One
Setup
What does it really mean:
• Application and company specific lingo
• Additional checks needed in some cases (human)
• E2E pipeline which can be reused for other purposes
User Feedback
Anonymization
Act One
Setup
User Feedback
Anonymization
Act Two
Conflict
This is great! Could you reimplement it in Azure 
User Feedback
Anonymization
Act Three
Resolution
Azure vs AWS – Human in the loop – custom integration
Conclusion
Conclusion • In more than one way, Clouds are teens, also. They are:
• Sometimes immature
• Sometimes unpredictable
• Have a logic of their own (different than adults)
(But also, with a fresh vision, and part of our future)
Conclusion Our first job is to explain this to our customers, and try to manage their
expectations
• How it is (not) going to solve all their problems
• How it differs from what they are used to
• Cost, Scope of Work, Way of Work, Maintenance, Performance …
And use POCs as much as possible, for all new topics for your customer.
sales@croz.net | www.croz.net

Contenu connexe

Similaire à [DSC Adria 23] Miro MIljanic Telco Data Pipelines in the Cloud Architecture and Use Cases.pptx

How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtimeDBmaestro - Database DevOps
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application DevelopmentKyle Hailey
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Kscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKyle Hailey
 
Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Neo4j
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14Kyle Hailey
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesSulman Ahmed
 
Why we got to Docker
Why we got to DockerWhy we got to Docker
Why we got to Dockerallingeek
 
Siegel - keynote presentation, 18 may 2013
Siegel  - keynote presentation, 18 may 2013Siegel  - keynote presentation, 18 may 2013
Siegel - keynote presentation, 18 may 2013NeilSiegelslideshare
 
Internal Investigations and the Cloud
Internal Investigations and the CloudInternal Investigations and the Cloud
Internal Investigations and the CloudDan Michaluk
 
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...Franziska Tanner
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentTasktop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldHao Tran
 
Getting Started in the Nonprofit Cloud
Getting Started in the Nonprofit CloudGetting Started in the Nonprofit Cloud
Getting Started in the Nonprofit CloudAbila
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 

Similaire à [DSC Adria 23] Miro MIljanic Telco Data Pipelines in the Cloud Architecture and Use Cases.pptx (20)

How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
 
Virtual Data : Eliminating the data constraint in Application Development
Virtual Data :  Eliminating the data constraint in Application DevelopmentVirtual Data :  Eliminating the data constraint in Application Development
Virtual Data : Eliminating the data constraint in Application Development
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Kscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data PlatformKscope 14 Presentation : Virtual Data Platform
Kscope 14 Presentation : Virtual Data Platform
 
Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14DevOps, Databases and The Phoenix Project UGF4042 from OOW14
DevOps, Databases and The Phoenix Project UGF4042 from OOW14
 
SDWest2005Goetsch
SDWest2005GoetschSDWest2005Goetsch
SDWest2005Goetsch
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web Services
 
Why we got to Docker
Why we got to DockerWhy we got to Docker
Why we got to Docker
 
Siegel - keynote presentation, 18 may 2013
Siegel  - keynote presentation, 18 may 2013Siegel  - keynote presentation, 18 may 2013
Siegel - keynote presentation, 18 may 2013
 
Internal Investigations and the Cloud
Internal Investigations and the CloudInternal Investigations and the Cloud
Internal Investigations and the Cloud
 
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...
Planning ahead to Connect the Dots between IBM Domino, Notes, Connections and...
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Getting Started in the Nonprofit Cloud
Getting Started in the Nonprofit CloudGetting Started in the Nonprofit Cloud
Getting Started in the Nonprofit Cloud
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 

Plus de DataScienceConferenc1

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDFDataScienceConferenc1
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...DataScienceConferenc1
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdfDataScienceConferenc1
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...DataScienceConferenc1
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptxDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In TreatmentsDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMEDDataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with SeifDataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...DataScienceConferenc1
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help youDataScienceConferenc1
 

Plus de DataScienceConferenc1 (20)

[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
[DSC Europe 23] Luciano Catani - AI in Diplomacy.PDF
 
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...[DSC Europe 23] Rania Wazir -  Mathematician jokes, cute cat photos, offensiv...
[DSC Europe 23] Rania Wazir - Mathematician jokes, cute cat photos, offensiv...
 
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
[DSC Europe 23] Irena Cerovic - AI in International Development.pdf
 
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
 
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
[DSC Europe 23] Branka Panic - Peace in the age of artificial intelligence.pptx
 
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments[DSC Europe 23][DigiHealth]  Goran Dumic -  Data-Driven Approach In Treatments
[DSC Europe 23][DigiHealth] Goran Dumic - Data-Driven Approach In Treatments
 
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...[DSC Europe 23][DigiHealth]  Milos Todorovic - Bridging the Gap-Innovating Ag...
[DSC Europe 23][DigiHealth] Milos Todorovic - Bridging the Gap-Innovating Ag...
 
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
[DSC Europe 23][DigiHealth] Urosh VIlimanovich Clinical Data Management and C...
 
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...[DSC Europe 23][DigiHealth]  Vladimir Brusic - SMART HEALTH HOME: Technology,...
[DSC Europe 23][DigiHealth] Vladimir Brusic - SMART HEALTH HOME: Technology,...
 
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...[DSC Europe 23][DigiHealth]  Dimitar Penkov Grid Search Optimization of Novel...
[DSC Europe 23][DigiHealth] Dimitar Penkov Grid Search Optimization of Novel...
 
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
[DSC Europe 23][DigiHealth] Tomislav Krizan - AIMED
 
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...
[DSC Europe 23][DigiHealth] Djordje Hirs - Computer Vision in Melanoma Diagno...
 
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
[DSC Europe 23][DigiHealth] Katarina Vucicevic - Navigating theKinetics of Dr...
 
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
[DSC Europe 23][DigiHealth] Anja Baresic 0- Croatian digital Healthcare ecosy...
 
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
 
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
[DSC Europe 23][AI:CSI] Uros Arsenijevic Unlocking Cybersecurity with Seif
 
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
[DSC Europe 23][AI:CSI] Goran Gvozden Improving Cybersecurity Posture with an...
 
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
 
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
 
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
[DSC Europe 23][DigiHealth] Ligia Kornowska-How_may AI help you
 

Dernier

Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 

Dernier (16)

Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 

[DSC Adria 23] Miro MIljanic Telco Data Pipelines in the Cloud Architecture and Use Cases.pptx

  • 1. Telco Data Pipelines in the Cloud: Architecture and Use Cases www.croz.net Miro Miljanić Data Architect
  • 2. Agenda • Introduction • Use Cases: • Reporting and data migration • File processing offload • User Feedback Anonymization • Conclusion and Q&A
  • 3. What do all Cloud initiatives have in common? They are all structured as a teen Rom-Com 1. Act One: Setup - Introduction of the protagonist - Establish the status quo - Inciting incident 2. Act Two: Conflict - Rising action - Complications - The false victory - Emotional low point 3. Act Three: Resolution - Climax - Resolution - Happily ever after They also involve (over) optimistic assumptions…
  • 4. Reporting and data migration
  • 5. Reporting and data migration Act One Setup Business problem: • We want to move the reporting and its data in the cloud: • To have Single view of data, in Cloud (operational and analytical) • Cloud Reimplementation of reporting – source and tool replacement • Initial Governance setup
  • 6. Reporting and data migration Act One Setup What does it really mean: • (Delta) Replication of data from several on-prem DB into one Cloud DB • Reimplementation of reporting logic from legacy reports and DB procedures • Data catalogue, lineage, data access… • Strict data security rules
  • 7. Reporting and data migration Act One Setup On Premise Data Reports Governance Security Business logic reimplementation Initial setup Delta Data replication
  • 8. Reporting and data migration Act One Setup Off we go • Thorough analysis and Proofs of Concept (POC) • Sources - Multiple DB and RDBMS, >10k objects • Target architecture • Delta load scenarios • DB code reimplementation scenarios • Reporting logic reimplementation scenarios and reporting optimizations • Data maintenance scenarios • …
  • 9. Reporting and data migration Act Two Conflict That’s all nice, but this is a bit too much for us now, get for X days?
  • 10. Reporting and data migration Act Three Resolution What did we do: • Simpler scenario which was manageable in X days: • No delta replication of data – initial DDLs and then loops and inserts inserts from views and tables using data dictionary • No code and report reimplementation • Setup of all environments, systems and replication • Security setup • Development templates • Cookbooks
  • 11. Reporting and data migration Act One Setup On Premise Data Reports Governance Security Business logic reimplementation Initial setup Delta Data replication
  • 13. File processing offload Act One Setup Business problem: • File processing offload to Cloud • We want to reduce our resources - DB, storage, ETL • We want to be more flexible in terms of scaling • We want to learn how this differs from our current process (and have new solution as similar as possible to previous)
  • 14. File processing offload Act One Setup What does it really mean: • Large number of raw files • Possible DQ and format issues – human intervention needed • Time constraints for processing • Performance requirements • Detailed logging – per each file • Preferred set of tools
  • 15. File processing offload Act One Setup • Off we go • POC • Similar loading and processing logic as on prem: iterative, high logging, multiple step processing • Processing in ETL tool • Detailed DQ checks per each file
  • 16. File processing offload Act One Setup DQ checks File preprocessing Error Bucket Verify and fix File landing area Error Loop until ready for aggregation Clean files Aggregation Export Archive
  • 17. File processing offload Act Two Conflict The test run and billing estimate were too high! Way too high…
  • 18. File processing offload Act Three Resolution What did we do: • Various load, processing and logging scenarios until we found the found the solution • Where to process files and how? • What can be sequential and what can be parallel? • How to log file processing? • How to handle and pinpoint errors? • Database, ETL or Code file processing? • Database, ETL or Code data logging? …or combination?
  • 19. File processing offload Act Three Resolution • Preparation and DQ: • Iterative, sequential, high logging • Direct filesystem access • Error pinpointing • Orchestration • Parallel processing of clean files
  • 21. User Feedback Anonymization Act One Setup Business problem: • User feedbacks contain free form entry in which users sometime enter comments which contain personal information. • We want to keep those comments, but we can keep this information for only 90 days, after that we would like to anonymize it.
  • 22. User Feedback Anonymization Act One Setup • Hi, my name is Miro Miljanić from CROZ, I contacted your agent yesterday about the problem on my address – Marohnićeva 1, Zagreb. I checked today and it seems that problem still exists. Please contact me on 099887766 or on this e-mail - mmiljanic@croz.net. Thanks in advance. • Hi, my name is <Person> from <Organization>, I contacted your agent yesterday about the problem on my address – <Location>, <Location>. I checked today and it seems that still exists. Please contact me on <Phone Number> or on this e-mail <e-mail>. Thanks in advance. • Hi, my name is James Bond from Microsoft, I contacted your agent yesterday about the problem on my address – 5th Avenue, New York, NY. I checked today and it seems that exists. Please contact me on 01999289029 or on this e-mail dj.example2234@example.com. Thanks in advance.
  • 23. User Feedback Anonymization Act One Setup What does it really mean: • Application and company specific lingo • Additional checks needed in some cases (human) • E2E pipeline which can be reused for other purposes
  • 25. User Feedback Anonymization Act Two Conflict This is great! Could you reimplement it in Azure 
  • 26. User Feedback Anonymization Act Three Resolution Azure vs AWS – Human in the loop – custom integration
  • 28. Conclusion • In more than one way, Clouds are teens, also. They are: • Sometimes immature • Sometimes unpredictable • Have a logic of their own (different than adults) (But also, with a fresh vision, and part of our future)
  • 29. Conclusion Our first job is to explain this to our customers, and try to manage their expectations • How it is (not) going to solve all their problems • How it differs from what they are used to • Cost, Scope of Work, Way of Work, Maintenance, Performance … And use POCs as much as possible, for all new topics for your customer.

Notes de l'éditeur

  1. Thank You all for coming today, my name is Miro Miljanić and I’m Data Architect in CROZ and I’m currently responsible for managing Cloud data initiatives. In CROZ, we have a long history of Data and SW engineering and consulting, but we’ve only in last few years began to gain significant experience in Cloud. The talk today is about several of our experiences with data and AI related Cloud initiatives. Although it has a Telco in its name, not all of the examples were built for Telco companies, but, they are legitimate use cases which could be used in any Telecommunication company.
  2. In Act One, the Setup, the main protagonist is introduced, its everyday life and the incident - introduction of love interest or a challenge. In our case this the initial project or POC (pi ou si) description and its drive. Act Two: Conflict - contains the pursue of the main character towards the goal and complications that arise. At one point, the main character seems to have reached its goal, but it is the false victory, there are some deeper issues or the victory is short-lived. It is the Emotional low point of our character where he learns an important lesson and re-evaluate their priorities. In our case this is a turning point – the situation that arouse and changed the course of action. Act Three: Resolution, it’s all about happy ending. Our hero overcomes all its obstacles, wins his love, becomes a better person and they live happily ever after. So, in the third part, I’ll explain what we did to solve the problem.
  3. Single view ment than there should be replication and delta replication from several (different engine) databases into one Cloud DB that should be used not only for corporate reporting, but for other analysis, also. This would be the main purpose of this DB, the applications, data processing and ETL logic will remain on on-prem databases. Reimplementation ment rewriting reporting logic from legacy DB code, views, packages, procedures and legacy reporting tool to new reporting tool. And initial Governace ment that there should be catologization and lineage of data together with data access and security model. As at every change implementation – this was the right time to enforce
  4. So, since this was a large scope of work we began thorough analysis of the requirements, and multiple source systems which contained more than 10k objects and their code. We also started working on architecture and several POCs (pi ou sis) regarding various scenarios: Delta load – what we have to implement on source to propagate of only changes to target DB DB code reimplementation scenarios – how to handle it, where to do it and not to affect application logic. How to manage reporting logic reimplementation, reporting optimizations, data maintenance scenarios and so on
  5. Things were going pretty well, we had a good relationship with the customer and understanding of complexity of the task, analysis and architecture setup was on the track, when we received a following response from the customer: That’s all nice, but this is a bit too much for us now, what could we get for X days? And Yes, the X was not nearly the number we anticipated for the whole solution.
  6. Things were going pretty well, …
  7. So, what did we do?
  8. Anonymization will provide a way to remove private data from the reviews without deleting them, and it will allow to keep the reviews in the database so that it could be used for other purposes in the future e.g. -topic extraction, sentiment analysis
  9. Example of the original message, Regular Anonymization and Synthetic replacement The scope of this projects covers NAMED ENTITY RECOGNITION machine learning problem
  10. Anonymization – Rule engine for rule based anonymization – PII that could be recognized with an algorithm, by specific format e.g. for e-mail, phone number. PII detection puts labels together with label confidence, for each label.
  11. Things were going pretty well, … AWS, Databricks problem
  12. 1. Human in the loop - Label studio – open source application – integrated in our solution, deployed as app service on Azure 2. Human in the loop enhancement – Human in the loop is used as gold label – new training data
  13. AWS - 2006, S3, EC2 Azure - 2008, commercial release - 2010 Google - 2010