SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
A Federated Information
Infrastructure that Works
Xavier Gumara Rigol
@xgumara
October 3rd, 2019 Barcelona
multitenancy (noun) mode of operation of
software where multiple independent instances
operate in a shared environment.
3
What are the challenges of
building a multi tenant
information architecture for
business insights?
How we solved them at
Adevinta?
Data Engineering Manager at Adevinta (former Schibsted)
since 2016.
Consultant Professor at the Open University of Catalonia
(UOC) since 2016.
Between 2013 and 2016 I worked as a Business Intelligence
Engineer at Schibsted, and previously as a Business
Intelligence Consultant at Stratebi for almost 3 years.
4
Xavier Gumara Rigol
About me
@xgumara
Adevinta is a marketplaces specialist. We are an
international family of local digital brands.
Our marketplaces create perfect matches on the
world’s most trusted marketplaces.
Thanks to our second hand effect our users
potentially save every year:
5
About Adevinta
20.5 million tons of
greenhouse gases
1.1 million
tonnes of plastic
More than 30 brands in 16 countries in Europe, Latin America and North Africa:
6
About Adevinta
+ a global services organization located between Barcelona and Paris.
7
Framing the problem
● Easy access to key facts about our
marketplaces (tenants)
● Eliminate data-quality discussions, establish
trust in the facts
● Reduce impact on manual data requests to
each tenant
● Minimize regional effort needed for global
data collection
● Provide a framework and infrastructure that
can be extended locally
8
Problems we are trying to solve
● Executive support
● Provide results sooner than later and iterate
● It is not a project but an initiative
● Fix data quality at the source
● Invest in solving technical debt
9
The lowest common denominator
for successful information architecture
initiatives
10
The challenges of a multi tenant
information architecture
1. Finding the right level of authority
2. Governance of the data sets
3. Building common infrastructure as a platform
01 Finding the right level of authority
12
Finding the right level of authority
Silos of unreachable data
Centralization
Authority not delegated
Decentralization
Authority delegated
Monolithic data platform bottleneck
Pros: speed of execution
(locally) and market
customization
Cons: difficult to have a
global view, duplication of
efforts
Pros: can work at small
scale
Cons: long response times,
difficult to harmonise
13
Finding the right level of authority
Decentralization
Transactional
PostgreSQL
Transactional
PostgreSQL
Analytical
PostgreSQL
Transactional
database X
Mature data
warehouse
Corporate KPIs
database
Regional view Global view
API
14
Finding the right level of authority
Centralization
Sources to ingest Consumers to serve
Big Data
Platform
Finding the right level of authority
Current solution: federation
Corporate data sources
Corporate
data lake
Regional data sources
and warehouse
Regional data sources
and warehouse
Downwards
federation
Finding the right level of authority
Current solution: federation
● Each regional data warehouse is a different Redshift instance
● Physical storage is S3 and can be accessed:
● Via Athena for global/central analysts
● Via Redshift Spectrum for global teams (downwards federation)
02 Governance of data sets
18
Governance of data sets
Embrace the concept of “data set as a product” that defines the basic qualities of a data set as:
Discoverable Addressable Trustworthy
Inter operableSelf-describing Secure
Source: “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” by Zhamak Dehghani
https://martinfowler.com/articles/data-monolith-to-mesh.html
19
Governance of data sets
“Data set as a product”: Discoverable
20
Governance of data sets
“Data set as a product”: Addressable
21
Governance of data sets
“Data set as a product”: Trustworthy
Contextual data
quality information
22
Governance of data sets
“Data set as a product”: Self-describing
All data set documentation includes:
● Data location
● Data provenance and data mapping
● Example data
● Execution time and freshness
● Input preconditions
● Example Jupyter notebook using the data set
23
Governance of data sets
“Data set as a product”: Inter operable
● Defining a common nomenclature is a must in all layers of the platform
● Usage of schema.org to identify the same object across different domains
"adType" : {
"description" : "Type of the ad" ,
"enum": [
"buy",
"sell",
"rent",
"let",
"lease",
"swap",
"give",
"jobOffer"
]
}
24
Governance of data sets
“Data set as a product”: Secure
03 Building common infrastructure as a platform
26
Building common infrastructure as a platform
Patterns for business metrics calculation:
● Metrics need to use specific events (filter)
● Some transformations applied before aggregating
● Group by several dimensions
● Aggregation function (count, count distinct, sum,...)
● Some transformations applied after aggregating
● Different periods of calculation day, week, month, 7d, 28d
Use-case: metrics calculation
Filter and
Cleanup
Group by
dimensions
Aggregate
Filter
Post
transformations
27
Building common infrastructure as a platform
Use-case: metrics calculation
val simpleMetric: Metric = withSimpleMetric(
metricId = AdsWithLeads,
cleanupTransformations = Seq(
filterEventTypes(List(isLeadEvent(EventType, ObjectType)))
),
dimensions = Seq(DeviceType, ProductType, TrackerType),
aggregate = countDistinct(AdId),
postTransformations = Seq(
withConstantColumn(Period, period)(_),
withConstantColumn(ClientId, client)(_))
)
Filter and
Cleanup
Group by
dimensions
Aggregate
Filter
Post
transformations
28
Building common infrastructure as a platform
Use-case: metrics calculation
val simpleMetricWithSubtotals: Metric =
simpleMetric.withSubtotals(Seq(DeviceType, ProductType, TrackerType)
)
Filter and
Cleanup
Group by
dimensions
Aggregate
Filter
Post
transformations
This configuration is then passed to the cube() function in Spark.
The cube() function “calculates subtotals and a grand total for every
permutation of the columns specified”.
29
Building common infrastructure as a platform
Use-case: metrics calculation
private val metricDefinitions: Seq[MetricDefinition] = List(
MetricDefinition(
metricIdentifiers(Sessions),
countDistinct(SessionId)
),
MetricDefinition(
metricIdentifiers(LoggedInSessions),
countDistinct(SessionId),
filterEventTypes(List(col(EventIsLogged) === 1)) _
),
MetricDefinition(
metricIdentifiers(AdsWithLeads),
countDistinct(AdId),
filterEventTypes(List(isLeadEvent(EventType, ObjectType))) _
)
)
30
Building common infrastructure as a platform
Use-case: Recency-Frequency-Monetization (RFM) user segmentation
val df = spark.read.parquet(path)
.groupBy( "user_id")
.agg(
count(col( "event_id")).as("total_events" ),
countDistinct()(col( "session_id" )).as("total_sessions" )
)
val dfWithSegments = df.transform(withSegment( "segment_chain" ,
Seq(
SegmentDimension(col( "total_events" ), "events_percentile" , 0.5, 0.8),
SegmentDimension(col( "total_sessions" ), "sessions_percentile" , 0.5, 0.8)
)
))
The withSegment method requires a name to store the output of the segmentation and a list of all dimensions that
will be used. You can tune the thresholds for each segment dimension.
31
Building common infrastructure as a platform
Use-case: Recency-Frequency-Monetization (RFM) user segmentation
val myMap = Map[String, String](
"LL" -> "That's a low active user" ,
"LM" -> "Users that do few events in different sessions" ,
"LH" -> "Users that do almost nothing but somehow generate many sessions" ,
"ML" -> "Meh... in little sessions" ,
"MM" -> "Meh... in medium sessions" ,
"MH" -> "Meh... in multiple sessions" ,
"HL" -> "Users that do a lot of things in a row" ,
"HM" -> "Users that do a lot of things along the day" ,
"HH" -> "Da best users"
)
dfWithSegments.transform(withSegmentMapping( "segment_name" , col("segment_chain" ), myMap))
The withSegmentMapping method applies a map to the result of the segmentation to add meaningful names to the
user segments.
32
What have we learned?
33
What have we learned?
● Federation gives autonomy
● Non-invasive governance is key
● Balance the delivery of business value vs tooling
Thank you!
Xavier Gumara Rigol
@xgumara

Contenu connexe

Tendances

Brown bag eventdrivenmicroservices-cqrs
Brown bag  eventdrivenmicroservices-cqrsBrown bag  eventdrivenmicroservices-cqrs
Brown bag eventdrivenmicroservices-cqrsVikash Kodati
 
Context Broker Introduction and Reference Architecure
Context Broker Introduction and Reference ArchitecureContext Broker Introduction and Reference Architecure
Context Broker Introduction and Reference ArchitecureMaruti Gollapudi
 
90300 633579030311875000
90300 63357903031187500090300 633579030311875000
90300 633579030311875000sumit621
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsReal World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsMongoDB
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningPradnya Saval
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationDenodo
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
FIWARE: Cross-domain concepts and technologies in domain Reference Architectures
FIWARE: Cross-domain concepts and technologies in domain Reference ArchitecturesFIWARE: Cross-domain concepts and technologies in domain Reference Architectures
FIWARE: Cross-domain concepts and technologies in domain Reference ArchitecturesOPEN DEI
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services SectorNorberto Leite
 
Data warehouse
Data warehouseData warehouse
Data warehouseprinceahan
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Webinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, SmarterWebinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, SmarterMongoDB
 

Tendances (20)

Introduction
IntroductionIntroduction
Introduction
 
Brown bag eventdrivenmicroservices-cqrs
Brown bag  eventdrivenmicroservices-cqrsBrown bag  eventdrivenmicroservices-cqrs
Brown bag eventdrivenmicroservices-cqrs
 
Context Broker Introduction and Reference Architecure
Context Broker Introduction and Reference ArchitecureContext Broker Introduction and Reference Architecure
Context Broker Introduction and Reference Architecure
 
90300 633579030311875000
90300 63357903031187500090300 633579030311875000
90300 633579030311875000
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel RobertsReal World MongoDB: Use Cases from Financial Services by Daniel Roberts
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data mining
Data miningData mining
Data mining
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
FIWARE: Cross-domain concepts and technologies in domain Reference Architectures
FIWARE: Cross-domain concepts and technologies in domain Reference ArchitecturesFIWARE: Cross-domain concepts and technologies in domain Reference Architectures
FIWARE: Cross-domain concepts and technologies in domain Reference Architectures
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Webinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, SmarterWebinar: How MongoDB is making Government Better, Faster, Smarter
Webinar: How MongoDB is making Government Better, Faster, Smarter
 
Data mining in e commerce
Data mining in e commerceData mining in e commerce
Data mining in e commerce
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 

Similaire à A federated information infrastructure that works

Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Denodo
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONBig Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONMatt Stubbs
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesVMware Tanzu
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
Chapter 6 Technology and Data Platforms.pptx
Chapter 6 Technology and Data Platforms.pptxChapter 6 Technology and Data Platforms.pptx
Chapter 6 Technology and Data Platforms.pptxMinHtetAung5
 
KELLY_MANOVERV.PDF
KELLY_MANOVERV.PDFKELLY_MANOVERV.PDF
KELLY_MANOVERV.PDFHernanKlint
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
 
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...Denodo
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsDenodo
 

Similaire à A federated information infrastructure that works (20)

Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONBig Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Chapter 6 Technology and Data Platforms.pptx
Chapter 6 Technology and Data Platforms.pptxChapter 6 Technology and Data Platforms.pptx
Chapter 6 Technology and Data Platforms.pptx
 
KELLY_MANOVERV.PDF
KELLY_MANOVERV.PDFKELLY_MANOVERV.PDF
KELLY_MANOVERV.PDF
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Scaling Your Data: Data Democratisation and DataOps
Scaling Your Data: Data Democratisation and DataOpsScaling Your Data: Data Democratisation and DataOps
Scaling Your Data: Data Democratisation and DataOps
 

Plus de Stratebi

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentesStratebi
 
Azure Synapse
Azure SynapseAzure Synapse
Azure SynapseStratebi
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with PythonStratebi
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with PythonStratebi
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasStratebi
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup SpainStratebi
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)Stratebi
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integrationStratebi
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data MarketingStratebi
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data AnalyticsStratebi
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosStratebi
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports AnalyticsStratebi
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme AnalysisStratebi
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIStratebi
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overviewStratebi
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalleStratebi
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con TalendStratebi
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend IntroducionStratebi
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent AnalyticsStratebi
 
El Futuro del Business Intelligence
El Futuro del Business IntelligenceEl Futuro del Business Intelligence
El Futuro del Business IntelligenceStratebi
 

Plus de Stratebi (20)

Destinos turisticos inteligentes
Destinos turisticos inteligentesDestinos turisticos inteligentes
Destinos turisticos inteligentes
 
Azure Synapse
Azure SynapseAzure Synapse
Azure Synapse
 
Options for Dashboards with Python
Options for Dashboards with PythonOptions for Dashboards with Python
Options for Dashboards with Python
 
Dashboards with Python
Dashboards with PythonDashboards with Python
Dashboards with Python
 
PowerBI Tips y buenas practicas
PowerBI Tips y buenas practicasPowerBI Tips y buenas practicas
PowerBI Tips y buenas practicas
 
Machine Learning Meetup Spain
Machine Learning Meetup SpainMachine Learning Meetup Spain
Machine Learning Meetup Spain
 
LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)LinceBI IIoT (Industrial Internet of Things)
LinceBI IIoT (Industrial Internet of Things)
 
SAP - PowerBI integration
SAP - PowerBI integrationSAP - PowerBI integration
SAP - PowerBI integration
 
Aplicaciones Big Data Marketing
Aplicaciones Big Data MarketingAplicaciones Big Data Marketing
Aplicaciones Big Data Marketing
 
9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics9 problemas en proyectos Data Analytics
9 problemas en proyectos Data Analytics
 
PowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y CursosPowerBI: Soluciones, Aplicaciones y Cursos
PowerBI: Soluciones, Aplicaciones y Cursos
 
Sports Analytics
Sports AnalyticsSports Analytics
Sports Analytics
 
Vertica Extreme Analysis
Vertica Extreme AnalysisVertica Extreme Analysis
Vertica Extreme Analysis
 
Businesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBIBusinesss Intelligence con Vertica y PowerBI
Businesss Intelligence con Vertica y PowerBI
 
Vertica Analytics Database general overview
Vertica Analytics Database general overviewVertica Analytics Database general overview
Vertica Analytics Database general overview
 
Talend Cloud en detalle
Talend Cloud en detalleTalend Cloud en detalle
Talend Cloud en detalle
 
Master Data Management (MDM) con Talend
Master Data Management (MDM) con TalendMaster Data Management (MDM) con Talend
Master Data Management (MDM) con Talend
 
Talend Introducion
Talend IntroducionTalend Introducion
Talend Introducion
 
Talent Analytics
Talent AnalyticsTalent Analytics
Talent Analytics
 
El Futuro del Business Intelligence
El Futuro del Business IntelligenceEl Futuro del Business Intelligence
El Futuro del Business Intelligence
 

Dernier

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Dernier (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

A federated information infrastructure that works

  • 1. A Federated Information Infrastructure that Works Xavier Gumara Rigol @xgumara October 3rd, 2019 Barcelona
  • 2. multitenancy (noun) mode of operation of software where multiple independent instances operate in a shared environment.
  • 3. 3 What are the challenges of building a multi tenant information architecture for business insights? How we solved them at Adevinta?
  • 4. Data Engineering Manager at Adevinta (former Schibsted) since 2016. Consultant Professor at the Open University of Catalonia (UOC) since 2016. Between 2013 and 2016 I worked as a Business Intelligence Engineer at Schibsted, and previously as a Business Intelligence Consultant at Stratebi for almost 3 years. 4 Xavier Gumara Rigol About me @xgumara
  • 5. Adevinta is a marketplaces specialist. We are an international family of local digital brands. Our marketplaces create perfect matches on the world’s most trusted marketplaces. Thanks to our second hand effect our users potentially save every year: 5 About Adevinta 20.5 million tons of greenhouse gases 1.1 million tonnes of plastic
  • 6. More than 30 brands in 16 countries in Europe, Latin America and North Africa: 6 About Adevinta + a global services organization located between Barcelona and Paris.
  • 8. ● Easy access to key facts about our marketplaces (tenants) ● Eliminate data-quality discussions, establish trust in the facts ● Reduce impact on manual data requests to each tenant ● Minimize regional effort needed for global data collection ● Provide a framework and infrastructure that can be extended locally 8 Problems we are trying to solve
  • 9. ● Executive support ● Provide results sooner than later and iterate ● It is not a project but an initiative ● Fix data quality at the source ● Invest in solving technical debt 9 The lowest common denominator for successful information architecture initiatives
  • 10. 10 The challenges of a multi tenant information architecture 1. Finding the right level of authority 2. Governance of the data sets 3. Building common infrastructure as a platform
  • 11. 01 Finding the right level of authority
  • 12. 12 Finding the right level of authority Silos of unreachable data Centralization Authority not delegated Decentralization Authority delegated Monolithic data platform bottleneck Pros: speed of execution (locally) and market customization Cons: difficult to have a global view, duplication of efforts Pros: can work at small scale Cons: long response times, difficult to harmonise
  • 13. 13 Finding the right level of authority Decentralization Transactional PostgreSQL Transactional PostgreSQL Analytical PostgreSQL Transactional database X Mature data warehouse Corporate KPIs database Regional view Global view API
  • 14. 14 Finding the right level of authority Centralization Sources to ingest Consumers to serve Big Data Platform
  • 15. Finding the right level of authority Current solution: federation Corporate data sources Corporate data lake Regional data sources and warehouse Regional data sources and warehouse Downwards federation
  • 16. Finding the right level of authority Current solution: federation ● Each regional data warehouse is a different Redshift instance ● Physical storage is S3 and can be accessed: ● Via Athena for global/central analysts ● Via Redshift Spectrum for global teams (downwards federation)
  • 17. 02 Governance of data sets
  • 18. 18 Governance of data sets Embrace the concept of “data set as a product” that defines the basic qualities of a data set as: Discoverable Addressable Trustworthy Inter operableSelf-describing Secure Source: “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” by Zhamak Dehghani https://martinfowler.com/articles/data-monolith-to-mesh.html
  • 19. 19 Governance of data sets “Data set as a product”: Discoverable
  • 20. 20 Governance of data sets “Data set as a product”: Addressable
  • 21. 21 Governance of data sets “Data set as a product”: Trustworthy Contextual data quality information
  • 22. 22 Governance of data sets “Data set as a product”: Self-describing All data set documentation includes: ● Data location ● Data provenance and data mapping ● Example data ● Execution time and freshness ● Input preconditions ● Example Jupyter notebook using the data set
  • 23. 23 Governance of data sets “Data set as a product”: Inter operable ● Defining a common nomenclature is a must in all layers of the platform ● Usage of schema.org to identify the same object across different domains "adType" : { "description" : "Type of the ad" , "enum": [ "buy", "sell", "rent", "let", "lease", "swap", "give", "jobOffer" ] }
  • 24. 24 Governance of data sets “Data set as a product”: Secure
  • 25. 03 Building common infrastructure as a platform
  • 26. 26 Building common infrastructure as a platform Patterns for business metrics calculation: ● Metrics need to use specific events (filter) ● Some transformations applied before aggregating ● Group by several dimensions ● Aggregation function (count, count distinct, sum,...) ● Some transformations applied after aggregating ● Different periods of calculation day, week, month, 7d, 28d Use-case: metrics calculation Filter and Cleanup Group by dimensions Aggregate Filter Post transformations
  • 27. 27 Building common infrastructure as a platform Use-case: metrics calculation val simpleMetric: Metric = withSimpleMetric( metricId = AdsWithLeads, cleanupTransformations = Seq( filterEventTypes(List(isLeadEvent(EventType, ObjectType))) ), dimensions = Seq(DeviceType, ProductType, TrackerType), aggregate = countDistinct(AdId), postTransformations = Seq( withConstantColumn(Period, period)(_), withConstantColumn(ClientId, client)(_)) ) Filter and Cleanup Group by dimensions Aggregate Filter Post transformations
  • 28. 28 Building common infrastructure as a platform Use-case: metrics calculation val simpleMetricWithSubtotals: Metric = simpleMetric.withSubtotals(Seq(DeviceType, ProductType, TrackerType) ) Filter and Cleanup Group by dimensions Aggregate Filter Post transformations This configuration is then passed to the cube() function in Spark. The cube() function “calculates subtotals and a grand total for every permutation of the columns specified”.
  • 29. 29 Building common infrastructure as a platform Use-case: metrics calculation private val metricDefinitions: Seq[MetricDefinition] = List( MetricDefinition( metricIdentifiers(Sessions), countDistinct(SessionId) ), MetricDefinition( metricIdentifiers(LoggedInSessions), countDistinct(SessionId), filterEventTypes(List(col(EventIsLogged) === 1)) _ ), MetricDefinition( metricIdentifiers(AdsWithLeads), countDistinct(AdId), filterEventTypes(List(isLeadEvent(EventType, ObjectType))) _ ) )
  • 30. 30 Building common infrastructure as a platform Use-case: Recency-Frequency-Monetization (RFM) user segmentation val df = spark.read.parquet(path) .groupBy( "user_id") .agg( count(col( "event_id")).as("total_events" ), countDistinct()(col( "session_id" )).as("total_sessions" ) ) val dfWithSegments = df.transform(withSegment( "segment_chain" , Seq( SegmentDimension(col( "total_events" ), "events_percentile" , 0.5, 0.8), SegmentDimension(col( "total_sessions" ), "sessions_percentile" , 0.5, 0.8) ) )) The withSegment method requires a name to store the output of the segmentation and a list of all dimensions that will be used. You can tune the thresholds for each segment dimension.
  • 31. 31 Building common infrastructure as a platform Use-case: Recency-Frequency-Monetization (RFM) user segmentation val myMap = Map[String, String]( "LL" -> "That's a low active user" , "LM" -> "Users that do few events in different sessions" , "LH" -> "Users that do almost nothing but somehow generate many sessions" , "ML" -> "Meh... in little sessions" , "MM" -> "Meh... in medium sessions" , "MH" -> "Meh... in multiple sessions" , "HL" -> "Users that do a lot of things in a row" , "HM" -> "Users that do a lot of things along the day" , "HH" -> "Da best users" ) dfWithSegments.transform(withSegmentMapping( "segment_name" , col("segment_chain" ), myMap)) The withSegmentMapping method applies a map to the result of the segmentation to add meaningful names to the user segments.
  • 32. 32 What have we learned?
  • 33. 33 What have we learned? ● Federation gives autonomy ● Non-invasive governance is key ● Balance the delivery of business value vs tooling
  • 34. Thank you! Xavier Gumara Rigol @xgumara