SlideShare a Scribd company logo
1 of 58
Download to read offline
Architecting for change:
LinkedIn's new data ecosystem
Sept 28, 2016
Shirshanka Das, Principal Staff Engineer, LinkedIn
Yael Garten, Director of Data Science, LinkedIn
@shirshanka, @yaelgarten
Design for change. Expect it. Embrace it.
Product Change Technology Culture
&
Process
Learnings
The Product Change: 

Launch a completely rewritten LinkedIn mobile app
What does this impact?
Data driven
product
Tracking data records user activity
InvitationClickEvent()
Tracking data records user activity
InvitationClickEvent()
Scale fact:

~ 1000 tracking event types, 

~ Double-digit TB per day, 

hundreds of metrics & data
products
user
engagement
tracking data
metric scripts
production code
Tracking Data Lifecycle
TransportProduce Consume
Member facing
data products
Business facing
decision making
Tracking Data Lifecycle & Teams
TransportProduce Consume
Product or App teams:
PMs, Developers, TestEng
Infra teams:
Hadoop, Kafka, DWH, ...
Data teams: 

Analytics, Relevance Engineers,...
user
engagement
tracking data
metric scripts
production code
Member facing
data products
Business facing
decision making
How do we calculate a metric: ProfileViews
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
Metric: 

ProfileViews = sum(PageViewEvent)

where pageKey = profile_page

PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"new_profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
or new_profile_page
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
CASE	
		WHEN	trackingInfo["profileIds"]	...	
		WHEN	trackingInfo["profileid"]	...	
		WHEN	trackingInfo["profileId"]	...	
		WHEN	trackingInfo["url$profileIds"]	...	
		WHEN	trackingInfo["11"]	LIKE	'%profileIds=%'	THEN	SUBSTRING(trackingInfo["11"],9,60)	
		WHEN	trackingInfo["12"]	LIKE	'%priceIds=%'	THEN	SUBSTRING(trackingInfo["12"],9,60)	
		ELSE	NULL	
END	AS	profile_id
Evolution as we mature and grow...
Metric: ProfileViews = sum(PageViewEvent

where 

pagekey = profile_page and
memberID != trackinginfo[vieweeID] )
Eventually… unmaintainable
get_tracking_codes	=	foreach	get_domain_rolled_up	generate	
		..entry_domain_rollup,	
	 (		(tracking_code	matches	'eml-ced.*'	or	tracking_code	matches	'eml-b2_content_ecosystem
_digest.*'	
				or	(referer	is	not	null	and	(referer	matches	'.*touch.linkedin.com.*trk=eml-ced.*'		
				or	referer	matches	'.*touch.linkedin.com.*trk=eml-b2_content_ecosystem_digest.*'))	?	'Email	-	CED'	:						
(tracking_code	matches	'eml-.*'	or	(referer	is	not	null	and	referer	matches	'.*touch.linkedin.com.*trk=eml-.*')	
or	entry_domain_rollup	==	'Email'	?	'Email	-	Other'	:	
								(tracking_code	==	'hp-feed-article-title-hpm'	and	entry_domain_rollup	==	'Linkedin'	?	'Homepage	Pulse	
Module'	:	((tracking_code	matches	'hp-feed-.*'	and	entry_domain_rollup	==	'Linkedin')	or	(std_user_interface	
matches	'(phone	app|tablet	app|phone	browser|tablet	browser)'	and	tracking_code	==	'v-feed')			or	
(tracking_code	==	'Organic	Traffic'	and	entry_domain_rollup	==	'Linkedin'	and	(referer	==	'https://
www.linkedin.com/nhome'	or	referer	==	'http://www.linkedin.com/nhome'))	?	'Feed'	:	
												(tracking_code	matches	'hb_ntf_MEGAPHONE_.*'	and	entry_domain_rollup	==	'Linkedin'	?	'Desktop	
Notifications'	:			(tracking_code	==	'm_sim2_native_reader_swipe_right'	?	'Push	Notification'	:		(tracking_code	==	
'pulse_dexter_stream_scroll'		and	entry_domain_rollup	==	'Linkedin'	?	'Pulse	-	Infinite	Scroll	on	Dexter'	:	--infinite	
scroll	on	dexter	
			((tracking_code	==	'pulse_dexter_nav_click'	or	tracking_code	==	'pulse-det-nav_art')	and	entry_domain_rollup	==	
'Linkedin'	?	'Pulse	-	Left	Rail	Click	on	Dexter'	:	--left	rail	click	on	dexter	
																				(tracking_code	==	'Organic	Traffic'	and	referer	is	not	null	and	referer	matches	'.*linkedin.com/pulse
/article.*'	?	'Publishing	Platform'	:	
																						'None	Found	Yet'))))))))))	as	entry_point;	
Homepage
team
Push Notification
team
Email team
Long form post
team
PageViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	1454745292951,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
		"trackingInfo"	:	{	
				["vieweeID"	:	"23456"],	
	 ...	
		}	
}	
We wanted to move to better data models
LI_ProfileViewEvent
	Record	1:	
{	
		"header"	:	{	
				"memberId"	:	12345,	
				"time"	:	4745292951145,	
				"appName"	:	{	
						"string"	:	"LinkedIn"	
				"pageKey"	:	"profile_page"	
				},	
		},	
"entityView"	:	{	
					"viewType"	:	"profile-view",	
					"viewerId"	:	“12345”,		
"vieweeId"	:	“23456”,		
		},	
}
Two options:

1. Keep the old tracking:
a. Cost: producers (try to) replicate it (write bad old code
from scratch),
b. Save: consumers avoid migrating.



2. Evolve.
a. Cost: time on data modeling, and on consumer
migration,
b. Save: pays down data modeling tech debt
How much work would it be?
How much work would it be?
Two options:
1. Keep the old tracking:
a. Cost: producers (try to) replicate it (write bad old code
from scratch),
b. Save: consumers avoid migrating.



2. Evolve.
a. Cost: time on data modeling, and on consumer
migration,
b. Save: pays down data modeling tech debt
2000 daysEstimated cost to update consumers to new tracking with clean, committee-approved data models
Estimated cost for producers to attempt to replicate old tracking
5000 days
#AnalyticsHappiness
The Task and Opportunity
Must do: So we will do the data modeling, and rewrite all the metrics to
account for the changes happening upstream… but…



Extra credit points: How do we make sure that the cost is not this high the
next time?




How do we handle evolution in a principled way?
Product Change Technology Culture
&
Process
Learnings
Metrics ecosystem at LinkedIn: 3 yrs ago
Operational Challenges
Diminished Trust due to multiple sources of truth
Data Stages
Ingest Process Serve VisualizeCreate
Ingest Process Serve VisualizeCreate
Tracking
Kafka
Espresso
…
Tracking Architecture
SDKs in different frameworks
(server, client)
Tracking front-end
Monitoring Tools
Components
KafkaClient-side
Tracking
Tracking
Frontend
Services
Tools
Create
Data Stages
Ingest Process Serve VisualizeCreate
Unified Ingestion with
Hundreds of TB / day
Thousands of datasets
80+% of data ingest
Ingest
In production @ LinkedIn, Intel, Swisscom, NerdWallet, PayPal
Ingest
Ingest Process Serve VisualizeCreate
Hadoop
Processing engines @ LinkedInProcess
Ingest Process Serve VisualizeCreate
Pinot
Pinot
Kafka Hadoop
Samza Jobs
Pinot
minutes
hour +
Distributed Multi-dimensional OLAP
Columnar + indexes
No joins
Latency: low ms to sub-second
Serve
Site-facing	Apps Reporting	dashboards Monitoring
In production @
LinkedIn, Uber
Serve
Ingest Process VisualizeCreate
Hadoop Pinot Raptor
Serve
Ingest Process VisualizeCreate
Unified Metrics Platform (UMP)
Hadoop Pinot Raptor
Serve
Unified Metrics Platform
Metrics Logic
Raw
Data
Pinot
UMP Harness
Incremental
Aggregate
Backfill
Auto-join
Raptor
dashboards
HDFS
Aggregated
Data
Experiment
Analysis
Relevance
...
HDFS
Ad-hoc
Ingest Process Serve VisualizeCreate
RaptorKafka

Espresso

…
Hadoop Pinot
Tracking Unified Metrics Platform (UMP)
How do we handle old and new?
PageViewEvent
ProfileViewEvent
Producers Consumers
old
new
Relevance
Analytics
The Big Challenge
load “/data/tracking/PageViewEvent” using AvroStorage()
(Pig scripts)
My Raw Data
Our scripts were doing ….
My Raw Data
My Data API
We need “microservices" for Data
The Database community solved this
decades ago...
Views!
We had been working on something that could help...
A Data Access Layer for Linkedin
Abstract away underlying physical details to allow users to
focus solely on the logical concerns
Logical Tables + Views
Logical FileSystem
Solving
With
Views
Producers
LinkedInProfileView
PageViewEvent
ProfileViewEvent
new
old
Consumers
pagekey==
profile
1:1
Relevance
Analytics
Views
ecosystem
41
Producers Consumers
LinkedInProfileView
JSAProfileView
Job Seeker App
(JSA)
LinkedIn App
UnifiedProfileView
Data Catalog +
Discovery
(DALI)
DaliFileSystem Client
Data Source
(HDFS)
Data Sink
(HDFS)
Processing Engine
(MapReduce, Spark)
DALI Datasets (Tables + Views)
Query Layers
(Hive, Pig, Spark)
View Defs +
UDFs
(Artifactory, Git)
Dataflow APIs
(MR, Spark,
Scalding)
DALI CLI
Dali: Implementation Details in Context
From
load ‘/data/tracking/PageViewEvent’
using AvroStorage();
To
load ‘tracking.UnifiedProfileView’ using
DaliStorage();
One small step for a script
A Few Hard Problems
Versioning
Views and UDFs
Mapping to Hive metastore entities
Development lifecycle
Git as source of truth
Gradle for build
LinkedIn tooling integration for deployment
Early experiences with Dali views
How we executed
Lots of work to get the infra ready
Closed beta model
Tons of training and education (hand holding) for all
Governance body
Feedback from analysts is overwhelmingly positive:
+ Much simpler to share and standardize data cleansing code with peers
+ Provides effective insulation to scripts from upstream changes
- Harder to debug where problems are due to additional layer
State of the world today
~100 producer views
~200 consumer views
~30% of UMP metrics use Dali data
sources
~80 unique tracking event data sources
ProfileViews
MessagesSent
Searches
InvitationsSent
ArticlesRead
JobApplications
...
What’s next for Dali?
Real-time Views on streaming data
Selective materialization
Hive is an implementation detail, not a long term bet
Open source
Data Quality Framework
Product Change Technology Culture
&
Process
Learnings
Infrastructure enables, but culture really preserves
get_tracking_codes	=	foreach	get_domain_rolled_up	generate	
		..entry_domain_rollup,	
	 (		(tracking_code	matches	'eml-ced.*'	or	tracking_code	matches	'eml-b2_content
_ecosystem_digest.*'	
				or	(referer	is	not	null	and	(referer	matches	'.*touch.linkedin.com.*trk=eml-ced.*'		
				or	referer	matches	'.*touch.linkedin.com.*trk=eml-b2_content_ecosystem_digest.*'))	?	'Email	
-	CED'	:						(tracking_code	matches	'eml-.*'	or	(referer	is	not	null	and	referer	matches	'.*touch.linkedin
.com.*trk=eml-.*')	or	entry_domain_rollup	==	'Email'	?	'Email	-	Other'	:	
								(tracking_code	==	'hp-feed-article-title-hpm'	and	entry_domain_rollup	==	'Linkedin'	?	'Homepage	
Pulse	Module'	:	((tracking_code	matches	'hp-feed-.*'	and	entry_domain_rollup	==	'Linkedin')	or	
(std_user_interface	matches	'(phone	app|tablet	app|phone	browser|tablet	browser)'	and	tracking_code	
==	'v-feed')			or	(tracking_code	==	'Organic	Traffic'	and	entry_domain_rollup	==	'Linkedin'	and	(referer	==	
'https://www.linkedin.com/nhome'	or	referer	==	'http://www.linkedin.com/nhome'))	?	'Feed'	:	
												(tracking_code	matches	'hb_ntf_MEGAPHONE_.*'	and	entry_domain_rollup	==	'Linkedin'	?	
'Desktop	Notifications'	:			(tracking_code	==	'm_sim2_native_reader_swipe_right'	?	'Push	Notification'	:		
(tracking_code	==	'pulse_dexter_stream_scroll'		and	entry_domain_rollup	==	'Linkedin'	?	'Pulse	-
For a great data ecosystem that can handle change:
1. Standardize core data entities

2. Create clear maintainable contracts between data producers
& consumers

3. Ensure dialogue between data producers & consumers

1. Standardize core data entities
• Event types and names: Page, Action, Impression
• Framework level client side tracking: views, clicks, flows
• For all else (custom) - guide when to create a new Event or Dali view

Navigation
Page View
Control Interaction
2. Create clear maintainable contracts
1
1. Tracking specification with monitoring: clear, visual, consistent contract
Need tooling to support culture shift

Tracking specification Tool
2
2. Dali dataset specification with data quality rules
3. Ensure dialogue between Producers & Consumers
• Awareness: Train about end-to-end data pipeline, data modeling
• Instill communication & collaborative ownership process between all: a step-by-step
playbook for who & how to develop and own tracking

PM → Analyst → Engineer → All3 → TestEng → Analyst
user engagement
tracking data
metric 

scripts
production

code
Member facing

data products
Business facing
decision making
Product Change Technology Culture
&
Process
Learnings
Our Learnings
Culture and Process
● Spend time to identify what needs culture & process, and 

what needs tools & tech
● Big changes can mean big opportunities
● Very hard to massively change things like data culture or data tech debt; never
a good time to invest in “invisible” behind-the-scenes change

→ Make it non-invisible -- try to clarify or size out the cost of NOT doing it

→ needed strong leaders, and a village

Tech
● Must build tooling to support that culture change otherwise culture will revert
● Work hard to make any new layer as frictionless as possible
● Virtual views on Hadoop data can work at scale! (Dali views)
For a great data ecosystem that can handle change:
1. Standardize core data entities

2. Create clear maintainable contracts between data producers & consumers

3. Ensure dialogue between data producers & consumers

Design for change. Expect it. Embrace it.
Did we succeed? We just handled another huge change!
#AnalyticsHappiness
Thank you.
@shirshanka, @yaelgarten

More Related Content

What's hot

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic websiteTung Nguyen Thanh
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillHostedbyConfluent
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarErlang Solutions
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 

What's hot (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Software architecture for high traffic website
Software architecture for high traffic websiteSoftware architecture for high traffic website
Software architecture for high traffic website
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learned
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II Webinar
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 

Similar to Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanBackstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanAmplitude
 
Digital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoTDigital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoTDimitri Volkmann
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022Yves Caseau
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAvkash Chauhan
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analyticsMariaDB plc
 

Similar to Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem (20)

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik FeldmanBackstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
Backstage 2019 - The Atlassian Journey with Amplitude - Itzik Feldman
 
Digital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoTDigital Twin: A radical new approach to IoT
Digital Twin: A radical new approach to IoT
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesAgile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
 
Scaling Legacy
Scaling LegacyScaling Legacy
Scaling Legacy
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
DataAquitaine February 2022
DataAquitaine February 2022DataAquitaine February 2022
DataAquitaine February 2022
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem