SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Collecting and Making Sense of
Diverse Data at WayUp
Harlan D. Harris, PhD
Director of Data Science
DataEngConf 2017
Thanks to:
JJ Fliegelman (CTO)
WayUp Engineers!
Why we built WayUp...
The leading digital platform for
employers to reach, recruit, and
engage candidates in an authentic way.
… with the focus on college students and recent grads.
One of thirty innovative
companies changing the
world.
2
Talking about Choices
● Where we focus effort
○ Event Collection & Data Refinement
● Tech stack
○ Segment, Redshift, dbt, Periscope
● Warehouse table design
○ ELT, layers & abstractions
● We’re Hiring!
4
Data Sources
5
Why We Warehouse
Support Business Analytics and Product (Data Science)
● Clean, Normalized Tables
● Abstract over Changes in Systems
● Right Type of Domain Knowledge
6
Data Reflects
the World
Decisions & Products
Reflect the World
Tech Stack for Analytics
● Segment
● Amazon Redshift
● S3/Spectrum
● dbt
● Periscope +
targeted tools
7
Avoid Vendor Lock-in;
Design to Minimize
Downstream Impact
Event Tracking
● Heap approach
○ Developers don’t make choices
○ Automatically get every load and click
○ UI changes can lose continuity
● Traditional approach
○ Developers choose what to track
○ Can miss stuff -- requires communication!
○ Can keep semantic continuity across
changes
○ Less lock-in
8
“Actions with
Meaning”
Redshift and Spectrum
● Value of familiarity, broad support
● Sweet spot in scale, room to grow
● Spectrum
○ External tables on S3 CSV
○ Query and join like internal tables
○ Avoid or delay loading until needed
○ Use Transform tools to load
9
The ELT Pattern
● “Data Lake” in columnar database
● Piped in via Segment data loader
● Transform on-database vs. in-transit
● Requires compute power, but space is cheap
● Can be more agile, “schema on read”
10
“most data transformation use cases can be much more
effectively handled in-database rather than in some
external processing layer” -dbt
dbt
11
https://blog.fishtownanalytics.com/what-exactly-is-dbt-47ba57309068
Abstraction Layers
12
Raw
LookupStaging
Analytics
Reporting Product
(Spectrum)
Dimension Tables and Activity Streams
13
hist_user
now_userdim_user
act_
user
actor
ts
action
object
ob_type
properties
Alice
Nov 2nd
viewed
sales-123
listing
{ pos: 3 }
fact table with
specific, consistent
structure
(see WeWork talk!)
What We’ve Learned; Where We’re Going
● Pay close attention to
what you store, and
how you refine data
● Tools now are amazing
● Design with empathy
and creativity
14
● Grow it with the
business!
● Build insights &
products to help our
users and customers!
Thanks!
Data Scientist (Recommender Systems)
Data Engineer (this stuff!)
FS & BE Engineers (Python)
harlan@wayup.com, @harlanh
We’re Hiring!

Contenu connexe

Tendances

Creating stunning data analytics dashboard using php and flex
Creating stunning data analytics dashboard using php and flexCreating stunning data analytics dashboard using php and flex
Creating stunning data analytics dashboard using php and flex
10n Software, LLC
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 

Tendances (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop Introduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
SFScon19 - Grazia Cazzin - KNOWAGE the open source answer to the new needs in...
SFScon19 - Grazia Cazzin - KNOWAGE the open source answer to the new needs in...SFScon19 - Grazia Cazzin - KNOWAGE the open source answer to the new needs in...
SFScon19 - Grazia Cazzin - KNOWAGE the open source answer to the new needs in...
 
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Thilga
ThilgaThilga
Thilga
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Creating stunning data analytics dashboard using php and flex
Creating stunning data analytics dashboard using php and flexCreating stunning data analytics dashboard using php and flex
Creating stunning data analytics dashboard using php and flex
 
DataTables view CKAN monthly live
DataTables view   CKAN monthly liveDataTables view   CKAN monthly live
DataTables view CKAN monthly live
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
"Interactive Deep Analytics" Dashboard
"Interactive Deep Analytics" Dashboard"Interactive Deep Analytics" Dashboard
"Interactive Deep Analytics" Dashboard
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
How to build a data stack from scratch
How to build a data stack from scratchHow to build a data stack from scratch
How to build a data stack from scratch
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
 
Tor Hovland: Taking a swim in the big data lake
Tor Hovland: Taking a swim in the big data lakeTor Hovland: Taking a swim in the big data lake
Tor Hovland: Taking a swim in the big data lake
 
Business intelligence
Business intelligence Business intelligence
Business intelligence
 
Design | expose ap is with cqrs
Design | expose ap is with cqrsDesign | expose ap is with cqrs
Design | expose ap is with cqrs
 
ODA Use-Case: XaitPorter Appliance
ODA Use-Case: XaitPorter ApplianceODA Use-Case: XaitPorter Appliance
ODA Use-Case: XaitPorter Appliance
 
Design | expose ap is with cqr
Design | expose ap is with cqrDesign | expose ap is with cqr
Design | expose ap is with cqr
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 

Similaire à Collecting and Making Sense of Diverse Data at WayUp

Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
VMware Tanzu
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Kinetica
 

Similaire à Collecting and Making Sense of Diverse Data at WayUp (20)

LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data Engineering
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 

Dernier

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 

Dernier (20)

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 

Collecting and Making Sense of Diverse Data at WayUp

  • 1. Collecting and Making Sense of Diverse Data at WayUp Harlan D. Harris, PhD Director of Data Science DataEngConf 2017 Thanks to: JJ Fliegelman (CTO) WayUp Engineers!
  • 2. Why we built WayUp... The leading digital platform for employers to reach, recruit, and engage candidates in an authentic way. … with the focus on college students and recent grads. One of thirty innovative companies changing the world. 2
  • 3.
  • 4. Talking about Choices ● Where we focus effort ○ Event Collection & Data Refinement ● Tech stack ○ Segment, Redshift, dbt, Periscope ● Warehouse table design ○ ELT, layers & abstractions ● We’re Hiring! 4
  • 6. Why We Warehouse Support Business Analytics and Product (Data Science) ● Clean, Normalized Tables ● Abstract over Changes in Systems ● Right Type of Domain Knowledge 6 Data Reflects the World Decisions & Products Reflect the World
  • 7. Tech Stack for Analytics ● Segment ● Amazon Redshift ● S3/Spectrum ● dbt ● Periscope + targeted tools 7 Avoid Vendor Lock-in; Design to Minimize Downstream Impact
  • 8. Event Tracking ● Heap approach ○ Developers don’t make choices ○ Automatically get every load and click ○ UI changes can lose continuity ● Traditional approach ○ Developers choose what to track ○ Can miss stuff -- requires communication! ○ Can keep semantic continuity across changes ○ Less lock-in 8 “Actions with Meaning”
  • 9. Redshift and Spectrum ● Value of familiarity, broad support ● Sweet spot in scale, room to grow ● Spectrum ○ External tables on S3 CSV ○ Query and join like internal tables ○ Avoid or delay loading until needed ○ Use Transform tools to load 9
  • 10. The ELT Pattern ● “Data Lake” in columnar database ● Piped in via Segment data loader ● Transform on-database vs. in-transit ● Requires compute power, but space is cheap ● Can be more agile, “schema on read” 10 “most data transformation use cases can be much more effectively handled in-database rather than in some external processing layer” -dbt
  • 13. Dimension Tables and Activity Streams 13 hist_user now_userdim_user act_ user actor ts action object ob_type properties Alice Nov 2nd viewed sales-123 listing { pos: 3 } fact table with specific, consistent structure (see WeWork talk!)
  • 14. What We’ve Learned; Where We’re Going ● Pay close attention to what you store, and how you refine data ● Tools now are amazing ● Design with empathy and creativity 14 ● Grow it with the business! ● Build insights & products to help our users and customers!
  • 15. Thanks! Data Scientist (Recommender Systems) Data Engineer (this stuff!) FS & BE Engineers (Python) harlan@wayup.com, @harlanh We’re Hiring!