SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
DATA DEMOCRATIZATION @ SOUNDCLOUD
October, 29th 2013
HI, I’M BRUNO
SOUNDCLOUD IS THE WORLD’S
LEADING AUDIO PLATFORM
Every minute, creators upload


12hrs 

of audio

reaching over 


200m
people every month



!

8% 

of the internet

PRESIDENT OBAMA

FOO FIGHTERS

SNOOP LION

MADONNA

SKRILLEX

MACKLEMORE

JOHN OLIVER˝
(DAILY SHOW/BUGLE)
what gets

listened to

where?

how much revenue do

we make in Brasil?

how many new users

did we get from that

campaign?
what makes a

sound successfull?

how do users use

our iOS and

Android apps?
do comments on

tracks correlate

with listens?

did the product

change affect

feature x?

what makes an

artist successfull?
DATA DEMOCRATIZATION

• Avoid Silos

• Remove unnecessary restrictions

• Provide simple tools

• Teach People how to use data
DATA DEMOCRATIZATION

In one sentence:
Deliver the right information to the
right person at the right time.
DATA ANALYSIS AND REPORTING
2010-2012
PRODUCTION DB

ANALYTICS DB
DATA ANALYSIS AND REPORTING

Listens

Sounds

Users

Comments

Favorites

Shares

Reposts

Impressions

Clicks

Conversions

Suggestions

Downloads

Taggings

Uploads
DATA ANALYSIS AND REPORTING

Listens

timestamp

duration

sound

owner

listener

API-key

(location)

country
DATA ANALYSIS AND REPORTING
additional metadata:

• location within sound

• context (location on site)

• segmentation
Listening creates >6000 events/s

BIG DATA
HADOOP TO THE RESCUE

2 Datacenter in AMS

200+ Nodes
HADOOP TO THE RESCUE

listen data

listen metadata

search data

recommender data

product testing data

backend production data

backend logs
HADOOP AND DATA DEMOCRATIZATION

Data is siloed on hadoop

Data governance not existing

Technical hurdles for access

Not realtime

Slow access
AMAZON REDSHIFT
Fast fully managed DW service

Optimized for petabyte or more
datasets

Fast query and I/O performance

Columnar storage technology
BI INFRASTRUCTURE
Source Systems

2013
Staging Area

DataWarehouse

Data Exploration

Amazon EMR
Hadoop

Pig/Ruby Scripts

COPY
MySql
(production db)

Pig/Ruby Scripts

External Systems

...

ETL Scripts
Job execution powered by:

ETL Scripts
IMPORT DATA FROM SOURCE SYSTEMS
First: Gather data from the several source systems into S3


Hadoop

Full/Daily Imports
MySql
(production db)

External Systems

MapReduce for: 	

- Listens 	

- Plays	

- Impressions	

- Affiliations	

- ...
IMPORT DATA FROM SOURCE SYSTEMS
Second: Rebuild staging area tables for full imports

Based on configuration files

!
tracks

users

client 	

applications

Create statements generated

...

!

Re-create DISTKEYS and SORTKEYS



Full control in changes in the data
model


Staging Area

!
yaml config files
IMPORT DATA FROM SOURCE SYSTEMS
Third: Import the data from S3 to RedShift


tracks

Full import: TRUNCATE & COPY	

Daily import: COPY	


users

Staging Area

client 	

applications

...
ETL AND DW DATAMODEL
!

ETL scripts divided into layers:

!

- Layer 1: Staging -> DW (dimensions)

- Layer 2: Staging -> DW (fact tables - raw data)

- Layer 3: DW -> DW (aggregated fact tables)

- Layer 4: DW -> Reporting Data Cubes (reporting data)

!
ETL AND DW DATAMODEL
DataWarehouse
ETL Layer 1 & 2

ETL Layer 3

ETL Layer 4

Data Exploration

Staging Area

Data Cleaning	

Data Transformation	


Data Presentation	


!

!

SQL

Ruby/SQL Scripts
Data Aggregation	


!

Ruby/SQL Scripts
JOB SCHEDULE AND EXECUTION
Job-scheduling tool developed
internally

Set dependencies between jobs

Execution in multiple machines

Supports all the ETL layers
DATA EXPLORATION
Simple and fast access to data

More time for “deep dives” into
data

Individualized Reporting

Allows interactivity between users

Integrated with RedShift
TIMELINE
Week 2
•
•

Week 4

Gap Analysis˝
Business Exploration
(requirements
interviews)

Week 6

Week 8

Week 10

Week 12

Week 14

Week 16

Requirement Analysis˝

•
•

Information Mapping
Design˝
Solution Design (Draft)

End of Analysis Stage˝

•
•

Define Infrastructure˝
Design Data Model

Infrastructure Ready!˝

•
•
•

Build ETL ˝
Build Data Cubes˝
Design Reports/Dashboards (Presentation
Layer)

BI 1.0 is built!˝

•
•

System/Integration
Tests ˝
User Acceptance
BI 1.0 is tested!˝

•
•

User Workshops˝
BI 1.0 Evaluation

BI 1.0 is ready
to use!˝

Milestones˝

Analysis Stage˝

Design & Build˝

Test & Deploy
DATA DEMOCRATIZATION
• Reports designed by end users

• Central repository for data analysis

• User interaction

• Data from one source only

• Scalable solution

• Data to the people!
what gets

listened to

where?

how much revenue do

we make in Brasil?

how many new users

did we get from that

campaign?
what makes a

sound successfull?

how do users use

our iOS and

Android apps?
do comments on

tracks correlate

with listens?

did the product

change affect

feature x?

what makes an

artist successfull?
!

QUESTIONS?
THANK YOU!
P.S. WE’RE HIRING.

SOUNDCLOUD.COM/JOBS

Contenu connexe

En vedette

Databases and agile development - Dwight Merriman (MongoDB)
Databases and agile development - Dwight Merriman (MongoDB)Databases and agile development - Dwight Merriman (MongoDB)
Databases and agile development - Dwight Merriman (MongoDB)jaxLondonConference
 
Design is a Process, not an Artefact - Trisha Gee (MongoDB)
Design is a Process, not an Artefact - Trisha Gee (MongoDB)Design is a Process, not an Artefact - Trisha Gee (MongoDB)
Design is a Process, not an Artefact - Trisha Gee (MongoDB)jaxLondonConference
 
45 second video proposal
45 second video proposal45 second video proposal
45 second video proposalNicole174
 
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...jaxLondonConference
 
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)jaxLondonConference
 
How Windows 10 will change the way we use devices
How Windows 10 will change the way we use devicesHow Windows 10 will change the way we use devices
How Windows 10 will change the way we use devicesCommelius Solutions
 
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)jaxLondonConference
 
Legal and ethical considerations redone
Legal and ethical considerations   redoneLegal and ethical considerations   redone
Legal and ethical considerations redoneNicole174
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...jaxLondonConference
 
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)Little words of wisdom for the developer - Guillaume Laforge (Pivotal)
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)jaxLondonConference
 
The state of the art biorepository at ILRI
The state of the art biorepository at ILRIThe state of the art biorepository at ILRI
The state of the art biorepository at ILRIAbsolomon Kihara
 
What You Need to Know About Lambdas - Jamie Allen (Typesafe)
What You Need to Know About Lambdas - Jamie Allen (Typesafe)What You Need to Know About Lambdas - Jamie Allen (Typesafe)
What You Need to Know About Lambdas - Jamie Allen (Typesafe)jaxLondonConference
 
Interactive media applications
Interactive media applicationsInteractive media applications
Interactive media applicationsNicole174
 
Practical Performance: Understand the Performance of Your Application - Chris...
Practical Performance: Understand the Performance of Your Application - Chris...Practical Performance: Understand the Performance of Your Application - Chris...
Practical Performance: Understand the Performance of Your Application - Chris...jaxLondonConference
 
Interactive media applications
Interactive media applicationsInteractive media applications
Interactive media applicationsNicole174
 
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...Introducing Vert.x 2.0 - Taking polyglot application development to the next ...
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...jaxLondonConference
 
Scaling Scala to the database - Stefan Zeiger (Typesafe)
Scaling Scala to the database - Stefan Zeiger (Typesafe)Scaling Scala to the database - Stefan Zeiger (Typesafe)
Scaling Scala to the database - Stefan Zeiger (Typesafe)jaxLondonConference
 
What makes Groovy Groovy - Guillaume Laforge (Pivotal)
What makes Groovy Groovy  - Guillaume Laforge (Pivotal)What makes Groovy Groovy  - Guillaume Laforge (Pivotal)
What makes Groovy Groovy - Guillaume Laforge (Pivotal)jaxLondonConference
 

En vedette (19)

Databases and agile development - Dwight Merriman (MongoDB)
Databases and agile development - Dwight Merriman (MongoDB)Databases and agile development - Dwight Merriman (MongoDB)
Databases and agile development - Dwight Merriman (MongoDB)
 
Design is a Process, not an Artefact - Trisha Gee (MongoDB)
Design is a Process, not an Artefact - Trisha Gee (MongoDB)Design is a Process, not an Artefact - Trisha Gee (MongoDB)
Design is a Process, not an Artefact - Trisha Gee (MongoDB)
 
45 second video proposal
45 second video proposal45 second video proposal
45 second video proposal
 
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...
Are Hypermedia APIs Just Hype? - Aaron Phethean (Temenos) & Daniel Feist (Mul...
 
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)
Packed Objects: Fast Talking Java Meets Native Code - Steve Poole (IBM)
 
How Windows 10 will change the way we use devices
How Windows 10 will change the way we use devicesHow Windows 10 will change the way we use devices
How Windows 10 will change the way we use devices
 
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)
Lambda Expressions: Myths and Mistakes - Richard Warburton (jClarity)
 
Legal and ethical considerations redone
Legal and ethical considerations   redoneLegal and ethical considerations   redone
Legal and ethical considerations redone
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)Little words of wisdom for the developer - Guillaume Laforge (Pivotal)
Little words of wisdom for the developer - Guillaume Laforge (Pivotal)
 
The state of the art biorepository at ILRI
The state of the art biorepository at ILRIThe state of the art biorepository at ILRI
The state of the art biorepository at ILRI
 
What You Need to Know About Lambdas - Jamie Allen (Typesafe)
What You Need to Know About Lambdas - Jamie Allen (Typesafe)What You Need to Know About Lambdas - Jamie Allen (Typesafe)
What You Need to Know About Lambdas - Jamie Allen (Typesafe)
 
Interactive media applications
Interactive media applicationsInteractive media applications
Interactive media applications
 
Practical Performance: Understand the Performance of Your Application - Chris...
Practical Performance: Understand the Performance of Your Application - Chris...Practical Performance: Understand the Performance of Your Application - Chris...
Practical Performance: Understand the Performance of Your Application - Chris...
 
Interactive media applications
Interactive media applicationsInteractive media applications
Interactive media applications
 
Why other ppl_dont_get_it
Why other ppl_dont_get_itWhy other ppl_dont_get_it
Why other ppl_dont_get_it
 
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...Introducing Vert.x 2.0 - Taking polyglot application development to the next ...
Introducing Vert.x 2.0 - Taking polyglot application development to the next ...
 
Scaling Scala to the database - Stefan Zeiger (Typesafe)
Scaling Scala to the database - Stefan Zeiger (Typesafe)Scaling Scala to the database - Stefan Zeiger (Typesafe)
Scaling Scala to the database - Stefan Zeiger (Typesafe)
 
What makes Groovy Groovy - Guillaume Laforge (Pivotal)
What makes Groovy Groovy  - Guillaume Laforge (Pivotal)What makes Groovy Groovy  - Guillaume Laforge (Pivotal)
What makes Groovy Groovy - Guillaume Laforge (Pivotal)
 

Plus de jaxLondonConference

Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...
Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...
Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...jaxLondonConference
 
JVM Support for Multitenant Applications - Steve Poole (IBM)
JVM Support for Multitenant Applications - Steve Poole (IBM)JVM Support for Multitenant Applications - Steve Poole (IBM)
JVM Support for Multitenant Applications - Steve Poole (IBM)jaxLondonConference
 
How Java got its Mojo Back - James Governor (Redmonk)
How Java got its Mojo Back - James Governor (Redmonk)					How Java got its Mojo Back - James Governor (Redmonk)
How Java got its Mojo Back - James Governor (Redmonk) jaxLondonConference
 
Java Testing With Spock - Ken Sipe (Trexin Consulting)
Java Testing With Spock - Ken Sipe (Trexin Consulting)Java Testing With Spock - Ken Sipe (Trexin Consulting)
Java Testing With Spock - Ken Sipe (Trexin Consulting)jaxLondonConference
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...jaxLondonConference
 
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...jaxLondonConference
 
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...jaxLondonConference
 
The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)jaxLondonConference
 
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)jaxLondonConference
 
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)jaxLondonConference
 
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...jaxLondonConference
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...jaxLondonConference
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
 

Plus de jaxLondonConference (15)

Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...
Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...
Conflict Free Replicated Data-types in Eventually Consistent Systems - Joel J...
 
JVM Support for Multitenant Applications - Steve Poole (IBM)
JVM Support for Multitenant Applications - Steve Poole (IBM)JVM Support for Multitenant Applications - Steve Poole (IBM)
JVM Support for Multitenant Applications - Steve Poole (IBM)
 
How Java got its Mojo Back - James Governor (Redmonk)
How Java got its Mojo Back - James Governor (Redmonk)					How Java got its Mojo Back - James Governor (Redmonk)
How Java got its Mojo Back - James Governor (Redmonk)
 
Java Testing With Spock - Ken Sipe (Trexin Consulting)
Java Testing With Spock - Ken Sipe (Trexin Consulting)Java Testing With Spock - Ken Sipe (Trexin Consulting)
Java Testing With Spock - Ken Sipe (Trexin Consulting)
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
 
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...
Java EE 7 Platform: Boosting Productivity and Embracing HTML5 - Arun Gupta (R...
 
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...
Exploring the Talend unified Big Data toolset for sentiment analysis - Ben Br...
 
The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)The Curious Clojurist - Neal Ford (Thoughtworks)
The Curious Clojurist - Neal Ford (Thoughtworks)
 
TDD at scale - Mash Badar (UBS)
TDD at scale - Mash Badar (UBS)TDD at scale - Mash Badar (UBS)
TDD at scale - Mash Badar (UBS)
 
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)
Run Your Java Code on Cloud Foundry - Andy Piper (Pivotal)
 
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)
Put your Java apps to sleep? Find out how - John Matthew Holt (Waratek)
 
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...
Project Lambda: Functional Programming Constructs in Java - Simon Ritter (Ora...
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
 
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...
 

Dernier

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Dernier (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

Data Democratization at Soundcloud - Bruno Sá (SoundCloud)

  • 1. DATA DEMOCRATIZATION @ SOUNDCLOUD October, 29th 2013
  • 3. SOUNDCLOUD IS THE WORLD’S LEADING AUDIO PLATFORM
  • 4. Every minute, creators upload
 12hrs 
 of audio

  • 6. ! 8% 
 of the internet

  • 7.
  • 8. PRESIDENT OBAMA FOO FIGHTERS SNOOP LION MADONNA SKRILLEX MACKLEMORE JOHN OLIVER˝ (DAILY SHOW/BUGLE)
  • 9.
  • 10. what gets
 listened to
 where? how much revenue do
 we make in Brasil? how many new users
 did we get from that
 campaign? what makes a
 sound successfull? how do users use
 our iOS and
 Android apps? do comments on
 tracks correlate
 with listens? did the product
 change affect
 feature x? what makes an
 artist successfull?
  • 11. DATA DEMOCRATIZATION • Avoid Silos
 • Remove unnecessary restrictions
 • Provide simple tools
 • Teach People how to use data
  • 12. DATA DEMOCRATIZATION In one sentence: Deliver the right information to the right person at the right time.
  • 13. DATA ANALYSIS AND REPORTING 2010-2012 PRODUCTION DB ANALYTICS DB
  • 14. DATA ANALYSIS AND REPORTING Listens
 Sounds
 Users
 Comments
 Favorites
 Shares
 Reposts Impressions
 Clicks
 Conversions
 Suggestions
 Downloads
 Taggings
 Uploads
  • 15. DATA ANALYSIS AND REPORTING Listens timestamp
 duration
 sound
 owner
 listener
 API-key
 (location)
 country
  • 16. DATA ANALYSIS AND REPORTING additional metadata:
 • location within sound
 • context (location on site)
 • segmentation Listening creates >6000 events/s BIG DATA
  • 17. HADOOP TO THE RESCUE 2 Datacenter in AMS
 200+ Nodes
  • 18. HADOOP TO THE RESCUE listen data
 listen metadata
 search data
 recommender data
 product testing data
 backend production data
 backend logs
  • 19. HADOOP AND DATA DEMOCRATIZATION Data is siloed on hadoop
 Data governance not existing
 Technical hurdles for access
 Not realtime
 Slow access
  • 20. AMAZON REDSHIFT Fast fully managed DW service
 Optimized for petabyte or more datasets
 Fast query and I/O performance
 Columnar storage technology
  • 21. BI INFRASTRUCTURE Source Systems 2013 Staging Area DataWarehouse Data Exploration Amazon EMR Hadoop Pig/Ruby Scripts COPY MySql (production db) Pig/Ruby Scripts External Systems ... ETL Scripts Job execution powered by: ETL Scripts
  • 22. IMPORT DATA FROM SOURCE SYSTEMS First: Gather data from the several source systems into S3
 Hadoop Full/Daily Imports MySql (production db) External Systems MapReduce for: - Listens - Plays - Impressions - Affiliations - ...
  • 23. IMPORT DATA FROM SOURCE SYSTEMS Second: Rebuild staging area tables for full imports
 Based on configuration files
 ! tracks users client applications Create statements generated
 ... ! Re-create DISTKEYS and SORTKEYS
 
 Full control in changes in the data model
 Staging Area ! yaml config files
  • 24. IMPORT DATA FROM SOURCE SYSTEMS Third: Import the data from S3 to RedShift
 tracks Full import: TRUNCATE & COPY Daily import: COPY users Staging Area client applications ...
  • 25. ETL AND DW DATAMODEL ! ETL scripts divided into layers:
 ! - Layer 1: Staging -> DW (dimensions)
 - Layer 2: Staging -> DW (fact tables - raw data)
 - Layer 3: DW -> DW (aggregated fact tables)
 - Layer 4: DW -> Reporting Data Cubes (reporting data)
 !
  • 26. ETL AND DW DATAMODEL DataWarehouse ETL Layer 1 & 2 ETL Layer 3 ETL Layer 4 Data Exploration Staging Area Data Cleaning Data Transformation Data Presentation ! ! SQL Ruby/SQL Scripts Data Aggregation ! Ruby/SQL Scripts
  • 27. JOB SCHEDULE AND EXECUTION Job-scheduling tool developed internally
 Set dependencies between jobs
 Execution in multiple machines
 Supports all the ETL layers
  • 28. DATA EXPLORATION Simple and fast access to data
 More time for “deep dives” into data
 Individualized Reporting
 Allows interactivity between users
 Integrated with RedShift
  • 29. TIMELINE Week 2 • • Week 4 Gap Analysis˝ Business Exploration (requirements interviews) Week 6 Week 8 Week 10 Week 12 Week 14 Week 16 Requirement Analysis˝ • • Information Mapping Design˝ Solution Design (Draft) End of Analysis Stage˝ • • Define Infrastructure˝ Design Data Model Infrastructure Ready!˝ • • • Build ETL ˝ Build Data Cubes˝ Design Reports/Dashboards (Presentation Layer) BI 1.0 is built!˝ • • System/Integration Tests ˝ User Acceptance BI 1.0 is tested!˝ • • User Workshops˝ BI 1.0 Evaluation BI 1.0 is ready to use!˝ Milestones˝ Analysis Stage˝ Design & Build˝ Test & Deploy
  • 30. DATA DEMOCRATIZATION • Reports designed by end users
 • Central repository for data analysis
 • User interaction
 • Data from one source only
 • Scalable solution
 • Data to the people!
  • 31. what gets
 listened to
 where? how much revenue do
 we make in Brasil? how many new users
 did we get from that
 campaign? what makes a
 sound successfull? how do users use
 our iOS and
 Android apps? do comments on
 tracks correlate
 with listens? did the product
 change affect
 feature x? what makes an
 artist successfull?
  • 33. THANK YOU! P.S. WE’RE HIRING.
 SOUNDCLOUD.COM/JOBS