Evolution of Data at Nubank - Product.io Meetup 2019-01-29

Evolution of
Data at
Nubank
29/01/2019
André Tavares
Product Manager

• Biggest ﬁntech
outside Asia
• 5 million+ credit
card customers
• 2.5 million+
NuContas
• 1300 employees

• 60 squads
• 200 microservices and
30 models in production
• 40 Tb of data
processed everyday
• 550 DAUs on data tools

Provide reliable and efficient
platform, services and stewardship
for Nubank to
make better decisions with data

• Company started on May 2013
• 10 employees by the end of the year
• Mostly engineers, no one directly
working with data
• No product yet
2013

No Product = No Data
• Getting to product-market ﬁt
is priority #1
• You won’t even have that
much data to work with
until you get there
• Early stage startups are not
the right place to work as a
Data Product Manager
Learning 1

• First credit card transaction in April
2014
• Product launched for friends & family
• Manual credit approval
• From 10 to 35 employees, head of credit
and ﬁrst 4 analysts hired
• 10.000 customers by the end of the year
2014

Credit is hard!
• Takes a long time for
credit decisions to be
evaluated (in our case,
several months)
• An incorrect policy could
cause the company to go
bankrupt before anyone
notices
Learning 2

• Product goes viral: from 10.000
customers to 400.000 in a single year!
• Surge in number of customers requires
very fast growth of customer service:
from 35 to 250 employees
• Business Analysts and Data Scientists
are now 10
• Squad data science created
2015

• First policies built to predict how much
customers would spend and how likely
they are to pay back their cards
2015

Data itself is a product
• Do we have all the data we
need? Obtaining it is part of
the problem
• Is it complete? Correct? Of
good quality? Do we need
backﬁlls?
• Need to follow all regulations
Learning 3

Failure: “We don’t
need SQL”
2015

• Hit a million customers during the year
• Finished the year with 400 employees
• 30 BAs and DSs,
• Squad DS is exploded, data people
working from various teams
• Some engineers start specializing on
data pipelines
2016

Centralized BI doesn’t
scale
• A central team can be
effective to establish
standards and best practices,
and to prioritize an
overwhelming number of
requests
• As the company grows, you
need to embed analytics into
each team to keep agile
Learning 4

• Model creation starts to become more
industrialized
• Automatizing key reports for central
bank leads us to creating our ETL and
our analytical environment
2016

22
ETL
• Extract: Data is extracted from the production
environment and sent to the analytical
environment
• Transform: Data is reﬁned into cleaner and
easier to use datasets
• Load: Datasets are loaded into databases that
can be accessed by consumers

You need an ETL
• High latency, high
throughput
• Horizontally scalable
• High accessibility
• Heterogeneous data
• Pain on write
• Uniﬁed, global
Learning 5

• Over 3 million customers
• Launched our next two products:
Rewards and NuConta
• 700 employees
• 50 BAs and DSs,
• Squad data infra
2017

• Structuring our data warehouse
• Dimensional modeling
• Batch models running on the ETL
• First BI tool: metabase
2017

First BI tool: Metabase
• Open source, self-hosted
• Allows querying our data
warehouse (ETL results)
• Go-to tool for writing simple
queries and creating simple
dashboards
• Point and click interface
empowers users that don’t know
SQL

2017
Failure: Contribution
Margin Dataset

ETL Jobs
• Anyone in the company can
contribute ETL jobs by opening a PR
in our monorepo 
• Teams are responsible for writing
and maintaining their jobs
• Jobs are written in scala (sparkSQL);
some DSLs are provided
• Use databricks to iterate on logic
• Peer review to ensure quality and
consistency
• 100 contributors making 400+
contributions per month

Focus on the Platform
Problem: Data team creating
datasets (tables) for the
entire company
• Lack fo context
• Hard to prioritize among
various teams
• Becoming a bottleneck
Learning 6

Solution: Empower vertical
teams to own dataset
creation
• Focus on tooling,
training and support
• Remove
interdependencies
Learning 6

• Over 5 million customers
• Launched debit cards
• 1200 employees
• 90 BAs and DSs,
• Squad data infra in Berlin oﬃce, squad
data access in São Paulo oﬃce
2018

• Models starting to pop on several areas
of the company
2018

Data Services
Trainings: Weekly trainings on SQL,
python or scala, new employee
onboarding, new tool rollout
Support: Dedicated slack support
channels; community of users support
each other
Meetings: Forums for sharing data
scientist and analyst work, monthly
meetings to discuss state of Data
Data Analysts: Function focused to
improving data usage in the company
(not SQL slaves!)

Invest on your people
Learning 7
• Training employees is not
only HR’s job
• Proactive investment on
training can avoid reactive
support work
• Sometimes the problem is
behavioral, not technological

Failure: Moving users to a
new BI tool too fast
2018

Building is not enough
• Internal launches are also
launches
• You need training and
support
• Do the beneﬁts of your mew
internal product outweigh
the switching costs?
Learning 8

• Future: dozens of millions of customers
• Thousands of employees
• Hundreds of analysts, dozens of data
scientists
• Growing data org
2019
and beyond

• Things we’ll work on:
• New data protection law
• Giving employees even more data
ownership
• Data Portal
• New Data Warehouse
• Infra refactors to better support new
product and refactors
2019
and beyond

No Product = No Data
Credit is hard!
Data itself is a product
Centralized BI doesn’t
scale
You need an ETL
Invest on your people
Building is not enough

Interested in working
with us?
sou.nu/jobs-at-nubank

Evolution of Data at Nubank - Product.io Meetup 2019-01-29

Evolution of Data at Nubank - Product.io Meetup 2019-01-29

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Evolution of Data at Nubank - Product.io Meetup 2019-01-29

Similaire à Evolution of Data at Nubank - Product.io Meetup 2019-01-29 (20)

Dernier

Dernier (20)

Evolution of Data at Nubank - Product.io Meetup 2019-01-29