Nubank is using machine learning, analytics and data engineering to disrupt the financial industry in Brazil. With over 5 million customers, it's already the biggest fintech outside Asia.
In this presentation I go over the main learnings of the company in the data space since it was founded 5 years ago. How did the company look like on each of those years? What mistakes did we make? What lessons did we learn?
This deck was presented in the São Paulo Product.io Meetup
8. • Company started on May 2013
• 10 employees by the end of the year
• Mostly engineers, no one directly
working with data
• No product yet
2013
9. No Product = No Data
• Getting to product-market fit
is priority #1
• You won’t even have that
much data to work with
until you get there
• Early stage startups are not
the right place to work as a
Data Product Manager
Learning 1
11. • First credit card transaction in April
2014
• Product launched for friends & family
• Manual credit approval
• From 10 to 35 employees, head of credit
and first 4 analysts hired
• 10.000 customers by the end of the year
2014
12. Credit is hard!
• Takes a long time for
credit decisions to be
evaluated (in our case,
several months)
• An incorrect policy could
cause the company to go
bankrupt before anyone
notices
Learning 2
14. • Product goes viral: from 10.000
customers to 400.000 in a single year!
• Surge in number of customers requires
very fast growth of customer service:
from 35 to 250 employees
• Business Analysts and Data Scientists
are now 10
• Squad data science created
2015
15. • First policies built to predict how much
customers would spend and how likely
they are to pay back their cards
2015
16. Data itself is a product
• Do we have all the data we
need? Obtaining it is part of
the problem
• Is it complete? Correct? Of
good quality? Do we need
backfills?
• Need to follow all regulations
Learning 3
19. • Hit a million customers during the year
• Finished the year with 400 employees
• 30 BAs and DSs,
• Squad DS is exploded, data people
working from various teams
• Some engineers start specializing on
data pipelines
2016
20. Centralized BI doesn’t
scale
• A central team can be
effective to establish
standards and best practices,
and to prioritize an
overwhelming number of
requests
• As the company grows, you
need to embed analytics into
each team to keep agile
Learning 4
21. • Model creation starts to become more
industrialized
• Automatizing key reports for central
bank leads us to creating our ETL and
our analytical environment
2016
22. 22
ETL
• Extract: Data is extracted from the production
environment and sent to the analytical
environment
• Transform: Data is refined into cleaner and
easier to use datasets
• Load: Datasets are loaded into databases that
can be accessed by consumers
23. You need an ETL
• High latency, high
throughput
• Horizontally scalable
• High accessibility
• Heterogeneous data
• Pain on write
• Unified, global
Learning 5
25. • Over 3 million customers
• Launched our next two products:
Rewards and NuConta
• 700 employees
• 50 BAs and DSs,
• Squad data infra
2017
26. • Structuring our data warehouse
• Dimensional modeling
• Batch models running on the ETL
• First BI tool: metabase
2017
27. First BI tool: Metabase
• Open source, self-hosted
• Allows querying our data
warehouse (ETL results)
• Go-to tool for writing simple
queries and creating simple
dashboards
• Point and click interface
empowers users that don’t know
SQL
29. ETL Jobs
• Anyone in the company can
contribute ETL jobs by opening a PR
in our monorepo
• Teams are responsible for writing
and maintaining their jobs
• Jobs are written in scala (sparkSQL);
some DSLs are provided
• Use databricks to iterate on logic
• Peer review to ensure quality and
consistency
• 100 contributors making 400+
contributions per month
30. Focus on the Platform
Problem: Data team creating
datasets (tables) for the
entire company
• Lack fo context
• Hard to prioritize among
various teams
• Becoming a bottleneck
Learning 6
31. Solution: Empower vertical
teams to own dataset
creation
• Focus on tooling,
training and support
• Remove
interdependencies
Focus on the Platform
Learning 6
33. • Over 5 million customers
• Launched debit cards
• 1200 employees
• 90 BAs and DSs,
• Squad data infra in Berlin office, squad
data access in São Paulo office
2018
35. Data Services
Trainings: Weekly trainings on SQL,
python or scala, new employee
onboarding, new tool rollout
Support: Dedicated slack support
channels; community of users support
each other
Meetings: Forums for sharing data
scientist and analyst work, monthly
meetings to discuss state of Data
Data Analysts: Function focused to
improving data usage in the company
(not SQL slaves!)
36. Invest on your people
Learning 7
• Training employees is not
only HR’s job
• Proactive investment on
training can avoid reactive
support work
• Sometimes the problem is
behavioral, not technological
38. Building is not enough
• Internal launches are also
launches
• You need training and
support
• Do the benefits of your mew
internal product outweigh
the switching costs?
Learning 8
40. • Future: dozens of millions of customers
• Thousands of employees
• Hundreds of analysts, dozens of data
scientists
• Growing data org
2019
and beyond
41. • Things we’ll work on:
• New data protection law
• Giving employees even more data
ownership
• Data Portal
• New Data Warehouse
• Infra refactors to better support new
product and refactors
2019
and beyond
43. No Product = No Data
Credit is hard!
Data itself is a product
Centralized BI doesn’t
scale
You need an ETL
Focus on the Platform
Invest on your people
Building is not enough