Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

❖ Head of R&D, Dafiti
❖ 17+ years IT stuff
❖ 1981 - 2011 in Germany
❖ 2011 Rocket Internet (locondo, lamoda, dafiti)
❖ Since 2011 in Brazil
➢ and in some way or another w/ Dafiti
❖ married, 2 sons
❖ Skype: georg.buske
❖ Drop me an email: georg.buske@dafiti.com.br
whoami; Georg Buske

Lots of stuff, won’t stop at each slide for long
Should be interactive - please ask anything during the presentation
Disclaimer

Industry view
Get to know Dafiti
Lots of examples and showcases, about successes and failures
Answer your questions
Today’s Objective

• Founded in 2011;
• Offices in 4 countries;
• 2.900 employees;
• 5 warehouses in LATAM;
• 50MM monthly users;
• > R$ 1.4 bi gross revenue
• Belongs to Global Fashion Group since 2014.
• Today ~120 people in IT (Brasil)
• R&D area created in 01/2018
• DFTech: Dafiti’s tech brand
Our history
Dafiti & GFG

Dafiti & GFG
- Global fashion group
(GFG)
- founded in 2014
- HQ London /
Singapore / tech
hub Vietnam
- operates in 27
counrites
- joint initiatives

IT Organizational timeline
● 2010 - 2011
○ project CTO / Rocket Internet
○ local dev team
○ Berlin dev team
● 2011 - 2012 (incl. first jira generation)
○ IT support
○ project teams
○ sprint team (each 1 Manager + coordinators)
○ backoffice team
○ infrastructure team
○ dedicated QA
○ outsourced developers
● 2012 - 2013 (incl. new jira generation)
○ as before with architecture team
● 2013 - 2014
○ as before with module owners inside sprint and project teams (technical ownership)
○ NOC
○ agile cells and committee of technical leaders (with POs and SMs) instead of project and sprint
○ lots of SAP consultants, backoffice team is now more part of global IT

IT Organizational timeline
● 2015 - 2016
○ renamed architecture team to labs
○ dedicated UX / frontend team
○ removed NOC
○ removed outsourced developers
○ squads and [explicit] cross functional teams instead of agile cells
○ PO = PM (product manager)
○ removed scrum master
○ renamed labs to devtools
○ added maintenance team
● 2017 - 2019++
○ PMs, POs, squads, pillars, SREs, prioritization committee and product funnel, R&D (from 2018)
Learning 2: All approaches since 2014 are all not that different and could have
worked with the right focus and methodologies. Thus another approach might
Learning 1: Whatever the next approach will be, we should fix the structure last
(processes first).

“Culture eats strategy to breakfast”
-- Peter Drucker

Organizational Design and communication
structures
Conway’s Law
Organizations which design systems are constrained to produce designs which are copies of the
communication structures of these organizations.
Brook’s Law
Adding human resources to a late software project makes it late
Jeff’s 2 Pizza rule
If a team couldn't be fed with two pizzas, it was too big

Project timeline
2011 Relaunch (Magento -> Alice and Bob)
2012 - 2013 lots of minor projects
2014 SAP
2015 Marketplace
2016 TriKan Integration (also deployment change because of audit problem)
...
Learning: There will never be the right time for [technical and/or cultural] shift -
and there will be always be #blackfriday at the last friday in November!

Legacy systems
“Today’s implementation is tomorrow’s legacy”
Dafiti’s system is a 8 year old set of monolithic applications

#DfTechJourney - Concept
● Core Customer
● 21th century mindset (VUCA)
● Technology Heavy User
● Contributors Culture – Empowerment
● Space for experimentation
● Execution speed & Fail Fast (MVP’s)
● BT - Business Technology
● Never left the day 1 – Agile “Start-UP” (by Jeff Bezos -Amazon)
What are the main points of Exponential structure & BT?

Transform Dafiti's E-commerce to an
Exponential Platform for our Customers
Improving the user experience, applying the best technologies,
through a learning culture and continuous improvement.

Wave 1 - Rice and Beans
6 months

Wave 2 - The place to be
1 Year

Wave 3 - F*cking Awesome
6 months

Infrastructure
Corporate IT & DC
Information Security
People & Culture
Products
SRE
Governance
Innovation &
Intelligence
Tech Stack
D&A
Platform
Backoffice

HIERARCHY CHART
CTO
Cristiano Hyppolito
Head of Eng
LATAM Rafael
Morelo
Head of R&D
LATAM Georg
Buske
Manager of Eng
LATAM
Pablo Maronna
Head of Gov
LATAM
Leandro Lemes
Head of InfoSec
LATAM
Luis Gonçalvez
Head of
BackOffice
LATAM
Adriana Ramos
Head of Infra
LATAM
Fabio Jacometto
Argentina Chile Colômbia
Coord Helpdesk CH
& CO TBD
Colombia
Chile
Argentina Chile
Head of AGILE
TBD
Org structure: Classical Organigram, but in practice super flat
During 2019: 300 Astronauts in Brazil + Argentina + Chile + Colômbia

#Dafiti
Our purpose is to revolutionize the fashion
ecosystem with intelligence.
Our principles:
- we put the customer at the center of everything
- we never stop learning
- we act with intelligence
- we build the best teams
- we trust and support each other
- we work together for the common good
Lots of achievements:
Our purpose, our journey, our blackfriday!
4 x orders of a normal day
324 orders / minute

● Lots of new collegues (third parties and full time hires)
● company wide agile rollout
● Ghostbusters (internal hackathon)
● intercontinental teams (AR + BR)
● lots of fun and beer (in fact, at least every friday - cheers)
● consulting for agile, platform and more
● new platform to come
● new dashboards via live
and many more...
#DFTechJourney

R&D and Innovation Recap & Outlook
Training for all

Safari as learning platform

There are technical topics in other
departments which want to get taught:
Python. SQL. HTML, Big Data, Angular/
React, Arquitetura de Banco de Dados/ ETL,
R (programming)
DFTAcademy rollout

#DfTechJourney
Trying new ways for talent
acquisition in tech: hackerX
and stackoverflow talent

● Machine Learning 101: regression (home prices prediction)
○ https://docs.google.com/presentation/d/1JAg382c9LMrdUm1lSvOfiTGTpWj9iEDKU1Saz9NAEPk
● Machine Learning 101: Image Understanding (Fashion-MNIST)
○ https://docs.google.com/presentation/d/122Pl6ej1x4JZVI1aN-Lawb6LlQ7gEOKT3C5x11L0EkA
● Machine Learning 101: Natural Language Processing (Rating and Reviews)
○ https://docs.google.com/presentation/d/1mC01GXDTByoRNtrPUdpxe1rWqlZ9u5EaxbJsM9Yl0rw
● Machine Learning 101: clustering (Dafiti brands)
○ https://drive.google.com/drive/folders/1XeHMBgh2Lx9LwJpX6Hunb2I0RgX5WgdQ?ogsrc=32
● Machine Learning 101: Recommendation engine (Dafiti products)
○ https://drive.google.com/drive/folders/1hgf4NzOEE0ExRb0EFQ7XUpT8MqFivrfM?ogsrc=32
● Python 101:
○ https://drive.google.com/drive/folders/1OHbNu8DBh3WecpY3jmJVQdd_tpyyaACs
Internal workshops

Workshops delivered through DFT
Academy and HR support 11/2018
and 12/2018
#DfTechJourney
ML Workshops

Machine learning guild’s main
objective in 2018:
● create internal workshops
objective 2019:
● papers we love / journal club
#DfTechJourney
ML Guild

● 25.07.2018 - definition and goals
● 29.08.2018 - DWH training Redshift
https://docs.google.com/presentation/d/1muxuxnlBgG0GAF9RP9vFtYcNfEw5JWCydqaoEYT8VUY
● 19.09.2018 - Data catalog and internal system Hulk
https://docs.google.com/presentation/d/1CscU8TcI-
2YsJGJCJxiewEXS9o1qxCykOzJZ95SZd4w/edit?ts=5ba293fb&pli=1#slide=id.g3f4ca1ae3c_1_0
● 10.10.2018 - internal system Nick
● 02.01.2019 - data security (TBD)
Summaries: https://docs.google.com/document/d/1d9Edegl2iiLlH4Qa7PkROwb_5FyYd-
e3GaYAVQpufgU/edit#
Data Guild

● Agile trends (H1)
● Sponsoring papis.io (H1)
● Hosting pydata meetup
● Semacomp
● II Congresso Latino-Americano de IA
● Mediaeval
● Hosting deep learning meetup
Events
#DfTechJourney

papis.io sponsoring
papis.io is a maior conference
about machine learning
#DfTechJourney

Follow: https://twitter.com/dafiti_tech or https://www.linkedin.com/company/dafiti/
Tweet: I <3 ML #papis #dafiti #ufsm
To make part of the raffle to win a papis LATAM 2019 ticket

Hosting pyData meetup
#DfTechJourney

II Congresso IA LatAm

#DfTechJourney
mediaEval, France

And more
Agile trends 04/2018, Semacomp 10/2018, deep
learning meetup in 12/2018 (TBD) o/

OKRs (a.k.a. objectives and
key results)

Shared OKRs
1
12
5.001
● company wide
● guarantees alignment and focus
● Strategic Objectives valid for 1 year
● KRs reviewed every 3 months
● regular team check-ins
● confidence index

Shared OKRs
#DfTechJourney
Still learning - MVP

#DfTechJourney
● Physical Kanban board with
backlog
● Started with sprints and Jira board
○ continue to improve in 2019
with participation by agile
masters
○ timebox: at least weeks
○ caution: extensive planning
● 2 - 2 - 2
○ 2 days kick - off
○ 2 weeks demo
○ 2 months verifiable user
facing prototype
Methodology
CRISP-DM
Various breakdowns on Kanban board

Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Initiative
Predictability
Uncertainty
Cone of Uncertainty
Nessa etapa o time
conseguirá dar uma
previsibilidade de
entrega baseado no
histórico.
When?
What?
How?
For what?
Which?
Why?
Stakeholders
C-Level
Product
Manager
Engineering
Manager
Product
Owner
Engineering
Manager
Engineering
Manager
4x
2x
1,25 x
0,8 x
0,5 x
0,25 x
Product development workflow

Platform
Team
Discovery Payment & Order + MKTplace Post Sales
Platform
Team
Platform
Team
Feature
Team
Feature
Team
Feature
Team
SRE
Team
SRE
Team
SRE
Team
Platform
01/01
Feature
01/01
Infra
00 / 01
Platform
01/01
Feature
01/01
Infra
01 / 01
Platform
01/01
Feature
01/01
Infra
00 / 01
Product team split (pillars & squads)
- each pillar has a PM, an engineering manager and various quads responsible for specific
features consisting of: Engineers and product owners
- and supported by (cross): Agile coaches, UX, Data engineering, AI and infrastructure

Architecture 2019
#DfTechJourney

A macro view of the technologies we will use...

...AND DATA FUELED
by our awesome D&A team :)

Data Lake DWH
Reporting
Sharing
Data
Load/ Export
Orchestration
Data Quality
Scheduling
Monitoring
ETL
Data Streaming
Datamarts
Feeds
Named Queries
Data Security
D&A
Tech Services

accengage
adjust
admotion
appannie
b2b
b2w
bingads
bob
campaign
carmen
criteo
cubiscan
dynad
exacttarget
exchange
external
fabric
facebook
financial
fit
freight
google
gotcha
Dafiti Data Lake
homer
ino
internal
itunes
king
madruga
marketing
markovian
netsuite
osticket
parallel
price
reception
responsys
sap
seller
solr
supplier
taboola
tms
wms
yahoo
zanox
zendesk
> 50 Different Sources
> 160 Database
schemas
8Tb distributed in 800k ORC / Parquet Files
7.5Tb in 6k Tables
http://172.18.10.70:8080/nick/home
Huge Files
When the files
aren´t so big and
we need to apply
filters
For more demanded data
D&A
Data Architecture

D&A Governance
D&A
Data Sharing Map
Transactional Systems / External
Sources
BI Tools
Operational Reports
(based on 1 system)
Data Feeds / Data Interfaces Operational Reports - “Heavy”, “Hard do run” reports
(based on 2 or more systems)
Tactical Reports / Dashboards
External platforms
Historical Data
Data Mining / AI
GFG BI
Global Pricing
Live
Visenze
Marketing Apps

D&A
Data Hub
Pricing
Commercial Planning
Supply Chain
Logistics
Transportation
Data Mining / AI
Other Platforms
GFG BI
Global Pricing
Google / Facebook
Financial Processes
Customer Service

the
Now R&D and Innovation…
Executive Summary

D&A / DWH
R&D / D&A
Ops / D&A
R&D / Eng.

R&D Team
Will Marcio
Ricardo
contratando
Georg
Drop us an email: research-and-development@dafiti.com.br
Rafael Albert

● Visual conception
○ Visenze
○ Streamoid
○ Flashwall
○ Markable.ai
○ Syte
○ Flixstock
Third party product integration (PoCs 2018)
Understand needs, create assessment framework, search more
possible third parties (benchmark or integration)
use the startup ecosystem to create value for Dafiti!
there are many startups pushing into chatbots
and fashion (image similarity and catalog
enrichment) but nobody is trying the hard stuff
as a product (e.g. marketing budget allocation)
;-)

● academic research group
● current status: paper work
● works on AR and image
understanding

● Internships @ Dafiti
○ Still working on the contractual part but we are making this
happen
○ feel free to send me an email if you have interest:
georg.buske@dafiti.com.br
Dafiti <3 UFSM

Innovation hubs
● Starting in 2019 create hubs in Brazil
● Work more closely with GFG (first calls with lamoda
R&D, get back to innovation topics within GFG)
● Until 2020 assess possibilities in China and USA
Use the startup ecosystem as multiplier - this is
what an exponential platform means...
If you participate in UFSM incubator
and/or creating a startup disrupting
fashion and/or commerce we want to
hear from you :)

R&D Vision
Purpose: Lead the revolution of fashion and shopping
with AI and technological innovation.
Mission: Give Dafiti the capacity to use state-of-the-
art AI.
Innovation and research
needs alignment too!
Innovation and
Intelligence Committee

Innovating our fashion eCommerce and help with the transformation to THE fashion platform in LATAM with the aid of
innovative ways such as machine learning, resp. artificial intelligence in general. E.g.:
● Building algorithms that help us with anticipated shipping, purchasing forecast and protects us against system failure.
● using image recognition to give our users the highest possible convenience and coolest features.
● using state of the art game engines to build virtual reality into our customer experience.
● Optimize product search and build data consistency monitoring.
● Help building a large scale architecture together with entire IT team.
● create a machine learning framework / standard stack and rules (e.g. Sakemaker, CICD, multi-cloud, experiment,
tracking, etc.)
The outcome will be nothing less than transform the way e-commerce works and to
provide sustainable solutions.
Mostly the team won’t work directly on user facing products but assesses ways to create impact and works together with
other areas to make them happen.

Key takeaways
● Events and techbranding is important to attract talent (team goal achieved - open positions filled,
BIG THANKS TO OUR HRBPs <3)
○ OTOH we’ll reduce the number of indicators which make the brand index [the old index is not
in these slides, please refer to the R&D strategy docs for more information]
● Techbranding and internal workshops not only helps foster DFTech as brand and teach our internal
workforce but creates insights and identifies problems and opportunities
● The plan is to start 2019 with 100 % alignment and a mixed model of internal workforce, consulting
partnerships and third party providers
○ regular update and alignment meetings will be held in form of an intelligence & innovation
committee
○ using more rigid agile methods such as timeboxed sprints (incl. planning / review) to create
more visibility and better alignment on results
● we will invest more into our ML standards and stack (as already started)
1
Innovation and research needs alignment,
too!
=> Innovation and Intelligence Committee

Key takeaways
● To become a name in research we must invest more and thus will start with 20 % time for this and
will partner more with academic institutions
● We’ll reduce the number of area KPIs monitored to budget, people, PoCs realized, models launched
in production, innovations launched (ideation will be part of this metric), third parties assessed,
internal workshops given and techbrand initiatives (papers, articles, events, etc.) for now - KPI
review is not in this presentation (please see strategy docs for old area metrics list)
● Pricing optimization and marketing allocation projects didn’t bring the expected results yet
○ eventually we will invest into more research
○ also there are many startups pushing into chatbots and fashion (image similarity and
catalog enrichment) but nobody trying the hard stuff as a product ;-)
● Investment in search, recommendations (looks, emails, onsite), catalog enrichment and image
recognition might be the most important in 2019
2
Balance explore (PoCs) VS.
exploit (production)!

HOW
● Committee
OUTPUT
● prioritized shortlist
● team composition (third party,
R&D, interdisciplinary team,
etc.)
R&D Framework
How
HOW
● Area or Product wishlist
● ML guilds or 20 % research
● Design thinking workshops
per area
OUTPUT
● Wishlist backlog (now: google
docs, future: open innovation
portal)
● ideas, hypothesis
Ideation Prioritization
Commit
tee
HOW
● Workshop per area together
with R&D (regular schedule)
OUTPUT
● ML Canvas (ML 101
workshop)
○ definition of
success criterias
and metrics
● Business canvas
● 6 pager
Detailing
R&D /
areas
Dafiti
Identif
y
collab
oratio
n /
work
type

R&D Framework
How
HOW
● Retrospective (Committee)
● Operation (if success)
OUTPUT
● Lessons learned / Insights
Finalization
Com
mittee
HOW
● Development
● Test (AB test)
● Refine until satisfied or
aborted (validation
with user)
OUTPUT
● Success -> deployment, ops
○ API
○ end-to-end
● Failure -> fail wall
Implement
HOW
● Data curation
● Paper research
● Third party benchmark
● EDA
OUTPUT
● Baseline model
● insights
● validated hypothesis
● GO/NOGO
PoC
R&D,
area,
third
party
KR:
6
KR:
2
Agile;
squad
s; TBD
R&D,
area,
third
party

R&D Framework
RULES
● 2 weeks ahead of committee meetings requirements of possible projects and its
definitions (success metrics, ML canvas) needs to be done / aligned with R&D
● no ideation during committee, only backlog discussion (exception: today)
● area / product person responsible [optionally together with R&D] will present the detailed
ideation item to committee
How

R&D Framework
COLLABORATION / WORK TYPES
● PoC internal: The PoC execution is fully owned by R&D.
● Implementation: The implementation of the deployable live product which is full owned
by R&D (either end-to-end or as API).
● Coach: The implementation or PoC execution is owned by the area and a R&D member is
supporting the initiative as a coach.
● workshop (not part of framework): Either as TechTalk like workshop through
DFTAcademy or deeper classroom trainings (certification) ML concepts will be taught to
DFT employees (rather than a concrete business problem solved).
How

Design Thinking workshops
CONVERGENTE DIVERGENTE
Entender Definir Gerar
Ideias
Decidir
Necessid
ades
(Pessoas)
Viabili
dade
(Negóc
ios)
Possib
ilidade
(Tecnol
ogia)
Oportun
idades
(Inovaç
ão)
DESIGN
THINKING
● with and by our awesome UX team

Machine Learning Canvas
● the canvas help to understand the maturity of the project (in terms of data sources, value proposition,
etc.)
● not everyone needs to understand every part but having the canvas created and validated shows it is
ready to work on
● value proposition
○ if there is a overall business model canvas the proportional value can be used for the ML task at
hand
○ there must be a success metric
● For the ones eager to learn more:
○ New book draft (I will send later on)
○ ML 101 workshop where we’ll discuss ML canvas
○ more to come :-)

Example: Customer service
● Develop intelligence / integration for services
we already have: Chat BOT Facebook
Messenger / E-mail Form Site / FAQ
● Develop intelligence / integration for calls
that we would like to have: BOT Time Line
Facebook, Instagram and Twitter / Chat BOT
Shop / Whatsapp (Online and Offline), URA
(Voice Response)
● In addition, a work that was developed in
B2W and generated many gains in speed of
service, quality and standardization of
contacts was the development of a Virtual
Attendant, who in addition to passing
information, can execute actions (sending 2ª
Via Boleto, 2ª Via de Nota Fiscal, alteration of
cadastral data, sending reset of password
Initial ideas
BMC
(optional)
MLC
Ideation / Detailing

Innovation and product
launches

Image Similarity

Return rate prediction

Approach
Dado um item comprado, prever se o item será retornado ou não.
DW
Feature
Extraction
Modelo 1
Modelo 2
Modelo
Agregador
Prob. de
Retorno

Features
Foram usadas em torno de 100 features a partir do produto, cliente e
transação para o treinamento dos modelos. Dentre as features estão:
- CEP de entrega
- Fornecedor
- Marca
- Idade da Conta do Usuário
- Tempo entre pedido e entrega
- Tempo desde a última compra do cliente
- Net Total Value
- Número de pedidos e retornos observados no cliente até a data
- etc...

Modelo Agregado
- Scores abaixo de 0.01:
- 90% dos itens não-retornados
- Taxa de retorno de 0.02%
- Scores acima de 0.5:
- 53% dos itens retornados
- Taxa de retorno de 19%

● a successful model for return rate prediction was created
● deployed via AWS sakemaker (part of ML standard)
● could be easily adapted for cancellation rate

● During the return rate project we noted many of our business
concern involve Survival Analysis.
● Survival Analysis model situations in which there are discrete
events that take some time to occur.
● Most of our problems fall into a less standard type of Survival
models called Cure Models
● We are currently developing the capability of applying cure
models in complex datasets for both insights and predictive
modelling.
● This will allows us to attack return rates, cancellation rates,
second purchase behavior, time-to-delivery, time-to-stock-
replenishment and all sorts of time-to-X problems.

● Search the look (H1)
● Search - S4 (H1)
● Categorization (catalog automatization)
● Causal impact and marketing budget allocation
● Size filters [external partner: bmind]
● Ratings and reviews
● Brand clustering
● Sales forecasting Blackfriday
PoCs (proof of concepts)

Search S4

Search PoC (s4) as
fallback for datajet
(because of before
outages) with advanced
learning to rank and
search optimization

● Strategy 2019 work together on Search as a global product (datajet)
● learnings and advanced concepts from s4 will be applied to datajet

Sales forecasting Blackfriday

Blackfriday throughout the years
Looking at sales from Thursday 00:00 to Sunday 23:59 in the years 2013 to 2018 there is a pattern that repeats every year:

Simulating revenue for 2018 based on 2017
Given the distribution of gross revenue per hour that was generated in 2017 during the Blackfriday, we could generate a
revenue projection for 2018. The values expected for each hour were derived based on the total revenue estimated by the Live
Sales, which is a system used at Dafiti that implements a moving average type of calculation.

● Success during Blackfriday
● Knowledge and models obtained being applied to “General Sales
Forecasting”:
○ awareness of cyclic sales behaviour in specific time windows
○ lag features
○ extraction and usage of Dafiti’s full sales history
○ how to deal with the data granularity
○ benchmarking GBM vs Neural Network
While starting to work on pricing optimization
we realized we need a sophisticated
forecasting first

Categorization (catalog automatization)

● Its goal is to automate object identification only from sku images.
● Imagenet* exists since 2010, and this task is considered dominated by
computer science.
● Deep Learning models are the actual state-of-the-art for this task.
● We have enough data for big learning models, over 3 million images.
● We have the data (needs some work) and we have the model!
● The data needs some adjustments as catalog “mistakes are easy to find”.
● Also the used catalog trees have duplicates, attributes are considered
category, examples from name_tree3:
○ "Other", "Outras Roupas", "Outros".
○ "Pijamas", "Pijamas e Camisetas".
○ "Polo Manga Curta", "Polo Manga Longa", "Polos".
Catalog automatization

● The trained model achieves this results for these catalog trees:
Catalog model errors total sku accuracy
name_tree1 72.683 681.244 89,33 %
name_tree2 136.793 681.244 79,92 %
name_tree3 158.898 681.244 76,67 %

What is suggested to fulfill an automatization:
1. Data cleansing with model’s insights and/or enhanced categorization tree and attributes.
2. Train and validate new model’s predictions.
3. Repeat 1 and 2 until satisfied.
4. Connect this API into the sku registration steps.
Next steps catalog automatization and conclusion:
● high potential for catalog curation
● learnings from 2018 will be applied in catalog cleanup 2019

Ratings & Reviews

● The goal is to automate approval of reviews.
● Started with preparation for slides for a congress -> made part of the
hackathon -> was incorporated into ML 101 workshops -> results aligned with
business
● We have the data (also needs some work) and we have the model!
● The data needs some adjustments:
○ Is there a defined policy for approval/rejection of reviews?
○ Is historical data accurate enough for what the company wants for the future?
○ Does the company wants more insights from reviews?*
Ratings & Reviews

Ratings & Reviews Historical data
historical data:
reviews_approved.csv 519.463
reviews_rejected.csv 81.598
total reviews model’s errors accuracy f1-score
manually evaluated reviews 601.061 57.704 90,39 % 88 %
approved
rejected
Test data (15%) results:

model’s
confidence
text
0.916 A qualidade não é tão boa. Pelo preço esperava ms
0.968 Muito boa,linda.
0.663 Não consigo fechar a compra
0.589 A calça e pequena tenho 1.63 ela ficou no meio das pernas odiei.por favor me reponha o valor pago.
0.773 Descascou no primeiro dia de uso. Decepcionada...
0.878 Recebi o tênis tem uma semana, a primeira vez que meu filho usou e fui limpar, o tênis desbotou. Não tem
qualidade
0.869 Lola lp.k
0.917 Produto
in store: REJECTED model’s prediction: APPROVED

model’s
confidence
text
0.731 Gostaria de saber quando estará disponível o nº 34?
0.973 very satisfied with the product. Great finish and very good value for the money.
Fits my shoe-size perfectly
0.549 gOSTARIA DE SABER SE VCS TEM ESSE SAPATO EM AZUL MARINHO!! OBRIGADO
MARIA LUCIA
0.700 oieeeeeeeeee eu queria essa linda sandalia pfv venha me dar xhauu
0.527 Quero saber se posso trocar o número se não der ..
0.521 è muito bonito mas eu vivo em moçambique e gostava que abrisem uma loja ca em maputo na capital de
moçambique.
0.691 morri porfavor digam-me alguma coisa porfavor
in store: APPROVED model’s prediction: REJECTED

Ratings & Reviews Pending
reviews_pending.csv 219414
approved rejected
pending reviews 194.173 (88,5 %) 25.241 (11,5 %)
most confidence cases of: confidence value text
APPROVED 0.999 Decepção. Malha Muito fina e áspera, parece uma lixa.
REJECTED 0.999 Gostei muito da sandália, super confortável mas já
estou mandando de volta pois ela esfolou inteirinha na
parte interna em dois usos. Já enviei fotos, estarei
enviando de volta amanhã pra dafiti.

What is suggested to fulfill an automatization:
1. Data cleansing with model’s insights.
2. Train and validate new model’s predictions.
3. Repeat 1 and 2 until satisfied.
4. Connect this API into the rating and reviews validation steps.
New project: extract insights and information directly from users reviews,
possibilities to explore:
a. brand and products alarms on user problems (quality, fitting,...)
b. detect reviews that are customer support related
c. sentiment analysis
Ratings & Reviews Pending
Owner? Decision?
=> committee

Causal Impact and budget allocation

Hold-Out Testes
● Processamento séries temporais estruturadas
● Teste em produção no canal “google non-brand SEM”
● Confirmação estatística de valor representativo do canal
● Criação de algoritmos em Python

Hackathon - Marketing Budget Allocation:
● Time series and non-linear optimization
● Minimization of “CIR” (1 / ROI)
● Algorithm makes resource allocation suggestions to optimize CIR

Results:
● Opensourced port of causal impact package in R to python
● A Hackathon can create good insights and kick off BUT might create a false
sense of success
● Understood GA data is not complete
● Optimization TBD

Brand clustering

Brand Clustering Analysis
● The goal is to bring marketing insights on how users act on brands, and
reduce the brands dimension
● We used Google Analytics (GA) actions for 2 days on Dafiti website. 640.235
cookie sessions interacting with 276.923 skus of 4.825 brands.
● Top 4 more interacted brands are:
○ [('Colcci', 54948), ('Vizzano', 42715), ('Santa Lolla', 41401), ('Moleca', 34054)]
● Top 4 GA scores:
○ [('Beautiful Lingerie', 13.7532), ('Philco', 12.5799), ('#Euqfiz', 11.2241), ('Kmc', 9.1530)]

Brand Clustering Analysis
How are Dafiti brands related to others?
● 9 brands cluster -> ['Armadillo', 'DAFITI UNIQUE', 'Ki-fofo', 'Lua Luá', 'Mania De Moça',
'Meketrefe', 'Penguin', 'Red Life', 'Styll Baby']
● 6 brands cluster -> ['Cavage', 'DAFITI JOY', 'La Beauté Cosmétiques', 'Miu Miu', 'Montain Boot',
'Refuse']
● 10 brands cluster -> ['DAFITI ACCESSORIES', 'Enox', 'Khatto', 'Paul Ryan', 'Prorider', 'Secret',
'Sunnies', 'THOMASTON', 'Terra e Agua', 'Tilit']
● 582 brands cluster -> ['...Lost', '100% Marca Própria', '3 Sprouts', ..., 'DAFITI I.D.', 'DC
Original', 'DGK', 'DKNY', ...,'Sex and the City Cosmetics', ...,'Shoes Shoes', ...,'You Rock',
'Zebu', 'Zenit', 'Ziva']
● 71 brands cluster -> ['Alta Villa Shoes', 'Asics', 'Ausländer', 'Beautiful Lingerie', 'Botswana',
'Bracciale Acessórios', 'Bull Motors', 'CZ Brand', 'Calcifran', 'Cisco', 'Columbia', 'Crocs',
'DAFITI EDGE', 'Dangelis Moda Íntima', ...,'Won Sports', 'Yardley', 'adidas', 'adidas Originals',
'adidas Performance', 'test', 'zeus']
● 534 brands cluster -> ['24 Horas Calçados', ..., 'Bvlgari', ...,'Café Brasil', ...,'Cravo &
Canela', ..., 'DAFITI', 'DAFITI SHOES', ...,'GUESS Kids', ...,'Harley-Davidson Footwear',
...,'Moleca', ...,'Santa Lolla', ...,'Tiffany & Co.', ...,'VIA UNO', ...,'Vizzano',...]

The resulted clustering does not help much for marketing insights directly.
Some changes are needed to provide a direct business value:
1. Consider the problem as a recommendation task.
2. Implement changes to the “Marreco” system, to provide an analysis over brand interactions.
Brand clustering
3 most similar brands to Dafiti brands and its similarity score (cosine):
● DAFITI I.D. - [('D-Tox', 0.3293), ('Monte Carlo Polo Club', 0.1646), ('Drop Life', 0.0856)]
● DAFITI SHOES - [('Moleca', 0.2111), ('Ana Cristina', 0.1580), ('Vizzano', 0.1565)]
● DAFITI EDGE - [('FKN', 0.0718), ('Lemon Grove', 0.0665), ('Yachtsman', 0.0366)]
● DAFITI - [('Ride Skateboard', 0.0666), ('Santa Maria', 0.0591), ('Snoopy', 0.0524)]
● DAFITI ACCESSORIES - [('Vila Flor', 0.0455), ('Prorider', 0.0435), ('Flyca Girls', 0.0334)]
● DAFITI UNIQUE - [('Shoulder', 0.0246), ('Energia', 0.0231), ('DAFITI ONTREND', 0.0166)]

Filter cleanup analysis (sizes)

Sizes children’s clothes

Internal workshops

Conclusion:
● we leveraged third party knowledge (consulting) to do the analysis
● few [marketplace] products are creating a very bad user experience
● has some potential quickwins
● we need to align the best form in terms of architecture (first idea of DB
update might not be ideal) - product development support?
● What can we fix in registration process already?

AI awareness
Sales forecasting (train new model with learnings from blackfriday forecasting)
Price optimization
ng and Buying
Marketing allocation
Cancellation rate
Email click prediction (recipient selection, Markovien)
Customer segmentation / user profiles
Online recommendations
Search
Reinforcement learning
Survival analysis
Email recommendations (jetlore competition)
Image similarity
Delivery prediction
Delivery visualization
Image segmentation
Anticipatory shipping
Intelligent Sizing
NLP for chatbots and sentiment analysis
Looks, Image understanding and shoppable videos
VR
personalized discounts

Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

Recommandé

Recommandé

Contenu connexe

Similaire à Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

Similaire à Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019 (20)

Dernier

Dernier (20)

Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019