9. Simple mathematical law …
Sum of Bernoulli = Binomial
Positive response ratio
Sum of Bernoulli -> Gaussian
pi
= ∑j
Yij
/ N
Large N
N = 10 N = 100
10. … that can be very helpful …
p(time) = 0.7 - 0.2 * time p(time) = 0.5
11. … but certain assumptions break
Length of every time interval
● Poor temporal resolution
● p no longer constant
● Few interactions, normal
approximation breaks
● Slower computation
Large Small
13. Categorical aggregation
Bernoulli
Feedback Yij
0 or 1
Binomial
(N0
, N1
)
(34, 27)
logit(p) ~ time + style_color + style_group
Group by each feature to make sure that p is approximately constant within
Binomial draw. Now time can be aggregated to an arbitrarily small time scale
14. Statistical methods with Bernoulli variables
● Pros:
○ Simple, flexible
○ Well studied technique
● Cons:
○ Large dataset
○ Large number of features
○ Scalability problems
● Pros:
○ Smaller dataset
○ Faster computation
○ Natural regularization that
helps with non-uniform data
● Cons:
○ Requires a more complex
ETL and analysis process.
Logistic Regression Models Generalized Linear Mixed Models
15. Simulating linear fashion trends
1000
random
styles Si
in
inventory
Interacting
with a large
uniform set
of clients
3 interactions
per day for
two years with
probability pi
pi
= pi,o
+ mi
* time
pi,o
~ N(0.6, 0.1) mi
~ U(-0.1, 0.1)
16. A GLMM linear trend classifier
logit(p) ~ X + Z +
X and Z have an offset and time as features
There is a slope per style id, with 95% CI
Out of fashion
CI all negative
Trending
CI all positive
19. Simulating cyclical seasonal trends
1000
random
styles Si
in
inventory
Interacting
with a large
uniform set
of clients
3 interactions
per day for
two years with
probability pi
pi
= pi,o
+ Ai
* cos(2 (time - t0
))
pi,o
~ N(0.6, 0.1) Ai
~ U(0, 0.1) t0
~ U(0, 1)
23. Discovering cyclical seasonal trends
Thousands
of real
styles Si
in
inventory
Interacting
with a large
uniform set
of clients
Use the style feedback as
a probe for seasonality
25. Conclusions
● Defining client feedback as a binary variable simplifies the
statistical analysis of trends
● The normal approximation is a useful tool but lacks the right
level of flexibility, and its assumptions are easily broken.
● Binomial data can be fit with generalized linear mixed effect
models, and the random effect coefficients can be used to
classify trends on styles.
● Our application to Stitch Fix data proves that the method
has real business applications.
28. Linearizing the cosine term
pi
= pi,o
+ Ai
* cos( 2 ( time - t0
) )
cos( - ) = cos( ) * cos( ) + sin( ) * sin( )
pi
= pi,o
+ Bi
* cos( 2 * time ) + Ci
* sin( 2 * time )
29. A GLMM seasonal trend classifier
logit(p) ~ X + Z +
X and Z have an offset and cosine
and sine of 2 by time as features
There are two temporal coefficients
per style id, with 95% CI
Non-seasonal
CI all comp. with 0 Any other case
Seasonal