Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Lightning: large scale machine learning in python
1. LIGHTNING, A LIBRARY FOR
LARGE-SCALE MACHINE
LEARNING IN PYTHON
,Fabian Pedregosa (1) Mathieu Blondel (2)
(1) Chaire Havas-Dauphine / INRIA, Paris France
(2) NTT Communication Science Laboratories, Kyoto Japan
2. SCIKIT-LEARN: WITH GREAT CODE
COMES GREAT RESPONSABILITY
#lines of code in scikit-learn
Very selective for new algorithms/models.
3. LIGHTNING
Incorporate recent progress in large-scale optimization.
scikit-learn compatible .
scalable on large datasets.
support for dense and sparse input.
emphasis on structured sparsity penalties.
dependencies = Python + Cython + scikit-learn.
5. FROM LARGE DATA TO LARGE
OPTIMIZATION
Big data comes in different flavors.
n{
⎛
⎝
⎜
⎜
⎜
⎜
D
A
T
A
⎞
⎠
⎟
⎟
⎟
⎟
p
Large sample:
Computer vision, advertising,
etc.
Large dimension:
Biology, neuroscience, etc.
6. LEARNING FROM LARGE SAMPLES
Usual methods (gradient descent, BFGS, etc.):
Pass through the data at each iteration.
Prohibitive for large datasets.
Back to simple methods:
Stochastic gradient descent (Robbins and Monro, 1951).
7. LEARNING FROM LARGE SAMPLES
lighting example, n=100.000
In last 5 years, flurry of
new stochastic methods:
Stochastic variance-
reduced gradient
(SVRG)
Stochastic Dual
Coordinate Ascent
(SDCA)
Stochastic Average
Gradient (SAG/SAGA)
They are all in lightning!
8. LEARNING FROM LARGE FEATURES
Iterate through the columns.
Coordinate Descent-like algorithms.
Very efficient for sparse models.
(Blondel et al. 2013) , multiclass classification with group-lasso penalty
9. STRUCTURED SPARSITY
There's so much more than the Lasso ...
Group sparse penalty.
Total variation.
Trace norm (low rank).
10. API
Similarities and differences with scikit-learn
scikit-learn:
(penalty = 'l1', )LogisticRegression
loss function
solver='liblinear'
algorithm
lightning:
(penalty = 'l1', ) CDClassifier
algorithm
loss='log'
loss function
API based on algorithms, not models.
11. EXTENSIBILITY
Typical loss and penalties available.
Possible to pass custom loss or penalty function
clf = FistaClassifier(
loss=my_loss,
penalty=my_penalty)
(available for Fista*and SAGA*)
13. SCIKIT-LEARN-CONTRIB
lightning is just the beginning.
Welcome projects that are:
Your browser does not support SVG
scikit-learn compatible.
Documented.
Test coverage > 80%.
14. THANKS FOR YOUR ATTENTION
http://contrib.scikit-learn.org/lightning/
(We're hiring!)