SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
LIGHTNING, A LIBRARY FOR
LARGE-SCALE MACHINE
LEARNING IN PYTHON
,Fabian Pedregosa (1) Mathieu Blondel (2)
(1) Chaire Havas-Dauphine / INRIA, Paris France
(2) NTT Communication Science Laboratories, Kyoto Japan
SCIKIT-LEARN: WITH GREAT CODE
COMES GREAT RESPONSABILITY
#lines of code in scikit-learn
Very selective for new algorithms/models.
LIGHTNING
Incorporate recent progress in large-scale optimization.
scikit-learn compatible .
scalable on large datasets.
support for dense and sparse input.
emphasis on structured sparsity penalties.
dependencies = Python + Cython + scikit-learn.
SCIKIT-LEARN COMPATIBLE
mix lightning with scikit-learn Pipeline, GridSearchCV,
etc.
⟹
FROM LARGE DATA TO LARGE
OPTIMIZATION
Big data comes in different flavors.
n{
⎛
⎝
⎜
⎜
⎜
⎜
D
A
T
A
⎞
⎠
⎟
⎟
⎟
⎟
  p
Large sample:
Computer vision, advertising,
etc.
Large dimension:
Biology, neuroscience, etc.
LEARNING FROM LARGE SAMPLES
Usual methods (gradient descent, BFGS, etc.):
Pass through the data at each iteration.
Prohibitive for large datasets.
Back to simple methods:
Stochastic gradient descent (Robbins and Monro, 1951).
LEARNING FROM LARGE SAMPLES
lighting example, n=100.000
In last 5 years, flurry of
new stochastic methods:
Stochastic variance-
reduced gradient
(SVRG)
Stochastic Dual
Coordinate Ascent
(SDCA)
Stochastic Average
Gradient (SAG/SAGA)
They are all in lightning!
LEARNING FROM LARGE FEATURES
Iterate through the columns.
Coordinate Descent-like algorithms.
Very efficient for sparse models.
(Blondel et al. 2013) , multiclass classification with group-lasso penalty
STRUCTURED SPARSITY
There's so much more than the Lasso ...
Group sparse penalty.
Total variation.
Trace norm (low rank).
API
Similarities and differences with scikit-learn
scikit-learn:
(penalty = 'l1', )LogisticRegression
  loss function
solver='liblinear'
  algorithm
lightning:
(penalty = 'l1', ) CDClassifier
  algorithm
loss='log'
  loss function
API based on algorithms, not models.
EXTENSIBILITY
Typical loss and penalties available.
Possible to pass custom loss or penalty function
clf = FistaClassifier(
loss=my_loss,
penalty=my_penalty)
(available for Fista*and SAGA*)
FUTURE CHALLENGES
Parallel stochastic methods
(Leblond, Pedregosa, Lacoste-Julien 2016)
Out of core (scale beyond computer memory).
SCIKIT-LEARN-CONTRIB
lightning is just the beginning.
Welcome projects that are:
Your browser does not support SVG
scikit-learn compatible.
Documented.
Test coverage > 80%.
THANKS FOR YOUR ATTENTION
http://contrib.scikit-learn.org/lightning/
(We're hiring!)

Contenu connexe

En vedette

Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 

En vedette (9)

Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Performance and scalability for machine learning
Performance and scalability for machine learningPerformance and scalability for machine learning
Performance and scalability for machine learning
 
Observing Dark Worlds
Observing Dark WorldsObserving Dark Worlds
Observing Dark Worlds
 
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
That's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data ScienceThat's like, so random! Monte Carlo for Data Science
That's like, so random! Monte Carlo for Data Science
 
A journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategyA journey from a bad kpi to an excellent strategy
A journey from a bad kpi to an excellent strategy
 
Los adjetivos
Los adjetivosLos adjetivos
Los adjetivos
 

Similaire à Lightning: large scale machine learning in python

Scikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in PythonScikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in Python
Ajay Ohri
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
Videoguy
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
EuroCloud
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
EuroCloud
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 

Similaire à Lightning: large scale machine learning in python (20)

Scikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in PythonScikit-learn : Machine Learning in Python
Scikit-learn : Machine Learning in Python
 
Victoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division Fermilab
 
Breast Cancer Prediction.pdf
Breast Cancer Prediction.pdfBreast Cancer Prediction.pdf
Breast Cancer Prediction.pdf
 
GUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and BioinformaticsGUGC Info Session - Informatics and Bioinformatics
GUGC Info Session - Informatics and Bioinformatics
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
 
Pedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviewsPedro medeiros citi-cloudviews
Pedro medeiros citi-cloudviews
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Big data
Big dataBig data
Big data
 
AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...AI & Topology concluding remarks - "The open-source landscape for topology in...
AI & Topology concluding remarks - "The open-source landscape for topology in...
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
ppt_ids-data science.pdf
ppt_ids-data science.pdfppt_ids-data science.pdf
ppt_ids-data science.pdf
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
data science and its role in big data analytics.pptx
data science and its role in big data analytics.pptxdata science and its role in big data analytics.pptx
data science and its role in big data analytics.pptx
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 

Plus de Fabian Pedregosa

Plus de Fabian Pedregosa (10)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
Average case acceleration through spectral density estimation
Average case acceleration through spectral density estimationAverage case acceleration through spectral density estimation
Average case acceleration through spectral density estimation
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
 

Dernier

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Dernier (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Lightning: large scale machine learning in python

  • 1. LIGHTNING, A LIBRARY FOR LARGE-SCALE MACHINE LEARNING IN PYTHON ,Fabian Pedregosa (1) Mathieu Blondel (2) (1) Chaire Havas-Dauphine / INRIA, Paris France (2) NTT Communication Science Laboratories, Kyoto Japan
  • 2. SCIKIT-LEARN: WITH GREAT CODE COMES GREAT RESPONSABILITY #lines of code in scikit-learn Very selective for new algorithms/models.
  • 3. LIGHTNING Incorporate recent progress in large-scale optimization. scikit-learn compatible . scalable on large datasets. support for dense and sparse input. emphasis on structured sparsity penalties. dependencies = Python + Cython + scikit-learn.
  • 4. SCIKIT-LEARN COMPATIBLE mix lightning with scikit-learn Pipeline, GridSearchCV, etc. ⟹
  • 5. FROM LARGE DATA TO LARGE OPTIMIZATION Big data comes in different flavors. n{ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ D A T A ⎞ ⎠ ⎟ ⎟ ⎟ ⎟   p Large sample: Computer vision, advertising, etc. Large dimension: Biology, neuroscience, etc.
  • 6. LEARNING FROM LARGE SAMPLES Usual methods (gradient descent, BFGS, etc.): Pass through the data at each iteration. Prohibitive for large datasets. Back to simple methods: Stochastic gradient descent (Robbins and Monro, 1951).
  • 7. LEARNING FROM LARGE SAMPLES lighting example, n=100.000 In last 5 years, flurry of new stochastic methods: Stochastic variance- reduced gradient (SVRG) Stochastic Dual Coordinate Ascent (SDCA) Stochastic Average Gradient (SAG/SAGA) They are all in lightning!
  • 8. LEARNING FROM LARGE FEATURES Iterate through the columns. Coordinate Descent-like algorithms. Very efficient for sparse models. (Blondel et al. 2013) , multiclass classification with group-lasso penalty
  • 9. STRUCTURED SPARSITY There's so much more than the Lasso ... Group sparse penalty. Total variation. Trace norm (low rank).
  • 10. API Similarities and differences with scikit-learn scikit-learn: (penalty = 'l1', )LogisticRegression   loss function solver='liblinear'   algorithm lightning: (penalty = 'l1', ) CDClassifier   algorithm loss='log'   loss function API based on algorithms, not models.
  • 11. EXTENSIBILITY Typical loss and penalties available. Possible to pass custom loss or penalty function clf = FistaClassifier( loss=my_loss, penalty=my_penalty) (available for Fista*and SAGA*)
  • 12. FUTURE CHALLENGES Parallel stochastic methods (Leblond, Pedregosa, Lacoste-Julien 2016) Out of core (scale beyond computer memory).
  • 13. SCIKIT-LEARN-CONTRIB lightning is just the beginning. Welcome projects that are: Your browser does not support SVG scikit-learn compatible. Documented. Test coverage > 80%.
  • 14. THANKS FOR YOUR ATTENTION http://contrib.scikit-learn.org/lightning/ (We're hiring!)