SlideShare a Scribd company logo
1 of 8
Apache Spark MLlib
● What is Apache Spark ?
● What is MLlib ?
● Functionality
● Dependencies
● Books
● Eco-system
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark – What is it ?
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce
● Used with Hadoop / HDFS
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in Scala / Java / Python
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – What is it ?
● Spark Machine Learning Library
● Provided with Spark Install
● Code in Scala / Java / Python
● Contain libraries
– Spark.mllib
– Spark.ml ( V1.2 )
● Provides common functionality
– classification, regression, clustering
– collaborative filtering, dimensionality reduction
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Functionality
● Basic Stats
● Classification and regression
● Collaborative Filtering
● Clustering
● Dimensionality reduction
● Feature extraction and transformation
● Optimization
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Dependencies
● NumPy for Python
● Breeze ( linear algebra )
● Netlib-java
● Jblas
● Gfortran runtime library
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Available Books
● See our Hadoop book from Apress / Springer
– “Big Data Made Easy”
● Look out for our Apache Spark based book
– from Packt in 2015
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark Eco system
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems

More Related Content

Viewers also liked

Presentación final
Presentación finalPresentación final
Presentación finaldocentecis
 
8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravninaones123
 
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyDay 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyOgilvy Consulting
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814vereadoreduardo
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은jin_yoo
 
Profile Optimisation
Profile OptimisationProfile Optimisation
Profile OptimisationLinkedIn
 
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析Makoto SAKAI
 
8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry FebreyTerry Febrey
 
The sps code of conduct 2011
The sps code of conduct 2011The sps code of conduct 2011
The sps code of conduct 2011bambangsaja
 
Excel dad6 8
Excel dad6 8Excel dad6 8
Excel dad6 8daalt209
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSteven Kizior
 

Viewers also liked (13)

Entonar
EntonarEntonar
Entonar
 
Presentación final
Presentación finalPresentación final
Presentación final
 
8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina8 kl vostochno-evropeyskaya_ravnina
8 kl vostochno-evropeyskaya_ravnina
 
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvyDay 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
Day 4 Reflection at #SXSW 2013 -- #SXSWOgilvy
 
PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814PEDIDO DE PROVIDÊNCIA 814
PEDIDO DE PROVIDÊNCIA 814
 
8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은8ink 기획서V1 0 김수현,유지은
8ink 기획서V1 0 김수현,유지은
 
Profile Optimisation
Profile OptimisationProfile Optimisation
Profile Optimisation
 
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析効果的なXPの導入を目的とした プラクティス間の相互作用の分析
効果的なXPの導入を目的とした プラクティス間の相互作用の分析
 
8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey8 Truths About Exercising presented by Terry Febrey
8 Truths About Exercising presented by Terry Febrey
 
94 1006-1-pb
94 1006-1-pb94 1006-1-pb
94 1006-1-pb
 
The sps code of conduct 2011
The sps code of conduct 2011The sps code of conduct 2011
The sps code of conduct 2011
 
Excel dad6 8
Excel dad6 8Excel dad6 8
Excel dad6 8
 
Smokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral CancerSmokeless Tobacco and Oral Cancer
Smokeless Tobacco and Oral Cancer
 

More from Mike Frampton (20)

Apache Airavata
Apache AiravataApache Airavata
Apache Airavata
 
Apache MADlib AI/ML
Apache MADlib AI/MLApache MADlib AI/ML
Apache MADlib AI/ML
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
 
Apache Gobblin
Apache GobblinApache Gobblin
Apache Gobblin
 
Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
OrientDB
OrientDBOrientDB
OrientDB
 
Prometheus
PrometheusPrometheus
Prometheus
 
Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
JanusGraph DB
JanusGraph DBJanusGraph DB
JanusGraph DB
 
Apache Ignite
Apache IgniteApache Ignite
Apache Ignite
 
Apache Samza
Apache SamzaApache Samza
Apache Samza
 
Apache Flink
Apache FlinkApache Flink
Apache Flink
 
Apache Edgent
Apache EdgentApache Edgent
Apache Edgent
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache Mesos
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

An introduction to Apache Spark MLlib

  • 1. Apache Spark MLlib ● What is Apache Spark ? ● What is MLlib ? ● Functionality ● Dependencies ● Books ● Eco-system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Spark – What is it ? ● Alternative to Map Reduce for certain applications ● A low latency cluster computing system ● For very large data sets ● May be 100 times faster than Map Reduce ● Used with Hadoop / HDFS ● Uses in memory cluster computing ● Memory access faster than disk access ● Has API's written in Scala / Java / Python www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Spark MLlib – What is it ? ● Spark Machine Learning Library ● Provided with Spark Install ● Code in Scala / Java / Python ● Contain libraries – Spark.mllib – Spark.ml ( V1.2 ) ● Provides common functionality – classification, regression, clustering – collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Spark MLlib – Functionality ● Basic Stats ● Classification and regression ● Collaborative Filtering ● Clustering ● Dimensionality reduction ● Feature extraction and transformation ● Optimization www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Spark MLlib – Dependencies ● NumPy for Python ● Breeze ( linear algebra ) ● Netlib-java ● Jblas ● Gfortran runtime library www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Available Books ● See our Hadoop book from Apress / Springer – “Big Data Made Easy” ● Look out for our Apache Spark based book – from Packt in 2015 www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Spark Eco system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems