SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
A Programming
                     Language/Toolkit Tour
                            Rory Winston




Monday, 27 February 2012
Agenda
                   • A quick overview and tour of:
                    •R
                    • Python
                    • Java/C++
                   • For data analysis/analytics applications
                   • Comparison
Monday, 27 February 2012
Purpose

                   • To give a feeling for the relative advantages
                           and disadvantages of each approach
                   • Understand the tradeoffs involved
                   • See some demos


Monday, 27 February 2012
R
                   •       R is a domain-specific-language (DSL) for statistics
                           and data analysis
                   •       Functional-based language
                   •       Based on an earlier language called S
                   •       Core engine written in C
                   •       Open-source
                   •       Popularity has exploded in the last few years
                   •       Some commercial support


Monday, 27 February 2012
Pros
                   •       R is the de facto standard in statistical analysis tooling
                   •       Incredible range of functionality via contributed libraries
                   •       Powerful interactive analysis environment and
                           visualization tools
                   •       Large number of built-in datasets
                   •       Cross-platform
                   •       Broad user community
                   •       Wide range of resources (books, tutorials, papers)
                           available



Monday, 27 February 2012
Cons
                   • Performance limitations
                   • Single-threaded interpreter
                   • Language limitations and quirks
                   • Initial learning curve may be steep
                   • R gives you a lot of power, but assumes
                           you know how to use it!


Monday, 27 February 2012
Language Features
                   •       R is vectorized:
                           •   Loops are not required for many operations
                               (and are actually discouraged)
                   •       R is functional:
                           •   Functions can be passed around like other
                               variables
                   •       R integrates with a BLAS:
                           •   high-performance numerical operations


Monday, 27 February 2012
Demo

                   • Console R
                   • R GUI
                   • RStudio


Monday, 27 February 2012
Tips

                   • Learn how to use ggplot2 (http://had.co.nz/
                           ggplot2/)
                   • Consider using RStudio (http://
                           www.rstudio.org)




Monday, 27 February 2012
Python

                   • Initially developed in the late 1980s
                   • Object-oriented / functional support
                   • Open-source
                   • Initially popular in web applications, now
                           popular across a number of domains



Monday, 27 February 2012
Pros
                   • Very readable, simple and clear syntax
                   • Well-supported (many libraries and
                           extensions)
                   • Easy to integrate with other languages (e.g.
                           C)
                   • Very efficient environment to develop in

Monday, 27 February 2012
Cons
                   • Language syntax is not universally popular
                   • In terms of analytics, many libraries are still
                           slightly immature
                   • Performance can be lacking (although there
                           are many options to tune it)
                   • Interpreter is effectively single-threaded

Monday, 27 February 2012
Python + Analytics
                   •       There are a number of excellent libraries available
                           for analytics applications:
                           •   NumPy + SciPy
                           •   matplotlib
                           •   pandas
                           •   scikits
                   •       Some packages (e.g. pandas) are designed to replicate
                           the ‘feel’ and functionality of analysis operations in R


Monday, 27 February 2012
NumPy + SciPy

                   • Using NumPy + SciPy + matplotlib provides
                           an experience similar to using an
                           interactive R/Matlab environment
                   • Supports vectorization and BLAS
                           integration
                   • Add ipython for more goodness

Monday, 27 February 2012
Tips

                   • Use ipython!
                   • Check out:
                    • http://pandas.pydata.org/
                    • http://statsmodels.sourceforge.net/
                    • http://scikit-learn.org

Monday, 27 February 2012
Comparisons
                       x <- 1:10
                                                x = arange(1,11)
                       x <- seq(1, 2, .2)
                                                x = arange(1,2,.2)
                       x <- seq(1,2,
                                                x = linspace(1,2,15)
                       length.out=15)
                                                M <- arange
                       M <- matrix(1:100, 10,
                                                (1,101).reshape(10,10)
                       10)
                                                x[x < 1.5]
                       x[ x < 1.5 ]
                                                X = colstack((a,b))
                       X <- cbind(a,b)




Monday, 27 February 2012
Java/C++

                   • The ultimate in power/flexibility
                   • Also the ultimate in development time and
                           effort
                   • Lets just look at C++ briefly


Monday, 27 February 2012
C++
                   •       Old but still very popular
                   •       Just had a revamp (C++11, was C++0x)
                   •       Mostly competes with Java on the server side
                   •       Everything else (JVM, R, Python) is written in C/C
                           ++
                   •       Both R and Python provide easy ways to interface
                           with C/C++ code
                           •   This is used a lot


Monday, 27 February 2012
Pros

                   • Flexibility
                   • Lots of libraries available
                   • Control of resources for performance-
                           critical apps (e.g. memory)
                   • C++11 adds a lot of nice stuff (finally)

Monday, 27 February 2012
Cons
                   •       Lots of effort
                   •       Lots of hidden traps for the unwary
                   •       Initial experience may be a large productivity
                           hit
                   •       Effort in porting between systems
                   •       There is “modern” C++ (which is actually
                           pretty nice) and everything else (which isn’t so
                           nice)


Monday, 27 February 2012
Examples
                   •       Lets look at a sample library
                   •       This one is called Armadillo (http://
                           arma.sourceforge.net/)
                   •       Developed in Australia (NICTA / Univ.
                           Queensland)
                   •       Contains functions for numerical applications
                           and some statistical functions
                   •       Modern, efficient use of C++


Monday, 27 February 2012
Armadillo

                   • Armadillo supports vectorized operations
                   • Also integrates with a BLAS
                   • Example (see console)


Monday, 27 February 2012
Simple Example

                   • Using the Box-Jenkins airline passenger
                           data
                   • Classic dataset
                   • 12 years of monthly airline passenger
                           observations (144 in all)



Monday, 27 February 2012
Passenger Dataset




Monday, 27 February 2012
Linear Model
                • We will use a simple linear model (explains
                           85% of the variance of this data)


                                        Ax = b
                                            
                                           1        t1
                                         1         t2 
                                       A=
                                         1
                                                        
                                                    t3 
                                          ...       ...


Monday, 27 February 2012
Conclusion
                   • Use the toolkit that’s most appropriate for
                           you
                           • Common approches are to use e.g. R for
                             prototyping and model selection and (if
                             required) switch to a higher-performance
                             implementation for production
                   • If you have time, learn all of them!
Monday, 27 February 2012
Language Map
                 Dynamic Typing                                        Static Typing
                                        Interactivity




                              R
                                           Python               Java
                            Octave
                                            Ruby               C/C++




                                     Performance, complexity




Monday, 27 February 2012
Resources




Monday, 27 February 2012

Contenu connexe

En vedette

Marketing on line
Marketing on lineMarketing on line
Marketing on lineSteffyKISS
 
[+57] Creative Colombia (Disseration)
[+57] Creative Colombia (Disseration)[+57] Creative Colombia (Disseration)
[+57] Creative Colombia (Disseration)fegome1
 
Androidアプリの特徴をちょっと紹介
Androidアプリの特徴をちょっと紹介Androidアプリの特徴をちょっと紹介
Androidアプリの特徴をちょっと紹介Masaki Watanabe
 
Sổ tay sinh học_Đại học Hoa Sen
Sổ tay sinh học_Đại học Hoa SenSổ tay sinh học_Đại học Hoa Sen
Sổ tay sinh học_Đại học Hoa SenHoa Sen University
 
Sekilas info seleb week 1210
Sekilas info seleb week 1210Sekilas info seleb week 1210
Sekilas info seleb week 1210yusdiwibowo
 
How can Warrants Help you Close a Deal?
How can Warrants Help you Close a Deal?How can Warrants Help you Close a Deal?
How can Warrants Help you Close a Deal?Trevor Crow
 
Presentation columbia union
Presentation columbia unionPresentation columbia union
Presentation columbia unionRoger Hernandez
 
Equine Emergencies Part 2
Equine  Emergencies Part 2Equine  Emergencies Part 2
Equine Emergencies Part 2Ernie Martinez
 
Chúc tết cl2
Chúc tết cl2Chúc tết cl2
Chúc tết cl2Tuyet Tran
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3aimeehopson
 
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014Cynthia Sharp
 
Digital scavenger hunt
Digital scavenger huntDigital scavenger hunt
Digital scavenger huntmcardenasp7
 
JIMS Media Coverage
JIMS Media CoverageJIMS Media Coverage
JIMS Media CoverageJims Rohini
 
Our Prosperity Plan
Our Prosperity PlanOur Prosperity Plan
Our Prosperity Plantnjmovers
 
Millennials Challenges
Millennials ChallengesMillennials Challenges
Millennials ChallengesJodi Okun
 
Moon survival by Win Azurias
Moon survival by Win AzuriasMoon survival by Win Azurias
Moon survival by Win Azuriasrzurias
 

En vedette (17)

Marketing on line
Marketing on lineMarketing on line
Marketing on line
 
[+57] Creative Colombia (Disseration)
[+57] Creative Colombia (Disseration)[+57] Creative Colombia (Disseration)
[+57] Creative Colombia (Disseration)
 
Androidアプリの特徴をちょっと紹介
Androidアプリの特徴をちょっと紹介Androidアプリの特徴をちょっと紹介
Androidアプリの特徴をちょっと紹介
 
Sổ tay sinh học_Đại học Hoa Sen
Sổ tay sinh học_Đại học Hoa SenSổ tay sinh học_Đại học Hoa Sen
Sổ tay sinh học_Đại học Hoa Sen
 
Sekilas info seleb week 1210
Sekilas info seleb week 1210Sekilas info seleb week 1210
Sekilas info seleb week 1210
 
How can Warrants Help you Close a Deal?
How can Warrants Help you Close a Deal?How can Warrants Help you Close a Deal?
How can Warrants Help you Close a Deal?
 
Presentation columbia union
Presentation columbia unionPresentation columbia union
Presentation columbia union
 
Equine Emergencies Part 2
Equine  Emergencies Part 2Equine  Emergencies Part 2
Equine Emergencies Part 2
 
Chúc tết cl2
Chúc tết cl2Chúc tết cl2
Chúc tết cl2
 
Evaluation question 3
Evaluation question 3Evaluation question 3
Evaluation question 3
 
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014
STRATEGIES FOR TAKING CHARGE OF YOUR LAW PRACTICE - ICLEF 2014
 
Digital scavenger hunt
Digital scavenger huntDigital scavenger hunt
Digital scavenger hunt
 
JIMS Media Coverage
JIMS Media CoverageJIMS Media Coverage
JIMS Media Coverage
 
Our Prosperity Plan
Our Prosperity PlanOur Prosperity Plan
Our Prosperity Plan
 
Que es el_amor
Que es el_amorQue es el_amor
Que es el_amor
 
Millennials Challenges
Millennials ChallengesMillennials Challenges
Millennials Challenges
 
Moon survival by Win Azurias
Moon survival by Win AzuriasMoon survival by Win Azurias
Moon survival by Win Azurias
 

Similaire à An Analytics Toolkit Tour

Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigTapan Avasthi
 
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...Radovan Semancik
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundKarel Zikmund
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface applicationKaran Harsh Wardhan
 
Stepping into Usable Web
Stepping into Usable WebStepping into Usable Web
Stepping into Usable WebShajed Evan
 
Mansoura University CSED & Nozom web development sprint
Mansoura University CSED & Nozom web development sprintMansoura University CSED & Nozom web development sprint
Mansoura University CSED & Nozom web development sprintAl Sayed Gamal
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Devconf 2011 - PHP - How Yii framework is developed
Devconf 2011 - PHP - How Yii framework is developedDevconf 2011 - PHP - How Yii framework is developed
Devconf 2011 - PHP - How Yii framework is developedAlexander Makarov
 
OpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot CampOpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot CampAnne Gentle
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael CollinsDevopsdays
 
Building & Scaling a Front End Practice & Team
Building & Scaling a Front End Practice & TeamBuilding & Scaling a Front End Practice & Team
Building & Scaling a Front End Practice & TeamMonika Piotrowicz
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
DSpace RoadMap 2011
DSpace RoadMap 2011DSpace RoadMap 2011
DSpace RoadMap 2011Tim Donohue
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentCarlos Solís
 
The Ideas of Clojure - Things I learn from Clojure
The Ideas of Clojure - Things I learn from ClojureThe Ideas of Clojure - Things I learn from Clojure
The Ideas of Clojure - Things I learn from ClojureHsuan Fu Lien
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Gozhubert
 
Zend_Tool In ZF 1.8 Webinar
Zend_Tool In ZF 1.8 WebinarZend_Tool In ZF 1.8 Webinar
Zend_Tool In ZF 1.8 WebinarRalph Schindler
 
Ruby And Ruby On Rails
Ruby And Ruby On RailsRuby And Ruby On Rails
Ruby And Ruby On RailsAkNirojan
 

Similaire à An Analytics Toolkit Tour (20)

Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...
How To Maintain Million Lines Of Open Source Code And Remain Sane or The Stor...
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface application
 
Stepping into Usable Web
Stepping into Usable WebStepping into Usable Web
Stepping into Usable Web
 
Mansoura University CSED & Nozom web development sprint
Mansoura University CSED & Nozom web development sprintMansoura University CSED & Nozom web development sprint
Mansoura University CSED & Nozom web development sprint
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Devconf 2011 - PHP - How Yii framework is developed
Devconf 2011 - PHP - How Yii framework is developedDevconf 2011 - PHP - How Yii framework is developed
Devconf 2011 - PHP - How Yii framework is developed
 
OpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot CampOpenStack Doc Overview for Boot Camp
OpenStack Doc Overview for Boot Camp
 
Dev ops lessons learned - Michael Collins
Dev ops lessons learned  - Michael CollinsDev ops lessons learned  - Michael Collins
Dev ops lessons learned - Michael Collins
 
Building & Scaling a Front End Practice & Team
Building & Scaling a Front End Practice & TeamBuilding & Scaling a Front End Practice & Team
Building & Scaling a Front End Practice & Team
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
DSpace RoadMap 2011
DSpace RoadMap 2011DSpace RoadMap 2011
DSpace RoadMap 2011
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven Development
 
The Ideas of Clojure - Things I learn from Clojure
The Ideas of Clojure - Things I learn from ClojureThe Ideas of Clojure - Things I learn from Clojure
The Ideas of Clojure - Things I learn from Clojure
 
New DevOps for the DBA
New DevOps for the DBANew DevOps for the DBA
New DevOps for the DBA
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Zend_Tool In ZF 1.8 Webinar
Zend_Tool In ZF 1.8 WebinarZend_Tool In ZF 1.8 Webinar
Zend_Tool In ZF 1.8 Webinar
 
Ruby And Ruby On Rails
Ruby And Ruby On RailsRuby And Ruby On Rails
Ruby And Ruby On Rails
 

Plus de Rory Winston

Building A Trading Desk On Analytics
Building A Trading Desk On AnalyticsBuilding A Trading Desk On Analytics
Building A Trading Desk On AnalyticsRory Winston
 
The Modern FX Desk
The Modern FX DeskThe Modern FX Desk
The Modern FX DeskRory Winston
 
KDB+/R Integration
KDB+/R IntegrationKDB+/R Integration
KDB+/R IntegrationRory Winston
 
Introduction to kdb+
Introduction to kdb+Introduction to kdb+
Introduction to kdb+Rory Winston
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R PackagesRory Winston
 
Streaming Data and Concurrency in R
Streaming Data and Concurrency in RStreaming Data and Concurrency in R
Streaming Data and Concurrency in RRory Winston
 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in RRory Winston
 
Real-TIme Market Data in R
Real-TIme Market Data in RReal-TIme Market Data in R
Real-TIme Market Data in RRory Winston
 

Plus de Rory Winston (8)

Building A Trading Desk On Analytics
Building A Trading Desk On AnalyticsBuilding A Trading Desk On Analytics
Building A Trading Desk On Analytics
 
The Modern FX Desk
The Modern FX DeskThe Modern FX Desk
The Modern FX Desk
 
KDB+/R Integration
KDB+/R IntegrationKDB+/R Integration
KDB+/R Integration
 
Introduction to kdb+
Introduction to kdb+Introduction to kdb+
Introduction to kdb+
 
Creating R Packages
Creating R PackagesCreating R Packages
Creating R Packages
 
Streaming Data and Concurrency in R
Streaming Data and Concurrency in RStreaming Data and Concurrency in R
Streaming Data and Concurrency in R
 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in R
 
Real-TIme Market Data in R
Real-TIme Market Data in RReal-TIme Market Data in R
Real-TIme Market Data in R
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

An Analytics Toolkit Tour

  • 1. A Programming Language/Toolkit Tour Rory Winston Monday, 27 February 2012
  • 2. Agenda • A quick overview and tour of: •R • Python • Java/C++ • For data analysis/analytics applications • Comparison Monday, 27 February 2012
  • 3. Purpose • To give a feeling for the relative advantages and disadvantages of each approach • Understand the tradeoffs involved • See some demos Monday, 27 February 2012
  • 4. R • R is a domain-specific-language (DSL) for statistics and data analysis • Functional-based language • Based on an earlier language called S • Core engine written in C • Open-source • Popularity has exploded in the last few years • Some commercial support Monday, 27 February 2012
  • 5. Pros • R is the de facto standard in statistical analysis tooling • Incredible range of functionality via contributed libraries • Powerful interactive analysis environment and visualization tools • Large number of built-in datasets • Cross-platform • Broad user community • Wide range of resources (books, tutorials, papers) available Monday, 27 February 2012
  • 6. Cons • Performance limitations • Single-threaded interpreter • Language limitations and quirks • Initial learning curve may be steep • R gives you a lot of power, but assumes you know how to use it! Monday, 27 February 2012
  • 7. Language Features • R is vectorized: • Loops are not required for many operations (and are actually discouraged) • R is functional: • Functions can be passed around like other variables • R integrates with a BLAS: • high-performance numerical operations Monday, 27 February 2012
  • 8. Demo • Console R • R GUI • RStudio Monday, 27 February 2012
  • 9. Tips • Learn how to use ggplot2 (http://had.co.nz/ ggplot2/) • Consider using RStudio (http:// www.rstudio.org) Monday, 27 February 2012
  • 10. Python • Initially developed in the late 1980s • Object-oriented / functional support • Open-source • Initially popular in web applications, now popular across a number of domains Monday, 27 February 2012
  • 11. Pros • Very readable, simple and clear syntax • Well-supported (many libraries and extensions) • Easy to integrate with other languages (e.g. C) • Very efficient environment to develop in Monday, 27 February 2012
  • 12. Cons • Language syntax is not universally popular • In terms of analytics, many libraries are still slightly immature • Performance can be lacking (although there are many options to tune it) • Interpreter is effectively single-threaded Monday, 27 February 2012
  • 13. Python + Analytics • There are a number of excellent libraries available for analytics applications: • NumPy + SciPy • matplotlib • pandas • scikits • Some packages (e.g. pandas) are designed to replicate the ‘feel’ and functionality of analysis operations in R Monday, 27 February 2012
  • 14. NumPy + SciPy • Using NumPy + SciPy + matplotlib provides an experience similar to using an interactive R/Matlab environment • Supports vectorization and BLAS integration • Add ipython for more goodness Monday, 27 February 2012
  • 15. Tips • Use ipython! • Check out: • http://pandas.pydata.org/ • http://statsmodels.sourceforge.net/ • http://scikit-learn.org Monday, 27 February 2012
  • 16. Comparisons x <- 1:10 x = arange(1,11) x <- seq(1, 2, .2) x = arange(1,2,.2) x <- seq(1,2, x = linspace(1,2,15) length.out=15) M <- arange M <- matrix(1:100, 10, (1,101).reshape(10,10) 10) x[x < 1.5] x[ x < 1.5 ] X = colstack((a,b)) X <- cbind(a,b) Monday, 27 February 2012
  • 17. Java/C++ • The ultimate in power/flexibility • Also the ultimate in development time and effort • Lets just look at C++ briefly Monday, 27 February 2012
  • 18. C++ • Old but still very popular • Just had a revamp (C++11, was C++0x) • Mostly competes with Java on the server side • Everything else (JVM, R, Python) is written in C/C ++ • Both R and Python provide easy ways to interface with C/C++ code • This is used a lot Monday, 27 February 2012
  • 19. Pros • Flexibility • Lots of libraries available • Control of resources for performance- critical apps (e.g. memory) • C++11 adds a lot of nice stuff (finally) Monday, 27 February 2012
  • 20. Cons • Lots of effort • Lots of hidden traps for the unwary • Initial experience may be a large productivity hit • Effort in porting between systems • There is “modern” C++ (which is actually pretty nice) and everything else (which isn’t so nice) Monday, 27 February 2012
  • 21. Examples • Lets look at a sample library • This one is called Armadillo (http:// arma.sourceforge.net/) • Developed in Australia (NICTA / Univ. Queensland) • Contains functions for numerical applications and some statistical functions • Modern, efficient use of C++ Monday, 27 February 2012
  • 22. Armadillo • Armadillo supports vectorized operations • Also integrates with a BLAS • Example (see console) Monday, 27 February 2012
  • 23. Simple Example • Using the Box-Jenkins airline passenger data • Classic dataset • 12 years of monthly airline passenger observations (144 in all) Monday, 27 February 2012
  • 25. Linear Model • We will use a simple linear model (explains 85% of the variance of this data) Ax = b   1 t1 1 t2  A= 1  t3  ... ... Monday, 27 February 2012
  • 26. Conclusion • Use the toolkit that’s most appropriate for you • Common approches are to use e.g. R for prototyping and model selection and (if required) switch to a higher-performance implementation for production • If you have time, learn all of them! Monday, 27 February 2012
  • 27. Language Map Dynamic Typing Static Typing Interactivity R Python Java Octave Ruby C/C++ Performance, complexity Monday, 27 February 2012