SlideShare une entreprise Scribd logo
1  sur  5
Major League Soccer Analytics
                                 with Python
                                   Chris Armstrong, Dan Derringer, Jude Ken-Kwofie,
                                     Hemanth Mahadevaiah and Sujana Veeraganti
                                                Stevens Institute of Technology



Abstract– Unlike European soccer leagues and popular
American sports, relatively little work as been done on          The next issue faced was how to merge the tables. The first
Major League Soccer (MLS) player and team                        idea was to use a for loop in Python to match the players’
performance analytics.With MLS growing in popularity             names and produce a master table with all their all-time stats
combined with the small community of individuals                 and salaries from 2012. Although it was successful, it was
conducting MLS analytics, we decided to apply web                less than ideal as it would take over 45 minutes to merge
analytics concepts taught in Business Intelligence &             these five tables. The next idea was to write a script in R to
Analytics class (BIA 660) to help determine player               merge the tables; since R is designed to be a statistical tool
ratings and compensation. To this end we used the                and can better manipulate tables. This plan successfully
Python programming language and related modules to:              reduced the processing time down to less than a minute and
1)crawl the web, 2) scrape relevant data, 3) compile             we added the ability for Python to run the R script
captured data into a data set, 4) determine player ratings       automatically after the data scraping was complete.
and simple statistics, and 5) create attractive plots            However, this wasn’t as clean as we would like it to be. The
showcasing the data relationships.                               final solution was to use the Pandas module for Python. The
                                                                 Pandas module gave us the ability to manipulate data the
Index Terms–Major League Soccer, Python, Visualization,          way we need it, without having to go outside of Python.
Web Scraping.
                                                                 The key Python scripts used in our work are as follows:
                       PROJECT GOAL
The primary goal of the project was to use BIA 660 web                    MLS_Statistical_Application.py – Includes a full
analytics lessons on the Python programming language and                  scraping function plusan interactive plotting feature
related modules to analyze and visualize MLS specific                     developed in Tkinter. The Tkinter function imports
data.The following Python modules were used in this work:                 a comma-separated value (csv) and allows the user
                                                                          to plot results by selecting column names as the x
         Web – mechanize, urllib2, BeautifulSoup, PyPDF2                  and y-axis.
         Regular Expression – re
                                                                                      DATA ANALYSIS
         System & I/O - sys, StringIO, csv, print, json
         Data Analysis - R, Pandas, Numpy, Scipy                 Initially, our analysis focused on determining 1) the best XI
         Data Visualization – Tkinter, Matplotlib                MLS players of all time and 2) if a reasonable correlation
                                                                 exists between player compensation and performance, i.e.,
The following sections describe our python data scraping,        goals, assists, and shots. However, due to the lack of
compilation, and analysis and visualization efforts.             publically available player passing efficiency data we found
                                                                 it challenging to build relationships between salary and
              DATA SCRAPING& COMPILATION
                                                                 performance and to determine the best players. Ultimately,
The Python script has gone through severaliterations. The        we decided to analyze player compensation versus player
original plan was to extract four tables of players’ all-time    goals, assists, shot as well as to simply calculate statistics
stats and six pdf files with salary data for players in 2007-    based on player minutes, goal, assists, shots, shots on goal,
2012. The idea was to merge these ten lists to create one        game winning goals and game winning assists. From a data
master list; however, not all players in the all-time stats      set of 251 MLS players we determined for the year 2012:
tables collected a salary in 2012 and not all of those that
collected salaries in 2012 also collected a salary in 2007.           The average MLS player earns $200,262.58.
This issue drastically reduced the number of records to               The lowest paid player, Jeb Brovsky earns $33,750.
analyze in the master list. Therefore, it was decided to only         The highest paid player, Thierry Henry earns
the salary data from 2012 would be used.                              $5,000,000.
                                                                                      November 13, 2012, Hoboken, NJ
                                        Major League Soccer Analytics with Python
                                                           1
The above statistics shows the average salary, median,
Out of the 251 players, 55.77% of the players make salaries      lowest salary and highest salary by position. Also included
greater than or equal to $100,000. Additional statistics are     in the table are the top five players with highest salary
presented below.                                                 among each position. As anticipated, the forwards are paid a
                                                                 higher salary of the four positions. Goalkeepers are the
We also found with the data on hand that in the MLS there is     lowest wage earners on average.
little to no correlation between player’s salaries and goals,
assists and shots (shown in Figure 1). Player compensation             FIGURE 1 GOALS AND ASSISTS VERSUS SALARY
seems to be based on their popularity than their ability to
score goals, assists and shots. There is a solid relationship
between players Google search hit rate and salary.

The lack of correlation between salary and performance is an
interesting result since in other leagues the highest paid
players are usually the best at scoring and assisting. As
mentioned earlier, an adequate data set on player passing
may provide better insights and results between salary and
performance.


                 DATA VISUALIZATION

The visual representation of the statistics was generated with
R, Matplotlib and Pandas. Scatter plots and histograms were
developed to show:

          Player compensation versus player goals, assists
          and shots (scatter plots)

          Player minutes, goal, assists, shots, shots on goal,
          game winning goals and game winning assists
          (histrograms)

The following section presents a few of the generated
visuals.
             FIGURES, TABLES AND EQUATIONS
                TABLE 1 - PLAYER PAY BY POSITION




Results


                                                                                          November 13, 2012, Hoboken, NJ
                                         Major League Soccer Analytics with Python
                                                            2
FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS                             Owners can get the similar goal/assist production
                  VERSUS SALARY
                                                                             from someone making < $200K as with someone
                                                                             making >$400K to $1.2M. This tends to suggest
                                                                             that higher paid players have the same impact on
                                                                             goals or assists as a low wager, which is
                                                                             interesting.
                                                                             Data shows that that the players have similar skill
                                                                             sets. It takes special players to score goals or give
                                                                             assists.



                                                                   FIGURE 2 - 3D PLOT OF FORWARDS GOALS, ASSISTS AND
                                                                   MINUTES




                                                                   Figure 2 shows a 3D rendering of player assists, minutes and
                                                                   game winning assists. In general, the plots sh
                                                                   ows little correlation between the fields. However, for
                                                                   defenders there is a strong correlation between the fields
                                                                   suggesting assists by defenders lead to wins.

                                                                   FIGURE 3 - HISTROGRAMS OF PLAYER MINUTES, GOALS,
                                                                   ASSISTS, SHOTS, SHORTS ON GOAL, GAME WINNING GOALS,
                                                                   GAME WINNING ASSISTS AND SALARY




                                                                   Results
Results
                                                                             The plot shows exploratory data analysis of the
          There is little correlation between a goals or assists
                                                                             various attributes like Minutes, Goals, Shots,
          and a high salary.
                                                                             Assists, and Shots on Goals, Game Winning Goals,

                                                                                             November 13, 2012, Hoboken, NJ
                                         Major League Soccer Analytics with Python
                                                            3
Game Winning Assists and Salary to summarize the
          main characteristics in easy-to-understand form.


                          CONCLUSION
Unlike European soccer leagues and popular American
sports, relatively little work as been done on Major League
Soccer (MLS) player and team performance analytics. With
MLS growing in popularity combined with the small
community of individuals conducting MLS analytics, we
decided to apply web analytics concepts taught in Business
Intelligence & Analytics class (BIA 660) to help determine
player ratings and compensation.

The primary goal of the project was to use BIA 660 web
analytics lessons on the Python programming language and
related modules to analyze and visualize MLS specific data.


                         ACKNOWLEDGMENT
We acknowledge the mentoring of Professor Winter Mason.


                            REFERENCES

          PYTHON PROGRAMMING LANGUAGE –
          HTTP://WWW.PYTHON.ORG/

          HTTP://WIKI.PYTHON.ORG/MOIN/TKINTER



                     1
                      AUTHOR INFORMATION
          Chris Armstrong,chris.r.armstrong@gmail.com
          Dan Derringer, dderringer311@gmail.com
          Jude Ken-Kwofie, jkenkwof@stevens.edu
          Hemanth Mahadevaiah,hemanth.m1@gmail.com
          Sujana Veeraganti, sujanaveeraganti@gmail.com




1
 Stevens Institute of Technology Business Intelligence & Analytics
Graduate Students
                                                                                           November 13, 2012, Hoboken, NJ
                                               Major League Soccer Analytics with Python
                                                                  4
November 13, 2012, Hoboken, NJ
Major League Soccer Analytics with Python
                   5

Contenu connexe

Similaire à Major League Soccer Player Analysis-Report

The Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfThe Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfDataSportsGroup
 
Federated Ontology for Sports- Paper
Federated Ontology for Sports- PaperFederated Ontology for Sports- Paper
Federated Ontology for Sports- PaperGeorge Sam
 
IPL auction q1_q2.docx
IPL auction q1_q2.docxIPL auction q1_q2.docx
IPL auction q1_q2.docxAlivaMishra4
 
Bank Shots to Bankroll Final
Bank Shots to Bankroll FinalBank Shots to Bankroll Final
Bank Shots to Bankroll FinalJoseph DeLay
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docxjeremylockett77
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB PlayersRobert-Ian Greene
 
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBA
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBAVISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBA
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBAcaijjournal
 
Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Christopher Conlan
 
B04124012020
B04124012020B04124012020
B04124012020IOSR-JEN
 
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Shrikant Mandlik
 
The Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballThe Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballDataSportsGroup
 
Game Behavioral Analytics
Game Behavioral AnalyticsGame Behavioral Analytics
Game Behavioral Analyticsmdk8989
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning PredictionIRJET Journal
 
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxChapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxtidwellveronique
 
Discovering The Best Free Football Scouting Software
Discovering The Best Free Football Scouting SoftwareDiscovering The Best Free Football Scouting Software
Discovering The Best Free Football Scouting Software360 Scouting
 
Sports and Big data
Sports and Big dataSports and Big data
Sports and Big dataDeZyre
 

Similaire à Major League Soccer Player Analysis-Report (20)

The Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdfThe Evolution and Power of Football Data Feeds.pdf
The Evolution and Power of Football Data Feeds.pdf
 
Federated Ontology for Sports- Paper
Federated Ontology for Sports- PaperFederated Ontology for Sports- Paper
Federated Ontology for Sports- Paper
 
IPL auction q1_q2.docx
IPL auction q1_q2.docxIPL auction q1_q2.docx
IPL auction q1_q2.docx
 
Cs229 final report
Cs229 final reportCs229 final report
Cs229 final report
 
Bank Shots to Bankroll Final
Bank Shots to Bankroll FinalBank Shots to Bankroll Final
Bank Shots to Bankroll Final
 
1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx1. After watching the attached video by Dan Pink on .docx
1. After watching the attached video by Dan Pink on .docx
 
Predicting Salary for MLB Players
Predicting Salary for MLB PlayersPredicting Salary for MLB Players
Predicting Salary for MLB Players
 
MoneyBall
MoneyBallMoneyBall
MoneyBall
 
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBA
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBAVISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBA
VISUALIZING THE IMPACT OF HOME ADVANTAGE IN NATIONAL BASKETBALL ASSOCIATION-NBA
 
Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019Beyond Moneyball: Data Science for Baseball in 2019
Beyond Moneyball: Data Science for Baseball in 2019
 
B04124012020
B04124012020B04124012020
B04124012020
 
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
Sports Analytics: Market Shares, Strategy, and Forecasts, Worldwide, 2015 to ...
 
honors_paper
honors_paperhonors_paper
honors_paper
 
The Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern FootballThe Essential Role of Data Feeds in Modern Football
The Essential Role of Data Feeds in Modern Football
 
Game Behavioral Analytics
Game Behavioral AnalyticsGame Behavioral Analytics
Game Behavioral Analytics
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
 
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docxChapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
Chapter 1 Information Systems in Global Business TodayNBA TEAMS .docx
 
Discovering The Best Free Football Scouting Software
Discovering The Best Free Football Scouting SoftwareDiscovering The Best Free Football Scouting Software
Discovering The Best Free Football Scouting Software
 
Research Paper
Research PaperResearch Paper
Research Paper
 
Sports and Big data
Sports and Big dataSports and Big data
Sports and Big data
 

Major League Soccer Player Analysis-Report

  • 1. Major League Soccer Analytics with Python Chris Armstrong, Dan Derringer, Jude Ken-Kwofie, Hemanth Mahadevaiah and Sujana Veeraganti Stevens Institute of Technology Abstract– Unlike European soccer leagues and popular American sports, relatively little work as been done on The next issue faced was how to merge the tables. The first Major League Soccer (MLS) player and team idea was to use a for loop in Python to match the players’ performance analytics.With MLS growing in popularity names and produce a master table with all their all-time stats combined with the small community of individuals and salaries from 2012. Although it was successful, it was conducting MLS analytics, we decided to apply web less than ideal as it would take over 45 minutes to merge analytics concepts taught in Business Intelligence & these five tables. The next idea was to write a script in R to Analytics class (BIA 660) to help determine player merge the tables; since R is designed to be a statistical tool ratings and compensation. To this end we used the and can better manipulate tables. This plan successfully Python programming language and related modules to: reduced the processing time down to less than a minute and 1)crawl the web, 2) scrape relevant data, 3) compile we added the ability for Python to run the R script captured data into a data set, 4) determine player ratings automatically after the data scraping was complete. and simple statistics, and 5) create attractive plots However, this wasn’t as clean as we would like it to be. The showcasing the data relationships. final solution was to use the Pandas module for Python. The Pandas module gave us the ability to manipulate data the Index Terms–Major League Soccer, Python, Visualization, way we need it, without having to go outside of Python. Web Scraping. The key Python scripts used in our work are as follows: PROJECT GOAL The primary goal of the project was to use BIA 660 web MLS_Statistical_Application.py – Includes a full analytics lessons on the Python programming language and scraping function plusan interactive plotting feature related modules to analyze and visualize MLS specific developed in Tkinter. The Tkinter function imports data.The following Python modules were used in this work: a comma-separated value (csv) and allows the user to plot results by selecting column names as the x Web – mechanize, urllib2, BeautifulSoup, PyPDF2 and y-axis. Regular Expression – re DATA ANALYSIS System & I/O - sys, StringIO, csv, print, json Data Analysis - R, Pandas, Numpy, Scipy Initially, our analysis focused on determining 1) the best XI Data Visualization – Tkinter, Matplotlib MLS players of all time and 2) if a reasonable correlation exists between player compensation and performance, i.e., The following sections describe our python data scraping, goals, assists, and shots. However, due to the lack of compilation, and analysis and visualization efforts. publically available player passing efficiency data we found it challenging to build relationships between salary and DATA SCRAPING& COMPILATION performance and to determine the best players. Ultimately, The Python script has gone through severaliterations. The we decided to analyze player compensation versus player original plan was to extract four tables of players’ all-time goals, assists, shot as well as to simply calculate statistics stats and six pdf files with salary data for players in 2007- based on player minutes, goal, assists, shots, shots on goal, 2012. The idea was to merge these ten lists to create one game winning goals and game winning assists. From a data master list; however, not all players in the all-time stats set of 251 MLS players we determined for the year 2012: tables collected a salary in 2012 and not all of those that collected salaries in 2012 also collected a salary in 2007. The average MLS player earns $200,262.58. This issue drastically reduced the number of records to The lowest paid player, Jeb Brovsky earns $33,750. analyze in the master list. Therefore, it was decided to only The highest paid player, Thierry Henry earns the salary data from 2012 would be used. $5,000,000. November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 1
  • 2. The above statistics shows the average salary, median, Out of the 251 players, 55.77% of the players make salaries lowest salary and highest salary by position. Also included greater than or equal to $100,000. Additional statistics are in the table are the top five players with highest salary presented below. among each position. As anticipated, the forwards are paid a higher salary of the four positions. Goalkeepers are the We also found with the data on hand that in the MLS there is lowest wage earners on average. little to no correlation between player’s salaries and goals, assists and shots (shown in Figure 1). Player compensation FIGURE 1 GOALS AND ASSISTS VERSUS SALARY seems to be based on their popularity than their ability to score goals, assists and shots. There is a solid relationship between players Google search hit rate and salary. The lack of correlation between salary and performance is an interesting result since in other leagues the highest paid players are usually the best at scoring and assisting. As mentioned earlier, an adequate data set on player passing may provide better insights and results between salary and performance. DATA VISUALIZATION The visual representation of the statistics was generated with R, Matplotlib and Pandas. Scatter plots and histograms were developed to show: Player compensation versus player goals, assists and shots (scatter plots) Player minutes, goal, assists, shots, shots on goal, game winning goals and game winning assists (histrograms) The following section presents a few of the generated visuals. FIGURES, TABLES AND EQUATIONS TABLE 1 - PLAYER PAY BY POSITION Results November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 2
  • 3. FIGURE 3 FORWARD, DEFENDER AND MIDFIELDER GOALS Owners can get the similar goal/assist production VERSUS SALARY from someone making < $200K as with someone making >$400K to $1.2M. This tends to suggest that higher paid players have the same impact on goals or assists as a low wager, which is interesting. Data shows that that the players have similar skill sets. It takes special players to score goals or give assists. FIGURE 2 - 3D PLOT OF FORWARDS GOALS, ASSISTS AND MINUTES Figure 2 shows a 3D rendering of player assists, minutes and game winning assists. In general, the plots sh ows little correlation between the fields. However, for defenders there is a strong correlation between the fields suggesting assists by defenders lead to wins. FIGURE 3 - HISTROGRAMS OF PLAYER MINUTES, GOALS, ASSISTS, SHOTS, SHORTS ON GOAL, GAME WINNING GOALS, GAME WINNING ASSISTS AND SALARY Results Results The plot shows exploratory data analysis of the There is little correlation between a goals or assists various attributes like Minutes, Goals, Shots, and a high salary. Assists, and Shots on Goals, Game Winning Goals, November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 3
  • 4. Game Winning Assists and Salary to summarize the main characteristics in easy-to-understand form. CONCLUSION Unlike European soccer leagues and popular American sports, relatively little work as been done on Major League Soccer (MLS) player and team performance analytics. With MLS growing in popularity combined with the small community of individuals conducting MLS analytics, we decided to apply web analytics concepts taught in Business Intelligence & Analytics class (BIA 660) to help determine player ratings and compensation. The primary goal of the project was to use BIA 660 web analytics lessons on the Python programming language and related modules to analyze and visualize MLS specific data. ACKNOWLEDGMENT We acknowledge the mentoring of Professor Winter Mason. REFERENCES PYTHON PROGRAMMING LANGUAGE – HTTP://WWW.PYTHON.ORG/ HTTP://WIKI.PYTHON.ORG/MOIN/TKINTER 1 AUTHOR INFORMATION Chris Armstrong,chris.r.armstrong@gmail.com Dan Derringer, dderringer311@gmail.com Jude Ken-Kwofie, jkenkwof@stevens.edu Hemanth Mahadevaiah,hemanth.m1@gmail.com Sujana Veeraganti, sujanaveeraganti@gmail.com 1 Stevens Institute of Technology Business Intelligence & Analytics Graduate Students November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 4
  • 5. November 13, 2012, Hoboken, NJ Major League Soccer Analytics with Python 5