SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
How smart is
Football Data
Analytics today?
Dr. Stefan Kühn
data2day - Karlsruhe
29.09.2015
Topic
Why Football Data Analytics?
• It’s about Football
• There is a lot of data out there
• There is a lot of ignorance out there
• Three examples
• Corners
• Marginal goals
• Substitutions
• Alternatives
2
Infos
Why Football is an interesting Use Case
• 209 FIFA federations - worldwide
• Most popular sport - 3.3-3.5 billion fans
• Monetary facts - revenue (Deloitte Money League)
• Real Madrid 2013/4: 549.5 Million € (Position 1)
• Bayern Munich 2013/4: 487.5 Million € (Position 3)
• Everton 2013/4: 144.1 Million € (Position 20)
• Social Media facts (Deloitte Money League)
• Facebook: FC Barcelona - 81.4 Million Likes
• Twitter: Real Madrid - 14.4 Million Followers
3
Some Stats
Why Football is a Data Use Case
• 306 Bundesliga matches per season
• 2000+ recorded events per match
• 512 Bundesliga players
• Live Statistics (Opta, Prozone etc.):
• Shots, Passes, Assists
• Tacklings, Blocks, intercepted Passes
• Saves and other actions of Goalkeepers
• Fouls and Foul types
• Position Data including time stamps
• 1.8 Million Amateur matches (Deutschland)
4
Some Remarks
Is there anything left to do?
• Big companies like SAP are involved
• Players are tracked in training and matches (and
sometimes at home as well)
• Physiological data, nutrition data, training plans
★ BUT:
Big data is not about the data.
(Gary King, Harvard University, 2013)
It’s about Analytics.
5
Some Remarks
Where is the ignorance?
• „The Number’s Game - Why Everything You
Know About Football Is Wrong“
• Book by Chris Anderson (former Cornell University
Prof) and David Sally (Economics and Behavioral
Game Theory)
• „Is it easier to score as a sub“?
• Blogpost by Dan Altman, founder of North Yard
Analytics
6
Ignorance
-
Part 1
7
Corners
Claim: Long corners are overrated, short
corners are better, see e.g. Barca.
8
Long corners versus Short corners
Corners
Some useful stats
• Average number of goals per team per match: 1.3
• Average number of corners per team per match: 5
• Long corners account for ~8.5% of all goals
• Silly question: The average team scores once
every ten games from a penalty, shall they give
up on penalties as well?
• Lack of relevant context
• How efficient are the alternatives?
• How efficient is the average possession?
9
Corners
Average Possession
• Average number of possessions per team per match: 200
• Average number of goals per team per match: 1.3
• Expectation value per possession: 0.0065
• Normalized per match (200 possessions):
• All possessions are corners: 4.4 goals
• Half of the possessions are corner: 2.85 goals
• 10% of the possessions are corners: 1.46 goals
• The efficiency of long corners is more than three times
as high as the efficiency of the average possession.
• Still unknown:
• How efficient are the alternatives?
• Are there any negative counter effects?
10
Corners
11
Ignorance
-
Part 2
12
Marginal Goals
13
Claim:
Some goals count
more than others,
one should rate
players according
to this.
Marginal goals
14
Why they should have bought Darren Bent
What do you think?
Marginal goals
Why they should have bought a book on hypothesis testing
• How many second goals could have been scored without the first goal?
• Do the samples for matches with one (own) goal, two goals etc. differ,
and if yes (it’s a definite yes, selection bias): how?
• Is it more likely to score more against weaker teams and less against
stronger teams?
• And of course: The events considered here are not statistically
independent.
15
What they should have done
• Compute marginal goals per sample group (e.g. fixed number of own goals).
Here, the first goal cannot have less marginal points than the second goal etc.
which is the only reasonable result.
• Do not compare apples and pies. (In some sense Simpson’s paradox)
• Or: Hire the best striker for first goals and the best striker for second goals.
Ignorance
-
Part 3
16
Substitutions and Scoring
17
Substitutions and Scoring
Claim
Subs score more
than expected
• This is the first
correct claim!
• But still weak
effect, unknown
reason(s)
• Do opponents
score more as
well?
• Corrections needed
• 36% of subs are
forwards
• Individual Orders
• Tactical changes
• Lots of other things
18
Substitutions and Scoring
Only
forwards
Controlled
for time on
the field
• Claim:
Fatigue is
the cause
of this
effect!
19
Substitutions and Scoring
A closer look
Estimates for
the mean for
first and
second half
• Analysis:
No control for
fatigue
possible, only
control for
time spent on
the field.
20
From minute 60
on the share of
subs starts to
rise. Effect on
number of goals?
Substitutions and Scoring
Detected
Reason
Fatigue,
subs are
fitter
• What do
you think,
when
looking at
this graph?
21
Summary
What are the commonalities in all cases?
• „New“ spectacular insights
• Preconceptions
• Confirmation Bias
• Lack of reflection
• Challenging own results?
• Alternative explanations?
• Do not mix up a variable and your interpretation
of this variable (fatigue vs. time on field)
• BUT: Data and Tools have been good!
22
Alternatives
23
What keeps Football Data Analytics from being smart?
24
Requirements
+ Scientific Method!
Reality
Tools Data
Money
???
+ Severe Time Constraint
+ Results must impress
What keeps Data Analytics from being smart?
25
Requirements
+ Scientific Method!
Reality
Tools Data
Money
???
+ Severe Time Constraint
+ Results must impress
Alternatives
26
27
Thanks a lot!
And enjoy the game :-)
www.codecentric.de
blog.codecentric.de
stefan.kuehn@codecentric.de

Contenu connexe

Similaire à SKuehn_Talk_FootballAnalytics_data2day2015

Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6wHarry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6wnikhilawareness
 
Go to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tomGo to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tomnikhilawareness
 
This is going everywhere
This is going everywhereThis is going everywhere
This is going everywherenikhilawareness
 
All channels minus Awareness channel
All channels minus Awareness channelAll channels minus Awareness channel
All channels minus Awareness channelnikhilawareness
 
Woolcock opta pro analytics forum
Woolcock opta pro analytics forumWoolcock opta pro analytics forum
Woolcock opta pro analytics forumTheWoolster
 
Woolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with linksWoolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with linksTheWoolster
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics PresentationMilesBuesst
 
EC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationEC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationRory O'Riordan
 

Similaire à SKuehn_Talk_FootballAnalytics_data2day2015 (13)

Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6wHarry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
Harry Potter 7-2 3D tonight!!! http://4rd.ca/aaaj6w
 
Go to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tomGo to all channels so that I may test your stats tom
Go to all channels so that I may test your stats tom
 
This is going everywhere
This is going everywhereThis is going everywhere
This is going everywhere
 
WC 2011 starts tom
WC 2011 starts tomWC 2011 starts tom
WC 2011 starts tom
 
All channels minus Awareness channel
All channels minus Awareness channelAll channels minus Awareness channel
All channels minus Awareness channel
 
I am omnipresent
I am omnipresentI am omnipresent
I am omnipresent
 
Woolcock opta pro analytics forum
Woolcock opta pro analytics forumWoolcock opta pro analytics forum
Woolcock opta pro analytics forum
 
Lineup Efficiency
Lineup EfficiencyLineup Efficiency
Lineup Efficiency
 
Woolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with linksWoolcock opta pro analytics forum with links
Woolcock opta pro analytics forum with links
 
Field Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptxField Hockey match analysis by rohit.pptx
Field Hockey match analysis by rohit.pptx
 
Lesson 2
Lesson 2Lesson 2
Lesson 2
 
Andy Pick: Statistics Presentation
Andy Pick: Statistics PresentationAndy Pick: Statistics Presentation
Andy Pick: Statistics Presentation
 
EC3144 Undergraduate Dissertation
EC3144 Undergraduate DissertationEC3144 Undergraduate Dissertation
EC3144 Undergraduate Dissertation
 

Plus de Stefan Kühn

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfStefan Kühn
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfStefan Kühn
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsStefan Kühn
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeStefan Kühn
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with RStefan Kühn
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsStefan Kühn
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep LearningStefan Kühn
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data VisualizationStefan Kühn
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsStefan Kühn
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017Stefan Kühn
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization MethodsStefan Kühn
 
Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataStefan Kühn
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeStefan Kühn
 
Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Stefan Kühn
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015Stefan Kühn
 

Plus de Stefan Kühn (16)

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Data Visualization at codetalks 2016
Data Visualization at codetalks 2016Data Visualization at codetalks 2016
Data Visualization at codetalks 2016
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 

Dernier

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 

Dernier (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

SKuehn_Talk_FootballAnalytics_data2day2015

  • 1. How smart is Football Data Analytics today? Dr. Stefan Kühn data2day - Karlsruhe 29.09.2015
  • 2. Topic Why Football Data Analytics? • It’s about Football • There is a lot of data out there • There is a lot of ignorance out there • Three examples • Corners • Marginal goals • Substitutions • Alternatives 2
  • 3. Infos Why Football is an interesting Use Case • 209 FIFA federations - worldwide • Most popular sport - 3.3-3.5 billion fans • Monetary facts - revenue (Deloitte Money League) • Real Madrid 2013/4: 549.5 Million € (Position 1) • Bayern Munich 2013/4: 487.5 Million € (Position 3) • Everton 2013/4: 144.1 Million € (Position 20) • Social Media facts (Deloitte Money League) • Facebook: FC Barcelona - 81.4 Million Likes • Twitter: Real Madrid - 14.4 Million Followers 3
  • 4. Some Stats Why Football is a Data Use Case • 306 Bundesliga matches per season • 2000+ recorded events per match • 512 Bundesliga players • Live Statistics (Opta, Prozone etc.): • Shots, Passes, Assists • Tacklings, Blocks, intercepted Passes • Saves and other actions of Goalkeepers • Fouls and Foul types • Position Data including time stamps • 1.8 Million Amateur matches (Deutschland) 4
  • 5. Some Remarks Is there anything left to do? • Big companies like SAP are involved • Players are tracked in training and matches (and sometimes at home as well) • Physiological data, nutrition data, training plans ★ BUT: Big data is not about the data. (Gary King, Harvard University, 2013) It’s about Analytics. 5
  • 6. Some Remarks Where is the ignorance? • „The Number’s Game - Why Everything You Know About Football Is Wrong“ • Book by Chris Anderson (former Cornell University Prof) and David Sally (Economics and Behavioral Game Theory) • „Is it easier to score as a sub“? • Blogpost by Dan Altman, founder of North Yard Analytics 6
  • 8. Corners Claim: Long corners are overrated, short corners are better, see e.g. Barca. 8 Long corners versus Short corners
  • 9. Corners Some useful stats • Average number of goals per team per match: 1.3 • Average number of corners per team per match: 5 • Long corners account for ~8.5% of all goals • Silly question: The average team scores once every ten games from a penalty, shall they give up on penalties as well? • Lack of relevant context • How efficient are the alternatives? • How efficient is the average possession? 9
  • 10. Corners Average Possession • Average number of possessions per team per match: 200 • Average number of goals per team per match: 1.3 • Expectation value per possession: 0.0065 • Normalized per match (200 possessions): • All possessions are corners: 4.4 goals • Half of the possessions are corner: 2.85 goals • 10% of the possessions are corners: 1.46 goals • The efficiency of long corners is more than three times as high as the efficiency of the average possession. • Still unknown: • How efficient are the alternatives? • Are there any negative counter effects? 10
  • 13. Marginal Goals 13 Claim: Some goals count more than others, one should rate players according to this.
  • 14. Marginal goals 14 Why they should have bought Darren Bent What do you think?
  • 15. Marginal goals Why they should have bought a book on hypothesis testing • How many second goals could have been scored without the first goal? • Do the samples for matches with one (own) goal, two goals etc. differ, and if yes (it’s a definite yes, selection bias): how? • Is it more likely to score more against weaker teams and less against stronger teams? • And of course: The events considered here are not statistically independent. 15 What they should have done • Compute marginal goals per sample group (e.g. fixed number of own goals). Here, the first goal cannot have less marginal points than the second goal etc. which is the only reasonable result. • Do not compare apples and pies. (In some sense Simpson’s paradox) • Or: Hire the best striker for first goals and the best striker for second goals.
  • 18. Substitutions and Scoring Claim Subs score more than expected • This is the first correct claim! • But still weak effect, unknown reason(s) • Do opponents score more as well? • Corrections needed • 36% of subs are forwards • Individual Orders • Tactical changes • Lots of other things 18
  • 19. Substitutions and Scoring Only forwards Controlled for time on the field • Claim: Fatigue is the cause of this effect! 19
  • 20. Substitutions and Scoring A closer look Estimates for the mean for first and second half • Analysis: No control for fatigue possible, only control for time spent on the field. 20 From minute 60 on the share of subs starts to rise. Effect on number of goals?
  • 21. Substitutions and Scoring Detected Reason Fatigue, subs are fitter • What do you think, when looking at this graph? 21
  • 22. Summary What are the commonalities in all cases? • „New“ spectacular insights • Preconceptions • Confirmation Bias • Lack of reflection • Challenging own results? • Alternative explanations? • Do not mix up a variable and your interpretation of this variable (fatigue vs. time on field) • BUT: Data and Tools have been good! 22
  • 24. What keeps Football Data Analytics from being smart? 24 Requirements + Scientific Method! Reality Tools Data Money ??? + Severe Time Constraint + Results must impress
  • 25. What keeps Data Analytics from being smart? 25 Requirements + Scientific Method! Reality Tools Data Money ??? + Severe Time Constraint + Results must impress
  • 27. 27 Thanks a lot! And enjoy the game :-) www.codecentric.de blog.codecentric.de stefan.kuehn@codecentric.de