Harness Racing and SAS

HARNESS
RACING
AND SAS
USING SAS TO MODEL HORSE RACES

DATA SET
•

“Past Performance” from TrackMaster for races September 26, 2013 at Yonkers
Raceway

•

Published in advance of the race

•

Cost: $1.50

•

Comes in XML format – parsed using python

•

Contains 10 most recent PPs for each horse racing that day

•

12 races x 8 horses x 10 past performances = 960 records

•

Variables of use: Lengths back at each quarter, final time, lead final time, gait, age
(meta), track condition, track name, track length

•

Created race-level, horse-race-level, and longitudinal data sets for different aspects
of this analysis

GAIT AND CONDITION
• Hypothesis: Gait and track condition influence race time

• Gait
• Binary: Pacers and Trotters
• Each race is one or the other
• Each horse is one or the other
• Condition
• Categorical: Fast, Good, or Sloppy
• Each race categorized into one
• Created and cleaned race-level data set
• Means test showed means are different for both variables
• T-test showed these differences are statistically significant

CORRELATION: LENGTHS
BACK AT CALLS
• Some horses pull away early, others seem to wait for the
last quarter to go to the front
• TrackMaster reports lengths back from lead and calls at
each quarter
• Lengths are recorded as fractional numbers (to the
quarter) and as parts of horse

• Nose
• Head
• Neck
• Additional complication: “costly breaks” of pace and
disqualification
• Still not happy – strange lengths back for winners at final

CORRELATION OF
LENGTHS BACK BY
QUARTER

AGE AND SPEED
•

Goal: Quantify how much
horses slow down with
age

•

Merged metadata for each
horse with past
performance data

•

Single-variable
regression analysis of
mean data set

•

Found that age is not a
great predictor of speed

•

Age: Discrete, yet not
categorical

MULTIVARIATE
REGRESSION
•

Longitudinal data set

•

Created dummy variables for past and present track conditions, gaits, and
track sizes

•

Used SAS’s “Lag” and “Last” Features

•

Removed disqualified races

•

Modeled race time based on current race conditions and two races prior

MULTIVARIATE
REGRESSION
Control Variables

Variables of Interest
Label

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

Label

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

Intercept

104.67788

4.81142

21.76

<.0001

Fast lag

0.35883

0.38598

0.93

0.3528

Lag final
time

0.01412

0.03120

0.45

0.6510

Sloppy lag

0.48532

0.43151

1.12

0.2610

Lag2 final
time

0.11361

0.02975

3.82

0.0001

Fast lag2

0.09472

0.37245

0.25

0.7993

Pacer

-3.68185

0.21247

-17.33

<.0001

Sloppy
lag2

-0.39904

0.42068

-0.95

0.3431

Fast

-0.77005

0.38954

-1.98

0.0484

5/8 Track
lag

0.14639

0.23680

0.62

0.5366

Sloppy

0.86942

0.43605

1.99

0.0465

1 Track lag 0.40192

0.51792

0.78

0.4379

Age

0.05312

0.04023

1.32

0.1871

5/8 track
lag2

0.58564

0.21764

2.69

0.0073

5/8 Track

-2.74052

0.20313

-13.49

<.0001

1 track
lag2

0.67260

0.49172

1.37

0.1717

1 Track

-3.18411

0.47824

-6.66

<.0001

Final race times from previous races are not
great determinants of final race time this race!

PREDICTION OF
SEPTEMBER 26 RACES
•

Used the coefficients
from my multivariate
regression and most
recent two races for each
horse

•

Ranked horses by
predicted race values

•

But my bets weren’t
great! But better than
choosing at random!

•

Reason: Low, low
variance in race times
among horses. Not
enough predictive power
in model, even with R^2 >
0.5

Predicting the Winner

Right
Wrong

FINAL THOUGHTS
•

SAS’s LAG and LAST features are great for
dealing with longitudinal data

•

Most work was on the DATA steps, not the
PROC steps

•

My model was based on only 960
occurrences, 96 horses

•

With more data, might model Pacers and
Trotters separately, Conditions separately

•

Still want to investigate lengths back for
winning horses

•

Learned much about SAS and about
harness racing

Harness Racing and SAS

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Harness Racing and SAS

Editor's Notes