Top Special Offer! Check discount
Get 13% off your first order - useTopStart13discount code now!
The research we conducted on the viability of predicting the outcomes (wins or losses) of a sports team in American football based on its historical data. We focused on the Miami Dolphins results and attempted to forecast a number of victories in 2015 using data from 1997 to 2011.
We selected this subject since American football is the most popular sport in the country and winning bets on it could bring in a sizable sum of money for the savvy bettor. Since Florida is where the Miami Dolphins were founded, it makes sense to be interested in the team’s performance since it is close by. The main research question was if the estimated results of time-serial wins regression taken on the interval of 15 years match the real result for the 2015 year.
Completing the project demonstrate Saint Leo University’s core value of excellence through the quality of research and its results presentation.
Data
The data source for statistics become an internet website about professional sport. This area contains a number of pages with statistical data. This information is reliable as soon as it may be checked by many sports fans and people interested in data for commercial purposes requiring information of high quality. The data on this topic is free of charge. However, American football statistics that sites provide has its specific, in particular, concentration on technical elements of the game and results of individual sportsmen.
The data sample used in this research covers years 1997-2017. As soon as the season of 2017 is not finished yet, the data for this year is incomplete. That year could be used only for forecasting, therefore, the sample includes 20 years of observations. The website providing information for research is https://www.pro-football-reference.com that contains American football statistic for a particular team, where for analysis have been chosen Miami Dolphins. In the sample have been included data on wins and losses per season for 20 seasons (1997-2016). The third available variable “tails” were equal zero for all years in the sample. Wins and losses in such conditions sum up to 16 in every season and mirroring each other. When a number of wins growth the number of loses declines. Based on that rule in the paper analysis employs mostly wins variable. In the linear regression model provided in this paper, the independent variable is time (nominal years are substituted with a number of periods), the dependent variable is won.
The values of all variables studied in research are presented in the table below:
Nominal year
Period
Wins
Loss
1997
1
9
7
1998
2
10
6
1999
3
9
7
2000
4
11
5
2001
5
11
5
2002
6
9
7
2003
7
10
6
2004
8
4
12
2005
9
9
7
2006
10
6
10
2007
11
1
15
2008
12
11
5
2009
13
7
9
2010
14
7
9
2011
15
6
10
2012
16
7
9
2013
17
8
8
2014
18
8
8
2015
19
6
10
2016
20
10
6
2017
21
4
5
Years and periods are presented in the ordinal form, while wins and loses in the quantitative form. Wins and loses signify a number of games where Miami Dolphins win or lose within a particular season.
Descriptive statistics
For primary variables, wins and loss in the table below provided descriptive statistics. From the comparison, it is possible to tell that Miami Dolphins loss on average more often than win. At the same time, the number of wins per season varies more than a number of losses.
Statistic
Wins
Loss
N
21
21
Mean
7,761905
7,904762
Std. Deviation
2,624972
2,567192
Skewness
-0,88691
1,096436
Kurtosis
0,667457
1,474791
Minimum
1
5
Median
8
7
Maximum
11
15
Graphs
Next graph shows the dynamic of wins and losses in the time perspective. The symmetry of changes comes from the standard sum of total games and zero shares of tails.
The next graph contains linear approximations for both dynamic series. The decline of lines tells that the share of loses is growing while one of the wins decreasing. The quality of approximations, however, is low as r-square for models is 0.1568 for wins and 0.0348 for loses. That means the time itself is not enough to reflect changes, the model requires more independent variables to reflect changes in wins and loses.
The next graph and table show that most of the seasons the number of wins have been more than 8.5.
Bin
Frequency
Int %
Bin
Frequency
Int %
1
1
4,76%
More
10
47,62%
3,5
0
4,76%
6
5
71,43%
6
5
28,57%
8,5
5
95,24%
8,5
5
52,38%
1
1
100,00%
More
10
100,00%
3,5
0
100,00%
The graph and table below show that most of the seasons the number of loses have been between 7.5 and 10.
Bin
Frequency
Int %
Bin
Frequency
Int %
5
4
19,05%
10
8
38,10%
7,5
7
52,38%
7,5
7
71,43%
10
8
90,48%
5
4
90,48%
12,5
1
95,24%
12,5
1
95,24%
More
1
100,00%
More
1
100,00%
Testing conducted
The test conducted in this research is a regression building for variable wins (with t-test and f-test for quality). The number of observation included was 15. The quality of the model was low by such characteristics as r-square, p-level of F-test, and a p-value of t-tests both for intercept and time variable.
Simple Linear Regression Analysis
Regression Statistics
Multiple R
0,481357
R Square
0,231704
Adjusted R Square
0,172605
Standard Error
2,595643
Observations
15
ANOVA
df
SS
MS
F
Significance F
Regression
1
26,41429
26,41429
3,920568
0,069269
Residual
13
87,58571
6,737363
Total
14
114
Coefficients
Standard Error
t Stat
P-value
Lower %
Upper %
Lower %
Upper %
Intercept
623,5143
310,8599
2,005773
0,066153
-48,0577
1295,086
-48,0577
1295,086
X
-0,30714
0,155119
-1,98004
0,069269
-0,64226
0,027972
-0,64226
0,027972
Prediction based on 15 years for the 19th one (2015) is 623.5143-2015*0.30714 = 4.62
In the other variant of the model with year numbers, it is 10.45714-19*0.30714 = 4.62 as well. The real number of wins is six. The predicted result is about 0.5 standard deviation far from the real one.
The model, in general, reflects the tendency of decreasing share of wins, but its prediction is not very precise.
Conclusion
Results of sports teams in every particular season depend on many factors, however, it is possible to divide some general tendency through the time. Long-term negative changes possibly have in the base some third parameter different from the time itself. That parameter requires next investigation and building of model with more undependable variables.
I would not recommend using presented statistical model for wins forecasting as it is based only on the time parameter. It is dangerous in the application for prediction including any financial interest due to its low quality. In addition, an approximation of results by the regression model, in general, may be recommended only for studying of existing results, interpolation. Extrapolation requires model based on much more data and variables and still may be questioned on quality of prediction.
Work cited
PFR. (2017). 2017 Miami Dolphins Statistics & Players. https://www.pro-football-reference.com/teams/mia/2017.htm#all_team_stats
Hire one of our experts to create a completely original paper even in 3 hours!