Selecting the Teams for Football Bowl Games
In recent years the college football teams playing in the various bowl games at the end of the season have been selected with an elaborate program in what is called the Bowl Championship Series (BCS) computer. The program produces a rank for every college team in the nation, and these ranks determine which teams play in the games. I know very little about the process, but I think that the bowl games have a ranking of their own, and the Rose Bowl gets the top-ranked team in the nation plus the next-lower team which is not from the same conference as the top team. The next-lower bowl game gets teams that are lower ranked and from different conferences than the first two teams; the process continues until all teams for bowl games are picked. Even if this description of the process is not quite right, the fact remains that it depends completely upon the ranking of the college teams -- and the computer program for doing the ranking is full of subjectivity in its data inputs. This subjectivity is the source of endless complaining by sports fans about the validity of the BCS process and the mismatch of the teams in the bowl games.
Some inputs to the BCS program are pure subjective judgment calls from coaches and sports writers on the relative strengths of teams. Others are observable numbers from games played, but given weight in the computation according to subjective criteria. Now there is nothing wrong with expert judgment and subjectivity. Indeed, almost every enterprise depends mostly upon expert judgment. But the team standings in conference play are determined solely by wins and losses -- no subjectivity at all, winning is everything -- and nobody argues about the standings. It is a strong departure to go from pure objectivity to a method heavily laced with subjectivity -- treacherous when the opinions of the experts are highly variable (not to mention the opinions of all the would-be experts and fans).
I propose here a scheme of picking teams for the bowl games which is 100% objective, using only win-loss data. To do that, the contest analysis algorithm (described elsewhere on this site) must be employed. But even before the algorithm is used, some consideration must be given to the entries in the win-loss scoreboard itself. First of all, why not use game points instead of wins in the scoreboard, if that permits distinguishing strong wins from weak ones? There are several reasons why this is not desirable: (1) Game points can easily give different team standings than wins, and winning is everything. (2) Some teams may focus on a strong passing game, with associated risk in making it click in every game, leading to wide variation in scores. Other teams may have ground play as their strong suit, with low scores being typical in their games, and not so much variation. A lopsided score may be nothing more than a reflection of chance. (3) A zero entry in the scoreboard (for either wins or points) causes problems when it is used to compute a team's strength (even causing a strength to come out to be zero), but the entry can be corrected through Bayesian statistics if it is wins, but not if it is points. So, what do I mean by Bayesian statistics?
Bayesian statistics concerns itself with predicting the outcome of a future random event from knowledge of what has happened before. It takes account of small sample sizes, and football conference standings are based on the smallest of sample sizes -- usually only one game between any two teams in a season (sometimes not even one). In reliability engineering, a lot of expert analysis and judgment goes into developing the so-called prior knowledge about an event (say a rocket launch) before testing. It is not just a matter of saying that, if 2 out of 3 trials have been successful so far, the chance of the next one being a success is 2/3. If the engineering data support a much greater reliability than 2/3, one can compute a prior sequence of "virtual" tests (say 10 tests, with 9 being successful) before the 3 that were actually conducted. With prior knowledge included this way, the total count of tests (real plus virtual) would be 13, with 11 successes -- a probability of success for the next future test of 11/13. If the 2-out-of-3 test had been done with no prior knowledge about the object being tested, one would represent that prior ignorance with 2 virtual prior tests, one being successful, and an estimate of 3/5 for the success probability on the next future test. This lower number (lower than the 2/3 of the test result) brings out explicitly in the estimate the weakness inherent in the small sample size. (The extreme example, of course, would be one test which comes out a success. It would be absurd to estimate, based on that alone, that the probability of future success is 1. In fact, that estimate can only approach 1 as the number of conducted tests approaches infinity.) The rule for the Bayesian reliability estimate with zero prior knowledge about the test object is to add 1 to the number of actual successes and 2 to the actual number of trials. If no trials have yet been done, the estimate (under the assumption of complete ignorance) has the perfectly logical value of 1/2.
Bayesian statistics used to be held in low esteem by the community of 'classical' statisticians, because they did not like the idea of working with prior information, but it is now very widely accepted. (I am not a statistician and not qualified to speak too strongly here, but I think that all problems in which Bayesian statistics are applicable should go via that route.) Now, back to the football conference games, we observe that we want to have zero subjective input to the problem, so we will want to use the Bayesian estimate for the probability that the next game will be a win when there is no prior information. The case of no prior information is represented by giving every team in a conference one virtual win against every other team, before the season starts, and then adding on the actual wins as the season progresses. Then the Bayesian scoreboard is just the usual array of team vs. team wins with every entry increased by one. Before the season starts, every team has a 50% win average -- all virtual data at this point. During the season, the teams will go above or below the 50% win average as they accumulate wins and losses in the array. (For the usual case of one game to be played against each other team, the starting 1's will either be left unchanged or increased to 2's, depending upon which team wins each game.) By itself, this Bayesian scoreboard will not change the standings in typical conference play; its advantage comes into the picture only when it is used together with the contest analysis algorithm to take account of the strengths of opposing teams. That advantage is key when it comes to analyzing interconference play, but is even significant for intraconference play when there are ties. (The small number of conference games assures that ties will be all too common, and the algorithm will resolve them -- unless the two tied teams have not played each other and have had identical win patterns.)
The Bayesian estimates of win probabilities are not expected to be as good as subjective estimates done by experts who bring along all of the factors they deem significant -- weather, coaching, sidelined players, past-season performance, etc. For betting purposes, we want to use all the all the information we have; the people who use it best are the best sports gamblers. But these estimates are variable from expert to expert, which is why they are not suitable for determining official team standings or placement of teams into bowl competitions.
Worked example: The 2000 season in the Big 10 and Pacific 10 conferences.
Following the 2000 season, the Big 10 and Pacific 10 conferences played each other in the Rose Bowl (Purdue and Washington). In this example I work through the season in the two conferences to demonstrate the operation of the contest algorithm. After that I show how the algorithm can be used to decide which conferences should be represented in bowl games and which teams from those conferences should play.
The 21 teams in the two conferences will be numbered as follows (first column is Big 10, but it has 11 teams):
1.
Illinois
12. Arizona
2. Indiana
13. Arizona St.
3. Iowa
14. California
4. Michigan
15. Oregon
5. Michigan St.
16. Oregon St.
6. Minnesota
17. So. California
7. Northwestern
18. Stanford
8. Ohio St.
19. UCLA
9. Penn St.
20. Washington
10. Purdue
21. Washington St.
11. Wisconsin
The Bayesian scoreboards for the two conferences are shown below side-by-side. Every off-diagonal entry is either a 1 or 2, the number of real plus virtual games the team in the row has won against the team in the column. (The numbers could have gone beyond 2 if the teams had played each other more than once.)
Big 10
Wins
Pacific 10 Wins
0 2 2
1 1 1 1 1 1 1 1 12
0 1 1 1 1 2 2 1 1 2 12
1 0 2 1 1 2 1 1 1 1 1 12
2 0 2 1 1 1 1 1 1 2 12
1 1 0 1 2 1 2 1 2 1 1 13
1 1 0 1 1 2 1 2 1 1 11
2 2 1 0 2 1 1 2 2 1 2 16
2 2 2 0 1 2 1 2 2 2 16
2 1 1 1 0 1 1 1 1 2 1 12
2 1 2 2 0 2 2 2 1 2 16
2 1 2 1 1 0 1 2 2 1 1 14
1 2 1 1 1 0 1 2 1 1 11
2 2 1 2 2 2 0 1 1 1 2 16
1 2 2 1 1 2 0 1 1 2 13
2 1 2 1 2 1 1 0 2 1 2 15
2 2 1 1 1 1 2 0 1 1 12
2 2 1 1 2 1 1 1 0 2 1 14
2 2 2 1 2 1 2 2 0 2 16
1 2 1 2 1 2 2 2 1 0 2 16
1 1 2 1 1 2 1 1 1 0 11
1 2 2 1 2 2 1 1 1 1 0 14
All teams play eight conference games, the Big 10 skipping two matches for each of its teams and the Pacific 10 skipping one match. Unplayed combinations are still given their virtual games, one win to each team. The actual wins can be read by subtracting the virtual wins from the 'wins' column or by counting the 2's in the row. Each conference has a three-way tie for first place, resolvable with the contest algorithm. The 'wins' column does not have a lot of variation and there are some matches not played, so it is rather hopeless to guess the best team from inspection of the raw data.
If the goal is to decide which two teams are to go to the Rose Bowl, presuming that it has already been decided that these two conferences will play there, the interconference games between Big 10 and Pacific 10 can be included, bringing more objective data into the problem. There were six such games, and a joint Big 10 - Pacific 10 win-loss scoreboard can be formed using the two above scoreboards plus the entries from the six interconference games. As before, all possible team combinations are given one virtual win for each team as an expression of prior ignorance about probability of win in the first actual game. The combined scoreboard looks like this (where wins and losses include both real and virtual games--the row sum and the column sum for each team):
Wins Losses Win
fraction
0 2 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1
23 26
0.469
1 0 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
22 26
0.458
1 1 0 1 2 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1
23 25
0.479
2 2 1 0 2 1 1 2 2 1 2 1 1 1 1 1 1 1 1 1 1
26 23
0.531
2 1 1 1 0 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
22 26
0.458
Big 10
2 1 2 1 1 0 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1
24 24
0.500
2 2 1 2 2 2 0 1 1 1 2 1 1 1 1 1 1 1 1 1 1
26 22
0.542
2 1 2 1 2 1 1 0 2 1 2 2 1 1 1 1 1 1 1 1 1
26 23
0.531
2 2 1 1 2 1 1 1 0 2 1 1 1 1 1 1 1 1 1 1 1
24 25
0.490
1 2 1 2 1 2 2 2 1 0 2 1 1 1 1 1 1 1 1 1 1
26 22
0.542
1 2 2 1 2 2 1 1 1 1 0 1 1 2 1 1 1 1 2
1 1 26 24
0.520
1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 2 2 1 1 2
23 26
0.469
1 1 1 1 1 1 1 1 1 1 1 2 0 2 1 1 1 1 1 1 2
23 25
0.479
1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 2 1 2 1 1
22 28
0.440
Pacific 10 1 1 1 1 1 1 1 1 1 1
1 2 2 2 0 1 2 1 2 2 2 27
21 0.551
1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 0 2 2 2 1 2
27 21
0.551
1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 0 1 2 1 1
23 26
0.469
1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 2 0 1 1 2
24 24
0.500
1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 2 0 1 1
24 26
0.480
1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 1 2 2 0 2
27 21
0.563
1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 1 0
22 26
0.458
Judging from the 'win fraction' column, Big 10 still has a two-way tie (Northwestern and Purdue), while Pacific 10 has a preferred contender (Washington). The teams sent to the Rose Bowl in 2001 were Purdue and Washington, so the Bayesian accounting on the combined scoreboard is probably very close to giving the same results as the BCS computer.
Running the contest algorithm on the three above scoreboards (after dividing the entries by games played) the relative applied strengths of teams come out to be as shown below. (The units shown are arbitrary--for relative placement of teams only.)
Separate
Combined
Scoreboards
Scoreboards
1.
281
919
2.
283
906
3.
306
946
4.
364
1038
5.
285
909
6.
325
983
7.
365
1060
8.
343
1035
9.
323
962
10.
374
1071
11.
322
1019
12.
356
921
13.
355
936
14.
332
886
15.
466
1070
16.
467
1092
17.
333
924
18.
379
973
19.
360
945
20.
469
1093
21.
329
901
The teams picked by the contest algorithm have their strengths underlined, and it is seen that the contest algorithm picks teams 10 and 20 with both the separate-scoreboard and combined-scoreboard approaches. These are the same teams (Purdue and Washington) that actually played in the Rose Bowl. The contest algorithm made the same choice as the BCS program in this case. The 'win-fraction' column attached to the previous scoreboard had a tie between Big 10 teams 7 and 10 so that tie was broken with the algorithm. It is evident from the numbers that the choice is very close -- a photo finish, one might say -- and typical of the close finishes seen in other types of sports. The contest algorithm can be thought of as something which does the same job as the camera in a sprint or horse race: remove the human element from the decision. The combined-scoreboard approach gives a more precise result, due to the inclusion of interconference data, but the extra precision did not change anything in this case.
Using interconference data from college games to select bowl contestants.
In the last section it was assumed that the Big 10 and Pacific 10 conferences were to go to the Rose Bowl, and the question had to do with picking the teams. In the old days the bowl games were always reserved for certain conferences, I think, and that was the case for the Rose Bowl in the 50's when I was a student in a Big 10 school. Back then, some team from the Big 10 went to the Rose Bowl every year, together with some team from a western conference (presumably Pacific 10). Whether some vestige of that old preference for certain conferences at certain bowl games remains today, I do not know, but applying contest analysis to the win-loss data of more of the college games will improve the selection of teams for bowl games in any case.
Using data from conferences which play some games with other conferences, we can rank conferences according to the average strengths of their teams. This can go all the way to inclusion of every college team and every conference, but I would hope that minor conferences (not of bowl class) could be left out to ease the labor of data entry. If each bowl game is supposed to draw its pair of teams from say four conferences, only those four conferences would have their data entered into the contest algorithm, and it would be run separately for each bowl game (as in the above example for two conferences). Selection could be the top two conferences, with team selection as a second step, or the top two teams from different conferences, with no second step. If conference affiliation is not to matter at all and the nation's two best teams (from any two different conferences) are to play in the highest-ranking bowl game, we would have to use the full set of all relevant college data to produce the applied strengths of all candidate teams and pick the bowl pairs sequentially by going down the list. (The question of rank of the bowl games is a separate one, already answered to everybody's satisfaction, no doubt. It had to be dealt with in years past -- maybe even by random drawing, except for giving the Rose Bowl permanent top rank by virtue of seniority.)
To appreciate the size of this problem, consider he number of conferences. Including the subdivisions, I counted 88 college conferences -- the last 40 or so looking decidedly minor to my unpracticed eye. Even if those were excluded, we might be talking about some number of college teams approaching 500. Interconference play is extensive; the Big 10 in 2000 played 44 conference games and 40 interconference games (not counting the Rose Bowl). Since only 6 of those latter games were with Pacific 10, one could estimate an involvement of six or seven conferences in a Big 10 season. If, say, seven conferences form a sort of conference of conferences, playing only among themselves, it could give a limit to the amount of data needed for contest analysis. If it is specified in advance that each bowl game is to have its teams drawn from a certain set of conferences, that would also give a nice limit to the amount of data needed, since extraneous interconference games would be excluded.
Alan E.
Johnsrud
Go to main page.
12 October 2009