URL for this frameset: http://slack.net/~whelan/tbrw/tbrw.cgi?2000/pairwise.shtml
For the better part of the past decade, the NCAA
Division I hockey tournament has been seeded largely by
statistical analysis. For the knowledgeable college hockey fan,
this means not only more confidence that the selection and
seeding is being done fairly without shady back-room deals, but
also a chance to predict in advance how the tournament will be
seeded. One impedement to this used to be the difficulty in
finding out ahead of time what rules the selection committee was
applying to make their decisions, but in March 1997 a major blow
was struck for public education on the process. After the
announcement of the tournament pairings for that year, Selection
Committee chair Joe Marsh provided a detailed
explanation of how it was done to Adam Wodon of US College Hockey
Online, a web site devoted to college hockey. That kicked
detailed effort to educate coaches and the public about the
selection process on the part of the committee, and while
some of the details of the 1999 selections were left
unexplained, we still have a reasonable description of the
process from Marsh's original interview, supplemented by the
announcement of the changes introduced for 1999 and
occasional inquiries to the NCAA.
First of all, from the NCAA's point of view, only official games played between established Division I programs count towards the selection process. This season, those teams are
Two of the ten MAAC teams--Bentley and Mercyhurst--plus three of the six CHA teams--Alabama-Huntsville, Bemidji State, and Findlay--are still Division II this season, and games against them will not be used in the selection process.
The underlying principle behind the current selection process is the pairwise comparison. One team is compared to another team based on five criteria:
A team wins one point towards the comparison for each of the first four criteria, and one point for each head-to-head game in which they defeated the other team in the comparison. Whichever team gets more points wins the comparison, and if it's a tie, the team with the higher RPI wins.
Every Team Under Consideration is compared to every other TUC in this way. The total number of such comparisons won is called the Pairwise Rating (PWR--the fine print). This number can be used to rank the TUCs, and in the past it was believed that the teams were seeded in the order of these Pairwise Rankings, but that is not precisely how it's done. The PWR is used to get a rough sense of which teams are in contention for which spots, but then those teams are placed according to the pairwise comparisons among or between them. For example, if you're battling it out for the twelfth and final spot in the postseason, it doesn't matter how you compare with the fifth-rated team. Thus a two-way tie is impossible, since one team will always win the pairwise comparison. If three teams end up in an unresolvable tie (rock-scissors-paper), we go to the RPI to resolve the deadlock.
With the debut last year of the Metro-Atlantic Athletic Conference, and the resulting appearance on the scene of six teams newly eligible for the Division I tournament and playing the lion's share of their games against one another, some of the weaknesses of the RPI and PWR systems were brought to light. MAAC regular-season champion Quinnipiac finished the season ranked 12th in the RPI and with a pairwise comparison advantage over all but 9 teams in the nation, but were not included in the NCAA's field of 12. This was presumably related to the following paragraph in the NCAA News report on the Summer 1998 Division I Men's Ice Hockey Committee meeting:
In addition to revising one of its selection criteria, the committee noted that it reserves the right to evaluate each team based on the relative strength of their respective conference using the overall conference ratings percentage index (RPI) in determining competitive equity.
It's not known exactly what measure was used to determine this lack of competitive equity, but with no games last season between the MAAC and any of the four established conferences, the best information on the relative strengths of the conferences was their respective performance against the four (last season) Division I Independents, which is summarized in the following table. (The average RPI of all the teams in the conference, which may be alluded to in the paragraph above, is also included for reference.)
|Conference||Avg RPI||vs Indies||vs Army||vs Niagara||vs Air Force||vs MSU-Mankato|
At any rate, the reason for Quinnipiac's deceptively high RPI and PWR last season is no big mystery. RPI attempts to correct a team's winning percentage for their strength of schedule by mixing it with the average winning percentage of their opponents. However, if those opponents have also played abnormally weak schedules, their winning percentages will be a poor indicator of their strength, and hence of the schedule strength of the team in question. According to the more sophisticated (the fine print) KRACH rating system, Quinnipiac was rated #41 out of 52 teams. The pairwise comparison algorithm is even more fragile, as the "Last 16" and "Teams Under Consideration" criteria make no allowance for strength of schedule at all, simply comparing the teams' winning percentages in those games. Niagara, despite having a low RPI, was able to win a few key comparisons last year by accumulating good records against weak teams in their last 16 games and against teams which accumulated winning records against weak schedules.
The bottom line is that the committee is at liberty to leave CHA and MAAC teams out of the tournament on the basis of the relative weakness of their schedules, even if their pairwise comparisons would otherwise entitle them to a berth. (Unfortunately, this method cannot correct for the other consequences of RPI's shortcomings, such as the potential overvaluing of top MAAC and CHA opponents appearing on major conference teams' schedules this season.) Here is a table of each conference's average RPI and their record vs each other conference; additionally, the team with the best RPI in the conference is listed as well as the average RPI of their conference opponents.
|Conference||Avg RPI||vs HE||vs WCHA||vs CCHA||vs CHA||vs ECAC||vs MAAC||Leader||Opp RPI|
|Hockey East (H)||.5320||13-6||10-7||3-2-1||26-15-3||5-0||Me||.5225|
For comparison, here is how the KRACH rating system predicts each conference would fare if each of its teams played each Division I team in each other conference once.
|Conference||vs Hockey East||vs WCHA||vs CCHA||vs ECAC||vs CHA||vs MAAC|
The NCAA tournament consists of twelve teams, divided for the first round and a half into two regionals, East and West. In each regional, two teams receive first-round byes while the other four play on the first night. On the second night of the regional, the two bye teams play the two first-round winners, with the two survivors from each regional then advancing to the national semifinals the following weekend. The selection and seeding process can be divided into the following steps:
The regular season (the fine print) and tournament champions in each of the four major conferences (WCHA, CCHA, ECAC and Hockey East) receive automatic berths, which accounts for between four and eight of the twelve teams. (The MAAC and CHA do not receive any automatic bids.) The remaining four to eight at-large teams are selected according to the pairwise method. There is the one stipulation that each major conference must have at least two representatives in the tourney.
This is one of the places where our understanding of the process is still a little lacking. We know that the committee gives "obvious" at-large bids to teams that win comparisons with the rest of the candidates, then scrutinizes the "bubble" teams by comparing them individually to one another. Usually, the precise mechanics of this process are irrelevant, but in the 1999 selections, there were between two and four conceivable sets of tournament teams depending on how the bubble was pared down. We know that Ohio State and Northern Michigan got the last two bids in that particular season, but there were a couple of different lines of reasoning that could have given that result, and the selection committee hasn't explained which one was used.
Any major conference team which wins both the regular season title in its league and its conference tournament receives an automatic first-round bye; since there are two conferences in each region and two byes in each regional, this will fill between zero and two of those slots in a given region. The other bye(s), if any, are given to the best team(s) in the appropriate region(s) according to a pairwise analysis.
There are now four remaining spots in each regional to fill with the other eight teams. If those eight teams are evenly divided, four from each region, the two better teams in each region play in their respective regionals, while the two lower teams are "shipped out" to play in the opposite region. If there is an imbalance, the bottom team(s) from the over-represented region are placed into the other region before the swap. (snotty aside) However, the host schools (Minnesota in the West and Rensselaer in the East this year) must be kept in their own regions. (the fine print) Also, see "Fine Tuning".
Once the four non-bye teams in each regional are determined, they are placed in the three to six positions according to their pairwise comparisons. The four and five seeds will play in the first round, with the winner to face the one seed, while the three and six seeds will meet for the right to play the two seed.
At this point, we have a setup for the tournament according to the numbers, but there could be other problems with it. For instance, all four first-round contests could be rematches of the conference title games, or the teams with the biggest fan bases could be playing outside of their regions. These are both considered undesirable by the NCAA, so the committee can shuffle things a bit, either by altering the seedings within a region, or choosing to send different teams to the opposite regionals. First-round intra-conference matchups are positively verboten, and potential second-round games between teams in the same conference should be avoided, especially if the teams met in their conference playoffs. If two teams are swapped within a region to eliminate a second-round matchup, the other two teams will be swapped as well to retain the first-round pairings, if that doesn't cause more problems. Also, teams can be shifted to different regions to increase attendance.
This is the one part of the selection procedure which is really a judgement call on the committee's part, and thus the most unpredictable. Ordering teams within a regional is basically deterministic, but when deciding which non-bye teams go in which regional, the committee is supposed to consider
How much weight they give to each is completely unspecified, although attendance seems to be very important, while conference considerations are not a big priority in populating the regionals. The best way to guess what they'll do is to look at historical precedent.
To get a detailed blow-by-blow of how this all works, you can read my description of how the 1998 tournament was seeded, as well as a summary of the seeding decisions from 1996-1998. My prediction of the 1999 seedings ultimately guessed the committee's behavior incorrectly in a couple of places, but it contains a list of alternate possibilities for the dreaded "bubble identification" stage.
Finally, to learn interactively, there's the tournament selection script "You Are The Committee", which also offers a self-serve what-if interface that lets you change some of the results and see how that would have affected the ratings used for tournament selection.