Motivated by the results of the Men's event at this year's European Championships, earlier this year ISU President Ottavio Cinquanta asked the Technical Committees to review the current scoring system and to look into ways of "improving" it. The ordinal system is criticised by Cinquanta because it allows place switching for skaters who have already performed as the interim results are calculated and displayed. In addition, he has described it as too difficult for the public to understand, which he feels leads the public to question the integrity of the results.
At the upcoming Nebelhorn Trophy (Oberstdorf), a revised method of calculating results will be tested. This system, known as "OBO" (for "One-by-One"), is intended to eliminate place switching as interim results are calculated. The method works as follows:
The judges assign their marks for each performance as in the current ordinal method. The total marks (with tie breakers) for each skater are then compared to the marks for every other skater on a one-on-one (or head-to-head, if you prefer) basis. In each head-to-head comparison the number of judges favoring one of the two skaters over the other is counted, with ties counted as being in favor of both skaters. When a majority of the judges (5 or more on a 9 judge panel) favor a skater in the head-to-head comparison a "win" is counted for the two performances compared.
The order of finish is determined by the number of wins scored be each skater; i.e., the skater with the largest number of wins places first. If two skaters have an equal number of wins the the total number of judges in favor of the skaters is used to break the tie. If the total count of judges in favor is the same, the skaters are tied.
Consider now the following set of ordinals for a four skater competition. [Note: Although OBO does not use ordinals, comparing ordinals and total marks with tie breakers are mathematically equivalent and simplifies comparison of the two systems.]
J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 | |
Skater A | 3 | 2 | 4 | 3 | 3 | 4 | 2 | 3 | 3 |
Skater B | 4 | 3 | 3 | 4 | 4 | 3 | 4 | 4 | 4 |
Skater C | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 1 |
Skater D | 2 | 4 | 1 | 1 | 2 | 2 | 3 | 1 | 2 |
Now compare every skater to every other skater. The following table shows this comparison as carried out under OBO. For each skater in the following table read along each row to see how that skater compares to the other skaters.
Skater A | Skater B | Skater C | Skater D | Wins | Judges in favor | Place | |
Skater A | n.a. |
7 judges favor A 1 win for A |
0 judges favor A 0 win for A |
2 judges favor A 0 win for A |
1 |
9 |
3 |
Skater B | 2 judges favor B 0 win for B |
n.a. |
0 judges favor B 0 win for B |
1 judge favors B 0 win for B |
0 |
3 |
4 |
Skater C | 9 judges favor C 1 win for C |
9 judges favor C 1 win for C |
n.a. |
6 judges favor C 1 win for C |
3 |
24 |
1 |
Skater D | 7 judges favor D 1 win for D |
8 judges favor D 1 win for D |
3 judges favor D 0 win for D |
n.a. |
2 |
18 |
2 |
Note that the above result is identical to that determined using the majority principle in the current ordinal method.
The value of the OBO method, according to its proponents, is that place switching is virtually impossible. Its critics would argue, however, this occurs at the expense of OBO giving the wrong answer in close competitions. Proponents also claim this system is more understandable and more manageable than the current method for manual calculations. Others, however, might disagree. It is difficult to image, for example, manually calculating results in real-time using this method with the same ease many spectators currently do using the majority principle.
Going back to our four skater example consider the following set of ordinals using the majority principle to determine place.
J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 | Majority | TOM | TO | Place | |
Skater A | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 4 | 8/2 | 12 |
16 |
2 |
Skater B | 2 | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 1 | 8/2 | 12 |
15 |
1 |
Skater C | 3 | 3 | 3 | 3 | 1 | 3 | 3 | 3 | 2 | 9/3 | 3 |
||
Skater D | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 9/4 | 4 |
Assuming the judges give the same ordinals as above regardless of skating order, if Skater C skates after both A and B, the interim results will show Skater A ahead of Skater B (with 5 firsts); but after Skater C performs the results of A and B will reverse. Using OBO, however, Skater A will be placed ahead of Skater B regardless of starting order. There will be no place switching.
The reason for this difference is the following. In the majority method, Skater A has a majority of firsts over Skater B before Skater C performs. However, because Judge 5 places Skater C first, Skater A loses his majority of firsts and neither A nor B end up with a majority of firsts after Skater C performs. Further, because of the difference of opinion among the panel, both Skater A and B end up with eight seconds, as well as a tie in the Total Ordinals of the Majority. The final decision comes down to the difference between the one third place mark for Skater B and the fourth place mark for Skater A. At a more basic level, if we assume that all nine judges opinions are equally valid a simple average of the ordinals would say that Skater B should win (which is basically what the Total Ordinal tie breaker does).
Using OBO, Skater A wins because five of the nine judges placed A ahead of B. The ordinals/total marks used to place A ahead of B are irrelevant. The fourth place mark from Judge 9 is irrelevant, and the difference between that mark and the third place mark from Judge 5 is also irrelevant. OBO reduces the decision to a simple majority vote of the panel for A over B.
OBO's approach of numerically ignoring differences in the marks that separate the skaters by several places is an extension of the idea behind the use of ordinals instead of total marks.
When describing the ordinal method it is typically pointed out that judges marks on somewhat different scales when assigning their marks, and that as an extreme example one judge might score an event using marks between 5.0 and 6.0 while another judge might use marks between 1.0 and 2.0. The use of ordinals compensates for this possibility. This example, however, is an oversimplification. First, judges don't actually give marks that differ from each other by that much. More importantly though, if the only differences between the judges' point scales were constant numerical shifts for one judge to the next, a simple average of the marks would work just fine. What the use of ordinals does, instead, is eliminate the variations from judge to judge in the point differences assigned between each place. In other words, it eliminates the numerical significance of one skater beating the next lowest place by 0.1 or 0.2 or 0.3, etc.
For example, suppose five of nine judges each place Skater-A 0.1 points above Skater-B, while the other four place Skater-B 0.2 points above Skater-A. Numerically, Skater-B should win, but in using ordinals the actual numerical differences between the places is irrelevant and all that matters is that Skater-A has a five to four majority over Skater-B.
OBO carries this idea one step farther and ignores the point differences assigned between several places. In terms of ordinals, OBO says that not only should the numerical differences in the marks assigned the skaters not be trusted, the numerical differences between the ordinals should not be trusted either. [More accurately, the idea behind OBO is that numerical differences in total marks which result in place differences of several ordinals are no more believable than differences in the marks that result in only one place difference.] While there is some statistical evidence supporting the former assumption, the latter view is probably a bit extreme. Nevertheless, going back to the four skaters in our example above, if you believe that the actual numerical values of the ordinals are meaningful then OBO gives the wrong answer in a nearly evenly split panel. If you believe the actual ordinal values are not meaningful then OBO gives a right answer; though not, perhaps, the right answer.
Another approach to looking at the value of OBO is to subject it to the same kind of statistical analysis described in the May issue of ISIO for several other scoring methods. We have tested OBO in the same way and for the same scenarios as described there. OBO was tested with small, moderate and large random errors; with small systematic biases from 1 through 3 judges together with small and moderate random errors; and with large systematic biases from 1 through 3 judges together with small and moderate random errors. With random errors only, and with small biases with random errors it was found that OBO statistically performs nearly identically to the current method in terms of overall accuracy (within 1-2%), but generates ties about 2-3 times more frequently (up to about 6% of the time). For large systematic errors with either small or moderate random errors, OBO gives results that are 15-20% less accurate than the current method. No cases were found where OBO performed significantly better than the current method, nor did OBO perform as well as the Median Mark with Tie Breakers method or the Median Range method, both of which perform better than the current method.
We also tested the assertion that OBO gives the same results as the ordinal method most of the time. For various scenarios involving small, medium, and large random errors we compared the results using OBO and the ordinal method for the same sets of marks. For each scenario tested, groups of 1000 synthetic competitions were computed several time. For best case judging (small random errors), OBO and the ordinal method produced identical sheets 89 to 92% of the time for the scenarios tested. In terms of the placements of the skaters, 1-3% of the skaters were placed differently in the two methods. For typical judging situations (moderate random errors), 64% of the sheets were identical and 12.4% of the skaters were placed differently. For worst case judging (large random errors), 32 to 36% of the sheets produced identical results with 15 to 28% of the skaters placed differently in the two methods.
In summary:
Note-1 added after initial posting.
Following the posting of this article, Sandra Loosemore provided some examples that illustrate some of the issues discussed above. (The following comments are our's, not Sandra's.)
The proponents of OBO only claim that place switching is less frequent using OBO compared to the current system, not that it entirely eliminates it. The following case shows an example of ordinals that can lead to place switching under both OBO and the current system.
J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 | Majority | |
Skater A | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 7/2 |
Skater B | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 1 | 1 | 6/2 |
Skater C | 3 | 3 | 3 | 3 | 1 | 1 | 1 | 2 | 2 | 5/2 |
If Skaters A and C skate before B, Skater C will be ahead of Skater A in the interim results, but after B skates, A and C will swap places using both OBO and the current system. In addition, Skaters B and C will end up tied with 1 "win" each, and 8 judges in favor of each using OBO. Skaters B and C are not, however, tied using the current system (although it is interesting to note that using both the median ordinal and the average ordinal B and C would be considered numerically tied).
Now add a fourth skater to the competition and have only one judge favor Skater D over C.
J1 | J2 | J3 | J4 | J5 | J6 | J7 | J8 | J9 | Majority | |
Skater A | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 7/2 |
Skater B | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 1 | 1 | 6/2 |
Skater C | 4 | 3 | 3 | 3 | 1 | 1 | 1 | 2 | 2 | 5/2 |
Skater D | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 9/4 |
If Skaters A and C skate before B and D, Skater C is ahead of A in the interim results but ends up third in the final results using both OBO and the current method. By losing one judge to Skater D, Skater C is no longer tied with B using OBO.
None of the above combinations of ordinals are particularly far-fetched and thus show that while OBO may produce place switching less often (and we don't know how less often) it still will occur for not-unusual cases of split panels. The first case also illustrates how it is that OBO produces ties more often than does the current method.
We thank SL for her input.
Note-2 added after initial posting.
The following additional comments are based on information provided to us by Steve Hazen following the initial posting. In addition the text of the article has been edited to add the results of some calculation SH suggested we include.
As was the case here, most descriptions of OBO are presented in terms of comparing ordinals. This is the easiest way to explain the method and at the same time it allows a comparison of OBO with the ordinal method to see its similarities and differences. In terms of actual implementation, however, the accountants will use the total marks directly (with the standard tie breakers on the first or second mark) to determine the results. Ordinals will not be seen, or published in the protocol; they do not officially exist under OBO. Those curious to see the ordinals or to compare the results to the ordinal method will have to go back to the individual marks and recompute the results. The calculation of results under OBO using total marks or ordinals are mathematically equivalent.
One practical issue involving OBO which we only briefly hinted at is the time it takes to manually calculate the results. The ISU wants a system that is not excessively burdensome to use manually and still allows allows the required paper trail to audit the results. According to SH, previous tests of OBO using two accountants and two assistants required 2-3 hours to compute the final results. Apparently part of the Nebelhorn test is to improve upon that situation.
Another issue relating to OBO, which we did not consider, deals with judges' accountability. Current practice and ISU regulations makes use of the judges' ordinals to identify "problem" placements. For this issue, first there is the short term problem of how to document the judges performance using ordinals under a scoring system in which ordinals are not supposed to be computed. Then there is the long term question of what really is the best method to use to gauge judges' performance under OBO. It is not clear that a simple comparison of placements is the best way to go, but any departures from using ordinals for that purpose will require changes to ISU regulations.
We thank SH for his input.
Note: Following the posting of the statistical analysis in our May issue, an interested reader suggested we note our credentials for the benefit of those who are curious about our qualifications to write about such matters . We add here, then, that the editor is a PhD research scientist with roughly 25 years experience in data analysis and other forms of number crunching.