CoP - The Ugly

These are the worst characteristics of CoP.  Any one of them is reason enough to reject the system.  Some of them have simple solutions, though they are solutions not likely to be found palatable by the ISU.  Others are such nasty technical problems it is unlikely they could be fixed before the Congress in June, if at all.  

Anonymity.

The advertised purpose of anonymity is to reduce misconduct.  There is no evidence, however, that it accomplishes that, and there is plenty of reason to believe it can have only limited effect.  So many forms of error and misconduct have nothing to do with external influences that anonymity does little more that sweep problems under the rug.

The idea of anonymity really seems to be "out of sight, out of mind."  Hide what the judges are doing from the public and the public will not be able to speak out when misconduct occur, because there will be no way for the public to even guess that it has occurred.

When the ISU president says that the ISU will not allow what happened in Salt Lake City to occur again, I don't think he means a repetition of misconduct -- which is what at first one might assume "what happened" means.  I think he means a repetition of the public relations nightmare, his personal discomfort in having to face 400 snarling media in press conferences, and having to defend his organization in front of the IOC yet again.  Place a cloak of secrecy on the judging and all his problems go away.

The bottom line, however, is that the public will not have confidence in the integrity of the results so long as secrecy remains.  Without public confidence, attendance suffers and TV ratings suffer.  And when attendance suffers and ratings suffer, income to the ISU and the national federations suffers.  It is a straight line from secrecy to reduced funding of skaters and skating, and the overall financial health of the sport.

Random results.

Under CoP one starts with 14 judges, but after so-much hocus-pocus only five judges' marks are used to determine results.  Due to the random selection of judges (which has no demonstrable value of any kind) the five marks that go into the results need not reflect the overall view of the panel as a whole -- and it is the view of the panel as a whole that is always going to be the most accurate determination of who should win.

Random selection of the judges has been used since the 2002/03 season in the so-called "interim system" and in CoP.  During the past two seasons there have been many cases where medal results were contaminated by random selection of the judges. These include seceral medals at the Grand Prix the last two seasons, a gold medal at 2003 Worlds, and medal results at the recent Four Continents Championships.

Any scoring system that assigns placements based on a flip of a coin is an insult to the skaters, the coaches who train them, and the fans from whom the financial health of skating ultimately derives.  Any scoring system that uses random selection of the winners (as well as all the other places) should be rejected out of hand, no matter what other redeeming qualities it might possess.

Statistical accuracy.

Big problem.  There isn't any.  Or more correctly, there isn't nearly enough.

In any complex activity judged by human beings there will always be honest differences of opinion and honest errors of judgment.  The only way to get an accurate decision for whatever is being judged is to combine the opinions of more than one person.  The number of judges required depends on the limitations on a single person getting the "right" answer.  The more uncertainty when one person tries to get the right answer, the more people you need to include to get a meaningful result.  The mathematical laws that govern this are unambiguous and well established.

The ability of one person to get the right answer is limited by human nature and human perception.  When attempting to judge to an absolute standard humans only get things right to within about 15%.  For some things human judgment is known to be as poor as 25%, and in rare cases as good as 10%.  Relative judgment, on the other hand, is known to be accurate at the 1-2% level, and sometimes better.  Typically, relative judgment is 10 times more accurate and reliable than absolute judgment.

In the 6.0 systems of judging, the judges use relative judgment.  Analysis of the judges' marks show that with a nine-judge panel, the statistical uncertainty in any one place due to random errors of judgment is 1/3 of a place.  This means a difference of any one place has a 95% confidence level.  It means that random errors in the judging, at worst, results in only one place in 20 being statistically in error.

Analysis of the judges marks in the Grand Prix shows that to achieve a 95% confidence level (that currently exists in the 6.0 systems), marks between successive places must exceed 1-1.5 points.  Under CoP, skaters with point differences of less than 1 point are statistically tied.  CoP, however, calculates points to the nearest 0.01 point, which is mathematically absurd.

At the Grand Prix about 1/4 of the skaters received scores within 1 point of another skater.  Thus, in CoP about 1/4 of the official placements are statistically meaningless.

Within the framework of CoP there are two choices to correct this situation.  One is to increase the number of judges.  Unfortunately, to obtain a 95% confidence level for a 0.1 point difference would require increasing the number of judges by a factor of 100, and to get 95% confidence at the level a 0.01 point difference would require increasing the number of judges by a factor of 10,000.  Neither of these choices, of course, is practical.

The second choice is to accept the statistic limitations of the system and to round scores off to the nearest 1 point.  This will result in about 1/4 of the skaters being tied; an unpleasant situation, but the most realistic one that is consistent with the statistical properties of the system.

A third choice is to abandon absolute judging and somehow incorporate relative judging into CoP.  This, however, would be difficult to do and also lies outside the philosophical framework of CoP, which would make it unappealing to ISU management.

Values of elements are artificial.

The point values in CoP qualitatively follow the generally accepted order of difficulty for individual elements, but quantitatively are completely arbitrary and inconsistent among the different types of elements.  They are not based on previous results and experience in judging events of different levels of difficulty.  They are not based on any bio-kinetic or physiological analysis of the intrinsic difficulty of the elements.  They are arbitrary numbers pulled out of thin air.  Change the point values for a few elements by one-tenth of a point here, another tenth of a point somewhere else, and the calculated scores of the skaters move around and the order of finish changes a significant fraction of the time.  In math-speak, the results of the scoring calculation are extremely unstable with respect to small changes in the point values for the elements.  This says that the reliability of the scoring algorithm is extremely low.  All that can be said for CoP is that the best skaters will end up more or less at the top and the worst skaters will end up more or less at the bottom, but the confidence in the individual placements is extremely low, making the individual results nearly meaningless.

The point values need to be thoroughly revised, and placed on a sound technical/physical footing.  To do this first requires undertaking a serious analysis of the intrinsic difficulty of the elements.  

Insufficient objectivity.

At the outset CoP was put forward as a completely objective system that would eliminate the ability of judges to manipulate the results.  The reality is that objectivity in CoP is limited to 0% to about 25% at best, depending on the event segment.  This is no better than the current system, which leaves plenty of opportunity to play games with the marks and place skaters where they wish.  The scoring of skating can never be completely objective due to the artistic component of the sport.  Nonetheless, a goal of a minimum of 50% objectivity should be achievable, and any point based system that does not reach that goal should not be used until it does.

Additive point model instead of multiplicative.

This is the second great mathematical blunder of CoP; the first being its lack of statistical accuracy.

In the long distant past, figures for a time where judged by taking a difficulty factor for each figure and multiplying it by a quality factor for execution (on a 6.0 scale) to determine the points earned for executing each figure.  Points for each figure were then summed.  Using this multiplicative process for each figure, the relative values of the figures remain the same for all qualities of execution.  For example, if one figure is rated intrinsically twice as difficult as another, then when those two figures are executed with the same quality factor the more difficult figure always earns twice as many points (in this example) as the other -- for all quality factors.  This is the only sane and logical choice.

CoP uses an additive point model.  For example, the triple jumps cover nearly a factor of two in base value for all the triple jumps, but the quality factors for each jump are all the same, -3 through +3.  Thus, the importance of a triple Axel compared to a triple toe loop depends on the quality of execution.  That is, the value of the triple Axel relative to the toe loop is different if both jumps are executed at quality -3, vs. -2, etc.  This means that there are double penalties and rewards built into the point model depending on the elements executed and the quality of execution.

The solution to this is straight forward, but messy for the programmers -- change all the quality of execution factors to a series of percentages of the base values.

Quantity trumps quality.

Intrinsic to any simple point based system, is the problem that often quantity is more important than quality.  The result of the men's event at the Grand Prix Final demonstrates this point nicely.  On a per jump basis, Plushenko had better jumps than Sandhu.  Plushenko was also superior to Sandhu in three of the four other aspects of skating judged, and equal in the fourth, and yet Plushenko still lost.

One way to reduce this problem is to adjust the relative importance of the five general aspects of skating judged, but this alone cannot eliminate it.  Alternate ways of allowing a skater with a lesser number of elements of higher quality to win, however, do exist.  For example, one could say that eight jump elements may be attempted, but only the six elements with the highest point values will count towards the point totals.  Another approach is to prorate the point totals for jumps and spins by the number of jump and spin elements actually executed.

Both these choices would be consistent with the concept still in the rules that "a fall is no bar to winning."  Currently under CoP, unfortunately, a fall most definitely is a bar to winning.

SO many marks!

It is generally accepted that the evaluation of skating could benefit from the use of more than two marks, but is it really necessary to have 33 numbers go into determine the results of an event!  Sometimes all that one gets from adding complexity to a problem is more complexity.

The use of the five program component scores illustrates this.  In general, these five subjective marks end up being pretty much the same for each competitor.  Analysis of marks from the Grand Prix shows that the vast majority of the time there is no statistical difference between the five scores given by the judges.

One way of viewing this is that the judges are not yet trained well enough to use these marks correctly.  But the judges have had months to work with CoP and have hundreds of pages of documentation to explain how to use the marks.  So maybe the problem is, it is humanly impossible to score these five marks independently.

Scores for the five program component marks look very much like the judging of dance events, where competitors tend to end up with more or less the same order of finish in each dance, and the some order of finish from each individual judge.

If the judges are incapable of using the marks independently then the number of marks should be reduced.  There is no reason why basic skating skills and transitions cannot be combined together into a single mark, and why the three presentation components cannot be combined into two, leaving a total of three program component scores which would be more judgable than the current five.

Insurmountable leads in the short programs.

Under the current 6.0 systems, the short program has 1/2 the value of the free skate.  CoP attempts to retain this balance but fails.  The 6.0 systems are like playing two football games, where the first game has half the value of the second game in counting wins and losses.  CoP is like playing two football games where the first game is played for 30 minutes and the second game is played for 60 minutes and the scores are added together.  These are not equivalent things.

Under the 6.0 approach, any of the top three skaters in the short program can always win the competition by winning the free skate.  Under CoP, it is so easy to build up an insurmountable lead in the short program such that even if the second place skater in the short program wins the free skate they still lose overall.  This gives far too much importance to the short program compared to previously.

Lack of testing and validation process.

Most of the problems with CoP can be traced to the lack of a rigorous testing and validation process prior to use of the system this season.  The failure to use well established engineering approaches for the development of CoP can be traced to the lack of system engineering expertise among the CoP development team, and a desire to get it done as quickly as possible regardless of the consequences.

At the Four Continents Championships, the ISU stated that many changes will be made to CoP in the near future.  That is encouraging, but they also said after making numerous changes there will no testing of the changes made, and the system will be considered ready for immediate use.

This is like your knowing that your car is not running particularly well before starting an important cross-country road trip -- say across Death Valley.  You take the car to your mechanic and he tells you he thinks he knows what the problems are.  He works on it for a while, but when he gives it back to you, you don't drive it around first to see if any of the problems really were solved.  You just go off on your trip, fingers crossed, hoping nothing goes wrong.

This approach to revising CoP shows yet again that the CoP development team doesn't have the slightest glimmer of understanding of basic system engineering principles, and is primarily concerned about ramming something through the next Congress without taking pains to first insure that it is correct.

Return to title page

Copyright 2004 by George S. Rossano