Measuring Skating

IJS attempts to measure the absolute merit of skating performances. Does it succeed?

The results of sporting contest are determined in one of two ways. Generally, in head-to-head competition (individuals or teams) the contestants attempt to score arbitrarily defined points, goals, runs, etc. on offense, while trying to prevent the opposition from scoring points, etc., through some form of defense. Each contestant has an equal opportunity at offense and defense, and the number of ways of scoring points (etc.) is generally limited. In this kind of contest the fact the point values are arbitrary is of no consequence.

For individual performance sports a second method is generally used. Each performance is measured against an absolute standard and the performance which is measured to be the best wins. The performance is often measured by a time (races), a distance (throwing and jumping sports) or a weight (strength sports). In this kind of contest, accurate, precise and repeatable measurements of a completely defined absolute standard are all required for the results of the competition to have legitimate meaning. Otherwise such a contest is a sham, an entertainment dressed up as a competition.

While the 6.0 system uses a system of ranking programs to determine results, the bedrock design intent of the International Judging System (IJS) is that skating programs are instead measured to determine the absolute accomplishment of the programs, quantified using a system of points. As explained by the ISU from day one, these points are meant to be the equivalent of times in a race. The points are meant to measure the technical difficulty and athletic quality of the performances and also the skill, sophistication and effectiveness of the performances in presenting the programs' artistic concepts.

Some critics of IJS have a hard time with that last idea. One cannot "measure" art; and, thus, trying to measure a skating program is a futile effort some say. Which is more artistic, more beautiful, a Brandenburg concerto, a Beethoven symphony, or a Rachmaninoff piano concerto? Which has superior artistic merit, the Mona Lisa, or Guernica? You can't say, and you can't measure it.

Perhaps that is why, by choice or by accident, IJS does not really attempt to measure the artistic merit of a performance. Nowhere in the ISU or USFSA rulebooks will you find the words art or artistic. Nowhere in the rules for the program components does it talk specifically about art. The terms artistic merit or artistic interpretation have not appeared in the rulebooks for over 20 years. The Program Components attempt to measure the skill and effectiveness of the skaters in presenting their artistic visions, but they do not say give a higher score to the skater with the superior artistic vision. That is a big difference, and makes measuring the presentation aspects of programs, in principle, feasible.

The idea that skating performances can be measured is the simple seed from which IJS develops. Getting from that seed to a plant that bears palatable fruit, however, is not so simple. The devil, as they say in system engineering, is in the details; and the development of the details of IJS has been continuously beset by demons for the past eight years, resulting in a system that bears only limited relation to its intended purpose. The system that has been created provides neither accurate, precise nor repeatable measurements, nor does it include a completely defined absolute standard on which to score performances.

Consider the many ways that IJS does not accurately and completely reward the accomplishment of skating performances. Any one of these can cause a program to be under-valued or over-valued from half a point to several points (usually under-valued). Several examples occurring in a single program can skew the measurement of a program by many points.

The jump SOV does not accurately describe the relative difficulty of the jumps. In particular, the 0.5 incremental increase in value for the triple jumps from triple toe loop through triple Lutz is patent nonsense. It is widely held that the true difficulty of the jumps increases exponentially. Under the current SoV, a skater executing a triple Salchow and a triple Lutz, for example, receives the same base value as a skater executing a triple loop and a triple flip. For a more appropriate exponential jump SoV, the Salchow plus Lutz would have higher value. Under the current SoV, a skater can execute an intrinsically more difficult group of jumps and receive equal or fewer base value points than a skater with an intrinsically less difficult group of jumps, and thus will be under-scored for these elements. There are further problems in the jump SOV with the values of triple Axel and the quads compared to the other jumps; and in reality, for all the jumps from 1T to 4Lz to a greater or lesser extent.

Edge calls penalize the skaters since the skaters receiving fewer points than the difficulty of the actual jump on the actual edge executed. Skaters who receive edge calls are over-penalized and thus under-scored for what they actually accomplish on the flutz and lip.

The harsh and abrupt loss of points for downgrades at the 1/4 mark does not accurately measure the gradual shades of gray as a jump is under-rotated from just less then full rotation to 1/2 turn under-rotated. Depending on the severity of the under-rotation, some jumps end up over-scored and others under-scored. A jump that is under-rotate barely less than 1/4 turn and a jump that is under-rotated barely more than 1/4 turn are negligibly different in execution, yet the latter jump receives less than half the points of the former. Failure to model the shades of gray in skating performances is a defect in many aspects of IJS.

Jump sequences are under-valued. Executing several jumps in sequence is more difficult than executing those jumps individually at different times in a program, yet sequences do not get the full base value for the jumps executed. The skaters lose 20% of the base value of the two most difficult jumps in the sequence, and the difficulty of connecting the jumps is ignored.

Deductions for falls over-penalize all jumps less difficult than double Axel, since a fall on double Lutz and below results in a net loss of points for the attempt, compared to attempting nothing. For a fall on the most difficult jumps, a complete failure of the jump due to a fall, still results in the skater getting substantial points -- points for accomplishing effectively nothing. Skaters also get a net loss of points if they fall on some spins, sequences, lift, etc. Depending on the situation, falls can result in a program being under-valued or over-valued compared to what is actually accomplished by the skater.

The SoV for other types of elements all have similar internal inconsistencies as the jump SoV. In addition, the values for the various element types are not consistent relative to each other, favoring skaters adept in one type of element over those in other types of elements. Depending on their relative strengths and weaknesses, some performances will be under-valued and others over-valued.

Elements are zeroed out even when something has actually been accomplished. While extra elements cannot be scored to preserve a level playing field, giving zero value to any of the allowed number of elements when something, no matter how small, is accomplished is incompatible with attempting to make absolute measurements of performances. Even if the credit ignored is only 0.1 point, in some competitions that is enough to make a difference in the placements and the medals.

If two skaters execute a given element scored with features (say a combination spin), and one skater executes one feature while the other skater does not attempt a feature, the two skaters receive the same points even though one skater has executed a more difficult element. Failure to give credit for a single feature results in elements often being under-valued.

The abrupt (quantized) feature model causes content to go unrewarded when partial features are executed. Ignoring shades of gray in features results in many elements being under-valued. For example, a skater could hold four spiral positions for not quite three seconds each for a total of nearly 12 seconds in position and get zero points, as though nothing was accomplished at all, while a skater who held one position for three seconds would get points and outscore the first skater, even though in reality the second skater clearly accomplished less in an absolute sense.

The difficulty of combining multiple features is not accurately captured in the scoring. There is no logic or consistency to the increase in base value when adding features. The base values do not capture the reality that executing certain combinations of features is more difficult than summing the individual features; that is, executing two features (for example) is often more than twice as difficult as executing the two features alone, but this is not rewarded in the SoV. Thus, executing difficult combinations of features is often under-valued.

Arbitrarily defined features and their values causes elements to be incorrectly valued. Features earn points based on the number executed and not their difficulty, even though the permitted features are not intrinsically of equal difficulty. In spins, for example, skaters generally try to pick the four easiest features that get them to level four. A skater who executes four more difficult features to get to level four receives no more more points than the skater who executes the least difficult features.

GoE values are an even greater hodgepodge than the base values in the SoV. Devoid of systematic logic and lacking rigorous internal consistency, it is currently impossible to accurately compare elements of higher difficulty and lower quality with elements of lower difficulty and higher quality. As an example, a level 3 step sequence with a GoE of -3 earns fewer points than a level 2 sequence with the same GoE. The level/difficulty increases, yet the points earned are less! The GoE values also introduce other errors into the scores since increasing the GoE due to positive aspects is based on the number completed in an element, not the intrinsic difficulty/merit of the specific positive aspect. The positive aspects are not of equal difficulty and/or merit, yet they all contribute to the score by the same amount.

Program components offer only a vague measure of the true accomplishment in each skill. Not because the values for the component criteria are incorrectly set in the scoring system, but instead because the values are not set at all, and because the requirements to reach a given score for the excessively complex components are not well defined.

The Transitions component, for example, has four listed criteria (variety, difficulty, intricacy, and quality) and an implied fifth criterion, the fraction of the program outside of elements filled with transitions. How are strengths and weaknesses in these traded against each other numerically? What movements are required to satisfy the criteria for variety, or intricacy? What strengths and weaknesses in the movements contribute to their quality? What is the absolute difficulty of each movement and how does that translate into a numerical value? If a transition movement has the difficulty of a single or double jump, for example, should the Transitions mark go up the 1-2 points that an absolute measurement system would require?

None of the questions have documented quantitative answers. There are no rules that specify how the answers to these questions are turned into numbers which have absolute, consistent meaning from one skater to the next, one competition to the next, one judge to the next. And Transitions has only four listed criteria! Others have eight, some with criteria that are far more subjective than the fairly straightforward evaluation of technical/athletic merit of transition movements.

Due to the many deficiencies in the way IJS attempts to measure skating performances, current IJS scores are only weakly connected to the true absolute merit of performances. If a skater blows out the competition with a 20 point victory, yes, that skater clearly was the best and we can all sleep soundly at night. But even with a three or five point victory there can often be serious doubt that the performance with the higher score was really the superior performance -- and results of skating performances are often determined by margins far less then three to five points. Often they are determined by small fractions of a single point. The 2008 U.S. National Men's Champion was decided based on a Free Skate tie breaker for a tie in total points. Interestingly, if the scores of that competition are recalculated using the SoV modified by the ISU a few months latter, Johnny Weir scores higher than Evan Lysacek and wins the championship. This illustrates the importance of getting the details of the system right, and how even minor defects in determining the absolute merit of performances can significantly alter the results.

Many objected to the ISU approach of 'pass it now, fix it later' adopted when IJS was rammed through the 2004 Congress, fearing problems in the system would never be fixed once it was passed. That fear has been vindicated. The mathematical defects in the system have long been obvious, many before the system was adopted in 2004, others discovered since. For the past six years, however, the keepers of IJS have done nothing more than rearrange deck chairs on the Titanic as they superficially tweak the rules year after year. If I were the person who had pushed for the development and adoption of this scoring system, I would be really pissed by now that my brain-child has been so poorly developed, squandering its potential. Heads would have rolled years ago.

Return to title page