This is the first of three articles on the use of computer technology in the judging of figure skating competitions. Part 1 discusses a rigorous approach to investigating the potential role of computer technology in the process of judging competitions, and describes the development of a computer model of the judging process as it currently exists. Part 2 describes many of the specific details of this computer model for the various figure skating events. Part 3 describes a software implementation of the model and includes a downloadable version of the annotation software developed with the model.
In previous articles I have written about various aspects of the proposed ISU judging system and the impact it is likely to have on the character of skating as a competitive sport. The heart and soul of this system is the "point model" -- the mathematical "formula" that says what aspects of skating are rewarded and by how much they are rewarded. Understanding the point model and getting the point model right is the essential part of developing a successful system.
There currently is, and always has been, a point model for skating. It is not written down per se, but exists, nonetheless, in the description of skating in the rules, and in the mind of each judge. The hardest part of training and "calibrating" new judges has always been imparting this point model to new judges so their marks will be consistent with current skating standards and the marks given by the established judges.
The point model for skating is something that has intrigued me for many years because of its complexity and dependence on the characteristics of human judgement. It first motivated research on how the judges use the marks; and over time, this led to efforts to deduce the point model the judges actually use. In the course of this research effort, software tools were written to captured the judging process as it actually exists, something I term Computer Assisted JudgingTM or CAJTM for short.
The purpose of the research described in these articles was to understand the judging process as it currently exists and to determine the extent to which it can be captured in a computer algorithm. Comparing this research effort to the ISU's development of its proposed new judging system is instructive because it highlights the many weaknesses in the ISU's effort.
When attempting to develop any system as complex as a figure skating judging system, three important questions, must be answered before starting work on the detailed implementation of the system:
Experience in many fields has shown that when a complex system is developed ignoring this tried and true approach the chances of a successful outcome are greatly diminished. The ISU effort must be viewed with suspicion right off the bat because it has not taken this approach, and has not adequately answered these question before willy nilly attempting to develop a working system.
Since the Winter Games, the ISU has offered many reasons why it is developing its new system. All of these reasons upon close examination are either specious, patently false or inconsistent. The ISU has not enunciated a clear compelling and believable goal for its new judging system. The goal appears to be the complete revision of the rules and standards of skating, as that certainly is going to be the effect of the proposed system. The ISU, however, has never laid out a clear vision of what this "revolutionary" new form of skating should look like. One must deduce what this new form of skating will be from the point model they describe, by looking at what it rewarded and what it is not. One will never know for sure if the ISU system does what they want it to do since they haven't said what they want it to do. This is an intellectually dishonest and risky approach. If the goal is to change the standards of skating, the correct and open approach would have been to first decide what this new form of skating should be, involving the entire skating community in the process, and only after that begin work towards implementing that vision of skating in a point model and software. The ISU has not involved the general skating community in the development of the proposed system. The ISU's activities are closely guarded secrets held by a small group, most of whom have no experience as judges.
In trying to understand the point model judges actually use, the goal of the research described in these articles was simple; to capture in a computer program the full complexity and subtlety of the judging process for skating as it currently exists, in accordance with the current rules and standards of skating.
The ISU point model greatly oversimplifies the sport of skating. It casts off many aspect of skating that will no longer be judged at all, and significantly reduces the importance of some elements, fundamentally altering the identity of skating as a sport. In terms of this oversimplification, one problem with the ISU point model is not that it attempts too much, but rather that it attempts too little.
The sole justification for employing computer technology in the judging process, I believe, is this: judges can easily be trained to identify the elements and evaluate the quality of each element, but it is difficult to train judges so that all judges combine that information in the same way, putting the same weight on each aspect of a skating performance (i.e., use the same point model). Use of a computer based point model insures that the assessments from all the judges are combined in the same way using the same point model. Most of the spread in placements for skaters in the middle half of an event is due to the judges using different point models. A computer based point model eliminates this source of variation in placements completely.
Another source of inconsistency in the current judging process is that deductions are not uniformly applied by a panel of judges. Some judges take deductions that others do not, and vice-versa. A computer based point model also eliminates this source of inconsistency, if implemented correctly.
In short, the goal of this study was to take skating as it is and determines the point model. The ISU approach picks an oversimplified point model and lets the chips fall where they may.
Having listened to many ISU presentation on the proposed judging system, it is clear the ISU did not begin with a thought out methodology and still does not have one. They are just making it up as they go along, which is a risky way of trying to reach a successful conclusion in a project of this complexity; and even worse, they are ignoring approaches that would help insure a better final product. They are not investigating and comparing alternate approaches that are likely to prove superior. Sadly, the ISU method is to define a point model mostly out of thin air which they hope proves plausible, and then declare victory and force people to believe it works without offering real proof. In my day job, however, we have a saying. Without data its just another opinion. Currently the ISU is only offering opinion.
To research the point model discussed in these articles, the following methodology was used, starting from the goal that the point model should reward skating skill in accordance with the rules as they exist. The actual content of World and Olympic programs were studied over many years to determine how the judges use the marks and what marks they give for various program content. Results from the U.S. National Championships and other USFSA competitions were used to extend the point model down to the Juvenile level. Competition data were also used to estimate the intrinsic ability of a human judge to evaluate elements and to determine the consistency of this ability. From these background studies a point model was developed. Numerical experiments were then used to investigate the performance of the point model under all conceivable situation to insure the point model gives correct results in all situations. This is the process the ISU should have followed, but has not.
If the ISU wants to redefine the sport of skating it has the right to do so. Nevertheless, if the ISU is going to redefine skating, once having decided what skating should be, the development method should then have been tailored towards developing the best point model for that definition, and not the other way round. Because the ISU has not used a coherent method of development that examines and compares all the possibilities based on rigorous research one has no way of knowing if the final product is an adequate product, no less the best product, or even an improvement on the current judging process.
The proposed ISU system involves several new concepts for evaluating figure skating. In order to demonstrate that the proposed system produces correct results at the confidence level required, a number of supporting studies need to be conducted to demonstrate these concepts work. The ISU has little idea with what accuracy and consistency judges can assess elements, little idea with what accuracy and consistency the spotters can do their job, no idea how sensitive the system will prove to quality judgement errors or identification errors, and no idea how sensitive the proposed system will be to misconduct. All of these questions can be answered with properly conduced studies. All of these questions should have been answered before beginning work on the details of the point model. None of these studies appear to be taking place. Without these studies, any claims the ISU makes for its proposed system can only be considered opinion and wishful thinking, and not demonstrated facts.
To test and validate its point model, the ISU has thus far relied on limited data from a few Grand Prix events this season. In looking at past data it has relied almost entirely on a study of Alexei Yagudins long program from last season.
This is a woefully inadequate approach. You cannot calibrate a complex point model that depends on many, many factors based on the performance of a few top skaters. You cannot calibrate using only a few judges, a few Grand Prix skaters, and a small number of Grand Prix events. The point model must be valid and self-consistent for all combinations of tricks, for all combinations of skill levels, for all combinations of errors, for all combinations of quality factors all of which must be tested and verified. This requires a huge quantity of data. More data than the ISU has looked at thus far, or plans to look at. Consequently, there will be a great deal of uncertainty in any point model the ISU adopts.
In math speak, the point model is a multi-dimensional function of many variables whose general form must first be determined. The many parameters that define the function must then be fit (determined) by comparing the model to actual data. The point model has hundreds of degrees of freedom. For each degree of freedom you need at least one data point to determine the point model, assuming the data is error/noise free. In the presence of noise you need several data points per degree of freedom. Using this process one not only determines the value of the parameters that define the point model, but also the uncertainty in the parameters, and thus the uncertainty in the point model. The variation in the results predicted by the point model due to the uncertainty in the point model must be significantly less than the expected difference between places in actual competition if the point model is to be considered valid. Complicating this process is that fact that some tricks are performed so infrequently, little data is available to accurately determine how they should be incorporated in to the point model, other than by analogy to other tricks.
If you used every assessment from every judge for every skater at every senior level ISU competition in one season you begin to approach the amount of data needed to accurately nail down the point model, at least in part. The ISU has not devoted near this level of effort, and has no intention of doing so. Note also, the ISU can only determine the point model in part by studying junior and senior level events. Since most programs at those levels do not include most of the lesser difficulty jumps and jump combinations they have no way of calibrating the point model for those elements. And finally more fundamentally, it is not even clear the ISU has looked into whether it has the correct mathematical form for the point model in the first place. Again, when the ISU is done, without data it will just be an opinion, and the validity of the point model will mostly be just a guess.
There are several ways of testing the point model. One is to take the point model and apply it to past competitions and see how it performs. Another is to run numerical simulations to test how the point model performs in all conceivable circumstances. A third is to conduct side-by-side tests in which test competitions are judged by two panels, one using the current system and one using the proposed system. At this time it appears the ISU will not be conducting any of these tests, and according to a reliable source the third approach, side by side comparison, has already been explicitly rejected outright by the ISU.
Impartial and unbiased testing of the type just described is essential to determining if any system actually works or not. In the research activity described here, the point model developed was tested by comparison with previous competitions and through numerical simulations. The ISU point model, on the other hand will not be tested or validated in any meaningful way to demonstrate it is a valid way of judging figure skating.
The CAJ point model developed in this research effort is based on more than ten years of data from ISU senior level competition, and lower level USFSA competitions. It was normalized to maintain consistency and tractability back to the previous 6.0 point number system to allow for historical comparison, and ease of understanding by the public and others in the skating community (skaters, parents, coaches and judges).
The CAJ and the ISU point models are fundamentally different. The ISU uses a primarily linear model with linear increments for quality points while CAJ uses a non-linear model with fractional quality factors. The ISU uses a system of plus/minus 1, 2 or 3 points for quality while CAJ uses a system of plus/minus fractional scaling factors. The ISU point model introduces endless logical inconsistencies that are not present in the CAJ point model. The CAJ point model also does not have the problem inherent in the proposed ISU system where elements worth less than three points with a quality of 3 will result in negative points. The CAJ point model better conforms to they way judges actually mark at all levels than does the ISU point model -- but that should not be surprising, since that was the goal in determining the CAJ point model in the first place.
The ISU approach overlooks the fact the quality points and the base mark points must be calibrated in a self consistent way for the point model to make sense, and also mixes up quality points and base mark points. Because the ISU point model oversimplifies the complexity of skating, tricks of different intrinsic difficulty in many cases will have the same base value. The judges could use the quality points to bring up the base mark, but even so a more difficult trick done at +3 quality will still go underscored.
The CAJ point model was set up to be self-consistent with respect to trick type, difficulty and quality, and to never mix base mark points with quality points because it was set up from the start to capture the full complexity and variety of skating elements. The CAJ point model does not include the "falling down points" present in the proposed ISU system, since elements with major errors do not receive credit under the current rules of skating and the goal was to conform to the current rules of skating.
If an element has a major error in it, under the proposed ISU system it receives the base mark minus three quality points. In the CAJ point model it is marked as "Failed" and receives no points. For example, jumps are currently considered failed in the event of a fall, 2 foot takeoff or landing, ½ rotation or more on the ice, or putting a hand or foot down to keep from falling down.
The CAJ point model was constructed to includes all common and uncommon tricks and allows for newly invented tricks. The ISU system does not and, thus, puts skating innovation in a straight jacket. The CAJ point model recognizes that there are thousands of possible jump combinations and sequences (consisting of two or three jumps) and hundreds of thousands of possible combinations spins when you include variations within the basic positions. The CAJ point model was set up with the flexibility to assign a different base mark for all possible jump combinations, jump sequences, spin combinations and footwork sequences (and lift combinations in pairs and dance). That level of complexity may not be necessary, but illustrates how simple minded the ISU approach is, limiting all spins and footwork sequenced to three levels of difficulty and three base marks.
In regard to program content, the CAJ point model incorporates the current definitions of a well balanced program and does not diminish the value of spins and footwork as does the proposed ISU system.
Unlike the proposed ISU system, the CAJ point model retains the concept of a perfect program. In both the CAJ point model and the proposed ISU system there is a maximum number of points available in the second marks and in the auxiliary technical marks. In the CAJ point model the maximum for the second mark remains 6.000, consistent with current standards (CAJ calculates scores to three decimal places in both the technical and presentation marks).
Normalizing the first mark is more difficult, because the technical standard for a 6.0 has evolved over time. A fixed point model cannot evolve in that way, consequently for the first mark, the CAJ point model is normalized so that the most difficult programs done today (mens program for singles, gold medal programs for pairs and dance) if executed perfectly would earn a 6.000. Less difficult programs earn lower marks and more difficult programs in the future would earn higher marks.
In the CAJ point model, a performance that earns the maximum number of technique points possible for that program content is reported as a "technically perfect program." To be designated a technically perfect program a judge would have to award +3 quality for every trick, take no deductions, and give maximum credit for each of the auxiliary technical marks.
In the course of working on the CAJ point model the question of the extent to which human judging can be entirely eliminated from skating competitions naturally came up. Although it may sound over-the-top, the potential exists that an artificial intelligence machine vision (AIMV) system could be developed in the next 6-10 years that would greatly reduce the role of human judges in competitions. This technology could already take over some of the judging chore.
The CAJ point model was structured to makes partial use of this technology. In the CAJ point model the judging of certain aspects of a performance could potentially be done automatically by an AIMV system. Speed of skating, variation of speed, and use of the ice, can all currently be evaluated using an AIMV approach in accordance with the way human judges currently evaluate those aspects of a performance. Under the proposed ISU system these factors remain subjectively marked by the judges. If the ISU really wants a really "revolutionary" system that is free of human foibles it should study the potential application of this technology in the development of its judging system and develop a research program to expand its scope.
Despite unsupported declarations of success and victory, the development of the newest ISU judging system thus far is an unmitigated disaster. It starts with unclear goals that are detrimental to the future of skating and to the safety of the athletes. Its development uses an ad hoc methodology that ignores the true complexity of skating and ignores other potentially superior approaches. The accuracy of the point model is not being adequately tested and the entire process is not taking advantage of the safety-net offered by using outside peer review. Adherence to an artificial and unrealistic schedule despite comments to the effect they will "take the time to get it right" does not bode well for a successful final product. At best the ISU will come up with "a solution", but it is unlikely it will be even an adequate solution, no less the best solution.
In studying the CAJ point model, it was found the current judging process, and rules and standards of skating, can almost entirely be captured in a computer based point model with only minor departures from current standards and practices. In Part 2, some of the details of the point model specific to each skating event will be discussed.
Copyright 2003 by George S. Rossano