A Test is a Test is a Test.
Or is it?

The controversy swirling around Rule 121 mainly involves the manner in which it came be a rule at all; but also involves what is meant by the word "test." The ISU says it intends to test the proposed system by using it as the official judging system at some Grand Prix competitions next season. This, however, is disingenuous.

The purpose of testing a system is first to prove that it works, and second to prove that it is superior to what it intends to replace. You don't do that by just using it at a competition to see what happens.

Long before you get to a competition, the system needs to be numerically tested. This involves activities such as using the system on evaluations of many previous competitions and seeing if the system gives the intended performance, and conducting numerical simulations to see how the system performs under all conceivable situations. It is currently my understanding the ISU is not doing these types of tests.

In order to test the hardware implementation and the operability of the final system, a complete dress rehearsal of the final form of the system is needed. This apparently is one of the goals of the tests at the Grand Prix next season. This is an unnecessarily risky approach, because if a major hardware problem or procedural problem appears, the competition involved will be a disaster. The prudent course of action would be to set up the complete system in a conference room before then, and bring in a complete panel of officials who watch video tapes and exercise the system in detail (i.e., a complete competitions worth).

The final form of testing is to use the system in competition to see if the system gives the intended performance, and to determine if the actual performance is superior to what it is intended to replace. This test cannot be done by just using the system alone without comparison to some standard for comparison.

In the above discussion, the key thing that makes a test a test is that before the test there is a predicted performance or behavior, and after the test there is a quantitative evaluation of how well predictions were met. In addition, there are pre-determined success criteria, specified before the test takes place, that the results are compared to, to determine success or failure.

The ISU is not taking this approach. By implementing the system next season without side by side comparison with the current system (preferably with secrecy and random selection of judges removed) it will be impossible to determine if the proposed system works as intended or is an improvement over the current method of evaluating skating. In the approach the ISU is taking, the intended performance has not been revealed, there are no success criteria specified to evaluate he performance of the new system, and there is no standard of reference to compare the proposed system to when it is first used. Under those circumstances, following the first use, the ISU can declare any performance of the system a success, so long as the hardware functions trouble free. In reality, however, what one will have in that situation is an untested system of unknown utility. The only thing one can definitively learn from that kind of use is that if the judges push the buttons some numbers will come out that may or may not mean anything.

Operation of the proposed system next season should at best be limited to side by side comparison with the official system, and then only if the necessary homework has been completed before then through other forms of testing. Performance criteria and success criteria must be established before the first operational test of any system takes place. Using the proposed system as the official system next season in any competitions is a risky approach that will be unfair to the skaters, and from which little will be learned about the actual performance and utility of the system.

Return to title page

A Test is a Test is a Test. Or is it?

A Test is a Test is a Test.
Or is it?