A Reader Writes

I'm not a statistician -- my background is more in mathematics -- but I do have a question/possible problem with your statistical analysis of absolute point models.

Your example says, "Let us say for the moment that the point system will be set up so that the champion program will earn about 180 points and the last place performance about 90 points." You correctly point out, "in our example, a range of about 90 points spans the results from first to last."

But then you say, "If a single judge can only judge to an absolute accuracy of 5% at best, that corresponds to 9 point[s] on a 180 point scale." Here is where I lose you. The range on this scale is 90 points, isn't it? I mean, what's important isn't that the scale tops out at 180, but rather that the whole thing has a 90 point range. 90 - 180 is the same as 0 - 90, no?

In which case, the 5% error is just 4.5 points. Divided by 3 for a 9-judge panel gives an error of 1.5 points. Which is, assuming 3 points between skaters, a one-half place error. While that isn't as spiffy as the one-third place error you calculated for top spots under the current system, it certainly isn't the entire place you calculate it out to.

(I also have some questions about your underlying assumptions. Like when you say that under the current system, statistical analysis shows an uncertainty of one place in specifying top and bottom skaters "in a group," I'm wondering what size group. The comparison that you're doing to the 30 skaters/90 point scale obviously assumes a "group" of the entire competition, i.e., 30 skaters. But if the current analysis is talking about a smaller "group" of skaters -- e.g., like when skaters are broken down into groups of 6 or so, the comparison HEAVILY favors the absolute point model. Because then we'd be talking about 18 points TOTAL (6 skaters, separated roughly by 3 points each), with a 5% error for each judge of .9 points, dividing by 3 for a nine-judge panel gives an error of .3 points. When there are 3 points between skaters, an error of only .3 points is one-TENTH of a placement -- WAY less than the one-third of a placement error in the current system.)

I enjoy reading your perspective on the proposed scoring system changes. I just want to make sure you've got the math right.

Sharon

You are correct that there are some assumptions that I didn't point out, mainly because when you start talking math most people's eyes glaze over and then roll up into the tops of their heads.

As I hope was clear, we really don't know how well judges can mark on an absolute scale (i.e, what accuracy and what consistency is achievable) and thus I am assuming in skating the judges can do no better than what has been demonstrated in the past in other situations and other environments - typically 5-10% accuracy at best. For a "typical" case I use 5%.

For a 5% accuracy I claim that any point value a judge comes up with can typically be expected to have an uncertainty of 5% the value. That would be 9 points near the top of the scale and 4.5 points near the bottom, and 6 points on the average. If you take 5% of the range of marks (from 90 to 180 points) you are really talking about the accuracy to determine relative placement, but that is not what the judges are being asked to do. Independently of any other performance they are being asked to come up with a number on some scale that is accurate and consistent over all time. In that situation each "measurement" is independent of the others and subject to its own 5% uncertainty on the point value, either up or down.

For example, say two skaters deserve points of 100 and 105 points according to some absolute truth (which we can never know and is sometimes called "God's Truth" in the world of system calibration for that reason). If you ask a judge to specify a point value for each independent of the other and the accuracy of the assessment is plus/minus 5%, then the uncertainty in each value is plus/minus 5 points. If you ask the judge to specify the point difference between the two, and not the point total, then you will get a much better result, a few percent of 5 at the most, or less than one point. That is why the current approach is better than what is proposed, the current approach is a relative comparison system which can be carried out more precisely than an absolute assessment.

In this example, if the assessments are truly independent and the errors are random, the first skater may end up with 105 points (or more) the second skater may end up with 100 points (or less) some fraction of the time, resulting in reversed placements that fraction of the time.

While my previous article mainly talked about the typical case and some worse cases, it is reasonable to ask, as you do, what about the best case. Does the ISU have a prayer?

If, as you suggest, it might be better to say the accuracy of each point total assigned is plus/minus 4.5 points for all skaters regardless of point total, where does that leave you?

For nine judges, as you point out the uncertainty will be half a place for 3 points between skaters. That sounds not too bad, but corresponds to a confidence level of 65%, meaning about 1/3 of the placements will be wrong. Further, the ISU will be only using 7 judges at the most, which makes the uncertainty 0.6 places and you are approaching a confidence level of only 50%. So even in your example we are approaching 50% of the placements being incorrect.

Your last paragraph also is basically pointing out that you do better with relative placements and scores than absolute placements. That is, in your example you really are using the numbers in a relative sense, not in an absolute statistically independent sense. I think what your example again illustrates is that when a judge does relative assessments within the group you get a fairly nice result, but that is not what is being proposed. In addition, it may be true for the free skate that the last warm-up usually has the six best skaters, but that is not always true.

If you follow skating closely, I am sure you have seen examples where a skater in the last warm-up has fallen out of the top ten, in which case there would be a larger point spread to keep track of. Further, for the short program (and compulsory dance and original dance) both the best and worst skaters could be in the same warm-up. Sometimes the best skater skates first, and the second best skater skates last in these events. In these cases the judges will be asked to judge on an absolute scale, consistently applied through 30 skaters, each assessed independently of the others. It is not clear this is possible and it is incumbent on the ISU to prove it can be done.

Another point your last paragraph hints at is that the statistics depends on the size of the competition to some extent, and also on the skill level of the competition. The numbers I quoted in the previous article on the spread of the judges marks is for large championship competitions; basically major competitions with 18 or more competitors. For lower level events of that size (novice and below) the spread in the judges marks is even greater that I quoted. For small events (no more than six skaters total) the spread in the marks is generally much less. My focus in these discussions, not surprisingly, is on the large championship events, since all the fuss and the main priority is about getting the Olympics right.

Return to title page