The ‘winter detector’ problem is common in big data analysis. A literal example, via computer scientist Sameer Singh, is the pattern-recognising algorithm that was shown many photos of wolves in the wild, and many photos of pet husky dogs. The algorithm seemed to be really good at distinguishing the two rather similar canines; it turned out that it was simply labelling any picture with snow as containing a wolf. An example with more serious implications was described by Janelle Shane in her book You Look Like a Thing and I Love You: an algorithm that was shown pictures of healthy skin and of skin cancer. The algorithm figured out the pattern: if there was a ruler in the photograph, it was cancer. If we don’t know why the algorithm is doing what it’s doing, we’re trusting our lives to a ruler detector.
I can think of nothing an audience won’t understand. The only problem is to interest them; once they are interested they understand anything in the world.
Social scientists have long understood that statistical metrics are at their most pernicious when they are being used to control the world, rather than try to understand it. Economists tend to cite their colleague Charles Goodhart, who wrote in 1975: ‘Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. (Or, more pithily: ‘When a measure becomes a target, it ceases to be a good measure.’) Psychologists turn to Donald T. Campbell, who around the same time explained: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.
Goodhart and Campbell were on to the same basic problem: a statistical metric may be a pretty decent proxy for something that really matters, but it is almost always a proxy rather than the real thing.