Author: Gary SmithGary Smith

01.11.19

09:00 am

The Exaggerated Promise of So-Called Unbiased Data Mining

Nobel laureate Richard Feynman once asked his Caltech students to calculate the probability that, if he walked outside the classroom, the first car in the parking lot would have a specific license plate, say 6ZNA74. Assuming every number and letter are equally likely and determined independently, the students estimated the probability to be less than 1 in 17 million. When the students finished their calculations, Feynman revealed that the correct probability was 1: He had seen this license plate on his way into class. Something extremely unlikely is not unlikely at all if it has already happened.

Gary Smith is the Fletcher Jones Professor of Economics at Pomona College. He is the author of The AI Delusion and Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie With Statistics.

The Feynman trap—ransacking data for patterns without any preconceived idea of what one is looking for—is the Achilles heel of studies based on data mining. Finding something unusual or surprising after it has already occurred is neither unusual nor surprising. Patterns are sure to be found, and are likely to be misleading, absurd, or worse.

https://www.wired.com/story/the-exaggerated-promise-of-data-mining/