Sunday, May 23, 2010

Limits of randomized experiments

I just got back from the Mid-Atlantic Causal Inference Conference, the leading meeting for statisticians who look not just for associations, but for causality. Randomized experiments are the gold standard for causality because randomization ensures that on average, the treatment and comparison groups are similar. Experiments do have limitations, however, that come primarily from their great expense: experiments may need to be small and short duration, weakening the chance that experimenters can see an effect. The study described in this article is a perfect example: 22 autistic children were randomized to a gluten-free, casein-free (GFCF) diet for 18 weeks and then given a "challenge" of these foods about 4 weeks into the trial; by the end of the trial 8 of the subjects had dropped out.

Seemingly, there are hundreds of parents on internet mailing lists and websites putting their children on a GFCF diet. GFCF diet is hard to implement, and it takes weeks or months or practice to get right, and even then an errant crumb can disrupt the progress, and it's unclear how long kids need to be on the diet to see an improvement because determining the starting point is so inexact. A parent can probably remove >90% of gluten and casein from their child's diet starting on day 1, but hunting down the remaining 10% to reach 100% adherence takes a long time. And 99.9998% adherence may be exactly what's required: the FDA definition of gluten-free is 20 ppm. Once the GFCF diet is in place, many parents say that it improves their children. Now a randomized trial that started out with 22 participants and lost 8 of them comes into the news with the headline, "Eliminating Wheat, Milk From Diet Doesn't Help Autistic Kids."

An experiment doesn't have the luxury of trying to refine the diet to make sure that it's being done correctly, or to figure out the length of time the diet needs to continue until there's improvement. An experiment generally determines the treatment in advance rather than trial and error, since trying to get the best result is, to a certain extent, cheating (i.e., risking a spuriously significant result that occurred simply by chance).

A good experiment is an invaluable tool for understanding reality, but a so-so experiment is no better than a qualitative study of people on internet websites.

No comments: