Yihui asked this question yesterday. My supervisor Dr. Hau also criticized routine grouping discretization. I encountered two plausible reasons in 2007 classes, one negative, the other at least conditionally positive.
The first is a variant of the old Golden Hammer law -- if the only tool is ANOVA, every continuous predictor need discretization. The second reason is empirical -- ANOVA with discretization steals df(s). Let's demo it with a diagram.
The red are the population points, and the black are samples. Which predicts the population better--the green continuous line, or the discretized blue dashes? R simulation code is given.