Why practitioners discretize their continuous data

Yihui asked this question yesterday. My supervisor Dr. Hau also criticized routine grouping discretization. I encountered two plausible reasons in 2007 classes, one negative, the other at least conditionally positive.

The first is a variant of the old Golden Hammer law -- if the only tool is ANOVA, every continuous predictor need discretization. The second reason is empirical -- ANOVA with discretization steals df(s). Let's demo it with a diagram.
The red are the population points, and the black are samples. Which predicts the population better--the green continuous line, or the discretized blue dashes? R simulation code is given.



4 thoughts on “Why practitioners discretize their continuous data”

  1. Pingback: Keep on Fighting!
  2. The discretization here is essentially a kind of local smoothing techniques using a constant kernel function. Generally speaking, local modeling can effectively improve fitness (lower error sum of squares) but we have to carefully avoid overfitting. If you discretize x into more intervals, the fitting will be even better.

  3. Residuals and errors are different. The more intervals, squared-residuals decrease while squared-errors increase. So the black points, or discretization with max intervals, predict red population the worst.

    Discretization fades micro information (most errors) while highlights macro information (usually non-linear). When LOESS is popular enough, discretization will be abandoned. Practitioners really need local smoothing to preview their concerned macro models.

Leave a Reply to Yihui Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>