# Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff

The material in this post has been migrated with python implementations to my github pages website.

Posted on April 21, 2013, in Regression, Simulations, Statistics, Theory and tagged bias-variance decomposition, bias-variance tradeoff, dependent variable, estimator, estimator bias, estimator variance, independent variable, learning curve, polyfit.m, polynomial model, Regression, Simulation, testing error, testing set, training error, training set. Bookmark the permalink. 10 Comments.

Hi Dustin,

Thanks a lot for the amazing article/tutorial. I have a question regarding the calculation of Bias and Variance. Is it necessary to know the true function f(x) in order to be able to estimate the Bias and Variance of a model? Because in the calculation of biasSquared it appears the function f(…).

Generally, do we want to estimate the Bias and Variance of a known function in order to do model selection and support the choice of a

prediction model?

Thanks in advance,

Dionysios

Thanks for reading; I hope the post was helpful. Regarding your question about the need for f(x) in model selection: Performing model selection directly using bias and variance is more of a theoretical perspective employed to develop using Mean Squared Error (MSE) as your model selection criterion. You’re correct you do need f(x) to determine the bias of the estimator (variance is defined only in terms of g(x), so f(x) is not needed there). However, by using the MSE, you get the information contained in the bias-variance definition of model accuracy without needing f(x) in hand (if f(x) was known, model estimation would be unnecessary and we’d be out of a job :^)). What’s nice is that the MSE is very easy to calculate from your observed data.

Hi again; thanks a lot for you fast reply. Maybe I still miss something. I am currently trying to do a similar plot as yours about some prediction models that I developed for a problem in order to show what is the “just right complexity” of promising prediction models. It is clear that MSE can be computed easily, but what about the terms of Variance and Bias?

Great post btw.

A bit late to the party but just want to highlight that the bias can be further decomposed into Model bias and estimation bias. Certain methods like OLS are unbiased estimators in the sense that they not have estimation bias. Trees on the other hand tend to be slightly biased in estimation.

Model bias exist because of how the underlying model deviates from the assumed form. i.e. in OLS we assume the underlying model is linear or that we have included all the necessary variables and there are no omitted variables etc. When these assumptions don’t hold true, model bias appears. In practice there’s almost always model bias.

Thanks a lot dustin… it help me alot

Reblogged this on It's how I want it to be.

We can create a graphical visualization of bias and variance using a bulls-eye diagram. Imagine that the center of the target is a model that perfectly predicts the correct values. pnrlookup.in As we move away from the bulls-eye, our predictions get worse and worse. Imagine we can repeat our entire model building process to get a number of separate hits on the target.

Thanks a lot Dustin. I was getting stuck and din’t know what the hell is E[f(x)] (since i think that we only define the expected value of a random variable, in this case f is known and so it is x..) then i came across your wonderful post here. Thanks again!

Absolutely brilliant, elegantly explained

Thank you

Sean

Pingback: Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff | Tianchi Chen's Blog