Residual Analysis
Plotting and Analysing Residuals
The residuals from a fitted model are defined as the differences between the response data and the fit to the response data at each predictor value.
residual = data – fit
You can display the residuals in the Curve Fitter app by clicking Residuals Plot in the Visualization section of the Curve Fitter tab.
Mathematically, the residual for a specific predictor value is the difference between the response value y and the predicted response value ŷ.
r = y – ŷ
Assuming the model you fit to the data is correct, the residuals approximate the random errors. Therefore, if the residuals appear to behave randomly, it suggests that the model fits the data well. However, if the residuals display a systematic pattern, it is a clear sign that the model fits the data poorly. Always bear in mind that many results of model fitting, such as confidence bounds, will be invalid should the model be grossly inappropriate for the data.
A graphical display of the residuals for a first-degree polynomial fit is shown below. The top plot shows that the residuals are calculated as the vertical distance from the data point to the fitted curve. The bottom plot displays the residuals relative to the fit, which is the zero line.
The residuals appear randomly scattered around zero indicating that the model describes the data well.
A graphical display of the residuals for a second-degree polynomial fit is shown below. The model includes only the quadratic term, and does not include a linear or constant term.
The residuals are systematically positive for much of the data range indicating that this model is a poor fit for the data.
Example: Residual Analysis
This example fits several polynomial models to generated data and evaluates how well those models fit the data and how precisely they can predict. The data is generated from a cubic curve, and there is a large gap in the range of the x variable where no data exist.
x = [1:0.1:3 9:0.1:10]'; c = [2.5 -0.5 1.3 -0.1]; y = c(1) + c(2)*x + c(3)*x.^2 + c(4)*x.^3 + (rand(size(x))-0.5);
Fit the data in the Curve Fitter app using a cubic polynomial and a fifth-degree polynomial. The data, fits, and residuals are shown below. You can display residuals in the Curve Fitter app by clicking Residuals Plot in the Visualization section of the Curve Fitter tab.
Both models appear to fit the data well, and the residuals appear to be randomly distributed around zero. Therefore, a graphical evaluation of the fits does not reveal any obvious differences between the two equations.
Look at the numerical fit results in the Results pane and compare the confidence bounds for the coefficients.
The results show that the cubic fit coefficients are accurately known (bounds are
small), while the quintic fit coefficients are not accurately known. As expected,
the fit results for poly3
are reasonable because the generated
data follows a cubic curve. The 95% confidence bounds on the fitted coefficients
indicate that they are acceptably precise. However, the 95% confidence bounds for
poly5
indicate that the fitted coefficients are not known
precisely.
The goodness-of-fit statistics are shown in the Table Of Fits pane. By default, the adjusted R-square and RMSE statistics are displayed in the table. The statistics do not reveal a substantial difference between the two equations. To choose statistics to display or hide, right-click the column headers.
The 95% nonsimultaneous prediction bounds for new observations are shown below. To
display prediction bounds in the Curve Fitter app, select
95%
from the Prediction Bounds
list in the Visualization section of the Curve
Fitter tab.
The prediction bounds for poly3
indicate that new observations
can be predicted with a small uncertainty throughout the entire data range. This is
not the case for poly5
. It has wider prediction bounds in the
area where no data exist, apparently because the data does not contain enough
information to estimate the higher degree polynomial terms accurately. In other
words, a fifth-degree polynomial overfits the data.
The 95% prediction bounds for the fitted function using poly5
are shown below. As you can see, the uncertainty in predicting the function is large
in the center of the data. Therefore, you would conclude that more data must be
collected before you can make precise predictions using a fifth-degree
polynomial.
In conclusion, you should examine all available goodness-of-fit measures before deciding on the fit that is best for your purposes. A graphical examination of the fit and residuals should always be your initial approach. However, some fit characteristics are revealed only through numerical fit results, statistics, and prediction bounds.