Performance measures in Forecasting, when to use what

Bijula Ratheesh
4 min readAug 29, 2021

From my experience, I have realized that the evaluation of a model requires more focus and effort than the model itself. The more complex your model is, so should be its evaluation metrics. Models should be regularly monitored to capture the drift in data, features, concept and model performances.

There should a robust Model evaluation framework that should test the assumption (if any) in the objective function or model and also evaluate the input and output. A typical Model evaluation framework can be seen below-

Error Analysis Metrics

Model performance measures or error analysis metrics should be

  1. valid
  2. unbiased
  3. not highly sensitivity to outliers

Errors analysis can be scale dependent or independent based on the problem we are trying to solve. Scale dependent errors are useful when we are assessing the performance of a single series and scale independent, when we are comparing between series.

Scale dependent

Mean Absolute Error (MAE )- is the average of absolute error. This is useful in assessing the cost of error. Used in inventory control. Formula is given below, where y is the actual value and y tilde is the forecast value.

Geometric Mean Absolute Error (GMAE)- this is a geometric mean aggregation of absolute error. It is more robust to outlier, however suffers from undefined problem if error is 0 or both forecast and actuals are 0.

Mean Squared Error (MSE) - It is an alternative to the mean absolute error, except that more weight is placed on larger errors. It is unreliable and difficult to interpret but still being used by statisticians.

MAE is more robust since it is less sensitive to extreme values than MSE. However when there is a very noisy data then even MAE is sensitive, here we can also use MdAE Median Absolute Error, which is using median instead of mean. MdAE is interpretationally difficult.

Scale Independent

Mean Absolute Percentage Error (MAPE) this is a widely used measure of error, although comes with a downside of running into being undefined if the error is 0. Also if the actual values are closer to 0, MAPE can be large. I have read in many articles that it gives weightage to negative error, however from the formula itself its evident that it cannot be biased.

Median Absolute Percentage Error (MdAPE)- this is a median substitute for MAPE, suffers from interpretability.

Relative Absolute Error (MdRAE)- This is a relative measure defined as the ratio of errors from a given forecasting method with errors produced by a naive forecast. If the series is volatile, then this is the right measure as it controls for change.

If the data values are large numbers ranging in thousands and above, it is better to avoid use of MAE and use MAPE instead. Similarly in case of data values being small numbers avoid the use to MAPE, since chances of running to extremely small data vales are high.

Using only one metric for comparison can be misleading and we can draw incorrect judgement. I suggest we use at least 2 measure to conclude our error analysis. Relative measure like RAE and its variants such as MdRAE and CumMdRAE can be used with other percentage or scale error to make our analysis more robust.

References

[International Series in Operations Research & Management Science] J. Scott Armstrong — Principles of Forecasting A Handbook for Researchers and Practitioners (2001, Springer) — libgen.lc

ANOTHER LOOK AT FORECAST-ACCURACY METRICS FOR INTERMITTENT DEMAND by Rob J. Hyndman

--

--