Model Evaluation

With the ever-growing suite of models at the disposal of data scientists, the problems with selecting a model have grown similarly. ADS offers the Evaluation Class, a collection of tools, metrics, and charts concerned with the contradistinction of several models.

After working hard to architect and train your model, it’s important to understand how it performs across a series of benchmarks. Evaluation is a set of functions that convert the output of your test data into an interpretable, standardized series of scores and charts. From the accuracy of the ROC curve and residual QQ plots.

Evaluation can help machine learning developers to:

  • Quickly compare models across several industry-standard metrics.

    • For example, what’s the accuracy, and F1-Score of my binary classification model?

  • Discover where a model is failing to feedback into future model development.

    • For example, while accuracy is high, precision is low, which is why the examples I care about are failing.

  • Increase understanding of the trade-offs of various model types.

Evaluation can help users of machine learning algorithms to:

  • Understand visually and numerically where the model is likely to perform well, and where it is likely to fail.

    • For example, model A performs well when the weather is clear, but is much more uncertain during inclement conditions.

There are three types of ADS Evaluators: