evaluate

Score a model on validation data. Call it freely — it returns Metrics, not Evidence. The distinction matters: evaluate is the iterate zone. assess is the commit zone.

Signature

ml.evaluate(model, data, *, metrics=None, intervals=False, se=False)
ml_evaluate(model, data)

Parameters

ParameterTypeDefaultDescription
modelModelA fitted model
dataDataFrameValidation data (must include the target column)
metricsdict | NoneNoneCustom metrics. Default: accuracy, f1, precision, recall, roc_auc (classification) or rmse, mae, r2 (regression).
intervalsboolFalseBootstrap 95% confidence intervals on each metric.
seboolFalseReport standard errors.

Returns

Metrics — a dict-like object with metric names as keys.

—— Metrics [classification] ————————
  accuracy:     0.8244
  f1:           0.7579
  precision:    0.8000
  recall:       0.7200
  roc_auc:      0.8647

Examples

Basic evaluation

metrics = ml.evaluate(model, s.valid)
print(metrics["roc_auc"])
metrics <- ml_evaluate(model, s$valid)
metrics$roc_auc

With confidence intervals

metrics = ml.evaluate(model, s.valid, intervals=True)
# accuracy: 0.8244 [0.7786, 0.8664]
# Confidence intervals available in Python only

Notes

  • Always use validation data, never test data. Test data is reserved for assess.
  • Call evaluate as many times as you want — try different models, compare results, iterate.
  • The grammar enforces the boundary: evaluate returns Metrics, assess returns Evidence. Different types.