evaluate
Score a model on validation data. Call it freely — it returns Metrics, not Evidence. The distinction matters: evaluate is the iterate zone. assess is the commit zone.
Signature
ml.evaluate(model, data, *, metrics=None, intervals=False, se=False)
ml_evaluate(model, data)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | Model | — | A fitted model |
data | DataFrame | — | Validation data (must include the target column) |
metrics | dict | None | None | Custom metrics. Default: accuracy, f1, precision, recall, roc_auc (classification) or rmse, mae, r2 (regression). |
intervals | bool | False | Bootstrap 95% confidence intervals on each metric. |
se | bool | False | Report standard errors. |
Returns
Metrics — a dict-like object with metric names as keys.
—— Metrics [classification] ————————
accuracy: 0.8244
f1: 0.7579
precision: 0.8000
recall: 0.7200
roc_auc: 0.8647 Examples
Basic evaluation
metrics = ml.evaluate(model, s.valid)
print(metrics["roc_auc"]) metrics <- ml_evaluate(model, s$valid)
metrics$roc_auc With confidence intervals
metrics = ml.evaluate(model, s.valid, intervals=True)
# accuracy: 0.8244 [0.7786, 0.8664] # Confidence intervals available in Python only Notes
- Always use validation data, never test data. Test data is reserved for
assess. - Call
evaluateas many times as you want — try different models, compare results, iterate. - The grammar enforces the boundary:
evaluatereturnsMetrics,assessreturnsEvidence. Different types.