drift
Detect distribution shift between reference data and new data. Two methods: statistical tests (KS/chi-squared per feature) or adversarial (train a classifier to distinguish old from new).
Signature
ml.drift(*, reference, new, method="statistical", threshold=0.05, seed=None)
ml_drift(reference, new, method = "statistical", threshold = 0.05, seed = NULL)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
reference | DataFrame | — | Original training data |
new | DataFrame | — | New incoming data |
method | str | "statistical" | "statistical" (KS/chi2) or "adversarial" (classifier-based) |
threshold | float | 0.05 | p-value threshold for statistical method |
seed | int | None | None | Random seed (adversarial method) |
Returns
DriftResult with:
.shifted—Trueif drift detected.severity—"none","low","medium", or"high".features_shifted— list of drifted feature names.features— dict of per-feature p-values.auc— adversarial AUC (adversarial method only)
Examples
Statistical drift detection
result = ml.drift(reference=s.train, new=new_data)
print(result.shifted) # True/False
print(result.severity) # "none", "low", "medium", "high"
print(result.features_shifted) # ["age", "fare"] result <- ml_drift(s$train, new_data)
result$shifted
result$severity
result$features_shifted Adversarial drift detection
# If a classifier can distinguish old from new, the data has drifted
result = ml.drift(reference=s.train, new=new_data, method="adversarial", seed=42)
print(result.auc) # > 0.5 means drift result <- ml_drift(s$train, new_data, method = "adversarial", seed = 42)
result$auc