drift

Detect distribution shift between reference data and new data. Two methods: statistical tests (KS/chi-squared per feature) or adversarial (train a classifier to distinguish old from new).

Signature

ml.drift(*, reference, new, method="statistical", threshold=0.05, seed=None)
ml_drift(reference, new, method = "statistical", threshold = 0.05, seed = NULL)

Parameters

ParameterTypeDefaultDescription
referenceDataFrameOriginal training data
newDataFrameNew incoming data
methodstr"statistical""statistical" (KS/chi2) or "adversarial" (classifier-based)
thresholdfloat0.05p-value threshold for statistical method
seedint | NoneNoneRandom seed (adversarial method)

Returns

DriftResult with:

  • .shiftedTrue if drift detected
  • .severity"none", "low", "medium", or "high"
  • .features_shifted — list of drifted feature names
  • .features — dict of per-feature p-values
  • .auc — adversarial AUC (adversarial method only)

Examples

Statistical drift detection

result = ml.drift(reference=s.train, new=new_data)
print(result.shifted)           # True/False
print(result.severity)          # "none", "low", "medium", "high"
print(result.features_shifted)  # ["age", "fare"]
result <- ml_drift(s$train, new_data)
result$shifted
result$severity
result$features_shifted

Adversarial drift detection

# If a classifier can distinguish old from new, the data has drifted
result = ml.drift(reference=s.train, new=new_data, method="adversarial", seed=42)
print(result.auc)  # > 0.5 means drift
result <- ml_drift(s$train, new_data, method = "adversarial", seed = 42)
result$auc