split
Stratified three-way split into train, valid, and test. Returns a split result with a .dev accessor (train + valid combined) for the final refit.
Signature
ml.split(data, target=None, *, ratio=(0.6, 0.2, 0.2), seed=42, stratify=True, groups=None)
ml_split(data, target = NULL, ratio = c(0.6, 0.2, 0.2), seed = NULL, stratify = TRUE, groups = NULL)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
data | DataFrame | — | Input data |
target | str | — | Name of the target column |
ratio | tuple | (0.6, 0.2, 0.2) | Train/valid/test proportions. Must sum to 1. |
seed | int | 42 | Random seed for reproducibility. |
stratify | bool | True | Stratify on target class distribution (classification only). |
groups | str | None | None | Column name for group-aware splitting. All rows with the same group value stay in the same partition. |
Returns
A SplitResult with four accessors:
| Accessor | Description |
|---|---|
.train | Training partition (60% by default) |
.valid | Validation partition (20%) |
.test | Test partition (20%) — held out, used only by assess |
.dev | Train + valid combined — use for the final refit before assessment |
Examples
Basic split
s = ml.split(data, "churn", seed=42)
print(len(s.train), len(s.valid), len(s.test)) s <- ml_split(data, "churn", seed = 42)
c(nrow(s$train), nrow(s$valid), nrow(s$test)) Custom ratio
s = ml.split(data, "target", ratio=(0.8, 0.1, 0.1), seed=42) s <- ml_split(data, "target", ratio = c(0.8, 0.1, 0.1), seed = 42) Grouped split
When rows belong to groups (e.g., multiple measurements per patient), set groups to keep all rows from the same group in the same partition. This prevents leakage across group boundaries.
s = ml.split(data, "outcome", groups="patient_id", seed=42) s <- ml_split(data, "outcome", groups = "patient_id", seed = 42)