split

Stratified three-way split into train, valid, and test. Returns a split result with a .dev accessor (train + valid combined) for the final refit.

Signature

ml.split(data, target=None, *, ratio=(0.6, 0.2, 0.2), seed=42, stratify=True, groups=None)

ml_split(data, target = NULL, ratio = c(0.6, 0.2, 0.2), seed = NULL, stratify = TRUE, groups = NULL)

Parameters

Parameter	Type	Default	Description
`data`	DataFrame	—	Input data
`target`	str	—	Name of the target column
`ratio`	tuple	`(0.6, 0.2, 0.2)`	Train/valid/test proportions. Must sum to 1.
`seed`	int	`42`	Random seed for reproducibility.
`stratify`	bool	`True`	Stratify on target class distribution (classification only).
`groups`	str \| None	`None`	Column name for group-aware splitting. All rows with the same group value stay in the same partition.

Returns

A SplitResult with four accessors:

Accessor	Description
`.train`	Training partition (60% by default)
`.valid`	Validation partition (20%)
`.test`	Test partition (20%) — held out, used only by `assess`
`.dev`	Train + valid combined — use for the final refit before assessment

Examples

Basic split

s = ml.split(data, "churn", seed=42)
print(len(s.train), len(s.valid), len(s.test))

s <- ml_split(data, "churn", seed = 42)
c(nrow(s$train), nrow(s$valid), nrow(s$test))

Custom ratio

s = ml.split(data, "target", ratio=(0.8, 0.1, 0.1), seed=42)

s <- ml_split(data, "target", ratio = c(0.8, 0.1, 0.1), seed = 42)

Grouped split

When rows belong to groups (e.g., multiple measurements per patient), set groups to keep all rows from the same group in the same partition. This prevents leakage across group boundaries.

s = ml.split(data, "outcome", groups="patient_id", seed=42)

s <- ml_split(data, "outcome", groups = "patient_id", seed = 42)

fit →