Algorithms

Eleven Rust-native families ship with zero external dependencies. Five more available via optional packages.

Rust-native (no dependencies)

AlgorithmClassificationRegressionKey parameter
random_forestYesYesn_estimators
extra_treesYesYesn_estimators
decision_treeYesYesmax_depth
gradient_boostingYesYesn_estimators, learning_rate
histgradientYesYesn_estimators, max_bins
adaboostYesn_estimators
logisticYesC
linearYes
elastic_netYesalpha, l1_ratio
naive_bayesYes
knnYesYesn_neighbors

External (optional packages)

AlgorithmClassificationRegressionInstall
xgboostYesYespip install "mlw[xgboost]"
lightgbmYesYespip install "mlw[lightgbm]"
catboostYesYespip install "mlw[catboost]"
svmYesYesIncluded (linear) / sklearn (nonlinear)

Auto selection

When algorithm="auto", the package selects based on task and data characteristics. The default is random_forest — a reliable baseline that works well across most problems without tuning.

Usage

# List all available algorithms
ml.algorithms()

# Classification only
ml.algorithms(task="classification")

# Use a specific one
model = ml.fit(s.train, "target", algorithm="xgboost", seed=42)
model <- ml_fit(s$train, "target", algorithm = "xgboost", seed = 42)

Engine selection

Each algorithm can run on multiple backends:

EngineDescription
"auto"Uses Rust backend when available, falls back to sklearn/CRAN
"ml"Rust backend (via PyO3). Zero external dependencies.
"sklearn"scikit-learn backend (Python only)
"r"CRAN packages (R only)
# Force Rust backend
model = ml.fit(s.train, "target", engine="ml", seed=42)

# Force sklearn
model = ml.fit(s.train, "target", engine="sklearn", seed=42)
# Force Rust backend
model <- ml_fit(s$train, "target", engine = "ml", seed = 42)

# Force CRAN packages
model <- ml_fit(s$train, "target", engine = "r", seed = 42)