Citations

Every empirical claim on this site links here. Download BibTeX

  1. Kapoor, S. & Narayanan, A. (2025). Leakage and the reproducibility crisis in ML-based science — living survey. reproducible.cs.princeton.edu 648 papers as of 2025. Continuously updated.
  2. Kapoor, S. & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns, 4(9), 100804. doi:10.1016/j.patter.2023.100804 294 papers across 17 fields. Peer-reviewed version.
  3. Kapoor, S. & Narayanan, A. (2022). Leakage and the reproducibility crisis in ML-based science. arXiv:2207.07048. arxiv.org/abs/2207.07048 329 papers in the original preprint.
  4. Roth, S. (2026). A grammar of machine learning workflows. EPAGOGY. doi:10.5281/zenodo.19023838 8 primitives, 4 constraints, 2,047 experimental instances. Python & R.
  5. Roth, S. (2022). Biased machines in the realm of politics. GSDS, University of Konstanz. KOPS Archive Causal decomposition of ML prediction errors.
  6. Wickham, H. (2010). A layered grammar of graphics. Journal of Computational and Graphical Statistics, 19(1), 3–28. doi:10.1198/jcgs.2009.07098 The structural precedent for formal workflow composition.