beam
Benchmark Evaluation and Metrics
beam is a metric formalization layer for bioinformatics benchmarks. It ships metric cards (YAML, JSON Schema) with measurement-theory grounded metadata, a multi-criteria decision pipeline with sensitivity primitives, and method-dataset heterogeneity diagnostics.
The metric cards are mapped to STATO, UO, OBI and HuggingFace evaluate where external terms exist; the OWL release artefact at docs/beam.owl.ttl is regenerated from the cards on each release.
Where to start
- Quick start. The five-line path from a CSV to an HTML report.
- How to run from beam.yaml. The declarative runner that produces a reproducible artefact.
- How to use beam from R. The R interface (reticulate-backed; same metric cards and pipeline).
- Cards and pipeline. What a metric card carries and how the pipeline consumes it.
- Comparing methods across datasets. Friedman-Nemenyi and Skillings-Mack.
Worked vignettes
Each vignette doubles as a CI integration test (rendered on every push):
- Duo 2018 clustering: the canonical walkthrough, 14 methods on 12 single-cell RNA-seq datasets.
- Simulated scenarios: consistency checks against documented ground truth.
- Transportation modes: cross-domain example with partial coverage.
- M4 forecasting: 25 methods on six frequency bands.
- OpenProblems batch integration: MCDA contrasted with the platform’s own mean-of-scores rule, plus a Bradley-Terry tree on 50 spatial datasets.
- Cross-benchmark meta-analysis: four single-cell integration benchmarks under one consistent rule.
Source
github.com/imallona/beam. GPL-3 code, CC-BY-4.0 metric cards.