flowchart LR
A[metric cards<br/>YAML] -->|polarity| B[properties_for]
A -->|scale_type<br/>allowed_transformations| V[validate_for_aggregation]
A -->|range_lower<br/>range_upper| BO[bounded normalization]
A -->|recommended_normalization<br/>score_of_random_baseline| NS[per-metric strategy]
A -->|recommended_aggregation<br/>_across_datasets| CD[aggregate_across_datasets]
A -.->|meaningful_zero<br/>uncertainty_model| X((not yet<br/>enforced))
B --> RR[run_from_registry]
V --> RR
BO --> RR
NS --> RR
S[score matrix<br/>tool x metric] --> RR
W{weights<br/>name or array} --> RR
M{method<br/>saw or topsis} --> RR
RR --> N[normalize<br/>per-metric strategy]
N -->|0 to 1 matrix<br/>higher = better| AGG[saw or topsis]
WT[weight vector] --> AGG
AGG --> R[rank]
R --> RES[Result]
T[tool x dataset matrix<br/>per metric] --> CD
CD -->|per-tool vector| S
What the MCDA pipeline reads from the metric cards
A metric card declares fields covering identity, kind, inputs, output, semantics, comparability, implementations, examples, and provenance. The pipeline binds to a growing subset of these. In the current release the ontology-aware entry run_from_registry reads per-metric polarity, declared range bounds, declared scale_type, the set of allowed_transformations, the comparability.recommended_normalization strategy, and semantics.score_of_random_baseline. The cross-dataset aggregation primitive reads comparability.recommended_aggregation_across_datasets. The remaining fields are kept as metadata and are not enforced.
For why the normalization strategy depends on the measurement scale of the metric, and where plain min-max scaling fails, see the page on normalization and scales.
Data flow
Solid edges mark fields read by the current pipeline. Dashed edges mark fields declared in the metric cards but not yet read.
Fields read
polarity: passed tonormalize, which inverts columns markedlower_is_betterand rescales each column to [0, 1]. The output matrix is oriented so higher values mean better performance for every column.range_lower,range_upper: when both bounds are declared on a card,run_from_registrypasses them tonormalize. The min-max and baseline-relative strategies use the theoretical range rather than the empirical extrema, so two benchmarks that use the same metric on different score subsets produce comparable rescaled values. Observations outside the declared range raise, whatever the strategy.comparability.recommended_normalization:run_from_registryreads this per metric and rescales that column with the named strategy, defaulting tomin_max. The options aremin_max,log_min_max,rank,zscore, andbaseline_relative. The choice depends on the measurement scale of the metric; see the page on normalization and scales. Runtime and peak memory uselog_min_max; the Adjusted Rand Index usesbaseline_relative.semantics.score_of_random_baseline: the chance-level value of a metric, read by thebaseline_relativestrategy so a method no better than chance maps to 0 rather than the column midpoint.scale_type:validate_for_aggregationrefuses SAW or TOPSIS on columns whose declared scale type isnominalorordinal. Onlyintervalandratiocolumns pass.allowed_transformations:validate_for_aggregationchecks that the card permits the transform the chosen strategy applies. Min-max and baseline-relative needaffineormin_max;log_min_maxneedslog;rankneedsrank;zscoreneedsz_scoreoraffine. This replaces the earlier blanket check foraffine, so a ratio metric normalized bylog_min_maxis validated againstlograther than against anaffinegrant it does not need.comparability.recommended_aggregation_across_datasets:aggregate_across_datasetsreads this when reducing a tool by dataset matrix to a tool vector for one metric. Ratio metrics whose values span orders of magnitude (runtime, peak memory) declaregeometric_meanper Smith 1988; bounded interval and ratio metrics declarearithmetic_mean.
run_from_registry also runs a guard after it picks the strategies. For any column still using min-max, it warns when a declared bound is missing (the scale rests on the data and shifts when the method set changes) and when the column is heavy-tailed (one outlier dominates the rescale). The warnings travel on the Result and do not block the run.
Fields declared but not enforced
meaningful_zero: declared on every card; no current reader.uncertainty_model: declared on derived metrics; the pipeline does not propagate uncertainty through aggregation.monotonic: declared on every card; no current reader.comparability.comparable_withinand free-formaggregation_rulesnotes: declared on every card; read only by humans.
Planned enforcement
- Use the declared
uncertainty_modelto propagate standard errors through normalization and aggregation, so the composite carries a usable error bar. - Enforce
comparability.comparable_withinto refuse cross-task aggregation when no card permits it. - Turn free-form
aggregation_rulesnotes into machine-readable constraints over time; therecommended_aggregation_across_datasetsenum is the first such migration.
As each item lands, the matching edge in the diagram moves from dashed to solid.