The MCDA core in beam is not specific to bioinformatics. This vignette runs it on a made-up transportation example to illustrate three points that the bio scenarios state but do not show as sharply. The modes of transport play the role of the methods, and the terrains play the role of the datasets. Each mode is scored on a terrain by speed (km/h, higher is better), cost (per km, lower is better), and CO2 (g per km, lower is better). The numbers are illustrative, kept in a plausible range. The three metrics are bundled metric cards (speed, cost, co2), so this cross-domain example reads polarity and normalization from the same registry as the bioinformatics examples. Every example here goes through the cards, whether the metrics are biological or not.
The first point is a method-by-dataset interaction. No mode is fastest on every terrain. A train is fastest on the flat road and the urban hop, a motorcycle on mud and uphill, a boat on open water, and a small plane on the long distance. A single global ranking over all terrains hides this structure.
The second point is partial coverage. Some modes do not run on some terrains at all. A boat cannot go off-road, a small plane cannot do an urban hop, and the land modes cannot cross open water. Those cells are missing (NaN). Because no mode is feasible on every terrain, a single pooled ranking over all modes is not even well defined. The output must be per-terrain.
The third point is a crossover among the slower modes. Trail running is slower than road running on the flat road but faster on mud and uphill, where off-road traction helps. An e-bike is faster than a bicycle on the flat road and the urban hop but slower uphill, where its weight costs it. On the water the kayak is slower than the motorboat but cheaper and zero CO2, so it outranks the boat on cost and CO2 while trailing on speed. None of these crossovers appears in a single pooled order: the mode that is faster on one terrain is slower on another.
Set-up
The five aggregation methods used below (SAW, TOPSIS, VIKOR, PROMETHEE II, COMET) are wrapped from pymcdm. beam normalizes the scores, then calls pymcdm on the normalized matrix and keeps the higher-is-better convention. The weighting schemes are beam’s own.
The speed heatmap below shows the full mode-by-terrain table. Infeasible cells, where a mode cannot run on a terrain, are left blank and marked with a cross. A red box marks the fastest mode on each terrain: that is the documented ground truth. Reading down any column, the fastest mode changes from terrain to terrain. Reading across any row, no mode covers every terrain.
speed = tb.metric("speed")n_modes =len(tb.mode_names)n_terrains =len(tb.terrain_names)fig, ax = plt.subplots(figsize=(6.4, 3.6))masked = np.ma.masked_invalid(speed)cmap = plt.cm.viridis.copy()cmap.set_bad(color="#dddddd")im = ax.imshow(masked, cmap=cmap, aspect="auto", norm="log")ax.set_xticks(range(n_terrains))ax.set_xticklabels(tb.terrain_names, rotation=45, ha="right")ax.set_yticks(range(n_modes))ax.set_yticklabels(tb.mode_names)ax.set_xlabel("terrain (dataset axis)")ax.set_ylabel("transport mode (method axis)")ax.set_title("Speed by mode and terrain; red box marks the fastest mode")for t inrange(n_terrains): fastest =int(np.nanargmax(speed[:, t])) ax.add_patch( plt.Rectangle( (t -0.5, fastest -0.5), 1.0, 1.0, fill=False, edgecolor="#c44e52", linewidth=2.5, ) )for i inrange(n_modes):for j inrange(n_terrains):if np.isnan(speed[i, j]): ax.text(j, i, "x", ha="center", va="center", color="#888888")cbar = fig.colorbar(im, ax=ax)cbar.set_label("speed (km/h, log scale)")fig.tight_layout()plt.show()
The grey cells with a cross are the mode-terrain pairs that do not run. No row is free of them, so no mode is measured on every terrain. Any ranking that pools across terrains compares modes that were measured on different subsets of conditions, which is why a single pooled ranking over all modes is not well defined here.
feasible = tb.feasible()runs_everywhere = [ tb.mode_names[i] for i inrange(n_modes) if feasible[i].all()]print("modes that run on every terrain:", runs_everywhere or"none")print()print("fastest mode per terrain, by speed, over the feasible modes (the ground truth):")for t, terrain inenumerate(tb.terrain_names): best = tb.mode_names[int(np.nanargmax(speed[:, t]))]print(f" {terrain:15s}{best}")
modes that run on every terrain: none
fastest mode per terrain, by speed, over the feasible modes (the ground truth):
flat_road train
mud motorcycle
uphill motorcycle
open_water boat
long_distance plane
urban_hop train
Per-terrain MCDA with example-level NaN handling
Because no mode runs on every terrain, the analysis is done one terrain at a time. On each terrain the infeasible modes are dropped (their cells are NaN) and the MCDA pipeline runs over the modes that actually run there. This is the example-level NaN handling: no imputation and no pooling across terrains, just the modes that were measured on the terrain in question. The helper feasible_submatrix drops the infeasible rows and returns a dense, NaN-free score matrix. run_from_registry then reads polarity and normalization from the speed, cost and co2 cards.
The cards declare the normalization: speed and cost span orders of magnitude across modes, so they use log_min_max; CO2 has true zeros (walking, cycling), so it uses rank, which is defined when a column carries hard zeros.
A first pass uses equal weights and SAW.
print("per-terrain top-ranked mode, equal weights, SAW")print(f"{'terrain':15s}{'top-ranked':12s}{'fastest by speed':16s} same?")for terrain in tb.terrain_names: names, sub = tb.feasible_submatrix(terrain) out = run_from_registry(sub, list(tb.metric_names), method="saw") top = names[int(np.argmin(out.ranks))] fastest = names[int(np.argmax(sub[:, 0]))]print(f" {terrain:15s}{top:12s}{fastest:16s}{top == fastest}")
Under equal weights the cheap, low-CO2 modes rank near the top on the land terrains even when they are slow. Speed is only one of three equally weighted metrics, so it does not decide the order on its own. An equal-weight composite can reward a mode that looks good only because the weighting discounts how long the trip would actually take.
Now weight speed heavily (0.8 on speed, 0.1 on cost, 0.1 on CO2) and rerun. With speed in charge, the per-terrain top-ranked mode tracks the ground truth, and a different mode ranks first on different terrains.
speed_heavy = np.array([0.8, 0.1, 0.1])print("per-terrain top-ranked mode, speed-heavy weights (0.8, 0.1, 0.1), SAW")print(f"{'terrain':15s}{'top-ranked':12s}{'fastest by speed':16s} same?")for terrain in tb.terrain_names: names, sub = tb.feasible_submatrix(terrain) out = run_from_registry(sub, list(tb.metric_names), weights=speed_heavy, method="saw") top = names[int(np.argmin(out.ranks))] fastest = names[int(np.argmax(sub[:, 0]))]print(f" {terrain:15s}{top:12s}{fastest:16s}{top == fastest}")
The mode that ranks first now changes across terrains: train on the flat road and the urban hop, motorcycle on mud and uphill, boat on open water, plane on the long distance. That is the method-by-dataset interaction stated at the top. A single global ranking cannot report all four at once.
A crossover among the slower modes
The fastest modes lead on their own terrains, but the interaction is sharper among the slower land modes, where two modes change places across terrains. Trail running is slower than road running on the flat road, where road shoes and a hard surface help, but faster on mud and uphill, where off-road traction matters. An e-bike is faster than a bicycle on the flat road and the urban hop, but its weight slows it down uphill. These are crossovers, not feasibility gaps: both modes run on the terrains in question, and which one is faster flips from terrain to terrain.
speed = tb.metric("speed")m = tb.mode_names.indext = tb.terrain_names.indexprint("trail running vs road running, speed (km/h):")print(f"{'terrain':12s}{'road':>6s}{'trail':>6s} faster")for terrain in ["flat_road", "mud", "uphill"]: road = speed[m("running"), t(terrain)] trail = speed[m("trail_running"), t(terrain)] faster ="trail"if trail > road else"road"print(f"{terrain:12s}{road:6.1f}{trail:6.1f}{faster}")print()print("e-bike vs bicycle, speed (km/h):")print(f"{'terrain':12s}{'bike':>6s}{'ebike':>6s} faster")for terrain in ["flat_road", "urban_hop", "uphill"]: bike = speed[m("bicycle"), t(terrain)] ebike = speed[m("e_bike"), t(terrain)] faster ="ebike"if ebike > bike else"bike"print(f"{terrain:12s}{bike:6.1f}{ebike:6.1f}{faster}")
Trail running is slower than road running on the flat road and faster on mud and uphill, so neither mode is faster everywhere they both run. A pooled speed order over the two modes has to pick one ranking, which is wrong on at least one terrain. This is the cleanest illustration of method-by-dataset interaction in the example: a clustering method that leads on one tissue and trails on another behaves the same way, and a single pooled ranking would hide the flip.
Five aggregations on one terrain
On the long-distance terrain every mode is feasible, so it is a good place to compare aggregation methods on a full mode set. The five methods are compared under the speed-heavy weighting. They agree on the extremes and differ in the middle.
names, sub = tb.feasible_submatrix("long_distance")methods = ["saw", "topsis", "vikor", "promethee_ii", "comet"]ranks_by_method = {}print(f"{'method':14s} ranks by mode (1 = best)")for m in methods: out = run_from_registry(sub, list(tb.metric_names), weights=speed_heavy, method=m) ranks_by_method[m] = out.ranks by_mode = {n: int(r) for n, r inzip(names, out.ranks)}print(f" {m:14s}{by_mode}")print()print("top-ranked mode per method:")for m in methods:print(f" {m:14s}{names[int(np.argmin(ranks_by_method[m]))]}")
All five methods put the plane first on the long distance under speed-heavy weights, which is the fastest mode there. The methods reshuffle the middle of the table instead. SAW, TOPSIS, VIKOR and PROMETHEE II agree on the top of the order (plane, train, motorcycle) and differ only in where the boat and the light land modes land. COMET reads the table differently: it builds its ranking from characteristic objects rather than from a direct distance to the ideal, so it ranks the boat last and pushes the e-bike down, while still keeping the plane first. The top of the order is stable across methods; the middle is method-dependent, so a claim about the exact order of the middle modes should not lean on one method.
The same ranks as a heatmap. Each column is one mode, each row one aggregation method. A column of one colour means that mode keeps its rank across methods.
grid = np.array([ranks_by_method[m] for m in methods])fig, ax = plt.subplots(figsize=(6.4, 3.0))im = ax.imshow(grid, cmap="RdYlGn_r", aspect="auto", vmin=1, vmax=len(names))ax.set_xticks(range(len(names)))ax.set_xticklabels(names, rotation=45, ha="right")ax.set_yticks(range(len(methods)))ax.set_yticklabels(methods)ax.set_xlabel("transport mode")ax.set_ylabel("aggregation method")ax.set_title("Long-distance rank by mode and method (speed-heavy weights)")for i inrange(grid.shape[0]):for j inrange(grid.shape[1]): ax.text(j, i, int(grid[i, j]), ha="center", va="center", color="black")cbar = fig.colorbar(im, ax=ax, ticks=range(1, len(names) +1))cbar.set_label("rank (1 = best)")fig.tight_layout()plt.show()
Four weightings on the same terrain
Holding the method at SAW and the terrain at long distance, the data-driven weightings are compared against equal weights. Entropy, standard deviation, and CRITIC all read the spread and correlation of the columns rather than any preference for speed, so they keep the three metrics near a third each. Under all four the order stays close to the equal-weight order, with running or train at the top rather than the fastest mode.
print(f"{'weighting':10s}{'weights (speed, cost, co2)':28s} top-ranked")for wname in ["equal", "entropy", "std", "critic"]: out = run_from_registry(sub, list(tb.metric_names), weights=wname, method="saw") w_str = np.array2string(np.round(out.weights, 3), separator=", ") top = names[int(np.argmin(out.ranks))]print(f" {wname:10s}{w_str:28s}{top}")
MEREC is left out of the table above. It takes the logarithm of the normalized scores, so it cannot accept a column that carries a hard zero, and the card normalization here (rank on CO2) produces such a zero. This is the one place the example calls run directly to override the card default with a zero-free normalization. Under z-score normalization MEREC reads the removal effect of each metric and weights cost and CO2 above speed, so a cheap mode ends up at the top, the same result as the other data-driven weightings.
Across both comparisons, the choice of weighting changes which mode ranks first far more than the choice of aggregation method does. The data-driven weightings recover a cheap mode; only a deliberate emphasis on speed recovers the per-terrain fastest mode.
A critical-difference diagram on a common block
A Demsar critical-difference diagram needs a complete tool by dataset table with no missing cells. Because no mode runs on every terrain, the diagram is restricted to a block of modes and the terrains where all of them run. The four ground modes (foot, running, bicycle, motorcycle) are feasible on five common terrains (flat road, mud, uphill, long distance, urban hop); open water is excluded because no ground mode runs there. The helper common_feasible_block returns that block and its common terrains. The diagram is built on speed, the metric whose per-terrain ground truth is marked above.
block_modes = ("foot", "running", "bicycle", "motorcycle")common_terrains, block = tb.common_feasible_block(block_modes)print("block modes: ", block_modes)print("common terrains: ", common_terrains)cd = critical_difference( block, higher_is_better=True, tool_names=block_modes)print(f"Friedman statistic = {cd.friedman_statistic:.3f}, p = {cd.friedman_pvalue:.4f}")print(f"critical difference (alpha={cd.alpha}) = {cd.critical_difference:.3f}")print("average ranks (1 = best, fastest):")for i in cd.order:print(f" {block_modes[i]:12s}{cd.average_ranks[i]:.2f}")groups = [tuple(block_modes[i] for i in c) for c in cd.cliques]print("cliques (not separable at alpha):", groups or"none")
order = cd.ordernames_sorted = [block_modes[i] for i in order]ranks_sorted = [cd.average_ranks[i] for i in order]ypos =list(range(len(order)))best =min(ranks_sorted)fig, ax = plt.subplots(figsize=(5.8, 3.0))ax.barh(ypos, ranks_sorted, color="#3a7ca5")ax.axvspan(best, best + cd.critical_difference, color="#c44e52", alpha=0.15)ax.axvline(best + cd.critical_difference, color="#c44e52", linestyle="--")ax.set_yticks(ypos)ax.set_yticklabels(names_sorted)ax.invert_yaxis()ax.set_xlabel("average speed rank across the common terrains (1 = fastest)")ax.set_ylabel("transport mode (fastest at top)")ax.set_title(f"Friedman p = {cd.friedman_pvalue:.3f}; shaded band is the critical difference")fig.tight_layout()plt.show()
On this restricted block the Friedman test rejects (p around 0.002): the four ground modes are separable by speed across the five common terrains, in the fixed order motorcycle, bicycle, running, foot. The critical difference is wide enough that adjacent modes fall in the same clique, so neighbouring modes are not separable, but the extremes (motorcycle and foot) are. This is a different question from the per-terrain MCDA. It asks whether the ground modes have a stable speed ordering across the terrains they share, and here they do, because none of them has a terrain where it suddenly leads.
SMAA on one terrain
SMAA samples weight vectors from a Dirichlet over the three-metric simplex, runs the pipeline once per draw, and reports the share of draws in which each mode is top-ranked (its confidence factor). On the long-distance terrain with TOPSIS, no single mode owns the top rank across the weight space.
smaa_report = smaa( sub, tb.polarity, n_samples=500, method="topsis", seed=0,)print("confidence factor (share of sampled weightings ranking the mode first):")for n, c inzip(names, smaa_report.confidence_factor):print(f" {n:12s}{c:.3f}")most = names[int(np.argmax(smaa_report.confidence_factor))]print(f"\nmost confident top-ranked mode: {most} "f"({smaa_report.confidence_factor.max():.2f} of samples)")
confidence factor (share of sampled weightings ranking the mode first):
foot 0.000
running 0.178
trail_running 0.000
bicycle 0.102
e_bike 0.000
motorcycle 0.000
train 0.366
kayak 0.000
boat 0.000
plane 0.354
most confident top-ranked mode: train (0.37 of samples)
The confidence is split between the train (about 0.37) and the plane (about 0.35), with running and bicycle taking the rest. No mode crosses half, so the long-distance top rank depends on the weighting: depending on how a user trades speed against cost and CO2, either the train or the plane comes out first. The SMAA confidence summarizes that, rather than a single composite that would pick one and hide the split.
Weight perturbation on one terrain
smallest_weight_perturbation reports, for every pair of modes where one is ranked above the other under SAW, the smallest single-weight change that swaps them. It runs in closed form for SAW. Here it runs on the long-distance terrain with equal weights.
ts = smallest_weight_perturbation( sub, tb.polarity, weights="equal", method="saw",)print("base ranks under equal weights, SAW:")print(" ", {n: int(r) for n, r inzip(names, ts.base.ranks)})if ts.most_fragile_pair isnotNone: p = ts.most_fragile_pairprint(f"\nmost fragile pair: {names[p.higher]} ranked above {names[p.lower]}, "f"flips by changing the weight on {tb.metric_names[p.criterion]!r} "f"by {p.delta:+.4f}" )print(f"top rank is fragile: {ts.top_rank_is_fragile}")
base ranks under equal weights, SAW:
{'foot': 4, 'running': 2, 'trail_running': 3, 'bicycle': 5, 'e_bike': 6, 'motorcycle': 9, 'train': 1, 'kayak': 7, 'boat': 10, 'plane': 8}
most fragile pair: kayak ranked above plane, flips by changing the weight on 'speed' by +0.0053
top rank is fragile: False
The most fragile pair is the one that the smallest single-weight nudge can reorder. Here a change of about five thousandths to the speed weight is enough to swap the kayak and the plane, while the equal-weight top rank itself is not fragile under any single-weight change of that size. The pairing tells a user which two modes sit closest together under the current weighting, so a small revision to preferences would reorder them first.
Leave one dataset out: how much the pooled order leans on one terrain
The sections above keep the analysis per terrain because no mode runs on every terrain. On a complete block, where a set of modes does run on every terrain in the block, a pooled ranking is well defined, and the next question is how much that pooled ranking leans on any single terrain. Leave-one-dataset-out answers it: pool the block, rank it, then drop one terrain, pool the rest, and re-rank. The water modes show this sharply. The kayak, the motorboat and the small plane all run on two terrains, open water and the long distance, so they form a complete two-terrain block.
water_modes = ("kayak", "boat", "plane")water_terrains = ("open_water", "long_distance")mode_idx = [tb.mode_names.index(m) for m in water_modes]terrain_idx = [tb.terrain_names.index(t) for t in water_terrains]water_block = tb.scores[np.ix_(mode_idx, terrain_idx, range(len(tb.metric_names)))]print("speed (km/h) by mode and terrain:")print(f"{'mode':8s}{'open_water':>11s}{'long_distance':>13s}")for i, mode inenumerate(water_modes):print(f"{mode:8s}{water_block[i, 0, 0]:11.0f}{water_block[i, 1, 0]:13.0f}")
speed (km/h) by mode and terrain:
mode open_water long_distance
kayak 6 8
boat 30 40
plane 18 750
Pooling the two terrains with speed-heavy weights and SAW, then dropping each terrain in turn. The reduction across terrains is the arithmetic mean of the raw scores, one per metric.
rules = ("arithmetic_mean",) *len(tb.metric_names)lodo = leave_one_dataset_out( water_block, tb.polarity, rules, dataset_names=water_terrains, metric_ids=tb.metric_names, weights=speed_heavy, method="saw", normalization=list(tb.normalization),)print("pooled over both terrains:", {m: int(r) for m, r inzip(water_modes, lodo.base.ranks)})for d, r in lodo.leave_one_out.items():print(f"drop {water_terrains[d]:14s}:", {m: int(rk) for m, rk inzip(water_modes, r.ranks)})print()print("rank held across the leave-one-dataset-out runs:")for m, s inzip(water_modes, lodo.rank_stability):print(f" {m:8s}{s *100:5.0f}%")print(f"most influential terrain: {water_terrains[lodo.most_influential_dataset]} "f"(largest rank shift {lodo.max_rank_shift})")
pooled over both terrains: {'kayak': 3, 'boat': 2, 'plane': 1}
drop open_water : {'kayak': 3, 'boat': 2, 'plane': 1}
drop long_distance : {'kayak': 3, 'boat': 1, 'plane': 2}
rank held across the leave-one-dataset-out runs:
kayak 100%
boat 50%
plane 50%
most influential terrain: long_distance (largest rank shift 1)
order = np.argsort(lodo.base.ranks)fig, ax = plt.subplots(figsize=(5.8, 2.6))ax.barh(range(len(water_modes)), lodo.rank_stability[order] *100, color="#3a7ca5")ax.set_yticks(range(len(water_modes)))ax.set_yticklabels([water_modes[i] for i in order])ax.invert_yaxis()ax.set_xlim(0, 100)ax.set_xlabel("rank held across the 2 leave-one-dataset-out runs (percent)")ax.set_ylabel("transport mode (ordered by pooled rank)")ax.set_title("Water modes: leave-one-terrain-out rank stability (speed-heavy weights)")fig.tight_layout()plt.show()
Pooled over both terrains the plane ranks first, because its 750 km/h on the long leg pulls its mean speed far above the others. Drop the long distance and rank on open water alone, and the motorboat ranks first instead: it is the fastest mode on the water, where the plane is slow. Drop open water and the long-distance order stands, plane first. The plane’s first place is not a property of water travel; it is an artifact of including the long leg in the pool. The kayak ranks last in every run under speed-heavy weights. Leave-one-dataset-out turns that into a number: the plane and the motorboat each hold their rank in only half the runs, so the pooled recommendation between them hangs entirely on whether the long-distance terrain is in the pool. This is the same caution the per-terrain sections raise, now stated as a stability measure on a complete block.
The funky heatmap shows the complete water block as a glyph table, with the leave-one-dataset-out span and the aggregation span next to the order. The block is small (three modes, three metrics, two terrains), so the panels are short, but the dataset span panel records what the text above says: the plane and the motorboat each move by a rank when a terrain is dropped.
The mixed-effects model puts a number on what the per-terrain sections show. Fit on speed, with the mode as a fixed effect and the terrain as a random intercept, it splits the speed variance into a between-terrain shift and the rest. The fit needs R’s lme4, so the chunk runs only when it is available.
from beam.heterogeneity import mixed_effects_from_matrix, r_availablespeed = tb.scores[:, :, 0]keep =~np.isnan(speed).all(axis=1)speed_modes = [m for m, k inzip(tb.mode_names, keep) if k]if r_available(): me = mixed_effects_from_matrix(speed[keep], speed_modes, tb.terrain_names)print(f"terrain shift (ICC): {me.icc_dataset:.2f} of the speed variance")print(f"residual share: {me.residual_share:.2f} (mode-by-terrain interaction plus noise)")else:print("R with lme4 not available; skipping the mixed-effects fit.")
R with lme4 not available; skipping the mixed-effects fit.
The contrast with the other examples is clear. On M4 the band intercept takes most of the variance, so the bands differ mostly in difficulty. Here the terrain intercept takes very little: almost all of the speed variance is the mode-by-terrain interaction, because the fast mode on one terrain is the slow mode on another. That is why a single pooled speed ranking is misleading even where it can be computed. A Bradley-Terry tree on the six terrains has too few datasets to find a stable split, the same small-sample limit the Duo and M4 examples hit; the OpenProblems spatial task is where a split appears.
Why per-terrain is the informative output
This example was built so that no mode runs on every terrain and no mode is fastest on every terrain it does run on. Both facts push the analysis to be per-terrain. Partial coverage means a pooled ranking over all modes is not even well defined, since different modes are measured on different terrains. The method-by-dataset interaction means that even where a pooled ranking is computable, it reports a single order that is wrong on at least some terrains: the fast modes lead on their own terrains and trail on others.
This mirrors method-by-dataset interaction in bioinformatics benchmarks. A clustering method that leads on one tissue can trail on another, and a method that needs a feature one dataset lacks is simply not applicable there, which is the same role the NaN cells play here. The transportation numbers are made up, but the structure of the problem is the same. Report per dataset and mark which method leads on each. Then use the cross-dataset tests (the critical-difference diagram, SMAA, weight perturbation) to say how much of the apparent ordering survives a change of conditions or of preferences.
Source Code
---title: "Transportation: a cross-domain MCDA example"author: "Izaskun Mallona"date: todayformat: html: theme: cosmo toc: true toc-location: left embed-resources: true code-tools: true fig-width: 6 fig-height: 3.5---## GoalThe MCDA core in beam is not specific to bioinformatics. This vignette runs it on a made-up transportation example to illustrate three points that the bio scenarios state but do not show as sharply. The modes of transport play the role of the methods, and the terrains play the role of the datasets. Each mode is scored on a terrain by speed (km/h, higher is better), cost (per km, lower is better), and CO2 (g per km, lower is better). The numbers are illustrative, kept in a plausible range. The three metrics are bundled metric cards (`speed`, `cost`, `co2`), so this cross-domain example reads polarity and normalization from the same registry as the bioinformatics examples. Every example here goes through the cards, whether the metrics are biological or not.The first point is a method-by-dataset interaction. No mode is fastest on every terrain. A train is fastest on the flat road and the urban hop, a motorcycle on mud and uphill, a boat on open water, and a small plane on the long distance. A single global ranking over all terrains hides this structure.The second point is partial coverage. Some modes do not run on some terrains at all. A boat cannot go off-road, a small plane cannot do an urban hop, and the land modes cannot cross open water. Those cells are missing (NaN). Because no mode is feasible on every terrain, a single pooled ranking over all modes is not even well defined. The output must be per-terrain.The third point is a crossover among the slower modes. Trail running is slower than road running on the flat road but faster on mud and uphill, where off-road traction helps. An e-bike is faster than a bicycle on the flat road and the urban hop but slower uphill, where its weight costs it. On the water the kayak is slower than the motorboat but cheaper and zero CO2, so it outranks the boat on cost and CO2 while trailing on speed. None of these crossovers appears in a single pooled order: the mode that is faster on one terrain is slower on another.## Set-upThe five aggregation methods used below (SAW, TOPSIS, VIKOR, PROMETHEE II, COMET) are wrapped from pymcdm. beam normalizes the scores, then calls pymcdm on the normalized matrix and keeps the higher-is-better convention. The weighting schemes are beam's own.```{python}import numpy as npimport matplotlib.pyplot as pltfrom beam.mcda import ( critical_difference, leave_one_dataset_out, run, run_from_registry, smaa, smallest_weight_perturbation,)from beam.scenarios import transportation_benchmarktb = transportation_benchmark()print("modes: ", tb.mode_names)print("terrains:", tb.terrain_names)print("metrics: ", tb.metric_names)print("polarity:", tb.polarity)print("normalization:", tb.normalization)```## The data, and the partial-coverage problemThe speed heatmap below shows the full mode-by-terrain table. Infeasible cells, where a mode cannot run on a terrain, are left blank and marked with a cross. A red box marks the fastest mode on each terrain: that is the documented ground truth. Reading down any column, the fastest mode changes from terrain to terrain. Reading across any row, no mode covers every terrain.```{python}speed = tb.metric("speed")n_modes =len(tb.mode_names)n_terrains =len(tb.terrain_names)fig, ax = plt.subplots(figsize=(6.4, 3.6))masked = np.ma.masked_invalid(speed)cmap = plt.cm.viridis.copy()cmap.set_bad(color="#dddddd")im = ax.imshow(masked, cmap=cmap, aspect="auto", norm="log")ax.set_xticks(range(n_terrains))ax.set_xticklabels(tb.terrain_names, rotation=45, ha="right")ax.set_yticks(range(n_modes))ax.set_yticklabels(tb.mode_names)ax.set_xlabel("terrain (dataset axis)")ax.set_ylabel("transport mode (method axis)")ax.set_title("Speed by mode and terrain; red box marks the fastest mode")for t inrange(n_terrains): fastest =int(np.nanargmax(speed[:, t])) ax.add_patch( plt.Rectangle( (t -0.5, fastest -0.5), 1.0, 1.0, fill=False, edgecolor="#c44e52", linewidth=2.5, ) )for i inrange(n_modes):for j inrange(n_terrains):if np.isnan(speed[i, j]): ax.text(j, i, "x", ha="center", va="center", color="#888888")cbar = fig.colorbar(im, ax=ax)cbar.set_label("speed (km/h, log scale)")fig.tight_layout()plt.show()```The grey cells with a cross are the mode-terrain pairs that do not run. No row is free of them, so no mode is measured on every terrain. Any ranking that pools across terrains compares modes that were measured on different subsets of conditions, which is why a single pooled ranking over all modes is not well defined here.```{python}feasible = tb.feasible()runs_everywhere = [ tb.mode_names[i] for i inrange(n_modes) if feasible[i].all()]print("modes that run on every terrain:", runs_everywhere or"none")print()print("fastest mode per terrain, by speed, over the feasible modes (the ground truth):")for t, terrain inenumerate(tb.terrain_names): best = tb.mode_names[int(np.nanargmax(speed[:, t]))]print(f" {terrain:15s}{best}")```## Per-terrain MCDA with example-level NaN handlingBecause no mode runs on every terrain, the analysis is done one terrain at a time. On each terrain the infeasible modes are dropped (their cells are NaN) and the MCDA pipeline runs over the modes that actually run there. This is the example-level NaN handling: no imputation and no pooling across terrains, just the modes that were measured on the terrain in question. The helper `feasible_submatrix` drops the infeasible rows and returns a dense, NaN-free score matrix. `run_from_registry` then reads polarity and normalization from the `speed`, `cost` and `co2` cards.The cards declare the normalization: speed and cost span orders of magnitude across modes, so they use `log_min_max`; CO2 has true zeros (walking, cycling), so it uses `rank`, which is defined when a column carries hard zeros.A first pass uses equal weights and SAW.```{python}print("per-terrain top-ranked mode, equal weights, SAW")print(f"{'terrain':15s}{'top-ranked':12s}{'fastest by speed':16s} same?")for terrain in tb.terrain_names: names, sub = tb.feasible_submatrix(terrain) out = run_from_registry(sub, list(tb.metric_names), method="saw") top = names[int(np.argmin(out.ranks))] fastest = names[int(np.argmax(sub[:, 0]))]print(f" {terrain:15s}{top:12s}{fastest:16s}{top == fastest}")```Under equal weights the cheap, low-CO2 modes rank near the top on the land terrains even when they are slow. Speed is only one of three equally weighted metrics, so it does not decide the order on its own. An equal-weight composite can reward a mode that looks good only because the weighting discounts how long the trip would actually take.Now weight speed heavily (0.8 on speed, 0.1 on cost, 0.1 on CO2) and rerun. With speed in charge, the per-terrain top-ranked mode tracks the ground truth, and a different mode ranks first on different terrains.```{python}speed_heavy = np.array([0.8, 0.1, 0.1])print("per-terrain top-ranked mode, speed-heavy weights (0.8, 0.1, 0.1), SAW")print(f"{'terrain':15s}{'top-ranked':12s}{'fastest by speed':16s} same?")for terrain in tb.terrain_names: names, sub = tb.feasible_submatrix(terrain) out = run_from_registry(sub, list(tb.metric_names), weights=speed_heavy, method="saw") top = names[int(np.argmin(out.ranks))] fastest = names[int(np.argmax(sub[:, 0]))]print(f" {terrain:15s}{top:12s}{fastest:16s}{top == fastest}")```The mode that ranks first now changes across terrains: train on the flat road and the urban hop, motorcycle on mud and uphill, boat on open water, plane on the long distance. That is the method-by-dataset interaction stated at the top. A single global ranking cannot report all four at once.## A crossover among the slower modesThe fastest modes lead on their own terrains, but the interaction is sharper among the slower land modes, where two modes change places across terrains. Trail running is slower than road running on the flat road, where road shoes and a hard surface help, but faster on mud and uphill, where off-road traction matters. An e-bike is faster than a bicycle on the flat road and the urban hop, but its weight slows it down uphill. These are crossovers, not feasibility gaps: both modes run on the terrains in question, and which one is faster flips from terrain to terrain.```{python}speed = tb.metric("speed")m = tb.mode_names.indext = tb.terrain_names.indexprint("trail running vs road running, speed (km/h):")print(f"{'terrain':12s}{'road':>6s}{'trail':>6s} faster")for terrain in ["flat_road", "mud", "uphill"]: road = speed[m("running"), t(terrain)] trail = speed[m("trail_running"), t(terrain)] faster ="trail"if trail > road else"road"print(f"{terrain:12s}{road:6.1f}{trail:6.1f}{faster}")print()print("e-bike vs bicycle, speed (km/h):")print(f"{'terrain':12s}{'bike':>6s}{'ebike':>6s} faster")for terrain in ["flat_road", "urban_hop", "uphill"]: bike = speed[m("bicycle"), t(terrain)] ebike = speed[m("e_bike"), t(terrain)] faster ="ebike"if ebike > bike else"bike"print(f"{terrain:12s}{bike:6.1f}{ebike:6.1f}{faster}")```Trail running is slower than road running on the flat road and faster on mud and uphill, so neither mode is faster everywhere they both run. A pooled speed order over the two modes has to pick one ranking, which is wrong on at least one terrain. This is the cleanest illustration of method-by-dataset interaction in the example: a clustering method that leads on one tissue and trails on another behaves the same way, and a single pooled ranking would hide the flip.## Five aggregations on one terrainOn the long-distance terrain every mode is feasible, so it is a good place to compare aggregation methods on a full mode set. The five methods are compared under the speed-heavy weighting. They agree on the extremes and differ in the middle.```{python}names, sub = tb.feasible_submatrix("long_distance")methods = ["saw", "topsis", "vikor", "promethee_ii", "comet"]ranks_by_method = {}print(f"{'method':14s} ranks by mode (1 = best)")for m in methods: out = run_from_registry(sub, list(tb.metric_names), weights=speed_heavy, method=m) ranks_by_method[m] = out.ranks by_mode = {n: int(r) for n, r inzip(names, out.ranks)}print(f" {m:14s}{by_mode}")print()print("top-ranked mode per method:")for m in methods:print(f" {m:14s}{names[int(np.argmin(ranks_by_method[m]))]}")```All five methods put the plane first on the long distance under speed-heavy weights, which is the fastest mode there. The methods reshuffle the middle of the table instead. SAW, TOPSIS, VIKOR and PROMETHEE II agree on the top of the order (plane, train, motorcycle) and differ only in where the boat and the light land modes land. COMET reads the table differently: it builds its ranking from characteristic objects rather than from a direct distance to the ideal, so it ranks the boat last and pushes the e-bike down, while still keeping the plane first. The top of the order is stable across methods; the middle is method-dependent, so a claim about the exact order of the middle modes should not lean on one method.The same ranks as a heatmap. Each column is one mode, each row one aggregation method. A column of one colour means that mode keeps its rank across methods.```{python}grid = np.array([ranks_by_method[m] for m in methods])fig, ax = plt.subplots(figsize=(6.4, 3.0))im = ax.imshow(grid, cmap="RdYlGn_r", aspect="auto", vmin=1, vmax=len(names))ax.set_xticks(range(len(names)))ax.set_xticklabels(names, rotation=45, ha="right")ax.set_yticks(range(len(methods)))ax.set_yticklabels(methods)ax.set_xlabel("transport mode")ax.set_ylabel("aggregation method")ax.set_title("Long-distance rank by mode and method (speed-heavy weights)")for i inrange(grid.shape[0]):for j inrange(grid.shape[1]): ax.text(j, i, int(grid[i, j]), ha="center", va="center", color="black")cbar = fig.colorbar(im, ax=ax, ticks=range(1, len(names) +1))cbar.set_label("rank (1 = best)")fig.tight_layout()plt.show()```## Four weightings on the same terrainHolding the method at SAW and the terrain at long distance, the data-driven weightings are compared against equal weights. Entropy, standard deviation, and CRITIC all read the spread and correlation of the columns rather than any preference for speed, so they keep the three metrics near a third each. Under all four the order stays close to the equal-weight order, with running or train at the top rather than the fastest mode.```{python}print(f"{'weighting':10s}{'weights (speed, cost, co2)':28s} top-ranked")for wname in ["equal", "entropy", "std", "critic"]: out = run_from_registry(sub, list(tb.metric_names), weights=wname, method="saw") w_str = np.array2string(np.round(out.weights, 3), separator=", ") top = names[int(np.argmin(out.ranks))]print(f" {wname:10s}{w_str:28s}{top}")```MEREC is left out of the table above. It takes the logarithm of the normalized scores, so it cannot accept a column that carries a hard zero, and the card normalization here (`rank` on CO2) produces such a zero. This is the one place the example calls `run` directly to override the card default with a zero-free normalization. Under z-score normalization MEREC reads the removal effect of each metric and weights cost and CO2 above speed, so a cheap mode ends up at the top, the same result as the other data-driven weightings.```{python}merec = run( sub, tb.polarity, weights="merec", normalization="zscore", method="saw", metric_ids=list(tb.metric_names),)w_str = np.array2string(np.round(merec.weights, 3), separator=", ")print(f"merec (zscore) weights={w_str} top-ranked={names[int(np.argmin(merec.ranks))]}")```Across both comparisons, the choice of weighting changes which mode ranks first far more than the choice of aggregation method does. The data-driven weightings recover a cheap mode; only a deliberate emphasis on speed recovers the per-terrain fastest mode.## A critical-difference diagram on a common blockA Demsar critical-difference diagram needs a complete tool by dataset table with no missing cells. Because no mode runs on every terrain, the diagram is restricted to a block of modes and the terrains where all of them run. The four ground modes (foot, running, bicycle, motorcycle) are feasible on five common terrains (flat road, mud, uphill, long distance, urban hop); open water is excluded because no ground mode runs there. The helper `common_feasible_block` returns that block and its common terrains. The diagram is built on speed, the metric whose per-terrain ground truth is marked above.```{python}block_modes = ("foot", "running", "bicycle", "motorcycle")common_terrains, block = tb.common_feasible_block(block_modes)print("block modes: ", block_modes)print("common terrains: ", common_terrains)cd = critical_difference( block, higher_is_better=True, tool_names=block_modes)print(f"Friedman statistic = {cd.friedman_statistic:.3f}, p = {cd.friedman_pvalue:.4f}")print(f"critical difference (alpha={cd.alpha}) = {cd.critical_difference:.3f}")print("average ranks (1 = best, fastest):")for i in cd.order:print(f" {block_modes[i]:12s}{cd.average_ranks[i]:.2f}")groups = [tuple(block_modes[i] for i in c) for c in cd.cliques]print("cliques (not separable at alpha):", groups or"none")``````{python}order = cd.ordernames_sorted = [block_modes[i] for i in order]ranks_sorted = [cd.average_ranks[i] for i in order]ypos =list(range(len(order)))best =min(ranks_sorted)fig, ax = plt.subplots(figsize=(5.8, 3.0))ax.barh(ypos, ranks_sorted, color="#3a7ca5")ax.axvspan(best, best + cd.critical_difference, color="#c44e52", alpha=0.15)ax.axvline(best + cd.critical_difference, color="#c44e52", linestyle="--")ax.set_yticks(ypos)ax.set_yticklabels(names_sorted)ax.invert_yaxis()ax.set_xlabel("average speed rank across the common terrains (1 = fastest)")ax.set_ylabel("transport mode (fastest at top)")ax.set_title(f"Friedman p = {cd.friedman_pvalue:.3f}; shaded band is the critical difference")fig.tight_layout()plt.show()```On this restricted block the Friedman test rejects (p around 0.002): the four ground modes are separable by speed across the five common terrains, in the fixed order motorcycle, bicycle, running, foot. The critical difference is wide enough that adjacent modes fall in the same clique, so neighbouring modes are not separable, but the extremes (motorcycle and foot) are. This is a different question from the per-terrain MCDA. It asks whether the ground modes have a stable speed ordering across the terrains they share, and here they do, because none of them has a terrain where it suddenly leads.## SMAA on one terrainSMAA samples weight vectors from a Dirichlet over the three-metric simplex, runs the pipeline once per draw, and reports the share of draws in which each mode is top-ranked (its confidence factor). On the long-distance terrain with TOPSIS, no single mode owns the top rank across the weight space.```{python}smaa_report = smaa( sub, tb.polarity, n_samples=500, method="topsis", seed=0,)print("confidence factor (share of sampled weightings ranking the mode first):")for n, c inzip(names, smaa_report.confidence_factor):print(f" {n:12s}{c:.3f}")most = names[int(np.argmax(smaa_report.confidence_factor))]print(f"\nmost confident top-ranked mode: {most} "f"({smaa_report.confidence_factor.max():.2f} of samples)")```The confidence is split between the train (about 0.37) and the plane (about 0.35), with running and bicycle taking the rest. No mode crosses half, so the long-distance top rank depends on the weighting: depending on how a user trades speed against cost and CO2, either the train or the plane comes out first. The SMAA confidence summarizes that, rather than a single composite that would pick one and hide the split.## Weight perturbation on one terrain`smallest_weight_perturbation` reports, for every pair of modes where one is ranked above the other under SAW, the smallest single-weight change that swaps them. It runs in closed form for SAW. Here it runs on the long-distance terrain with equal weights.```{python}ts = smallest_weight_perturbation( sub, tb.polarity, weights="equal", method="saw",)print("base ranks under equal weights, SAW:")print(" ", {n: int(r) for n, r inzip(names, ts.base.ranks)})if ts.most_fragile_pair isnotNone: p = ts.most_fragile_pairprint(f"\nmost fragile pair: {names[p.higher]} ranked above {names[p.lower]}, "f"flips by changing the weight on {tb.metric_names[p.criterion]!r} "f"by {p.delta:+.4f}" )print(f"top rank is fragile: {ts.top_rank_is_fragile}")```The most fragile pair is the one that the smallest single-weight nudge can reorder. Here a change of about five thousandths to the speed weight is enough to swap the kayak and the plane, while the equal-weight top rank itself is not fragile under any single-weight change of that size. The pairing tells a user which two modes sit closest together under the current weighting, so a small revision to preferences would reorder them first.## Leave one dataset out: how much the pooled order leans on one terrainThe sections above keep the analysis per terrain because no mode runs on every terrain. On a complete block, where a set of modes does run on every terrain in the block, a pooled ranking is well defined, and the next question is how much that pooled ranking leans on any single terrain. Leave-one-dataset-out answers it: pool the block, rank it, then drop one terrain, pool the rest, and re-rank. The water modes show this sharply. The kayak, the motorboat and the small plane all run on two terrains, open water and the long distance, so they form a complete two-terrain block.```{python}water_modes = ("kayak", "boat", "plane")water_terrains = ("open_water", "long_distance")mode_idx = [tb.mode_names.index(m) for m in water_modes]terrain_idx = [tb.terrain_names.index(t) for t in water_terrains]water_block = tb.scores[np.ix_(mode_idx, terrain_idx, range(len(tb.metric_names)))]print("speed (km/h) by mode and terrain:")print(f"{'mode':8s}{'open_water':>11s}{'long_distance':>13s}")for i, mode inenumerate(water_modes):print(f"{mode:8s}{water_block[i, 0, 0]:11.0f}{water_block[i, 1, 0]:13.0f}")```Pooling the two terrains with speed-heavy weights and SAW, then dropping each terrain in turn. The reduction across terrains is the arithmetic mean of the raw scores, one per metric.```{python}rules = ("arithmetic_mean",) *len(tb.metric_names)lodo = leave_one_dataset_out( water_block, tb.polarity, rules, dataset_names=water_terrains, metric_ids=tb.metric_names, weights=speed_heavy, method="saw", normalization=list(tb.normalization),)print("pooled over both terrains:", {m: int(r) for m, r inzip(water_modes, lodo.base.ranks)})for d, r in lodo.leave_one_out.items():print(f"drop {water_terrains[d]:14s}:", {m: int(rk) for m, rk inzip(water_modes, r.ranks)})print()print("rank held across the leave-one-dataset-out runs:")for m, s inzip(water_modes, lodo.rank_stability):print(f" {m:8s}{s *100:5.0f}%")print(f"most influential terrain: {water_terrains[lodo.most_influential_dataset]} "f"(largest rank shift {lodo.max_rank_shift})")``````{python}order = np.argsort(lodo.base.ranks)fig, ax = plt.subplots(figsize=(5.8, 2.6))ax.barh(range(len(water_modes)), lodo.rank_stability[order] *100, color="#3a7ca5")ax.set_yticks(range(len(water_modes)))ax.set_yticklabels([water_modes[i] for i in order])ax.invert_yaxis()ax.set_xlim(0, 100)ax.set_xlabel("rank held across the 2 leave-one-dataset-out runs (percent)")ax.set_ylabel("transport mode (ordered by pooled rank)")ax.set_title("Water modes: leave-one-terrain-out rank stability (speed-heavy weights)")fig.tight_layout()plt.show()```Pooled over both terrains the plane ranks first, because its 750 km/h on the long leg pulls its mean speed far above the others. Drop the long distance and rank on open water alone, and the motorboat ranks first instead: it is the fastest mode on the water, where the plane is slow. Drop open water and the long-distance order stands, plane first. The plane's first place is not a property of water travel; it is an artifact of including the long leg in the pool. The kayak ranks last in every run under speed-heavy weights. Leave-one-dataset-out turns that into a number: the plane and the motorboat each hold their rank in only half the runs, so the pooled recommendation between them hangs entirely on whether the long-distance terrain is in the pool. This is the same caution the per-terrain sections raise, now stated as a stability measure on a complete block.The funky heatmap shows the complete water block as a glyph table, with the leave-one-dataset-out span and the aggregation span next to the order. The block is small (three modes, three metrics, two terrains), so the panels are short, but the dataset span panel records what the text above says: the plane and the motorboat each move by a rank when a terrain is dropped.```{python}import beamfrom beam.reporting import funky_heatmap_from_runwater = beam.Scores( values=water_block, tool_names=water_modes, metric_ids=tb.metric_names, dataset_names=water_terrains, layout="long",)water_run = beam.rank(water, weights="equal", method="saw")funky_heatmap_from_run(water_run, title="Transport water block: scores and rank robustness")```## Heterogeneity: the interaction dominates hereThe mixed-effects model puts a number on what the per-terrain sections show. Fit on speed, with the mode as a fixed effect and the terrain as a random intercept, it splits the speed variance into a between-terrain shift and the rest. The fit needs R's lme4, so the chunk runs only when it is available.```{python}from beam.heterogeneity import mixed_effects_from_matrix, r_availablespeed = tb.scores[:, :, 0]keep =~np.isnan(speed).all(axis=1)speed_modes = [m for m, k inzip(tb.mode_names, keep) if k]if r_available(): me = mixed_effects_from_matrix(speed[keep], speed_modes, tb.terrain_names)print(f"terrain shift (ICC): {me.icc_dataset:.2f} of the speed variance")print(f"residual share: {me.residual_share:.2f} (mode-by-terrain interaction plus noise)")else:print("R with lme4 not available; skipping the mixed-effects fit.")```The contrast with the other examples is clear. On M4 the band intercept takes most of the variance, so the bands differ mostly in difficulty. Here the terrain intercept takes very little: almost all of the speed variance is the mode-by-terrain interaction, because the fast mode on one terrain is the slow mode on another. That is why a single pooled speed ranking is misleading even where it can be computed. A Bradley-Terry tree on the six terrains has too few datasets to find a stable split, the same small-sample limit the Duo and M4 examples hit; the OpenProblems spatial task is where a split appears.## Why per-terrain is the informative outputThis example was built so that no mode runs on every terrain and no mode is fastest on every terrain it does run on. Both facts push the analysis to be per-terrain. Partial coverage means a pooled ranking over all modes is not even well defined, since different modes are measured on different terrains. The method-by-dataset interaction means that even where a pooled ranking is computable, it reports a single order that is wrong on at least some terrains: the fast modes lead on their own terrains and trail on others.This mirrors method-by-dataset interaction in bioinformatics benchmarks. A clustering method that leads on one tissue can trail on another, and a method that needs a feature one dataset lacks is simply not applicable there, which is the same role the NaN cells play here. The transportation numbers are made up, but the structure of the problem is the same. Report per dataset and mark which method leads on each. Then use the cross-dataset tests (the critical-difference diagram, SMAA, weight perturbation) to say how much of the apparent ordering survives a change of conditions or of preferences.