Ensembles of Approximators #532

han-ol · 2025-07-04T16:19:18Z

This draft-PR is the result of discussions with @elseml and @stefanradev93.

The goal is fast and convenient support of approximator ensembles and the first steps for this are taken already.

We envision ApproximatorEnsemble as the abstraction at the heart of future workflows using ensembles.
- fundamentally, it is a wrapper of a dictionary of arbitrary Approximator objects.
- it overwrites the central methods compute_metrics, build, sample and passes inputs on to the respective ensemble member's methods.
Since ensembles should cover the sensitivity wrt all randomness in approximators, which are not just initialization, but also the random order of training batches, we need slightly modified datasets.
- For now only OfflineEnsembleDataset is implemented, which makes sure that training batches have an additional dimension at the second axis, containing multiple independent random slices of the available offline samples.

A few things are missing, among them are

predict/estimate methods for ApproximatorEnsemble (currently sample exists)
tests for ApproximatorEnsemble
doc strings for ApproximatorEnsemble
OnlineEnsembleDataset
DiskEnsembleDataset
tests for ensemble datasets
some Workflow

…ng signal

…le member

…mbleDataset

han-ol · 2025-07-04T16:22:29Z

You can check out the example notebook here: https://github.com/bayesflow-org/bayesflow/blob/ensembles/examples/ApproximatorEnsemble%20example.ipynb

codecov · 2025-07-04T16:33:51Z

Codecov Report

❌ Patch coverage is 98.90110% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
bayesflow/approximators/approximator_ensemble.py	98.61%	1 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/__init__.py	`49.05% <100.00%> (ø)`
bayesflow/approximators/__init__.py	`100.00% <100.00%> (ø)`
bayesflow/approximators/approximator.py	`79.54% <100.00%> (-0.46%)`	⬇️
...low/approximators/model_comparison_approximator.py	`85.79% <100.00%> (+0.59%)`	⬆️
bayesflow/datasets/__init__.py	`100.00% <100.00%> (ø)`
bayesflow/datasets/offline_ensemble_dataset.py	`100.00% <100.00%> (ø)`
bayesflow/approximators/approximator_ensemble.py	`98.61% <98.61%> (ø)`

elseml · 2025-07-09T11:54:19Z

Good job Hans! Fyi, commit 955ac79 uncovered a bug in ModelComparisonApproximator's build_from_data method which d8d84c8 addresses.

Known problem: When the approximators share networks/weights, deserialization will fail. I'm not sure yet, maybe we can fix this by looking at the way the weights are stored during saving.

vpratz · 2025-08-05T11:16:32Z

Nice work :) While trying it out a bit, the additional dimension for each ensemble member in the data caught me by surprise, and it took me a while to figure out why a dimension was missing from my data during training.
I'm not sure what the best interface would be here, but I think it would be good to think about the design for a bit. Maybe the following questions will get us closer to what we want to have:

Should the ordinary dataset classes be supported? If yes, which mode should they operate in (does it need a warning?), if no, we might want to override fit to check for them and raise an error if they are passed.
Should there be multiple "modes" for the ...EnsembleDatasets, i.e., identical data vs. different data? Might be especially relevant for online training, as different data increases the required compute.
How do we handle custom dataset classes, what shapes do we expect from them?

As I did not take part in the discussions, maybe you already talked this through. In any way, I would be happy to hear your thoughts on this...

vpratz · 2025-08-05T11:28:13Z

I have added serialization support, but deserialization fails when multiple approximators use the same weights, e.g. when they share a summary network. I'm not sure yet how this can be resolved, and if we want to enable serialization if we can not resolve it...

paul-buerkner · 2025-08-06T06:33:28Z

@han-ol could you perhaps provide a minimal code example for how the ensembles should work? On that basis, it might be then also easier to discuss interface questions, including those of @vpratz.

vpratz · 2025-08-06T07:04:35Z

@paul-buerkner You can find one as part of the PR: https://github.com/bayesflow-org/bayesflow/blob/ensembles/examples/ApproximatorEnsemble%20example.ipynb

vpratz · 2025-08-15T16:49:40Z

@han-ol and I had a discussion on the dataset question. One approach would be a general EnsembleDatasetWrapper with the following properties:

it takes in an arbitrary dataset, e.g. an instance of OnlineDataset or OfflineDataset
it determines the batch size either by reading dataset.batch_size or by sampling a batch from the dataset
it has a parameter like unique_data_fraction to control if all ensemble members get the same data (0.0) or every member gets different data (1.0). For in between values, a bootstrap procedure can be used.
the required number of simulations can be obtained by repeatedly sampling batches from dataset, or, for our own simulators, by changing the dataset.batch_size parameter. The latter would be a bit hacky, we would have to see if we want this.

In addition, in the approximator ensemble class, we can determine by the shape of inference_variables if a standard dataset (like OnlineDataset) was used. If so, we default to showing the same data to all ensemble members.

This gives us the possibility to use our existing datasets, and only requires this one additional class to add the capability to pass different data to different approximators.

stefanradev93 · 2025-08-20T08:26:55Z

I am strongly in favor of this idea. We also discussed in the past that this is a more elegant and catch-all solution.

han-ol added 4 commits July 2, 2025 17:34

Draft of ApproximatorEnsemble

211b20f

Slice training batch, giving every ensemble member independent traini…

d8fa571

…ng signal

OfflineEnsembleDataset: independent slices of indices for each ensemb…

a146e6a

…le member

Example notebook training an ApproximatorEnsemble with an OfflineEnse…

e12eb08

…mbleDataset

han-ol and others added 4 commits July 8, 2025 20:48

Make ApproximatorEnsemble directly importable

70bb230

Add log_prob, estimate and predict wrapper methods; flexible building

711cd7b

Make use of approximator.build_from_data() in fit

955ac79

Fix build_from_data in ModelComparisonApproximator

d8d84c8

han-ol and others added 11 commits July 17, 2025 17:31

Make use of approximator.build_from_data() in fit

7daaab2

Fix build_from_data in ModelComparisonApproximator

71d8382

Add predict wrapper method

4d3130b

Tests for ApproximatorEnsemble

e60a25e

Make OfflineEnsembleDataset directly importable

df7b5d2

Merge remote-tracking branch 'upstream/ensembles' into ensembles

0693ec5

Fit test for ensembles

1ca2a76

Tests for OfflineEnsembleDataset

59cf297

Merge branch 'dev' into ensembles

3590441

Merge remote-tracking branch 'upstream/dev' into ensembles

a079bd7

make ApproximatorEnsemble serializable [no ci]

4f20680

Known problem: When the approximators share networks/weights, deserialization will fail. I'm not sure yet, maybe we can fix this by looking at the way the weights are stored during saving.

Merge remote-tracking branch 'upstream/dev' into ensembles [no ci]

2929bc4

support for nested summary variables [no ci]

d6120c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensembles of Approximators #532

Ensembles of Approximators #532

Uh oh!

han-ol commented Jul 4, 2025 •

edited

Loading

Uh oh!

han-ol commented Jul 4, 2025

Uh oh!

codecov bot commented Jul 4, 2025 •

edited

Loading

Uh oh!

elseml commented Jul 9, 2025

Uh oh!

vpratz commented Aug 5, 2025

Uh oh!

vpratz commented Aug 5, 2025

Uh oh!

paul-buerkner commented Aug 6, 2025

Uh oh!

vpratz commented Aug 6, 2025

Uh oh!

vpratz commented Aug 15, 2025

Uh oh!

stefanradev93 commented Aug 20, 2025

Uh oh!

Uh oh!

Ensembles of Approximators #532

Are you sure you want to change the base?

Ensembles of Approximators #532

Uh oh!

Conversation

han-ol commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

han-ol commented Jul 4, 2025

Uh oh!

codecov bot commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

elseml commented Jul 9, 2025

Uh oh!

vpratz commented Aug 5, 2025

Uh oh!

vpratz commented Aug 5, 2025

Uh oh!

paul-buerkner commented Aug 6, 2025

Uh oh!

vpratz commented Aug 6, 2025

Uh oh!

vpratz commented Aug 15, 2025

Uh oh!

stefanradev93 commented Aug 20, 2025

Uh oh!

Uh oh!

han-ol commented Jul 4, 2025 •

edited

Loading

codecov bot commented Jul 4, 2025 •

edited

Loading