Compare lazy ex #376

dpbutter · 2025-08-06T23:37:56Z

This is a reworking of a small part of Ex_comparator to make it faster. (I measure ~2.6x speedup on general workflows, up to 4x speedup on very elaborate substitutes with many rules. For example, one typical substitute went from 27 s to 7 s.)

The underlying comparator used replacement_map, mapping Ex objects to Ex objects. These are created and destroyed many times in involved computations. These only differ from existing objects by a small number of operations (modifying parent relations or eliminating children). To save on clock cycles, I've introduced Lazy_Ex, which is a glorified wrapper around an Ex::iterator and a flag indicating what modification to perform. The idea is to avoid Ex creation as much as possible.

The Lazy_Ex provides a resolve() routine which will apply the operation and return a new Ex only when needed.

However, most of the objects in replacement_map are just passed to subtree_compare. So I modified subtree_compare to natively accept Lazy_Ex objects, applying the flagged operation during the tree comparison. So in practice, creation of such a new Ex is minimized.

…titute to use this.

…for debugging. Added extra cell_id = 0 statement do console debugger would work in python.

After passing testing, removed all testing structure and renamed new_replacement_map to replacement_map.

dpbutter · 2025-08-07T00:08:14Z

P.S. The penultimate commit includes an entire testing apparatus where I compared existing runs of compare with the old and new replacement maps being used side-by-side, to make sure nothing different happened.

kpeeters · 2025-08-07T07:34:18Z

What does the typical Ex look like for which you are trying to optimise? Is it one head node with a few children? Or just single nodes without children at all?

dpbutter · 2025-08-07T15:26:15Z

I'll take a deeper look at this later and try to profile them. (I could also send you the scripts I'm using if it would be useful.)

I admit I was quite surprised by the cost of Ex creation and how much of a benefit revising replacement_map did. My solution does seem a kludge; probably there is a better/smarter way of circumventing it, but would require a greater rewrite of your existing code logic, which I wanted to avoid.

One possibility, which I didn't explore, is that the real time issue is with multiplier insertion into rat_set. I ran perf report on a long workflow and a sizeable amount of time was spent in Multiplier::operator<. I suspect this is may be to calls to the str_node constructor, which then tries to insert an existing multiplier into rat_set over and over again.

kpeeters · 2025-08-07T15:28:46Z

I made some changes recently to Multiplier which may have removed this bottleneck (it's in 2.5.14).

dpbutter · 2025-08-07T15:52:39Z

My original data that I compared against was indeed from 2.5.12. Repeating it for 2.5.14 helps a bit -- that takes the 27s operation I mentioned above to 23s. So I guess Lazy_Ex circumvents more than just Multiplier, because it still kicks that down to 7s. I should benchmark how many Ex creations happen in my operation, but I haven't done so yet.

dpbutter · 2025-08-15T14:38:58Z

I collected some statistics for typical Ex objects in a long run. Below I give a dictionary where the key is the size of the Ex object that would have been created and the value is the count of such objects. Unsurprisingly, it is highly dominated by objects with one node (presumably indices or objects with erased children).

1: 115746460
2: 2076
3: 43964
4: 34188
5: 64196
7: 7842
8: 16
9: 212
10: 82
11: 380
13: 248
14: 192
15: 764
16: 144
17: 620
18: 580
19: 1280
20: 544
21: 512
22: 644
23: 180
24: 250
25: 324
26: 112
27: 272
28: 188
29: 4
31: 216
32: 252
34: 72
85: 32
90: 32
96: 94
101: 94
625: 36
630: 36
932: 24
937: 24
1484: 4
1738: 4
3619: 12
3624: 12

kpeeters · 2025-08-15T15:03:37Z

That suggests that there is potentially a lot to be gained if we can make single-node Ex creation and destruction faster. The Ex copy constructor is pretty expensive right now for single nodes. First of all, it allocates not just 1 node but 3, as there is always the head and feet nodes of the tree. And then the copy_ function is not particularly clever, because it first sets up the top-level node, and then replaces that node with a copy of itself (I think I wrote replace first and then realised that this was a quick way to get a full copy, but it's pretty stupid). So that's another alloc/dealloc. So it does 5 alloc/dealloc operations in total, instead of just 1. I'll take a shot at removing that overhead, because it's idiotic (for what I should have realised will always be the most relevant case by a large margin).

Of course your PR avoids more overhead, in particular the overhead of the slow operator< for str_node, which needs to compare strings and multipliers, while you just compare pointers. So I think it's still useful to merge this. I just need to make sure that this is not going to introduce complexity which will make life harder further down the road.

dpbutter added 7 commits August 4, 2025 11:10

Modifications to Compare.cc to test changes replacement_map format.

2a72d11

Finished adding new replacement_map code to Compare.cc. Modified subs…

eabc01f

…titute to use this.

Removed old checkpoint_map comment.

f5f6099

Restructured Lazy_Ex as a class for better encapsulation.

573e8ed

Fixed small mistakes in Compare. Added size() to Ex object in Python …

c79e876

…for debugging. Added extra cell_id = 0 statement do console debugger would work in python.

Modified index_value_map as well.

b35ba4a

Added new replacement maps to conditional compare.

6785624

After passing testing, removed all testing structure and renamed new_replacement_map to replacement_map.

Removed "new" prefix from some replacement_map variables.

f0d7bd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compare lazy ex #376

Compare lazy ex #376

Uh oh!

dpbutter commented Aug 6, 2025 •

edited

Loading

Uh oh!

dpbutter commented Aug 7, 2025

Uh oh!

kpeeters commented Aug 7, 2025

Uh oh!

dpbutter commented Aug 7, 2025

Uh oh!

kpeeters commented Aug 7, 2025

Uh oh!

dpbutter commented Aug 7, 2025 •

edited

Loading

Uh oh!

dpbutter commented Aug 15, 2025 •

edited

Loading

Uh oh!

kpeeters commented Aug 15, 2025

Uh oh!

Uh oh!

Compare lazy ex #376

Are you sure you want to change the base?

Compare lazy ex #376

Uh oh!

Conversation

dpbutter commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpbutter commented Aug 7, 2025

Uh oh!

kpeeters commented Aug 7, 2025

Uh oh!

dpbutter commented Aug 7, 2025

Uh oh!

kpeeters commented Aug 7, 2025

Uh oh!

dpbutter commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpbutter commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpeeters commented Aug 15, 2025

Uh oh!

Uh oh!

dpbutter commented Aug 6, 2025 •

edited

Loading

dpbutter commented Aug 7, 2025 •

edited

Loading

dpbutter commented Aug 15, 2025 •

edited

Loading