-
Notifications
You must be signed in to change notification settings - Fork 43
Compare lazy ex #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Compare lazy ex #376
Conversation
…titute to use this.
…for debugging. Added extra cell_id = 0 statement do console debugger would work in python.
After passing testing, removed all testing structure and renamed new_replacement_map to replacement_map.
P.S. The penultimate commit includes an entire testing apparatus where I compared existing runs of |
What does the typical Ex look like for which you are trying to optimise? Is it one head node with a few children? Or just single nodes without children at all? |
I'll take a deeper look at this later and try to profile them. (I could also send you the scripts I'm using if it would be useful.) I admit I was quite surprised by the cost of Ex creation and how much of a benefit revising replacement_map did. My solution does seem a kludge; probably there is a better/smarter way of circumventing it, but would require a greater rewrite of your existing code logic, which I wanted to avoid. One possibility, which I didn't explore, is that the real time issue is with multiplier insertion into |
I made some changes recently to |
My original data that I compared against was indeed from 2.5.12. Repeating it for 2.5.14 helps a bit -- that takes the 27s operation I mentioned above to 23s. So I guess Lazy_Ex circumvents more than just Multiplier, because it still kicks that down to 7s. I should benchmark how many Ex creations happen in my operation, but I haven't done so yet. |
I collected some statistics for typical Ex objects in a long run. Below I give a dictionary where the key is the size of the Ex object that would have been created and the value is the count of such objects. Unsurprisingly, it is highly dominated by objects with one node (presumably indices or objects with erased children). 1: 115746460 |
That suggests that there is potentially a lot to be gained if we can make single-node Of course your PR avoids more overhead, in particular the overhead of the slow |
This is a reworking of a small part of
Ex_comparator
to make it faster. (I measure ~2.6x speedup on general workflows, up to 4x speedup on very elaborate substitutes with many rules. For example, one typical substitute went from 27 s to 7 s.)The underlying comparator used replacement_map, mapping Ex objects to Ex objects. These are created and destroyed many times in involved computations. These only differ from existing objects by a small number of operations (modifying parent relations or eliminating children). To save on clock cycles, I've introduced Lazy_Ex, which is a glorified wrapper around an Ex::iterator and a flag indicating what modification to perform. The idea is to avoid Ex creation as much as possible.
The Lazy_Ex provides a resolve() routine which will apply the operation and return a new Ex only when needed.
However, most of the objects in replacement_map are just passed to
subtree_compare
. So I modifiedsubtree_compare
to natively accept Lazy_Ex objects, applying the flagged operation during the tree comparison. So in practice, creation of such a new Ex is minimized.