Code Generation: Constant creation with NOT(0) + shifts#16729
Conversation
|
I meant to have all commits show as their original commit author. They do, but only if you click on them. Full credit for this optimization approach goes to @moh-eulith. |
|
I'll go through the failing tests and update them as needed. |
|
Stats from the output of Bytecode size
Gas
|
blishko
left a comment
There was a problem hiding this comment.
I have glanced over the code and I am generally in favour of this change.
It seems this could be further split into two separate changes and we could evaluate them separately.
One thing I don't like is the duplication of the code in the evmasm optimizer and in the Yul optimizer.
I think we would ideally implement this change only at the evmasm level.
(I think we could actually rip out the whole Yul constant optimizer, see #16738.)
| // pure negation can sometimes produce bad results | ||
| // example: 0xff00000000000000000000000000000000000000000000000000000000000000 | ||
| // 0xff at the most significant byte of u256 | ||
| // without the extra condition: not(sub(shl(0xf8, 0x01), 0x01)) | ||
| // the extra condition turns that into: shl(0xf8, 0xff) | ||
| if ( | ||
| numberEncodingSize(~_value) < numberEncodingSize(_value) && | ||
| (onesEnd < 256 || onesEnd - onesStart > 16) | ||
| ) |
There was a problem hiding this comment.
This is interesting!
We could try to separate this change and see what effect we get from this small change.
There was a problem hiding this comment.
this depends on the computed mask variables (onesEnd, onesStart), so separating it would require a chunk of code duplicated in both PRs.
There was a problem hiding this comment.
What I am suggesting is to keep the computation of onesEnd and onesStart, but skip the computation of newRoutine. Do you expect that would still provide some benefit?
There was a problem hiding this comment.
Yes, it would provide a tiny benefit for the rare cases like the given example (easy to see in a test case, hardly move the needle for real contracts). IMHO, not worth a separate PR.
P.S. Did you notice yul optimizer doesn't need this? It's because it's structurally better than the libevmasm version (yul compares different computations; libevasm compares a single computation with other choices, like DATACOPY, so the single computation has to have extra conditionals).
There was a problem hiding this comment.
P.S. Did you notice yul optimizer doesn't need this? It's because it's structurally better than the libevmasm version (yul compares different computations; libevasm compares a single computation with other choices, like DATACOPY, so the single computation has to have extra conditionals).
I definitely see some differences. If yul version does something better, the evmasm version could be improved to do the same thing, no?
In any case, no need to separate this into a new PR.
There was a problem hiding this comment.
Removing the yul optimizer on top of the not(0) constant optimizer makes the yul code even more optimized
That's weird. The tests included in this repo's CI tell a different story. When I added the not(0) optimizations to the yul path, there were improvements. Compare
#15935 (comment)
and
#15935 (comment)
For example, "colony" went from -1.91 to -2.33, etc.
There was a problem hiding this comment.
This is using the not(0) constant optimizer as the baseline here, so it's comparing the not(0) constant optimizer PR alone (0 changes by definition) vs having both the not(0) constant optimizer and removing the yul constant optimizer (the spat version).
The not(0) constant optimizer is definitely a win in either case.
There was a problem hiding this comment.
Okay, I now get it, and now I also understand why on the performance optimization side I've been seeing things that only speed up yul speed up the regular compiler too.
appendYulUtilityFunctions() is called by appendMissingFunctions(), and places yul code into contracts. Doing this runs the yul side, including yul optimizers, even on non ir-contracts.
There was a problem hiding this comment.
@moh-eulith Here's the chart, with both against a baseline of the current development branch. This is probably what you were expecting.
There was a problem hiding this comment.
I assume "splat" is without the yul optimizer. I think the code this is benchmarking against matters. I just tried this on the my own contracts: removing yul-constant-optimizer increased the contract total size from 61,593 to 61,903 (I compile with --via-ir --optimize --optimize-runs 2 so presumably similar to your via-ir-low-runs).
@blishko 's PR shows the same thing (#16738 (comment)) where ir-optimize-evm+yul has some regressions and some improvements.

Description
Right or left aligned constants in 12 gas, 5 bytes, vs current 18 gas, 9 bytes.
PR #15935 proposed using NOT(0) and then shifts to create constants. This is more bytecode efficient than the current computed constants in most situations. This results in 2.9% improvement in optimized (200 runs) bytecode size, as well as a slight average gas improvement. via-ir optimization results show a similar improvement.
The underlying concepts are sound, but I wanted the code to reflect that underlying simplicity. This PR:
The behavior should be identical between both PRs. Both PR's result in identical benchmark gas usage and bytecode size. I also wrote a separate harness (not included in PR) to test this algorithm for correctness against each possible contiguous run of 1s.
Why we should do this:
There are a lot of ways to optimize constant creation or masking operations. I think this is the one that should land immediately:
Checklist
AI Disclosure
Usage: