Skip to content

Conversation

samyron
Copy link
Contributor

@samyron samyron commented Aug 13, 2025

Changelog 📓

  • Use a segmented buffer for the OutputStream to reduce System.arraycopy's each time the output buffer is resized.
  • Refactored StringEncoder#encode to include a SWAR-based fast path for basic JSON encoding. The algorithm is from this post. It's the same as the vector-based algorithm in the C extension.

These features can be toggled with the system properties json.useSegmentedOutputStream and json.useSWARBasicEncoder. Both default to true. I'm happy to remove these. They made testing and benchmarking much easier.

Benchmarks

SegmentedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.741k i/100ms
Calculating -------------------------------------
                json     18.378k (± 6.3%) i/s   (54.41 μs/i) -    182.805k in  10.011722s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    85.000 i/100ms
Calculating -------------------------------------
                json    857.615 (± 1.3%) i/s    (1.17 ms/i) -      8.585k in  10.012075s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   185.000 i/100ms
Calculating -------------------------------------
                json      1.849k (± 1.0%) i/s  (540.77 μs/i) -     18.500k in  10.005181s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.558k i/100ms
Calculating -------------------------------------
                json     25.217k (± 1.1%) i/s   (39.66 μs/i) -    253.242k in  10.043890s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.560k i/100ms
Calculating -------------------------------------
                json     15.622k (± 0.8%) i/s   (64.01 μs/i) -    157.560k in  10.086737s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    87.000 i/100ms
Calculating -------------------------------------
                json    875.692 (± 0.9%) i/s    (1.14 ms/i) -      8.787k in  10.035282s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   182.000 i/100ms
Calculating -------------------------------------
                json      1.818k (± 0.8%) i/s  (550.15 μs/i) -     18.200k in  10.013389s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.544k i/100ms
Calculating -------------------------------------
                json     25.319k (± 0.9%) i/s   (39.50 μs/i) -    254.400k in  10.048804s

ByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.078k i/100ms
Calculating -------------------------------------
                json     10.829k (± 2.5%) i/s   (92.35 μs/i) -    108.878k in  10.062513s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    78.000 i/100ms
Calculating -------------------------------------
                json    810.901 (± 2.8%) i/s    (1.23 ms/i) -      8.112k in  10.013134s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   128.000 i/100ms
Calculating -------------------------------------
                json      1.269k (± 3.3%) i/s  (788.26 μs/i) -     12.672k in  10.001657s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.178k i/100ms
Calculating -------------------------------------
                json     21.633k (± 1.0%) i/s   (46.23 μs/i) -    217.800k in  10.068853s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.014k i/100ms
Calculating -------------------------------------
                json     10.203k (± 0.8%) i/s   (98.01 μs/i) -    102.414k in  10.037929s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    79.000 i/100ms
Calculating -------------------------------------
                json    814.479 (± 2.1%) i/s    (1.23 ms/i) -      8.216k in  10.092101s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   136.000 i/100ms
Calculating -------------------------------------
                json      1.358k (± 1.0%) i/s  (736.45 μs/i) -     13.600k in  10.016731s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.246k i/100ms
Calculating -------------------------------------
                json     21.987k (± 1.6%) i/s   (45.48 μs/i) -    220.108k in  10.013722s

master (as of commit 37e6890)

% ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb 

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   951.000 i/100ms
Calculating -------------------------------------
                json      9.517k (± 0.8%) i/s  (105.08 μs/i) -     96.051k in  10.093716s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    84.000 i/100ms
Calculating -------------------------------------
                json    843.486 (± 1.1%) i/s    (1.19 ms/i) -      8.484k in  10.059526s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   145.000 i/100ms
Calculating -------------------------------------
                json      1.448k (± 0.8%) i/s  (690.73 μs/i) -     14.500k in  10.016276s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.342k i/100ms
Calculating -------------------------------------
                json     23.073k (± 0.8%) i/s   (43.34 μs/i) -    231.858k in  10.049473s

@byroot byroot requested a review from headius August 13, 2025 14:03
private static final int DEFAULT_CAPACITY = 1024;

private int totalLength;
private byte[][] segments = new byte[21][];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 21? The minimum segment size is 1024 for the first segment. The code doubles the segment size for each additional segment. Based on this doubling, we only need 21 segments before we hit Integer.MAX_VALUE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👏

Maybe a comment or well-named constant so nobody else asks that question in the future?

@samyron
Copy link
Contributor Author

samyron commented Aug 14, 2025

Synthetic benchmarks of encoding an array of 128-byte ASCII strings.

benchmark_encoding "bytes.128.bestcase", ([("a" * 128)] * 10000)

SegmetedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   256.000 i/100ms
Calculating -------------------------------------
                json      2.561k (± 0.9%) i/s  (390.48 μs/i) -     25.600k in   9.997219s

ByteListDirectOutputStream + Scalar (effectively the same code as master)

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   137.000 i/100ms
Calculating -------------------------------------
                json      1.376k (± 1.2%) i/s  (726.60 μs/i) -     13.837k in  10.055507s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   141.000 i/100ms
Calculating -------------------------------------
                json      1.424k (± 0.8%) i/s  (702.28 μs/i) -     14.241k in  10.001896s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   254.000 i/100ms
Calculating -------------------------------------
                json      2.558k (± 1.5%) i/s  (390.92 μs/i) -     25.654k in  10.030970s

Master

% ONLY=json ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   134.000 i/100ms
Calculating -------------------------------------
                json      1.334k (± 3.6%) i/s  (749.69 μs/i) -     13.400k in  10.062253s


if (pos + 4 <= len) {
int x = bb.getInt(ptr + pos);
int is_ascii = 0x808080 & ~x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hex number only checks 3 bytes.
Maybe 0x8080800x80808080

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thank you! Late night coding without my glasses...

Interestingly no spec failed. I'll try to address that.

@samyron
Copy link
Contributor Author

samyron commented Aug 18, 2025

As of commit c3d02b08b0708b9fb6eec2fcd819224706418985 I refactored the SWAR implementation into it's own subclass of StringEncoder. I did so after looking at the jitwatch suggestions which hinted that the encodeBasic and encodeBasicSWAR methods could not be inlined into the StringEncoder#encode method. That implied there was at least one conditional and branch every time StringEncoder#encode was called when the SWAR implementation was used.

Benchmarks as of this commit

SWAR + SegmentedByteListDirectOutputStream

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.821k i/100ms
Calculating -------------------------------------
                json     18.262k (± 0.8%) i/s   (54.76 μs/i) -    183.921k in  10.071834s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    90.000 i/100ms
Calculating -------------------------------------
                json    904.604 (± 1.4%) i/s    (1.11 ms/i) -      9.090k in  10.050832s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   192.000 i/100ms
Calculating -------------------------------------
                json      1.867k (± 9.8%) i/s  (535.52 μs/i) -     18.432k in  10.061992s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.488k i/100ms
Calculating -------------------------------------
                json     26.728k (± 4.9%) i/s   (37.41 μs/i) -    266.216k in  10.000183s

SWAR + ByteListDirectOutputStream

Note: This did seem like a particularly good run, at least for the activitypub.json benchmark.

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.866k i/100ms
Calculating -------------------------------------
                json     18.740k (± 0.7%) i/s   (53.36 μs/i) -    188.466k in  10.057320s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    87.000 i/100ms
Calculating -------------------------------------
                json    875.255 (± 1.4%) i/s    (1.14 ms/i) -      8.787k in  10.041293s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   183.000 i/100ms
Calculating -------------------------------------
                json      1.829k (± 1.3%) i/s  (546.89 μs/i) -     18.300k in  10.009902s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.582k i/100ms
Calculating -------------------------------------
                json     25.316k (± 0.9%) i/s   (39.50 μs/i) -    255.618k in  10.097779s

@samyron
Copy link
Contributor Author

samyron commented Aug 18, 2025

I'm happy to disable the SegmentedByteListDirectOutputStream by default or remove it from this PR entirely. On my Macbook Air M1 it does seem to help a bit with some benchmarks. It also seems to be a bit more resilient between changing the order of the benchmarks. However, it doesn't seem to help as much on my Macbook Pro M4. I don't have current benchmarks to post from the M4 but will run them again as of the commit above tomorrow.

@samyron
Copy link
Contributor Author

samyron commented Aug 19, 2025

Benchmarks from an Macbook Pro M4. I ran these a bunch of times and the results do vary a bit each run but I grabbed a random sampling. The big surprise is the activitypub.json benchmark in the SegmentedByteListDirectOutputStream + Scalar results. I didn't expect it to make that big of a difference, especially considering the other benchmarks were much closer.

Note, while I don't have the benchmarks here, if I do run the citm_catalog benchmark before activitypub the SegmentedByteListDirectOutputStream does perform on both of those. The data shape on citm_catalog is quote different from the activitypub. Hotspot is probably making different decisions about what/how to optimize the code. It's possible I'm not running the benchmarks long enough for the results to stabilize.

SegmentedByteListDirectOutputStream + SWAR

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.311k i/100ms
Calculating -------------------------------------
                json     23.693k (± 1.0%) i/s   (42.21 μs/i) -    473.755k in  19.997219s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   130.000 i/100ms
Calculating -------------------------------------
                json      1.290k (± 1.2%) i/s  (775.31 μs/i) -     25.870k in  20.060511s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   256.000 i/100ms
Calculating -------------------------------------
                json      2.544k (± 1.0%) i/s  (393.06 μs/i) -     50.944k in  20.026085s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.477k i/100ms
Calculating -------------------------------------
                json     34.263k (± 0.8%) i/s   (29.19 μs/i) -    688.446k in  20.094387s

ByteListDirectOutputStream + SWAR

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.246k i/100ms
Calculating -------------------------------------
                json     22.857k (± 1.2%) i/s   (43.75 μs/i) -    458.184k in  20.048535s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   134.000 i/100ms
Calculating -------------------------------------
                json      1.324k (± 1.4%) i/s  (755.18 μs/i) -     26.532k in  20.040921s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   269.000 i/100ms
Calculating -------------------------------------
                json      2.710k (± 1.4%) i/s  (368.97 μs/i) -     54.338k in  20.053211s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.700k i/100ms
Calculating -------------------------------------
                json     37.012k (± 1.1%) i/s   (27.02 μs/i) -    743.700k in  20.095805s

SegmentedByteListDirectOutputStream + Scalar

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.798k i/100ms
Calculating -------------------------------------
                json     18.096k (± 1.1%) i/s   (55.26 μs/i) -    363.196k in  20.073377s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   118.000 i/100ms
Calculating -------------------------------------
                json      1.184k (± 0.9%) i/s  (844.54 μs/i) -     23.718k in  20.032471s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   197.000 i/100ms
Calculating -------------------------------------
                json      1.980k (± 1.0%) i/s  (505.07 μs/i) -     39.597k in  20.001127s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.070k i/100ms
Calculating -------------------------------------
                json     30.296k (± 0.9%) i/s   (33.01 μs/i) -    607.860k in  20.065977s

ByteListDirectOuptutStream + Scalar

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   937.000 i/100ms
Calculating -------------------------------------
                json      9.393k (± 1.2%) i/s  (106.46 μs/i) -    188.337k in  20.052987s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json    96.000 i/100ms
Calculating -------------------------------------
                json    955.553 (± 0.9%) i/s    (1.05 ms/i) -     19.200k in  20.095038s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   182.000 i/100ms
Calculating -------------------------------------
                json      1.824k (± 1.3%) i/s  (548.28 μs/i) -     36.582k in  20.060380s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.721k i/100ms
Calculating -------------------------------------
                json     26.785k (± 1.0%) i/s   (37.33 μs/i) -    536.037k in  20.014396s

@headius
Copy link
Contributor

headius commented Aug 19, 2025

@samyron Great results! I think we could go ahead with this any time, pending my couple of minor review comments that should be addressed. The segmented stream is consistently faster than the old logic, and coupled with SWAR it can be much faster. I'd like to see this land so we can get back to playing with the vector API.

@samyron
Copy link
Contributor Author

samyron commented Aug 22, 2025

@samyron Great results! I think we could go ahead with this any time, pending my couple of minor review comments that should be addressed. The segmented stream is consistently faster than the old logic, and coupled with SWAR it can be much faster. I'd like to see this land so we can get back to playing with the vector API.

@headius I'm happy to address the comments but I don't see any review comments on this PR...

Copy link
Contributor

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor changes needed

@@ -114,6 +115,17 @@ class StringEncoder extends ByteListTranscoder {

protected final byte[] escapeTable;

private static final String USE_SWAR_BASIC_ENCODER_PROP = "json.useSWARBasicEncoder";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prefix this with jruby. like other properties in JRuby and other libs.

private static final int DEFAULT_CAPACITY = 1024;

private int totalLength;
private byte[][] segments = new byte[21][];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. 👏

Maybe a comment or well-named constant so nobody else asks that question in the future?

@headius
Copy link
Contributor

headius commented Aug 22, 2025

@samyron D'oh, I had started a review but never submitted it. Just a couple of minor changes and we can merge.

@byroot byroot requested a review from headius August 27, 2025 18:44
@headius
Copy link
Contributor

headius commented Aug 27, 2025

Ship it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants