Skip to content

Conversation

NoiseByNorthwest
Copy link
Owner

The v0.5 branch is currently under development, but I’m making it publicly available in its current state so anyone can benefit from the ongoing improvements.

Most of the work so far has focused on deep optimizations within the profiler core. As a result, the overhead introduced by SPX is now lower than that of XHProf. Compared to SPX v0.4, the overhead in v0.5 has been reduced by a factor of 4.

While I haven’t had time to publish the benchmark I use to compare the overhead of Xdebug, SPX, and XHProf, I plan to make it public closer to the end of v0.5’s development. In the meantime, all details about the optimizations can be found in the commit log.

If you try out this branch, I’d be very interested in hearing your feedback (especially whether you notice improvements in performance and overhead).

The remaining work before v0.5 can be officially released is a full rewrite of the web interface. This rewrite aims to:

  • provide a more maintainable codebase
  • deliver a cleaner and more modern UI (with support for color themes)
  • improve performance to make large reports (10M+ calls) easier to work with
  • fix some precision issues which are hard to fix on the current code base
  • enhance the analysis widgets, especially for reports with deep call stacks

Once this rewrite is complete, v0.5 will be released. Further improvements, mainly new analysis widgets and better support of modern application servers (such as FrankenPHP), are planned during the lifetime of the v0.5.

This patch makes the full stats computation (required only for "fp"
& "trace" reporters) optional and deactivated for the "full"
reporter which is, as being the one used for web UI, the most
common use case.
The per-function call overhead is thus significantly reduced, going
from ~565ns to ~460ns.
As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 460ns.
- xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch uses, when writing to "full" report file,
spx_str_builder_append_long() instead of
spx_str_builder_append_double() when possible.

The per-function call overhead is thus significantly reduced, going
from ~460ns to ~427ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 427ns.
- xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch uses Zstandard compression for "full" report data file.
Zstandard is faster than zlib (DEFLATE) while achieving a better
compression ratio.

For instance a 208MB report is compressed to:
- 49MB with zlib / level 6.
- 43MB with Zstandard / level 1.

Zstandard is also OK for the web UI since it is supported by both
Firefox and Chromium since few years.

The per-function call overhead is thus significantly reduced, going
from ~427ns to ~273ns.

This patch allows SPX to match the peformance of Xhprof.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 273ns.
- Xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch makes "full" report data file format lighter through the
use of several techniques such as relative function IDs or delta
values for metrics.

For instance, the same report file weighs:
- 43MB without this patch.
- 14MB with this patch.

The per-function call overhead is thus significantly reduced, going
from 273ns to 221ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 221ns.
- Xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch adds `spx_output_stream_write()` and uses it when flushing
full "report"'s string buffer to its data file instead of
`spx_output_stream_write()` which avoid a useless call to `strlen()`
on a big string.

The per-function call overhead is thus slightly reduced, going
from ~221ns to ~218ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 218ns.
- Xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch adds various optimizations around "full" report use case
and the use of only "wt" & "zm" as selected metrics (i.e. the default
ones).

The per-function call overhead is thus significantly reduced, going
from ~218ns to ~172ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 172ns.
- Xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.
When Zstandard is not available, SPX can still be built, but the
"full" report will fallback to zlib, causing SPX's per-function
overhead to be significantly increased (from 330ns to 495ns on my
hardware).
This patch also improves anonymous function's (generated) name for
PHP 8.3- by replacing `{closure}` with `{closure:<file>:<line>}`.

On probing side, this patch does not have a noticable impact on
performance.

Resolves #169
This patch:
- adds Zend Engine's observer API support for PHP 8.2+ (not stable
    enough with PHP 8.0 & 8.1). Observer API is used by default (
    instead of execution hooks) bringing safer and JIT-compatible
    instrumentation in addition to a lower overhead. The use of the
    observer API can still be disabled via the `spx.use_observer_api`
    INI parameter.
- improves sampling profiler accuracy for long functions.
- improves and simplifies VM stack discovery as provided by
    spx_php.c, bringing better performances and correct stack view in
    sampling mode when the sample is a function call end.
- adds various micro-optimizations on tracer & reporter side.

This patch significantly reduces the per-function call overhead:
- in tracing mode: from 172ns to 139ns.
- in sampling mode (amortized per-function overhead): from 44ns to
    2ns with SPX_SAMPLING_PERIOD=1000 (i.e. 1ms sampling period).
    For comparisonn, Excimer's (v1.2.3) amortized per-function
    overhead with the same sampling period is <1ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- SPX v0.4's (v0.4.18) per-function call overhead is 565ns.
- current SPX's per-function call overhead is 139ns.
- Xhprof's (v2.3.10) per-function call overhead is 274ns.
- Xdebug's (v3.4.1) per-function call overead is 1595ns.

Resolves #215
This patch significantly reduces the per-function call overhead in
tracing mode, going from 139ns to 129ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- tracing mode per-function call overhead:
    - SPX v0.4.18: 565ns
    - SPX current: 129ns
    - Xhprof v2.3.10: 274ns
    - Xdebug v3.4.1: 1595ns
- sampling mode (1ms period) per-function call amortized overhead:
    - SPX v0.4.18: 45ns
    - SPX current: 2ns
    - Excimer v1.2.3: <1ns
This patch also fixes tests for old PHP versions such as 5.6 & 7.0.

This patch slightly increases, as expected, the per-function call
overhead in tracing mode, from 129ns to 134ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- tracing mode per-function call overhead:
    - SPX v0.4.18: 565ns
    - SPX current: 134ns
    - Xhprof v2.3.10: 274ns
    - Xdebug v3.4.1: 1595ns
- sampling mode (1ms period) per-function call amortized overhead:
    - SPX v0.4.18: 45ns
    - SPX current: 2ns
    - Excimer v1.2.3: <1ns
All crashes where caused by an inconsistent stack view from SPX's
point of view.

This patch brings the following main changes:
- add a dedicated stack tracking logic, with the ability to track
    special & frame-less (at ZE level) functions such as
    php_request_shutdown or zend_compile_file
- add workrounds to some ZE bugs (corrupted zend_execute_data
    records)
- fix sampling profiling mode's incorrect behavior when internal
    functions are not instrumented
- remove PHP 5 support since it is more than ever too costly to
    maintain
- remove uses of TSRMLS_* macros since PHP 5 is not supported anymore

This patch also slightly increases the per-function call overhead in
tracing mode, from 134ns to 138ns.

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- tracing mode per-function call overhead:
    - SPX v0.4.18: 565ns
    - SPX current: 138ns
    - Xhprof v2.3.10: 274ns
    - Xdebug v3.4.1: 1595ns
- sampling mode (1ms period) per-function call amortized overhead:
    - SPX v0.4.18: 45ns
    - SPX current: 2ns
    - Excimer v1.2.3: <1ns
This patch adds various optimizations, including the following:
- refactor spx_metric in order to increase the efficiency of
    metric-related processings (only process enabled metrics, and
    process them sequentially, simplified & merged loops...)
- simplify hooks management in spx_php (make code clearer and may
    give more room for optimization to the compiler)
- avoid the defensive use of strdup() in spx_profiler_tracer for
    empty strings
- remove a useless use of strlen() in spx_output_stream

Consequently, this patch significantly reduces the per-function
call overhead:
- in tracing mode: from 138ns to 123ns.
- in sampling mode (amortized per-function overhead): from 5ns to
    2ns with SPX_SAMPLING_PERIOD=1000 (i.e. 1ms sampling period).
    Sampling mode performance had indeed been degraded by the
    previous patch (from 2ns to 5ns).

As a reminder:
- timings have been measured on my hardware (AMD Ryzen 7 9700X)
    and with my php-profiler-bench project.
- tracing mode per-function call overhead:
    - SPX v0.4.18: 565ns
    - SPX current: 123ns
    - Xhprof v2.3.10: 274ns
    - Xdebug v3.4.1: 1595ns
- sampling mode (1ms period) per-function call amortized overhead:
    - SPX v0.4.18: 45ns
    - SPX current: 2ns
    - Excimer v1.2.3: <1ns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant