-
Notifications
You must be signed in to change notification settings - Fork 93
v0.5 #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
NoiseByNorthwest
wants to merge
15
commits into
master
Choose a base branch
from
v0.5
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
v0.5 #301
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch makes the full stats computation (required only for "fp" & "trace" reporters) optional and deactivated for the "full" reporter which is, as being the one used for web UI, the most common use case. The per-function call overhead is thus significantly reduced, going from ~565ns to ~460ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 460ns. - xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch uses, when writing to "full" report file, spx_str_builder_append_long() instead of spx_str_builder_append_double() when possible. The per-function call overhead is thus significantly reduced, going from ~460ns to ~427ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 427ns. - xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch uses Zstandard compression for "full" report data file. Zstandard is faster than zlib (DEFLATE) while achieving a better compression ratio. For instance a 208MB report is compressed to: - 49MB with zlib / level 6. - 43MB with Zstandard / level 1. Zstandard is also OK for the web UI since it is supported by both Firefox and Chromium since few years. The per-function call overhead is thus significantly reduced, going from ~427ns to ~273ns. This patch allows SPX to match the peformance of Xhprof. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 273ns. - Xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch makes "full" report data file format lighter through the use of several techniques such as relative function IDs or delta values for metrics. For instance, the same report file weighs: - 43MB without this patch. - 14MB with this patch. The per-function call overhead is thus significantly reduced, going from 273ns to 221ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 221ns. - Xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch adds `spx_output_stream_write()` and uses it when flushing full "report"'s string buffer to its data file instead of `spx_output_stream_write()` which avoid a useless call to `strlen()` on a big string. The per-function call overhead is thus slightly reduced, going from ~221ns to ~218ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 218ns. - Xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
This patch adds various optimizations around "full" report use case and the use of only "wt" & "zm" as selected metrics (i.e. the default ones). The per-function call overhead is thus significantly reduced, going from ~218ns to ~172ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 172ns. - Xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns.
When Zstandard is not available, SPX can still be built, but the "full" report will fallback to zlib, causing SPX's per-function overhead to be significantly increased (from 330ns to 495ns on my hardware).
This patch also improves anonymous function's (generated) name for PHP 8.3- by replacing `{closure}` with `{closure:<file>:<line>}`. On probing side, this patch does not have a noticable impact on performance. Resolves #169
This patch: - adds Zend Engine's observer API support for PHP 8.2+ (not stable enough with PHP 8.0 & 8.1). Observer API is used by default ( instead of execution hooks) bringing safer and JIT-compatible instrumentation in addition to a lower overhead. The use of the observer API can still be disabled via the `spx.use_observer_api` INI parameter. - improves sampling profiler accuracy for long functions. - improves and simplifies VM stack discovery as provided by spx_php.c, bringing better performances and correct stack view in sampling mode when the sample is a function call end. - adds various micro-optimizations on tracer & reporter side. This patch significantly reduces the per-function call overhead: - in tracing mode: from 172ns to 139ns. - in sampling mode (amortized per-function overhead): from 44ns to 2ns with SPX_SAMPLING_PERIOD=1000 (i.e. 1ms sampling period). For comparisonn, Excimer's (v1.2.3) amortized per-function overhead with the same sampling period is <1ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - SPX v0.4's (v0.4.18) per-function call overhead is 565ns. - current SPX's per-function call overhead is 139ns. - Xhprof's (v2.3.10) per-function call overhead is 274ns. - Xdebug's (v3.4.1) per-function call overead is 1595ns. Resolves #215
This patch significantly reduces the per-function call overhead in tracing mode, going from 139ns to 129ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - tracing mode per-function call overhead: - SPX v0.4.18: 565ns - SPX current: 129ns - Xhprof v2.3.10: 274ns - Xdebug v3.4.1: 1595ns - sampling mode (1ms period) per-function call amortized overhead: - SPX v0.4.18: 45ns - SPX current: 2ns - Excimer v1.2.3: <1ns
This patch also fixes tests for old PHP versions such as 5.6 & 7.0. This patch slightly increases, as expected, the per-function call overhead in tracing mode, from 129ns to 134ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - tracing mode per-function call overhead: - SPX v0.4.18: 565ns - SPX current: 134ns - Xhprof v2.3.10: 274ns - Xdebug v3.4.1: 1595ns - sampling mode (1ms period) per-function call amortized overhead: - SPX v0.4.18: 45ns - SPX current: 2ns - Excimer v1.2.3: <1ns
All crashes where caused by an inconsistent stack view from SPX's point of view. This patch brings the following main changes: - add a dedicated stack tracking logic, with the ability to track special & frame-less (at ZE level) functions such as php_request_shutdown or zend_compile_file - add workrounds to some ZE bugs (corrupted zend_execute_data records) - fix sampling profiling mode's incorrect behavior when internal functions are not instrumented - remove PHP 5 support since it is more than ever too costly to maintain - remove uses of TSRMLS_* macros since PHP 5 is not supported anymore This patch also slightly increases the per-function call overhead in tracing mode, from 134ns to 138ns. As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - tracing mode per-function call overhead: - SPX v0.4.18: 565ns - SPX current: 138ns - Xhprof v2.3.10: 274ns - Xdebug v3.4.1: 1595ns - sampling mode (1ms period) per-function call amortized overhead: - SPX v0.4.18: 45ns - SPX current: 2ns - Excimer v1.2.3: <1ns
This patch adds various optimizations, including the following: - refactor spx_metric in order to increase the efficiency of metric-related processings (only process enabled metrics, and process them sequentially, simplified & merged loops...) - simplify hooks management in spx_php (make code clearer and may give more room for optimization to the compiler) - avoid the defensive use of strdup() in spx_profiler_tracer for empty strings - remove a useless use of strlen() in spx_output_stream Consequently, this patch significantly reduces the per-function call overhead: - in tracing mode: from 138ns to 123ns. - in sampling mode (amortized per-function overhead): from 5ns to 2ns with SPX_SAMPLING_PERIOD=1000 (i.e. 1ms sampling period). Sampling mode performance had indeed been degraded by the previous patch (from 2ns to 5ns). As a reminder: - timings have been measured on my hardware (AMD Ryzen 7 9700X) and with my php-profiler-bench project. - tracing mode per-function call overhead: - SPX v0.4.18: 565ns - SPX current: 123ns - Xhprof v2.3.10: 274ns - Xdebug v3.4.1: 1595ns - sampling mode (1ms period) per-function call amortized overhead: - SPX v0.4.18: 45ns - SPX current: 2ns - Excimer v1.2.3: <1ns
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The v0.5 branch is currently under development, but I’m making it publicly available in its current state so anyone can benefit from the ongoing improvements.
Most of the work so far has focused on deep optimizations within the profiler core. As a result, the overhead introduced by SPX is now lower than that of XHProf. Compared to SPX v0.4, the overhead in v0.5 has been reduced by a factor of 4.
While I haven’t had time to publish the benchmark I use to compare the overhead of Xdebug, SPX, and XHProf, I plan to make it public closer to the end of v0.5’s development. In the meantime, all details about the optimizations can be found in the commit log.
If you try out this branch, I’d be very interested in hearing your feedback (especially whether you notice improvements in performance and overhead).
The remaining work before v0.5 can be officially released is a full rewrite of the web interface. This rewrite aims to:
Once this rewrite is complete, v0.5 will be released. Further improvements, mainly new analysis widgets and better support of modern application servers (such as FrankenPHP), are planned during the lifetime of the v0.5.