Skip to content

fix(snapshot): Replace sleeps with yield #5619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Aug 4, 2025

Blocked on new background fibers from helio romange/helio#402.

Background tasks should be as cooperative as possible to not increase the latency. We yield after each bucket traversal either by pushing to the channel or by yielding explicitly

@dranikpg dranikpg marked this pull request as ready for review August 11, 2025 07:04
Comment on lines 541 to 432
fb2::Fiber::Opts fb_opts{.priority = fb2::FiberPriority::BACKGROUND, .name = "heatbeat"};
fiber_heartbeat_periodic_ =
MakeFiber([this, index = pb->GetPoolIndex(), period_ms, heartbeat]() mutable {
fb2::Fiber(fb_opts, [this, index = pb->GetPoolIndex(), period_ms, heartbeat]() mutable {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Heartbeat should be background
  2. If its background, it doesn't penalize other background fibers on idle
  3. We can split it into stages to increase simplicity and have dynamic load on it

Comment on lines 333 to 336
// Yield after possibly long cpu slice due to compression and serialization
if (ThisFiber::Priority() == fb2::FiberPriority::BACKGROUND)
ThisFiber::Yield();
// TODO: else Sleep() to provide write backpressure in advance?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we yielded after the write, now we yield before because that's the end of the "CPU" slice (that the scheduler masures). There shoudn't be any difference as long as writes are not async (not sure why we didn't add that yet) 🤔

@@ -164,8 +164,11 @@ class SliceSnapshot : public journal::JournalConsumerInterface {

// Used for sanity checks.
bool serialize_bucket_running_ = false;

std::string snapshot_fb_name_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helio fiber stores a string view, previously it pointed just to a stack placed string 😳

@romange romange added this to the v1.34 milestone Aug 11, 2025
@dranikpg
Copy link
Contributor Author

Will fix heartbeat after romange/helio#449

@dranikpg dranikpg force-pushed the replication-yields branch 2 times, most recently from 6925b6d to e3946d5 Compare August 12, 2025 13:27
@dranikpg
Copy link
Contributor Author

dranikpg commented Aug 13, 2025

image

@dranikpg
Copy link
Contributor Author

I didn't manage to find a "winning" strategy for finding the best scheduler configuration, so I just resorted to finding the best default value 😞 Parameters for one of cloud's snapshots

image

@romange
Copy link
Collaborator

romange commented Aug 24, 2025

Please rebase. Is it ready for review @dranikpg ?

@dranikpg dranikpg force-pushed the replication-yields branch from e3946d5 to c96b31f Compare August 24, 2025 17:20
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants