-
Notifications
You must be signed in to change notification settings - Fork 34
Fix backup retry behavior #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fad38c0
to
f5033bb
Compare
c1a2031
to
c9afd03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
// retrying the bulk import in backup/restore logic is handled manually. | ||
// retrying bulk export is also handled manually, because the default behavior is | ||
// to start at the beginning of the stream, which produces duplicate values. | ||
selector.StreamClientInterceptor(retry.StreamClientInterceptor(retryOpts...), selector.MatchFunc(isNoneOf(importBulkRoute, exportBulkRoute))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is half of the fix - we don't want to automatically retry ExportBulk requests, because then we're not properly handling restarting the stream.
internal/cmd/backup.go
Outdated
// Clone the request to ensure that we are keeping all other fields the same | ||
newReq := req.CloneVT() | ||
newReq.OptionalCursor = lastResponse.AfterResultCursor | ||
|
||
relationshipStream, err = spiceClient.ExportBulkRelationships(ctx, newReq) | ||
log.Info().Err(err).Str("cursor token", lastResponse.AfterResultCursor.Token).Msg("encountered retryable error, resuming stream after token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the other half of the fix - manually catching retryable errors and ensuring that we resume the stream in the correct spot.
func (m *mockClientForBackup) Recv() (*v1.ExportBulkRelationshipsResponse, error) { | ||
// If we've run through all our calls, return an EOF | ||
if m.recvCallIndex == len(m.recvCalls) { | ||
return nil, io.EOF | ||
} | ||
recvCall := m.recvCalls[m.recvCallIndex] | ||
m.recvCallIndex++ | ||
return recvCall() | ||
} | ||
|
||
func (m *mockClientForBackup) ExportBulkRelationships(_ context.Context, req *v1.ExportBulkRelationshipsRequest, _ ...grpc.CallOption) (grpc.ServerStreamingClient[v1.ExportBulkRelationshipsResponse], error) { | ||
if m.exportCalls == nil { | ||
// If the caller doesn't supply exportCalls, pass through | ||
return m, nil | ||
} | ||
if m.exportCallsIndex == len(m.exportCalls) { | ||
// If invoked too many times, fail the test | ||
m.t.FailNow() | ||
return m, nil | ||
} | ||
exportCall := m.exportCalls[m.exportCallsIndex] | ||
m.exportCallsIndex++ | ||
exportCall(m.t, req) | ||
return m, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little awkward/verbose, but it felt like the best way to represent multiple calls to each of these functions.
@@ -183,9 +183,9 @@ func TestRestorer(t *testing.T) { | |||
} | |||
} | |||
|
|||
type mockClient struct { | |||
type mockClientForRestore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming because I created a different one for Backup
9d3cef7
to
9a58bfc
Compare
9a58bfc
to
314ec3a
Compare
since at that point we would have not received any response and we wouldn't have a token. Also made sure to verify the tests failed if I removed the usage of the last known token, which led me to improve the assertions, as the tests were SIGSEV'ing too. Also inverted the !errors.Is(err, io.EOF) check, which is more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took the liberty to push some changes for an edge case where zed could SIGSEV if the export call fails on the first attempt. Everything else looks good to me!
// last received response. | ||
|
||
// Clone the request to ensure that we are keeping all other fields the same | ||
newReq := req.CloneVT() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to clone the request? does req.OptionalCursor = lastResponse.AfterResultCursor
not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't strictly need to, but it's not expensive and I don't like mutating function parameters.
Fixes #542
Description
Currently,
ExportBulk
is on the list of retryable methods. That means that when an export stream errors for any reason, the request is re-issued. This is a problem because it starts at the beginning and the relations are appended naively to the output. Ideally we'd resume from the place where the error occurred, and that's what this PR implements.TODO
Changes
ExportBulk
to the list of things that doesn't get retriedTesting
Review. See that tests pass.