-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Labels
Team:Elastic-Agent-Control-PlaneLabel for the Agent Control Plane teamLabel for the Agent Control Plane teambugSomething isn't workingSomething isn't working
Description
Background
Actions dequeued from the dispatcher action store and dispatched to the coordinator upgrader:
- are not retried on failure
- Are not persisted across an agent restart (although they should be re-sent by fleet-server on the first checkin after restart?!)
Note: this is probably worth tracking as a separate investigation issue.
Issue
If Elastic-Defend and tamper protection of agent is enabled, action can remain stale in the bkgActions. Specifically we add the action invoking getAsyncContext
but we return an err without emptying bkgActions
in case Elastic-Defend can't acknowledge the upgrade action
elastic-agent/internal/pkg/agent/application/actions/handlers/handler_action_upgrade.go
Lines 53 to 74 in 3bcb53f
asyncCtx, runAsync := h.getAsyncContext(ctx, a, ack) | |
if !runAsync { | |
return nil | |
} | |
if h.tamperProtectionFn() { | |
// Find inputs that want to receive UPGRADE action | |
// Endpoint needs to receive a signed UPGRADE action in order to be able to uncontain itself | |
state := h.coord.State() | |
ucs := findMatchingUnitsByActionType(state, a.Type()) | |
if len(ucs) > 0 { | |
h.log.Debugf("handlerUpgrade: proxy/dispatch action '%+v'", a) | |
err := notifyUnitsOfProxiedAction(ctx, h.log, action, ucs, h.coord.PerformAction) | |
h.log.Debugf("handlerUpgrade: after action dispatched '%+v', err: %v", a, err) | |
if err != nil { | |
return err | |
} | |
} else { | |
// Log and continue | |
h.log.Debugf("No components running for %v action type", a.Type()) | |
} | |
} |
Impact
- Upgrade actions may remain permanently stuck in bkgActions.
- Subsequent upgrade attempts with the same version and source are ignored.
- Likely the cause of multiple recent internal error reports.
For confirmed bugs, please report:
- Version: All active releases
- Operating System: All
Metadata
Metadata
Assignees
Labels
Team:Elastic-Agent-Control-PlaneLabel for the Agent Control Plane teamLabel for the Agent Control Plane teambugSomething isn't workingSomething isn't working