-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Prompt injection detection (simplified - only pattern matching) #4237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prompt injection detection (simplified - only pattern matching) #4237
Conversation
…canning will be added in follow-up PR
5d4f43d
to
15fe965
Compare
…-pattern-matching-v1
… respected + add some more detail to ToolCall pop up for user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry got to this late - already day for you so sending you what I have
// Format the confirmation prompt | ||
let prompt = "Goose would like to call the above tool, do you allow?".to_string(); | ||
|
||
// Get confirmation from user | ||
let permission_result = cliclack::select(prompt) | ||
.item(Permission::AllowOnce, "Allow", "Allow the tool call once") | ||
.item(Permission::AlwaysAllow, "Always Allow", "Always allow the tool call") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make sure users can't always allow a tool call. I wasn't entirely sure whether you had mentioned to take it away entirely, or if we just want to take away that option for these security findings. Let me know what you think the better way is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to leave that in, as it is a common mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert these changes, thanks for clarifying
crates/goose/src/agents/agent.rs
Outdated
|
||
permission_result | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed before, we need to move this all out into its own infra rather than having the same logic for all the inspectors that decide whether to allow for a tool and then do the mixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tried my best to address this comment, let me know if I've fully misunderstood what you meant here 🫠
|
||
/// Scan recipe components for security threats | ||
/// This should be called when loading/applying recipes | ||
pub async fn scan_recipe_components( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not, getting rid of this, probably copied that over from the previous branch.
} | ||
|
||
// Scan recipe context (additional context data) | ||
if let Some(context_items) = &recipe.context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is duplicated from above. also I don't think we use .context. we use other things though
…an - hopefully correctly understanding the main comment on this PR
Overview
This PR implements prompt injection detection using pattern-matching techniques, providing users with security warnings when potentially malicious tool calls are detected.
Working design doc with requirements
These notes are a result of pairing sessions with Douwe: https://docs.google.com/document/d/1OWh7Ab_eu8STaoplPA6AKF6AnXQNqXc8JYiAwmUYDTs/edit?tab=t.0
Implementation approach
Implemented pattern-based detection for prompt injection, scanning tool calls for potentially dangerous commands. Pattern-matching provides immediate protection while laying groundwork for future ML-based scanning with BERT models, which will be the next iteration of this work.
Security scanning was added in the reply_internal() method after permission checking, leveraging the existing tool approval workflow. The security scanner receives both proposed tool calls and full conversation history for contextual threat assessment (using BERT models and potentially other tools). In this iteration, the full conversation history is not considered, but we will do that once model scanning is integrated.
Security can be enabled/disabled via security.enabled config parameter with configurable confidence thresholds
Key changes
crates/goose/src/security/
: New security module with scanner setup and patterns. Can be extended to support model-based scanning.agent.rs
: Addedapply_security_results_to_permissions()
to move security-flagged tools from approved to needs_approvaltool_execution.rs
: Enhancedhandle_approval_tool_requests_with_security()
to include security warnings in confirmation promptsToolCallConfirmation.tsx
: Added support for custom security warning promptsGooseMessage.tsx
: Updated to pass security context from backend to UI componentsConfiguration
Future work
ToolMonitor