Prompt injection detection (simplified - only pattern matching) #4237

dorien-koelemeijer · 2025-08-21T02:13:47Z

Overview

This PR implements prompt injection detection using pattern-matching techniques, providing users with security warnings when potentially malicious tool calls are detected.

Working design doc with requirements

These notes are a result of pairing sessions with Douwe: https://docs.google.com/document/d/1OWh7Ab_eu8STaoplPA6AKF6AnXQNqXc8JYiAwmUYDTs/edit?tab=t.0

Implementation approach

Implemented pattern-based detection for prompt injection, scanning tool calls for potentially dangerous commands. Pattern-matching provides immediate protection while laying groundwork for future ML-based scanning with BERT models, which will be the next iteration of this work.

Security scanning was added in the reply_internal() method after permission checking, leveraging the existing tool approval workflow. The security scanner receives both proposed tool calls and full conversation history for contextual threat assessment (using BERT models and potentially other tools). In this iteration, the full conversation history is not considered, but we will do that once model scanning is integrated.

Security can be enabled/disabled via security.enabled config parameter with configurable confidence thresholds

Key changes

crates/goose/src/security/: New security module with scanner setup and patterns. Can be extended to support model-based scanning.
agent.rs: Added apply_security_results_to_permissions() to move security-flagged tools from approved to needs_approval
tool_execution.rs: Enhanced handle_approval_tool_requests_with_security() to include security warnings in confirmation prompts
ToolCallConfirmation.tsx: Added support for custom security warning prompts
GooseMessage.tsx: Updated to pass security context from backend to UI components

Configuration

security:
  enabled: true
  confidence_threshold: 0.5

Future work

Scan context using BERT model(s) - do some more research on what model/models we want to use + if there's a workaround for the token limit that these models can ingest.
Integrate security scanning into ToolMonitor
Make sure we cover all different avenues for prompt injection (files and images, google drive links, recipes, MCP tool results, etc)

…canning will be added in follow-up PR

…-pattern-matching-v1

… respected + add some more detail to ToolCall pop up for user

DOsinga

sorry got to this late - already day for you so sending you what I have

DOsinga · 2025-08-21T23:59:35Z

crates/goose-cli/src/session/mod.rs

                                // Format the confirmation prompt
                                let prompt = "Goose would like to call the above tool, do you allow?".to_string();

                                // Get confirmation from user
                                let permission_result = cliclack::select(prompt)
-                                    .item(Permission::AllowOnce, "Allow", "Allow the tool call once")
-                                    .item(Permission::AlwaysAllow, "Always Allow", "Always allow the tool call")


did you mean to delete this?

We want to make sure users can't always allow a tool call. I wasn't entirely sure whether you had mentioned to take it away entirely, or if we just want to take away that option for these security findings. Let me know what you think the better way is.

I think we want to leave that in, as it is a common mode

Will revert these changes, thanks for clarifying

DOsinga · 2025-08-22T00:01:28Z

crates/goose/src/agents/agent.rs

+
+        permission_result
+    }
+


As discussed before, we need to move this all out into its own infra rather than having the same logic for all the inspectors that decide whether to allow for a tool and then do the mixing

Have tried my best to address this comment, let me know if I've fully misunderstood what you meant here 🫠

crates/goose/src/security/mod.rs

DOsinga · 2025-08-22T00:05:27Z

crates/goose/src/security/mod.rs

+
+    /// Scan recipe components for security threats
+    /// This should be called when loading/applying recipes
+    pub async fn scan_recipe_components(


are we using this?

We're not, getting rid of this, probably copied that over from the previous branch.

DOsinga · 2025-08-22T00:07:32Z

crates/goose/src/security/mod.rs

+        }
+
+        // Scan recipe context (additional context data)
+        if let Some(context_items) = &recipe.context {


this code is duplicated from above. also I don't think we use .context. we use other things though

…an - hopefully correctly understanding the main comment on this PR

Pattern-matching only version of prompt injection detection - model s…

2407eaf

…canning will be added in follow-up PR

dorien-koelemeijer mentioned this pull request Aug 21, 2025

feat: Prompt injection detection #4021

Closed

dorien-koelemeijer changed the title ~~Prompt injection (simplified - only pattern matching)~~ Prompt injection detection (simplified - only pattern matching) Aug 21, 2025

Add some more commands + try to include obfuscation

15fe965

dorien-koelemeijer force-pushed the feat/prompt-injection-pattern-matching-v1 branch from 5d4f43d to 15fe965 Compare August 21, 2025 02:28

Merge remote-tracking branch 'origin/main' into feat/prompt-injection…

52bfaea

…-pattern-matching-v1

dorien-koelemeijer marked this pull request as draft August 21, 2025 04:24

dorien-koelemeijer added 2 commits August 21, 2025 14:48

some cleanup after testing

85971b2

Some further improvements in making sure threshold defined by user is…

aa2d7d2

… respected + add some more detail to ToolCall pop up for user

dorien-koelemeijer marked this pull request as ready for review August 21, 2025 05:55

DOsinga reviewed Aug 22, 2025

View reviewed changes

dorien-koelemeijer added 3 commits August 22, 2025 15:52

Try to add security scanning to the ToolMonitor and keep agent.rs cle…

8c779de

…an - hopefully correctly understanding the main comment on this PR

Clean up println

48a9486

fix match on boolean

5290d5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prompt injection detection (simplified - only pattern matching) #4237

Prompt injection detection (simplified - only pattern matching) #4237

Uh oh!

dorien-koelemeijer commented Aug 21, 2025 •

edited

Loading

Uh oh!

DOsinga left a comment

Uh oh!

DOsinga Aug 21, 2025

Uh oh!

dorien-koelemeijer Aug 22, 2025

Uh oh!

michaelneale Aug 22, 2025

Uh oh!

dorien-koelemeijer Aug 22, 2025

Uh oh!

DOsinga Aug 22, 2025

Uh oh!

dorien-koelemeijer Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

DOsinga Aug 22, 2025

Uh oh!

dorien-koelemeijer Aug 22, 2025

Uh oh!

DOsinga Aug 22, 2025

Uh oh!

Uh oh!

Prompt injection detection (simplified - only pattern matching) #4237

Are you sure you want to change the base?

Prompt injection detection (simplified - only pattern matching) #4237

Uh oh!

Conversation

dorien-koelemeijer commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Working design doc with requirements

Implementation approach

Key changes

Configuration

Future work

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dorien-koelemeijer commented Aug 21, 2025 •

edited

Loading