How to enforce either a text response or tool_use, but not both at the same time? #382

AlexVPopov · 2025-09-01T15:51:49Z

AlexVPopov
Sep 1, 2025

Hey folks 👋

TL;DR: Since RubyLLM automatically handles tool invocations, does it make sense for the LLM to ever return a response where content contains both text and tool_use? And is there a way to force the model to return either text or a tool call, but not both?

Context:

Model: claude-sonnet-4-20250514
Tool: ProductRecommender
tool_choice: auto (LLM decides when to invoke the tool)

Example query:

RubyLLM.ask "What do you see on the image?", with: "url-to-image"

Sometimes the response is just text:

"content": [
  {
    "type": "text",
    "text": "I see product X. Do you want a recommendation for a similar product?"
  }
]

Other times, it’s both text and tool_use:

"content": [
  {
    "type": "text",
    "text": "I see product X. Do you want a recommendation for a similar product?"
  },
  {
    "type": "tool_use",
    "id": "toolu_01WuLdB6aVijh3R78jai7eR9",
    "name": "ruby_llm_tools--product_recommender",
    "input": {
      "query": "Product X similar product"
    }
  }
]

Because RubyLLM handles tools automatically, the sequence looks like this under the hood:

Call LLM → What do you see on this image?
Response → I see product X… + tool_use
Tool is invoked
LLM is called again with the tool result
Final response → Here’s product Y

From the user’s perspective, it looks like:

User: What do you see in this image?
Assistant: Here's product Y

Ideally, what I’d like instead is one of two scenarios:

Scenario I: Explicit confirmation before tool use

User: "What do you see in this image?"
Assistant: "I see product X, do you want me to recommend a similar one?"
User: "Yes"
Assistant: "Here's product Y"

Scenario II: Seamless tool use incorporated into the answer

User: "What do you see on this image?"
Assistant: "I see product X. Also, here's product Y."

So the question is: Is it possible to configure the model so it always follows one of these two patterns, instead of mixing text and tool_use in the same response?

Answered by crmne

Sep 1, 2025

Yes just add your confirmation code in the execute function of the Tool. If the user doesn't allow tool use you can return something like "User didn't allow this tool".
You could tell the LLM to always use at least a tool with something like tool_choice: "required".

View full answer

crmne · 2025-09-01T15:58:59Z

crmne
Sep 1, 2025
Maintainer

Yes just add your confirmation code in the execute function of the Tool. If the user doesn't allow tool use you can return something like "User didn't allow this tool".
You could tell the LLM to always use at least a tool with something like tool_choice: "required".

1 reply

tpaulshippy Sep 1, 2025

With Anthropic, it would be tool_choice: { type: "tool", name: "tool_name" } - see https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#controlling-claude%E2%80%99s-output

tpaulshippy · 2025-09-01T17:03:53Z

tpaulshippy
Sep 1, 2025

You might try telling the LLM which pattern you'd like it to follow in your system prompt.

0 replies

AlexVPopov · 2025-09-03T14:01:37Z

AlexVPopov
Sep 3, 2025
Author

Hey @crmne and @tpaulshippy, thank you for your quick answers. They make sense, however, I'm not sure what's the right way to implement them.

Yes just add your confirmation code in the execute function of the Tool. If the user doesn't allow tool use you can return something like "User didn't allow this tool".

Here what I don't get is this - look at the tool execution flow from the docs, what trips me is this:

The model responds not with text, but with a special message indicating a tool call, including the tool name (weather_lookup) and arguments ({ latitude: 52.52, longitude: 13.40 }).

I expect the model's response to be something like:

    "content": [
        {
            "type": "tool_use",
            "id": "toolu_01WkoX6VFWauRwWPGGZA7k5d",
            "name": "ruby_llm_tools--invoke_rag",
            "input": {
                "query": "wireless earbuds bluetooth headphones"
            }
        }
    ],

instead, it's:

    "content": [
        {
            "type": "text",
            "text": "I can see Sony wireless earbuds in the image! They appear to be sitting on their charging case on what looks like a concrete surface, with a beach or waterfront area in the background. 🎧\n\nAre you interested in wireless earbuds? I'd be happy to help you find some great options from our collection! Let me search for what we have available for you."
        },
        {
            "type": "tool_use",
            "id": "toolu_01WkoX6VFWauRwWPGGZA7k5d",
            "name": "ruby_llm_tools--invoke_rag",
            "input": {
                "query": "wireless earbuds bluetooth headphones"
            }
        }
    ],

When you say: Yes just add your confirmation code in the execute function of the Tool. could you clarify what you mean?

I don't have confirmation code, I rely on the user to confirm whether they want to search for a product or not. Basically, the flow I want is either:

just return an answer:

    "content": [
        {
            "type": "text",
            "text": "I can see Sony wireless earbuds in the image! They appear to be sitting on their charging case on what looks like a concrete surface, with a beach or waterfront area in the background. 🎧\n\nAre you interested in wireless earbuds? I'd be happy to help you find some great options from our collection! Let me search for what we have available for you."
        }
    ],

get the user's approval (Yes, serach for similar products), respond with tool_use and give final answer

or

just return tool_use (no text), get tool result and then, using the tool result - respond to the original answer.

I understand this might probably be more of a question on how to properly build a prompt, not so much about the RubyLLM capabilities, so if you feel that way, please, disregard this.

@tpaulshippy

With Anthropic, it would be tool_choice: { type: "tool", name: "tool_name" }

The problem with this is - I don't want to always use a given tool, I need to offload the decision to the LLM, so enforcing a tool is not an option for me.

You might try telling the LLM which pattern you'd like it to follow in your system prompt.

Yes, after looking into Anthropic's API and RubyLLM, I tend to think that this is my only option.

6 replies

tpaulshippy Sep 3, 2025

I don't think giving the LLM the payload examples will work. The LLM probably doesn't understand how the API that wraps it works well enough. Just tell it how you want it to respond.

tpaulshippy Sep 3, 2025

Can you just ignore the text if you get a mixed response?

AlexVPopov Sep 3, 2025
Author

That's the question - how do I do that? I mean - RubyLLM automatically adds it to the messages inside the chat object, so it inevitably influences the conversation later.

tpaulshippy Sep 3, 2025

You can always manipulate the messages using code like this -- #352 (reply in thread)

AlexVPopov Sep 3, 2025
Author

Hey @tpaulshippy, yeah, I do that at the beginning of the session, that's how I build my chat:

chat = RubyLLM.chat
  .with_instructions(system_prompt)
  .with_tools(*ruby_llm_tools)
  .with_params(**ruby_llm_params)
  .tap { |ruby_llm_chat| ruby_llm_messages.each { ruby_llm_chat.add_message(_1) } }

However, I need to remove the text message in the middle of the flow. So basically, when I do:

answer = chat.ask "My question"

I need to be able to hook somehwere, inspect the intermediate response and clear out any text messages. I'll look into using chat.on_end_message.

AlexVPopov · 2025-09-03T15:41:49Z

AlexVPopov
Sep 3, 2025
Author

@crmne @tpaulshippy I finally found a solution!!!

chat.on_end_message { _1.content = '' if _1.tool_call? }

This results in exactly the flow I want - the second call to the LLM contains just the tool_use and the tool_result in the history, but not the intermediate text response of the LLM:

  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s this?"
        },
        {
          "type": "image",
          "source": {
            "type": "url",
            "url": "image-url"
          }
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_011pbtiU3pCBhWAvyWEkApx8",
          "name": "ruby_llm_tools--invoke_rag",
          "input": {
            "query": "the query"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": "toolu_011pbtiU3pCBhWAvyWEkApx8",
          "content": [
            {
              "type": "text",
              "text": "Tool result"
            }
          ]
        }
      ]
    }
  ]

This results in the LLM having a response, which both:

answer's the original question of the user;
incorporates the tool result.

Thank you very much, folks! 🙇 🙏

0 replies

Uh oh!

How to enforce either a text response or tool_use, but not both at the same time? #382

Uh oh!

AlexVPopov Sep 1, 2025

Replies: 4 comments · 7 replies

Uh oh!

crmne Sep 1, 2025 Maintainer

Uh oh!

Uh oh!

tpaulshippy Sep 1, 2025

Uh oh!

tpaulshippy Sep 1, 2025

Uh oh!

AlexVPopov Sep 3, 2025 Author

Uh oh!

tpaulshippy Sep 3, 2025

Uh oh!

tpaulshippy Sep 3, 2025

Uh oh!

AlexVPopov Sep 3, 2025 Author

Uh oh!

tpaulshippy Sep 3, 2025

Uh oh!

AlexVPopov Sep 3, 2025 Author

Uh oh!

AlexVPopov Sep 3, 2025 Author

AlexVPopov
Sep 1, 2025

Replies: 4 comments 7 replies

crmne
Sep 1, 2025
Maintainer

tpaulshippy
Sep 1, 2025

AlexVPopov
Sep 3, 2025
Author

AlexVPopov Sep 3, 2025
Author

AlexVPopov Sep 3, 2025
Author

AlexVPopov
Sep 3, 2025
Author