Skip to content

Feature Request: Implement Streaming for Ollama Models #3527

@AaronGoldsmith

Description

@AaronGoldsmith

Please explain the motivation behind the feature request.

The primary motivation is to improve the user experience and responsiveness for users interacting with Ollama models through Goose. Currently, Goose's Ollama integration waits for the entire model response to be generated before displaying any output.

This synchronous behavior leads to:

  • Noticeable Delays: Users experience frustrating waits for full responses, especially with longer generations.
  • Poor Interactivity: It hinders real-time feedback and makes interactive sessions less fluid.
  • Perceived Unresponsiveness: The application can feel sluggish, as opposed to modern LLM interfaces that stream output as it's generated.

Implementing streaming would enable:

  • Enhanced Real-time Interaction: Providing immediate feedback and a more dynamic experience.
  • Improved User Perception: Making Goose feel faster and more responsive when using Ollama.
  • Alignment with Modern LLM Practices: Bringing the Ollama integration in line with industry standards for generative AI.

This enhancement will directly benefit users who have selected Ollama as their provider within Goose, making their AI interactions much more efficient and enjoyable.

Describe the solution you'd like

Update the Goose Ollama integration (ollama.rs) to support streaming responses from the Ollama API.

This involves:

  1. Including stream: true in the Ollama API request.
  2. Modifying the complete function to process the API's response as a stream of server-sent events (SSE).
  3. Incrementally processing and displaying data chunks as they arrive, rather than waiting for the full response.

Describe alternatives you've considered

No viable alternatives exist for achieving true streaming without modifying the core ollama.rs logic. Pseudo-streaming via polling or using an external proxy are inefficient and add unnecessary complexity, failing to deliver a genuine real-time experience.

  • I have verified this does not duplicate an existing feature request

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions