Skip to content

Panicked with OpenAI-compatible mlx_lm.server #4276

@mkozjak

Description

@mkozjak

Describe the bug

Panicked with a working OpenAI-compatible mlx_lm.server on macOS 15.6.1.

To Reproduce
Steps to reproduce the behavior:

  1. Run mlx_lm.server with lmstudio-community/Qwen3-4B-MLX-4bit
  2. Ensure the server works
  3. Configure goose to use the server
  4. Make sure goose found all available models
  5. Query

Expected behavior
I expected to get a response from goose cli.

Screenshots

Image

Please provide following information:

  • OS & Arch: macOS 15.6.1 on MacBook Pro M1 2021 13"
  • Interface: CLI
  • Version: v1.5.0
  • Provider & Model: OpenAI - lmstudio-community/Qwen3-4B-MLX-4bit

Additional context
RUST_BACKTRACE=full:

starting session | provider: openai model: lmstudio-community/Qwen3-4B-MLX-4bit
    logging to /Users/mkozjak/.local/share/goose/sessions/20250822_124435.jsonl
    working directory: /Users/mkozjak

Goose is running! Enter your instructions, or try asking what goose can do.

Context: ○○○○○○○○○○ 0% (0/128000 tokens)
( O)> hello
◓  Mapping context vectors...
thread 'main' panicked at crates/goose/src/providers/formats/openai.rs:478:84:
index out of bounds: the len is 0 but the index is 0
stack backtrace:
◒  Mapping context vectors...                                                                                                   0:        0x1076fa18c - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hca5ddac00d0fe81d
   1:        0x107715744 - core::fmt::write::h345c34c786f611a2
   2:        0x1076c8a1c - std::io::Write::write_fmt::hf3ba17a7aed57370
   3:        0x1076fa084 - std::sys::backtrace::BacktraceLock::print::h5f3c3a4eb4693f7a
   4:        0x1076e0528 - std::panicking::default_hook::{{closure}}::haaa8c3be8e963b10
   5:        0x1076e0424 - std::panicking::default_hook::hf46b416954f6334b
   6:        0x1076e0b04 - std::panicking::rust_panic_with_hook::hae736c323e602f79
   7:        0x1076fa698 - std::panicking::begin_panic_handler::{{closure}}::h0f4c8174a03fed47
   8:        0x1076fa3b0 - std::sys::backtrace::__rust_end_short_backtrace::hc96ad38e8f461380
   9:        0x1076e0628 - __rustc[75b86e412cc6dff0]::rust_begin_unwind
  10:        0x1077db68c - core::panicking::panic_fmt::h6f546b4a6f225fa9
  11:        0x1077db80c - core::panicking::panic_bounds_check::hd382327e3f27e417
  12:        0x1069e0fdc - <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next::hfdbef9deedd00789
  13:        0x1069dd904 - <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next::h5f992b557c4d9d6c
  14:        0x1051b93cc - <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next::hd9f8b3b4fef7cac7
  15:        0x1051fad98 - goose::agents::agent::Agent::reply_internal::{{closure}}::{{closure}}::hc6aa0e48387db047
  16:        0x1051b64a8 - <async_stream::async_stream::AsyncStream<T,U> as futures_core::stream::Stream>::poll_next::h0584de3f3d89745c
  17:        0x104ffb108 - <core::future::poll_fn::PollFn<F> as core::future::future::Future>::poll::h79edfaa660883a4b
  18:        0x104f41208 - goose_cli::session::Session::process_agent_response::{{closure}}::h62f3e2efd1f5db36
  19:        0x104f383d8 - goose_cli::session::Session::interactive::{{closure}}::h9ed6dd2874531469
  20:        0x104f26e48 - goose_cli::cli::cli::{{closure}}::h5a957a9c79e6a21d
  21:        0x104f6ca0c - goose::main::{{closure}}::hf6ef7b5286b6dbd5
  22:        0x10501e5b8 - tokio::runtime::park::CachedParkThread::block_on::hbfb7e8eb15ce5b8d
  23:        0x10525e790 - goose::main::h25eb38abe2385176
  24:        0x10535aa1c - std::sys::backtrace::__rust_begin_short_backtrace::hf584e981f3000ed7
  25:        0x1052aa18c - std::rt::lang_start::{{closure}}::h60bce630afd490e8
  26:        0x1076deb84 - std::rt::lang_start_internal::h7aa982cc60ae55c0
  27:        0x10525ea4c - _main

A working sample using mlx_lm.chat:

[INFO] Starting chat session with lmstudio-community/Qwen3-4B-MLX-4bit.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> hello
<think>
Okay, the user said "hello". I need to respond appropriately. Since they're greeting me, I should respond with a friendly greeting. Maybe "Hi there! How can I assist you today?" That sounds good. It's welcoming and opens the door for them to ask questions. I should keep it simple and positive. Let me make sure there's no markdown and that the response is natural. Yep, that should work.
</think>

Hi there! How can I assist you today? 😊

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions