We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents 4af66ce + 1640c9f commit 3931ca3Copy full SHA for 3931ca3
_posts/2025-08-05-gpt-oss-support.md
@@ -56,6 +56,9 @@ vLLM requires nightly built PyTorch to serve GPT models. To ensure compatibility
56
Install LMCache from source (this command may take a few minutes due to CUDA kernel compilations):
57
58
```bash
59
+git clone https://github.com/LMCache/lmcache.github.io.git
60
+cd lmcache
61
+
62
# In your virtual environment
63
ENABLE_CXX11_ABI=1 uv pip install -e . --no-build-isolation
64
```
@@ -84,7 +87,6 @@ max_local_cpu_size: 80
84
87
85
88
LMCACHE_CONFIG_FILE="./backend_cpu.yaml" \
86
89
LMCACHE_USE_EXPERIMENTAL=True \
-CUDA_VISIBLE_DEVICES=6,7 \
90
vllm serve \
91
openai/gpt-oss-120b \
92
--max-model-len 32768 \
0 commit comments