-
Notifications
You must be signed in to change notification settings - Fork 75
How to use
igardev edited this page Aug 15, 2025
·
1 revision
llama-vscode is an extension for code completion, chat with ai and agentic coding, focused on local model usage with llama.cpp.
- Install llama.cpp
- Show llama-vscode menu by clicking "llama-vscode" in the status bar or by Ctrl+Shift+M, and select 'Install/upgrade llama.cpp' (sometimes restart is needed to adjust the paths to llama-server)
- Select env (group of models) for your needs from llama-vscode menu.
- This will download (only the first time) the models and run llama.cpp servers locally (or use external servers endpoints, depends on env)
- Start using llama-vscode
- For code completion - just start typing (uses completion model)
- For edit code with AI - select code, right click and select 'llama-vscode Edit Selected Text with AI' (uses chat model, no tools support required)
- For chat with AI (quick questions to (local) AI instead of searching with google) - select 'Chat with AI' from llama.vscode menu (uses chat model, no tools support required, llama.cpp server should run on model endpoint.)
- For agentic coding - select 'Show Llama Agent' from llama.vscode menu (or Ctrl+Shift+A) and start typing your questions or requests (uses tools model and embeddings model for some tools, most intelligence needed, local usage supported, but you could also use external, paid providers for better results)
If you want to use llama-vscode only for code completion - you could disable RAG from llama-vscode menu to avoid indexing files.
If you are an existing user - you could continue using llama-vscode as before.
For more details - select 'View Documentation' from llama-vscode menu