Skip to content

Commit ddb7dcb

Browse files
committed
fix: bump docs
1 parent 9a44b3e commit ddb7dcb

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/source/reference/launcher.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ Options:
5858
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.
5959

6060
Marlin kernels will be used automatically for GPTQ/AWQ models.
61-
62-
[env: QUANTIZE=]
6361

6462
Possible values:
6563
- awq: 4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
@@ -72,6 +70,8 @@ Options:
7270
- bitsandbytes-nf4: Bitsandbytes 4bit. Can be applied on any model, will cut the memory requirement by 4x, but it is known that the model will be much slower to run than the native f16
7371
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
7472
- fp8: [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
73+
74+
[env: QUANTIZE=]
7575

7676
```
7777
## SPECULATE
@@ -456,14 +456,14 @@ Options:
456456
```shell
457457
--usage-stats <USAGE_STATS>
458458
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
459-
460-
[env: USAGE_STATS=]
461-
[default: on]
462459
463460
Possible values:
464461
- on: Default option, usage statistics are collected anonymously
465462
- off: Disables all collection of usage statistics
466463
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
464+
465+
[env: USAGE_STATS=]
466+
[default: on]
467467

468468
```
469469
## PAYLOAD_LIMIT

0 commit comments

Comments
 (0)