You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/reference/launcher.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,8 +58,6 @@ Options:
58
58
Quantization method to use for the model. It is not necessary to specify this option for pre-quantized models, since the quantization method is read from the model configuration.
59
59
60
60
Marlin kernels will be used automatically for GPTQ/AWQ models.
61
-
62
-
[env: QUANTIZE=]
63
61
64
62
Possible values:
65
63
- awq: 4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
@@ -72,6 +70,8 @@ Options:
72
70
- bitsandbytes-nf4: Bitsandbytes 4bit. Can be applied on any model, will cut the memory requirement by 4x, but it is known that the model will be much slower to run than the native f16
73
71
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
74
72
- fp8: [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
73
+
74
+
[env: QUANTIZE=]
75
75
76
76
```
77
77
## SPECULATE
@@ -456,14 +456,14 @@ Options:
456
456
```shell
457
457
--usage-stats <USAGE_STATS>
458
458
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
459
-
460
-
[env: USAGE_STATS=]
461
-
[default: on]
462
459
463
460
Possible values:
464
461
- on: Default option, usage statistics are collected anonymously
465
462
- off: Disables all collection of usage statistics
466
463
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
0 commit comments