Skip to content
@neuralmagic

Neural Magic

Neural Magic (Acquired by Red Hat) empowers developers to optimize & deploy LLMs at scale. Our model compression & acceleration enable top performance with vLLM

Pinned Loading

  1. deepsparse deepsparse Public archive

    Sparsity-aware deep learning inference runtime for CPUs

    Python 3.2k 190

Repositories

Showing 10 of 77 repositories
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    neuralmagic/vllm’s past year of commit activity
    Python 13 Apache-2.0 9,736 0 11 Updated Aug 25, 2025
  • compressed-tensors Public

    A safetensors extension to efficiently store sparse quantized tensors on disk

    neuralmagic/compressed-tensors’s past year of commit activity
    Python 153 Apache-2.0 22 5 19 Updated Aug 24, 2025
  • research Public

    Repository to enable research flows

    neuralmagic/research’s past year of commit activity
    Python 1 0 0 2 Updated Aug 21, 2025
  • neuralmagic/model-validation-configs’s past year of commit activity
    0 0 0 1 Updated Aug 20, 2025
  • axolotl Public Forked from axolotl-ai-cloud/axolotl

    Go ahead and axolotl questions

    neuralmagic/axolotl’s past year of commit activity
    Python 0 Apache-2.0 1,130 0 5 Updated Aug 10, 2025
  • nm-actions Public

    Neural Magic GHA

    neuralmagic/nm-actions’s past year of commit activity
    Python 0 Apache-2.0 0 0 4 Updated Aug 7, 2025
  • lmms-eval Public Forked from EvolvingLMMs-Lab/lmms-eval

    Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

    neuralmagic/lmms-eval’s past year of commit activity
    Python 0 361 0 9 Updated Aug 7, 2025
  • DeepEP Public Forked from deepseek-ai/DeepEP

    DeepEP: an efficient expert-parallel communication library

    neuralmagic/DeepEP’s past year of commit activity
    Cuda 0 MIT 910 0 0 Updated Jul 28, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    neuralmagic/flashinfer’s past year of commit activity
    Cuda 0 Apache-2.0 458 0 0 Updated Jul 18, 2025
  • DeepGEMM Public Forked from deepseek-ai/DeepGEMM

    DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

    neuralmagic/DeepGEMM’s past year of commit activity
    Python 0 MIT 684 0 0 Updated Jul 18, 2025