https://blog.lmcache.ai/2025-05-08-mooncake/ #27
Replies: 2 comments
-
For the model deployment, did you deploy a single model copy across all 8 GPUs or each GPU got one model copy? Also how many nodes did you use to test this feature? If GPUs are across different nodes i.e. 80 H100, (8 P5 instances) How does mooncake transfer KV Cache across instances? |
Beta Was this translation helpful? Give feedback.
0 replies
-
what's you test code to give these data? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
https://blog.lmcache.ai/2025-05-08-mooncake/
Overview of the Collaboration LMCache and Mooncake have announced a strategic collaboration aimed at pioneering a KVCache-centric Large Language Model (LLM) serving system. This partnership seeks to significantly enhance the efficiency, scalability, and responsiveness of LLM applications. By combining LMCache’s advanced KVCache management techniques with Mooncake’s powerful and optimized backend...
https://blog.lmcache.ai/2025-05-08-mooncake/
Beta Was this translation helpful? Give feedback.
All reactions