-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Fix image_grid_thw to be in CPU #40394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
95e6e30
to
64526de
Compare
Signed-off-by: cyy <cyyever@outlook.com>
64526de
to
3e489ac
Compare
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v, qwen2_vl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. The model needs all inputs to be in the same device and users who pass in device='cuda'
will expect image grid tensors to be moved to CUDA when processing. This leads to inconsistencies in API
I'd better have vLLM move the tensor back to CPU if needed or call image processor with device="cpu"
We might need to provide |
@zucchini-nlp @qubvel This tensor is special because its main purpose is index manipulation. For such tensors for indexing, moving them to GPU doesn't provide acceleration. The dilemma is that we want almost all of tensors to be in CUDA except some special ones, so using |
I see, though it will be quite breaking and as noted above, will not be consistent with what image processing API does. IMO all inputs returned will need to be in the same device and type, as requested by users. So I agree with @qubvel on that both should be created on the same device as |
What does this PR do?
In vLLm inference and SFT training, there are lots of blocking operations of image_grid_thw such as
torch.prod
andtorch.tolist
, so let's always fix image_grid_thw to CPU to avoid them.A simple grep gives the following examples: