Fix image_grid_thw to be in CPU #40394

cyyever · 2025-08-23T06:09:27Z

What does this PR do?

In vLLm inference and SFT training, there are lots of blocking operations of image_grid_thw such as torch.prod and torch.tolist, so let's always fix image_grid_thw to CPU to avoid them.
A simple grep gives the following examples:

src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:        split_sizes = (image_grid_thw.prod(-1) // self.visual.spatial_merge_size**2).tolist()
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:        split_sizes = (image_grid_thw.prod(-1) // self.visual.spatial_merge_size**2).tolist()
src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py:                    samples = torch.split(image_grid_thw, list(image_nums))
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:                    samples = torch.split(image_grid_thw, list(image_nums))
src/transformers/models/glm4v/modeling_glm4v.py:                    samples = torch.split(image_grid_thw, list(image_nums))

Signed-off-by: cyy <cyyever@outlook.com>

github-actions · 2025-08-23T10:49:43Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: glm4v, qwen2_vl

Rocketknight1 · 2025-08-25T11:50:28Z

cc @qubvel @zucchini-nlp

zucchini-nlp

I don't think we need this. The model needs all inputs to be in the same device and users who pass in device='cuda' will expect image grid tensors to be moved to CUDA when processing. This leads to inconsistencies in API

I'd better have vLLM move the tensor back to CPU if needed or call image processor with device="cpu"

qubvel · 2025-08-26T08:40:40Z

We might need to provide pixel_values.device to avoid creating a tensor on CPU by default, wdyt?

cyyever · 2025-08-26T09:04:22Z

@zucchini-nlp @qubvel This tensor is special because its main purpose is index manipulation. For such tensors for indexing, moving them to GPU doesn't provide acceleration.

The dilemma is that we want almost all of tensors to be in CUDA except some special ones, so using device for processor is not a good solution.
Admittedly this is a semantics-breaking change..

zucchini-nlp · 2025-08-26T09:42:21Z

This tensor is special because its main purpose is index manipulation. For such tensors for indexing, moving them to GPU doesn't provide acceleration.

I see, though it will be quite breaking and as noted above, will not be consistent with what image processing API does. IMO all inputs returned will need to be in the same device and type, as requested by users. So I agree with @qubvel on that both should be created on the same device as pixel_values.device, which can be "cuda" or "cpu" depending on input kwargs

cyyever force-pushed the use_cpu_image_grid_thw branch from 95e6e30 to 64526de Compare August 23, 2025 10:47

Fix image_grid_thw tensor to CPU

3e489ac

Signed-off-by: cyy <cyyever@outlook.com>

cyyever force-pushed the use_cpu_image_grid_thw branch from 64526de to 3e489ac Compare August 23, 2025 10:48

cyyever changed the title ~~Fix image_grid_thw tensor to CPU~~ Fix image_grid_thw to be in CPU Aug 23, 2025

zucchini-nlp reviewed Aug 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix image_grid_thw to be in CPU #40394

Fix image_grid_thw to be in CPU #40394

cyyever commented Aug 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 23, 2025

Uh oh!

Rocketknight1 commented Aug 25, 2025

Uh oh!

zucchini-nlp left a comment •

edited

Loading

Uh oh!

qubvel commented Aug 26, 2025

Uh oh!

cyyever commented Aug 26, 2025 •

edited

Loading

Uh oh!

zucchini-nlp commented Aug 26, 2025

Uh oh!

Uh oh!

Fix image_grid_thw to be in CPU #40394

Are you sure you want to change the base?

Fix image_grid_thw to be in CPU #40394

Conversation

cyyever commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Aug 23, 2025

Uh oh!

Rocketknight1 commented Aug 25, 2025

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qubvel commented Aug 26, 2025

Uh oh!

cyyever commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Aug 26, 2025

Uh oh!

Uh oh!

cyyever commented Aug 23, 2025 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading

cyyever commented Aug 26, 2025 •

edited

Loading