-
Notifications
You must be signed in to change notification settings - Fork 213
Open
Description
When converting model to ONNX GenAI Runtime - Model converts using OLIVE
When running the model with AITK or Foundry Local
NPU - Error 5005 Qualcomm Snapdragon issue microsoft/Foundry-Local#67
GPU - Error Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync
error for GPU
2025-08-07 20:27:35.858 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-08-07T20:27:35.857568+08:00 Loading model:gpt-oss-20b-cuda-gpu
2025-08-07 20:27:35.865 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-08-07T20:27:35.8645887+08:00 Failed loading model:gpt-oss-20b-cuda-gpu error: [Load model from /home/lokinfey/.aitk/models/Microsoft/gpt-oss-20b-cuda-gpu/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=9]., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x56
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x730
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x549]
2025-08-07 20:27:35.866 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-08-07T20:27:35.864944+08:00 Finish loading model:gpt-oss-20b-cuda-gpu elapsed time:00:00:00.0073828
2025-08-07 20:27:35.905 [error] Failed loading model gpt-oss-20b-cuda-gpu. Load model from /home/lokinfey/.aitk/models/Microsoft/gpt-oss-20b-cuda-gpu/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=9].
2025-08-07 20:27:41.699 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1400] 2025-08-07T20:27:41.6987518+08:00 Loading model:gpt-oss-20b-cuda-gpu
2025-08-07 20:27:41.708 [info] Error: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1402] 2025-08-07T20:27:41.7079342+08:00 Failed loading model:gpt-oss-20b-cuda-gpu error: [Load model from /home/lokinfey/.aitk/models/Microsoft/gpt-oss-20b-cuda-gpu/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=9]., at Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess(IntPtr) + 0x56
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx.LoadModelAsync(String, String, String, CancellationToken) + 0x730
at Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderBase`1.<EnsureModelLoadedAsync>d__44.MoveNext() + 0x549]
2025-08-07 20:27:41.709 [info] Information: Microsoft.Neutron.OpenAI.Provider.OpenAIServiceProviderOnnx [1401] 2025-08-07T20:27:41.7082954+08:00 Finish loading model:gpt-oss-20b-cuda-gpu elapsed time:00:00:00.0095458
2025-08-07 20:27:41.749 [error] Failed loading model gpt-oss-20b-cuda-gpu. Load model from /home/lokinfey/.aitk/models/Microsoft/gpt-oss-20b-cuda-gpu/model.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/Add/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),"","","model.layers.0.attn.sinks": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node(/model/layers.0/attn/GroupQueryAttention) with schema(com.microsoft::GroupQueryAttention:1) has input size 12 not in range [min=7, max=9].
To Reproduce
Steps to reproduce the behavior:
- Convert GPT OSS Model using OLIVE 4 x H200 required
- Run in AITK Playground
- Error recieved above for NPU or GPU
- See error
Expected behavior
Would expect this run
Metadata
Metadata
Assignees
Labels
No labels