Skip to content

Conversation

girlsending0
Copy link

@girlsending0 girlsending0 commented Aug 27, 2025

Description:
This PR fixes an issue where stream_usage metadata was not being returned during invoke or stream calls for HuggingFace chat models.
I updated ChatHuggingFace (via ChatHuggingFaceWithUsage) to align with BaseChatOpenAI behavior, ensuring usage information is properly included in streaming outputs.

Issue: N/A (but addresses missing usage metadata in HuggingFace integration).

Dependencies: None

Twitter handle: None

Copy link

vercel bot commented Aug 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
langchain Ignored Ignored Preview Aug 31, 2025 7:21am

Copy link

codspeed-hq bot commented Aug 27, 2025

CodSpeed WallTime Performance Report

Merging #32708 will not alter performance

Comparing girlsending0:fix/add_stream_usage (074af3b) with master (fcf7175)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched benchmarks

Copy link

codspeed-hq bot commented Aug 27, 2025

CodSpeed Instrumentation Performance Report

Merging #32708 will not alter performance

Comparing girlsending0:fix/add_stream_usage (074af3b) with master (fcf7175)

Summary

✅ 14 untouched benchmarks

@mdrxy mdrxy changed the title fix(huggingface): add stream_usage support for ChatHuggingFace invoke/stream fix(huggingface): add stream_usage support for ChatHuggingFace invoke/stream Aug 27, 2025
@mdrxy mdrxy added the integration Related to a provider partner package integration label Aug 27, 2025
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind sharing a reproducible snippet or adding a test to demonstrate the functionality?

It looks like token usage is already accessible via streaming when using HF Endpoints:

from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint


llm = HuggingFaceEndpoint(
    repo_id="openai/gpt-oss-120b",
    task="conversational",
    provider="fireworks-ai",
)

model = ChatHuggingFace(llm=llm)

full = None
for chunk in model.stream("hello"):
    full = chunk if full is None else full + chunk

full.usage_metadata

@@ -492,6 +492,9 @@ class GetPopulation(BaseModel):
"""Modify the likelihood of specified tokens appearing in the completion."""
streaming: bool = False
"""Whether to stream the results or not."""
stream_usage: bool = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this stream_usage: Optional[bool] = None?

(langchain-openai mistakenly did not do this)

def _stream(
self,
messages: list[BaseMessage],
stop: Optional[list[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
*,
stream_usage: Optional[bool] = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we implement on _astream as well?

@ccurme ccurme self-assigned this Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration Related to a provider partner package integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants