-
Notifications
You must be signed in to change notification settings - Fork 19k
fix(huggingface): add stream_usage
support for ChatHuggingFace
invoke/stream
#32708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(huggingface): add stream_usage
support for ChatHuggingFace
invoke/stream
#32708
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub. |
CodSpeed WallTime Performance ReportMerging #32708 will not alter performanceComparing
|
CodSpeed Instrumentation Performance ReportMerging #32708 will not alter performanceComparing Summary
|
stream_usage
support for ChatHuggingFace
invoke/stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind sharing a reproducible snippet or adding a test to demonstrate the functionality?
It looks like token usage is already accessible via streaming when using HF Endpoints:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
llm = HuggingFaceEndpoint(
repo_id="openai/gpt-oss-120b",
task="conversational",
provider="fireworks-ai",
)
model = ChatHuggingFace(llm=llm)
full = None
for chunk in model.stream("hello"):
full = chunk if full is None else full + chunk
full.usage_metadata
@@ -492,6 +492,9 @@ class GetPopulation(BaseModel): | |||
"""Modify the likelihood of specified tokens appearing in the completion.""" | |||
streaming: bool = False | |||
"""Whether to stream the results or not.""" | |||
stream_usage: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this stream_usage: Optional[bool] = None
?
(langchain-openai mistakenly did not do this)
def _stream( | ||
self, | ||
messages: list[BaseMessage], | ||
stop: Optional[list[str]] = None, | ||
run_manager: Optional[CallbackManagerForLLMRun] = None, | ||
*, | ||
stream_usage: Optional[bool] = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement on _astream
as well?
Description:
This PR fixes an issue where
stream_usage
metadata was not being returned duringinvoke
orstream
calls for HuggingFace chat models.I updated
ChatHuggingFace
(viaChatHuggingFaceWithUsage
) to align withBaseChatOpenAI
behavior, ensuring usage information is properly included in streaming outputs.Issue: N/A (but addresses missing usage metadata in HuggingFace integration).
Dependencies: None
Twitter handle: None