Skip to content

adjust HPU warmup: use dummy inputs with shape more close to real scenario #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2025

Conversation

kaixuanliu
Copy link
Contributor

@kaixuanliu kaixuanliu commented Jul 29, 2025

In original implementation, we use dummy inputs with shapes like [1,128], [1,256],[2,128],[2,256] to do warmup, aiming to generate recipe cache in warmup stage. And in real serving scenario, we padding the input_ids/attention_masks to shapes cached in warmup stage. However, we found precision issue for reranker models following tei docs . We think it may be because wrong graphs/recipe was used during replay stage. Hence we adjust the create_warmup_batch function in this PR, to make the dummy inputs more close to real scenario, hence during warmup stage, these inputs will also be padded in python backend and will generate right recipe caching/graph, which will be the same with serving stage. We made several round experiments, and the wrong output issue disappears after this PR.

@kaixuanliu
Copy link
Contributor Author

@regisss , pls help review, thx!

…nario to avoid wrong output from reranker model

Signed-off-by: Liu, Kaixuan <[email protected]>
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss
Copy link
Collaborator

regisss commented Aug 8, 2025

cc @Narsil
The methods warmup_hpu and create_warmup_batch are only used on HPU

@regisss regisss merged commit c8ff435 into huggingface:main Aug 8, 2025
2 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants