Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

Conversation

phlrain
Copy link

@phlrain phlrain commented Apr 3, 2023

其中
cinn/ir/fuse_block_model_fp16_test.cc
是softmax 在fp16下的测试case,
kernel耗时,86微秒, 接近phi kernel的 82 微秒,

但是落后torch的 77.47 微秒

原因是,部分for loop没有进行merge,需要进一步merge,手动merge后,实测性能为 75 微秒,能够追平torch的实现

@paddle-bot
Copy link

paddle-bot bot commented Apr 3, 2023

Thanks for your contribution!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant