JIT-inline F64SIN, F64COS, and F64TAN for reduced precision x87 path#5343
JIT-inline F64SIN, F64COS, and F64TAN for reduced precision x87 path#5343pmatos wants to merge 3 commits intoFEX-Emu:mainfrom
Conversation
|
I didn't manage to replicate the exact algorithm in the advsimd routines due to lack of registers, but I manage to rewrite it using a similar form with the available registers and got some good results: I am going to try to see if I can get similar results by jitting other f64 operations. |
|
AmpereA1A: Before: After: Cortex-A720/Radxa: After: Looks like Cortex-A720 is worse off in SINCOS because of the code emission. Would be interesting to keep it in the dispatcher and branch out so we don't annihilate icache. Kind of like the conversion operations. |
Ufff - inside the dispatcher I could use more registers so I tried to make the code closer to the advsimd in the arm optimized-routines repo. I have done similarly for other operations and I will push them as separate prs. If you run the same microbenchmarks as earlier do you get better results? |
|
Much better! So I guess the main question now is checking if the precision difference causes real problems or not? |
|
Oop, looks like something about this implementation breaks Mirror's Edge from running. |
That's odd - thanks for pointing that out. |
No description provided.