We observe graph-breaks when running TORCH_LOGS="graph_breaks" python thunder/benchmarks/benchmark_inference.py --input-length 32 --output-length 32 --mode thunder --num-iterations 10. This can result in increased latency in the decode stage.
Thanks @wujingyue for highlighting this.
Request: Investigate and try to fix these graph-breaks. Also, investigate and fix the split reasons.
cc: @IvanYashchuk for assignment.
cc @crcrpar