-
Notifications
You must be signed in to change notification settings - Fork 297
Open
Description
Describe the issue
I use torch.profiler to see every op's run time on XPU. here is my test code , i use
activities = [ProfilerActivity.CPU, ProfilerActivity.XPU]
but terminal only output CPU's information.
code
import torch
import torchvision.models as models
from torch.profiler import profile, ProfilerActivity, record_function
# import intel_extension_for_pytorch as ipex # 确保安装了Intel Extension for PyTorch
# print(f"IPEX version: {ipex.__version__}")
# from ipex_llm import optimize_model
# 检查XPU设备是否可用
if not hasattr(torch, 'xpu') or not torch.xpu.is_available():
print("Intel XPU device is not available. Please install Intel Extension for PyTorch.")
print("You can install it using: pip install intel-extension-for-pytorch")
exit(1)
# 设置设备为XPU
device = "xpu"
print(f"Using device: {device}")
# 创建模型并移动到XPU设备
model = models.resnet18()
model = model.to(device)
# model = optimize_model(model)
# model = ipex.optimize(model, dtype=torch.float32)
# 创建输入并移动到XPU设备
inputs = torch.randn(5, 3, 224, 224).to(device)
activities = [ProfilerActivity.CPU, ProfilerActivity.XPU]
# 1. 基本性能分析
print("===== Basic Profiling on XPU =====")
with profile(activities=activities, record_shapes=True) as prof:
with record_function("model_inference"):
model(inputs)
# 打印关键指标表格
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
# 2. 内存分析
print("\n===== Memory Profiling on XPU =====")
with profile(activities=activities, profile_memory=True, record_shapes=True) as prof:
model(inputs)
print("\n===== XPU Time Analysis =====")
print(prof.key_averages().table(sort_by="xpu_time_total", row_limit=15))
print("\n===== XPU Memory Analysis =====")
print(prof.key_averages().table(sort_by="self_xpu_memory_usage", row_limit=15))
# 3. 导出Chrome trace
print("\n===== Exporting Chrome Trace =====")
with profile(activities=activities) as prof:
model(inputs)
prof.export_chrome_trace("xpu_trace.json")
print("Trace saved to xpu_trace.json")
# 4. 长运行作业分析
print("\n===== Profiling Long-running Job on XPU =====")
def trace_handler(p):
output = p.key_averages().table(sort_by="cpu_time_total", row_limit=10)
print(f"Profiling step {p.step_num}:")
print(output)
p.export_chrome_trace(f"xpu_trace_step_{p.step_num}.json")
with profile(
activities=activities,
schedule=torch.profiler.schedule(wait=1, warmup=1, active=2),
on_trace_ready=trace_handler,
) as p:
for idx in range(8):
model(inputs)
p.step()
output
[W902 11:14:24.000000000 OperatorEntry.cpp:161] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterCPU.cpp:30477
new kernel: registered at G:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\aten\generated\ATen\RegisterXPU.cpp:468 (function operator ())
Using device: xpu
===== Basic Profiling on XPU =====
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
model_inference 0.97% 8.983ms 100.00% 926.351ms 926.351ms 1
aten::conv2d 0.08% 753.988us 44.31% 410.499ms 20.525ms 20
aten::convolution 0.21% 1.990ms 44.23% 409.745ms 20.487ms 20
aten::_convolution 0.04% 392.795us 44.02% 407.755ms 20.388ms 20
aten::convolution_overrideable 43.39% 401.918ms 43.97% 407.362ms 20.368ms 20
aten::add_ 19.33% 179.090ms 19.33% 179.090ms 6.396ms 28
aten::adaptive_avg_pool2d 0.02% 184.097us 17.13% 158.657ms 158.657ms 1
aten::mean 17.11% 158.473ms 17.11% 158.473ms 158.473ms 1
aten::batch_norm 0.03% 312.396us 11.03% 102.222ms 5.111ms 20
aten::_batch_norm_impl_index 0.08% 719.290us 11.00% 101.910ms 5.095ms 20
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 926.351ms
===== Memory Profiling on XPU =====
===== XPU Time Analysis =====
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::conv2d 0.72% 66.599us 58.21% 5.361ms 268.065us 0 b 0 b 20
aten::convolution 2.04% 188.197us 57.48% 5.295ms 264.735us 0 b 0 b 20
aten::_convolution 1.26% 115.799us 55.44% 5.107ms 255.325us 0 b 0 b 20
aten::convolution_overrideable 53.28% 4.908ms 54.18% 4.991ms 249.535us 0 b 0 b 20
aten::empty 2.07% 190.898us 2.07% 190.898us 2.386us 0 b 0 b 80
[memory] 0.00% 0.000us 0.00% 0.000us 0.000us 0 b 0 b 40
aten::add_ 7.85% 723.487us 7.85% 723.487us 25.839us 0 b 0 b 28
aten::batch_norm 0.52% 48.197us 20.57% 1.895ms 94.758us 0 b 0 b 20
aten::_batch_norm_impl_index 1.11% 101.999us 20.05% 1.847ms 92.349us 0 b 0 b 20
aten::native_batch_norm 15.41% 1.420ms 18.45% 1.700ms 84.979us 0 b 0 b 20
aten::empty_like 0.52% 48.299us 1.44% 132.898us 6.645us 0 b 0 b 20
aten::empty_strided 0.92% 84.599us 0.92% 84.599us 4.230us 0 b 0 b 20
aten::reshape 0.48% 44.099us 0.85% 78.498us 1.962us 0 b 0 b 40
aten::view 0.50% 45.999us 0.50% 45.999us 0.754us 0 b 0 b 61
aten::relu_ 1.22% 112.597us 10.22% 941.084us 55.358us 0 b 0 b 17
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 9.211ms
===== XPU Memory Analysis =====
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::conv2d 0.72% 66.599us 58.21% 5.361ms 268.065us 0 b 0 b 20
aten::convolution 2.04% 188.197us 57.48% 5.295ms 264.735us 0 b 0 b 20
aten::_convolution 1.26% 115.799us 55.44% 5.107ms 255.325us 0 b 0 b 20
aten::convolution_overrideable 53.28% 4.908ms 54.18% 4.991ms 249.535us 0 b 0 b 20
aten::empty 2.07% 190.898us 2.07% 190.898us 2.386us 0 b 0 b 80
[memory] 0.00% 0.000us 0.00% 0.000us 0.000us 0 b 0 b 40
aten::add_ 7.85% 723.487us 7.85% 723.487us 25.839us 0 b 0 b 28
aten::batch_norm 0.52% 48.197us 20.57% 1.895ms 94.758us 0 b 0 b 20
aten::_batch_norm_impl_index 1.11% 101.999us 20.05% 1.847ms 92.349us 0 b 0 b 20
aten::native_batch_norm 15.41% 1.420ms 18.45% 1.700ms 84.979us 0 b 0 b 20
aten::empty_like 0.52% 48.299us 1.44% 132.898us 6.645us 0 b 0 b 20
aten::empty_strided 0.92% 84.599us 0.92% 84.599us 4.230us 0 b 0 b 20
aten::reshape 0.48% 44.099us 0.85% 78.498us 1.962us 0 b 0 b 40
aten::view 0.50% 45.999us 0.50% 45.999us 0.754us 0 b 0 b 61
aten::relu_ 1.22% 112.597us 10.22% 941.084us 55.358us 0 b 0 b 17
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 9.211ms
===== Exporting Chrome Trace =====
Trace saved to xpu_trace.json
===== Profiling Long-running Job on XPU =====
Profiling step 4:
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
ProfilerStep* 16.12% 2.507ms 100.00% 15.547ms 7.773ms 2
aten::batch_norm 0.85% 132.098us 34.00% 5.286ms 132.160us 40
aten::_batch_norm_impl_index 0.79% 122.897us 33.15% 5.154ms 128.858us 40
aten::native_batch_norm 28.94% 4.499ms 31.92% 4.962ms 124.061us 40
aten::conv2d 0.44% 68.598us 25.41% 3.951ms 98.781us 40
aten::convolution 1.50% 233.198us 24.97% 3.883ms 97.066us 40
aten::_convolution 0.99% 153.196us 23.47% 3.649ms 91.236us 40
aten::convolution_overrideable 21.46% 3.336ms 22.49% 3.496ms 87.406us 40
aten::add_ 15.52% 2.412ms 15.52% 2.412ms 43.074us 56
aten::relu_ 1.28% 198.794us 5.64% 877.582us 25.811us 34
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 15.547ms
Profiling step 8:
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
ProfilerStep* 19.75% 2.728ms 100.00% 13.811ms 6.905ms 2
aten::conv2d 0.51% 70.799us 29.17% 4.028ms 100.701us 40
aten::convolution 1.76% 243.296us 28.65% 3.957ms 98.931us 40
aten::batch_norm 1.40% 193.097us 28.63% 3.954ms 98.861us 40
aten::_batch_norm_impl_index 0.84% 115.900us 27.23% 3.761ms 94.034us 40
aten::_convolution 1.12% 155.194us 26.89% 3.714ms 92.849us 40
aten::native_batch_norm 22.41% 3.096ms 25.92% 3.580ms 89.506us 40
aten::convolution_overrideable 24.75% 3.419ms 25.77% 3.559ms 88.969us 40
aten::add_ 10.19% 1.408ms 10.19% 1.408ms 25.139us 56
aten::relu_ 1.86% 256.997us 8.26% 1.141ms 33.552us 34
---------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 13.811ms
Metadata
Metadata
Assignees
Labels
No labels