Skip to content

How to profile on XPU #856

@xiaohoua

Description

@xiaohoua

Describe the issue

I use torch.profiler to see every op's run time on XPU. here is my test code , i use
activities = [ProfilerActivity.CPU, ProfilerActivity.XPU] but terminal only output CPU's information.

code

import torch
import torchvision.models as models
from torch.profiler import profile, ProfilerActivity, record_function
# import intel_extension_for_pytorch as ipex  # 确保安装了Intel Extension for PyTorch
# print(f"IPEX version: {ipex.__version__}")
# from ipex_llm import optimize_model

# 检查XPU设备是否可用
if not hasattr(torch, 'xpu') or not torch.xpu.is_available():
    print("Intel XPU device is not available. Please install Intel Extension for PyTorch.")
    print("You can install it using: pip install intel-extension-for-pytorch")
    exit(1)

# 设置设备为XPU
device = "xpu"
print(f"Using device: {device}")

# 创建模型并移动到XPU设备
model = models.resnet18()
model = model.to(device)

# model = optimize_model(model)
# model = ipex.optimize(model, dtype=torch.float32)

# 创建输入并移动到XPU设备
inputs = torch.randn(5, 3, 224, 224).to(device)

activities = [ProfilerActivity.CPU, ProfilerActivity.XPU]

# 1. 基本性能分析
print("===== Basic Profiling on XPU =====")
with profile(activities=activities, record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)
        
# 打印关键指标表格
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))

# 2. 内存分析
print("\n===== Memory Profiling on XPU =====")
with profile(activities=activities, profile_memory=True, record_shapes=True) as prof:
    model(inputs)
    
print("\n===== XPU Time Analysis =====")
print(prof.key_averages().table(sort_by="xpu_time_total", row_limit=15))

print("\n===== XPU Memory Analysis =====")
print(prof.key_averages().table(sort_by="self_xpu_memory_usage", row_limit=15))


# 3. 导出Chrome trace
print("\n===== Exporting Chrome Trace =====")
with profile(activities=activities) as prof:
    model(inputs)
    
prof.export_chrome_trace("xpu_trace.json")
print("Trace saved to xpu_trace.json")

# 4. 长运行作业分析
print("\n===== Profiling Long-running Job on XPU =====")
def trace_handler(p):
    output = p.key_averages().table(sort_by="cpu_time_total", row_limit=10)
    print(f"Profiling step {p.step_num}:")
    print(output)
    p.export_chrome_trace(f"xpu_trace_step_{p.step_num}.json")

with profile(
    activities=activities,
    schedule=torch.profiler.schedule(wait=1, warmup=1, active=2),
    on_trace_ready=trace_handler,
) as p:
    for idx in range(8):
        model(inputs)
        p.step()
output

[W902 11:14:24.000000000 OperatorEntry.cpp:161] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> ()
    registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterCPU.cpp:30477
       new kernel: registered at G:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\aten\generated\ATen\RegisterXPU.cpp:468 (function operator ())
Using device: xpu
===== Basic Profiling on XPU =====
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                   model_inference         0.97%       8.983ms       100.00%     926.351ms     926.351ms             1
                      aten::conv2d         0.08%     753.988us        44.31%     410.499ms      20.525ms            20
                 aten::convolution         0.21%       1.990ms        44.23%     409.745ms      20.487ms            20
                aten::_convolution         0.04%     392.795us        44.02%     407.755ms      20.388ms            20
    aten::convolution_overrideable        43.39%     401.918ms        43.97%     407.362ms      20.368ms            20
                        aten::add_        19.33%     179.090ms        19.33%     179.090ms       6.396ms            28
         aten::adaptive_avg_pool2d         0.02%     184.097us        17.13%     158.657ms     158.657ms             1
                        aten::mean        17.11%     158.473ms        17.11%     158.473ms     158.473ms             1
                  aten::batch_norm         0.03%     312.396us        11.03%     102.222ms       5.111ms            20
      aten::_batch_norm_impl_index         0.08%     719.290us        11.00%     101.910ms       5.095ms            20
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 926.351ms


===== Memory Profiling on XPU =====

===== XPU Time Analysis =====
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                      aten::conv2d         0.72%      66.599us        58.21%       5.361ms     268.065us           0 b           0 b            20
                 aten::convolution         2.04%     188.197us        57.48%       5.295ms     264.735us           0 b           0 b            20
                aten::_convolution         1.26%     115.799us        55.44%       5.107ms     255.325us           0 b           0 b            20
    aten::convolution_overrideable        53.28%       4.908ms        54.18%       4.991ms     249.535us           0 b           0 b            20
                       aten::empty         2.07%     190.898us         2.07%     190.898us       2.386us           0 b           0 b            80
                          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us           0 b           0 b            40
                        aten::add_         7.85%     723.487us         7.85%     723.487us      25.839us           0 b           0 b            28
                  aten::batch_norm         0.52%      48.197us        20.57%       1.895ms      94.758us           0 b           0 b            20
      aten::_batch_norm_impl_index         1.11%     101.999us        20.05%       1.847ms      92.349us           0 b           0 b            20
           aten::native_batch_norm        15.41%       1.420ms        18.45%       1.700ms      84.979us           0 b           0 b            20
                  aten::empty_like         0.52%      48.299us         1.44%     132.898us       6.645us           0 b           0 b            20
               aten::empty_strided         0.92%      84.599us         0.92%      84.599us       4.230us           0 b           0 b            20
                     aten::reshape         0.48%      44.099us         0.85%      78.498us       1.962us           0 b           0 b            40
                        aten::view         0.50%      45.999us         0.50%      45.999us       0.754us           0 b           0 b            61
                       aten::relu_         1.22%     112.597us        10.22%     941.084us      55.358us           0 b           0 b            17
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 9.211ms


===== XPU Memory Analysis =====
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                      aten::conv2d         0.72%      66.599us        58.21%       5.361ms     268.065us           0 b           0 b            20
                 aten::convolution         2.04%     188.197us        57.48%       5.295ms     264.735us           0 b           0 b            20
                aten::_convolution         1.26%     115.799us        55.44%       5.107ms     255.325us           0 b           0 b            20
    aten::convolution_overrideable        53.28%       4.908ms        54.18%       4.991ms     249.535us           0 b           0 b            20
                       aten::empty         2.07%     190.898us         2.07%     190.898us       2.386us           0 b           0 b            80
                          [memory]         0.00%       0.000us         0.00%       0.000us       0.000us           0 b           0 b            40
                        aten::add_         7.85%     723.487us         7.85%     723.487us      25.839us           0 b           0 b            28
                  aten::batch_norm         0.52%      48.197us        20.57%       1.895ms      94.758us           0 b           0 b            20
      aten::_batch_norm_impl_index         1.11%     101.999us        20.05%       1.847ms      92.349us           0 b           0 b            20
           aten::native_batch_norm        15.41%       1.420ms        18.45%       1.700ms      84.979us           0 b           0 b            20
                  aten::empty_like         0.52%      48.299us         1.44%     132.898us       6.645us           0 b           0 b            20
               aten::empty_strided         0.92%      84.599us         0.92%      84.599us       4.230us           0 b           0 b            20
                     aten::reshape         0.48%      44.099us         0.85%      78.498us       1.962us           0 b           0 b            40
                        aten::view         0.50%      45.999us         0.50%      45.999us       0.754us           0 b           0 b            61
                       aten::relu_         1.22%     112.597us        10.22%     941.084us      55.358us           0 b           0 b            17
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 9.211ms


===== Exporting Chrome Trace =====
Trace saved to xpu_trace.json

===== Profiling Long-running Job on XPU =====
Profiling step 4:
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     ProfilerStep*        16.12%       2.507ms       100.00%      15.547ms       7.773ms             2
                  aten::batch_norm         0.85%     132.098us        34.00%       5.286ms     132.160us            40
      aten::_batch_norm_impl_index         0.79%     122.897us        33.15%       5.154ms     128.858us            40
           aten::native_batch_norm        28.94%       4.499ms        31.92%       4.962ms     124.061us            40
                      aten::conv2d         0.44%      68.598us        25.41%       3.951ms      98.781us            40
                 aten::convolution         1.50%     233.198us        24.97%       3.883ms      97.066us            40
                aten::_convolution         0.99%     153.196us        23.47%       3.649ms      91.236us            40
    aten::convolution_overrideable        21.46%       3.336ms        22.49%       3.496ms      87.406us            40
                        aten::add_        15.52%       2.412ms        15.52%       2.412ms      43.074us            56
                       aten::relu_         1.28%     198.794us         5.64%     877.582us      25.811us            34
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 15.547ms

Profiling step 8:
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                     ProfilerStep*        19.75%       2.728ms       100.00%      13.811ms       6.905ms             2
                      aten::conv2d         0.51%      70.799us        29.17%       4.028ms     100.701us            40
                 aten::convolution         1.76%     243.296us        28.65%       3.957ms      98.931us            40
                  aten::batch_norm         1.40%     193.097us        28.63%       3.954ms      98.861us            40
      aten::_batch_norm_impl_index         0.84%     115.900us        27.23%       3.761ms      94.034us            40
                aten::_convolution         1.12%     155.194us        26.89%       3.714ms      92.849us            40
           aten::native_batch_norm        22.41%       3.096ms        25.92%       3.580ms      89.506us            40
    aten::convolution_overrideable        24.75%       3.419ms        25.77%       3.559ms      88.969us            40
                        aten::add_        10.19%       1.408ms        10.19%       1.408ms      25.139us            56
                       aten::relu_         1.86%     256.997us         8.26%       1.141ms      33.552us            34
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 13.811ms


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions