Commit b2fc9a2
committed
[None][feat] AutoDeploy ONNX export
[none][feat] Add AutoDeploy export-onnx mode
Add a new mode "export-onnx" to AutoDeploy.
The mode is almost identical to the default one with 2 difference:
1. Fuse torch_rope_with_explicit_cos_sin &
torch_cached_attention_with_cache into onnx_rope_attnetion
2. The result is not TRT Engine but .onnx
Files added:
- export_onnx.py: The transformation to fuse the ops
- graph_module_visualizer.py: Convert GraphModule to .dot
- examples/onnx_export_llm.py: Example usage
- onnx_driveos_llm.yaml: The new mode config file
- onnx_attnetion.py: The definition of the fused op
[none][feat] fix small graphviz bug, remove useless code
[none][feat] Rename mode from onnx_driveos_llm to export_driveos_llm_onnx
[none][feat] Rename export_onnx.py to fuse_rope_attention.py
[none][feat] Annotate .meta['val'] with add_graph_input()
[none][feat] Successfully export .onnx
[none][feat] Add set_kvcache_placeholder_metadata transform
[none][feat] Skip torch_cached_attention_prepare_metadata
[none][feat] Fix SetKVCachePlaceholderMetadata transform
[none][feat] Remove unused placeholder of prepare_metadata
[none][feat] Fix to run DeepSeek-R1
[none][feat] Add remove_graph_input, refactor remove_unused_placeholder()
[none][feat] Merge K&V cache placeholder
[none][feat] Replace sin_cos with input
[none][feat] Manually fuse rope & attn
[none][feat] Export torch_attention_bsnd_grouped_sdpa with dynamic shape
[none][feat] Manually match rope & attn, not replace yet
[none][feat] Successfully export ONNX with dynamic input
[none][feat] Hack out_spec to add graph output
[none][feat] Fix present_key_values shape
[none][feat] Fix input & output names
[none][feat] Change out_spec in add_graph_output
[none][feat] Fix export of torch_linear_simple
The original translation misses a transpose on the weight.
[none][feat] Fix present_key_values shape
[none][feat] Rewire reshape's new shape as TRT-LLM edge
[none][feat] Fix non-text rebase conflicts
[none][feat] Fix AttentionPlugin domain. should be "" not "ai.onnx"
[none][feat] Enhance visualize, use .meta["val"] instead of .meta["tensor_meta"]
[none][feat] Fix visualize tensor width calculation
When calculate the width of the tensor, check it the dimension is a int or SymInt.
The original implementation accidentally introduce constraints to the symbol int.
I don't execlty know how it happen. actually I don't think it should
introduce new constraints, but it dose.
[none][feat] Fix output dynamic batch_size
Originally max batch size is 2, however, don't know why, when set to 2,the batch_size will collapse to literal static int 2 even we explicitly it is dynamic axis.
And more weird, when set to 13, the batch_size will be dynamic.
default=13, # to enable dynamic batch_size, the match size must > 1
[none][feat] Rename fuse_rope_attention_manually to fuse_rope_attention
[none][feat] Remove fuse_rope_attention.py
[none][feat] Rewire reshape to make the graph like Luxiao's
[none][feat] Fix last_token_ids dtype from i32 to i64
[none][feat] Catch up update to date DriveOS LLM
- Add placeholder kvcache_start_index
- AttentionPlugin add input kvcache_start_index
- Insert Unsqueeze -1 before GatherND
- rope_rotary_cos_sin dynamic axis name changed from
rope_max_position_length to max_position_embeddings
- logits' dtype should be float32, insert a cast
- Insert cast to f16 before AttentionPlugin
- All cast to bf16 should be f16
[none][feat] Catch up update to date DriveOS LLM
- model.half() convert whole model to f16, including weight
- Remove AttentionPlugin attribute kv_cache_capacity & max_batch_size
- AttentionPlugin output[1] shape infer by seq_len + past_len
- AttentionPlugin domain changed from `onnx.ai` to `trt`
- Placeholder `kvcache_start_index` dynamic axes changed from `batch_size` to `kv_cache_start_batch_size`
[none][feat] Catch up-to-date main
[none][feat] Add test for fuse_rope_attention transform
- Add test for fuse_rope_attention
- Enhance run_test_transformed_gm support Module with multiple input
- Fix add_graph_output for graph with only one _LEAF_SPEC
[none][feat] Add unit test for fuse_rope_attn
- Add a unit test
- Fix add_graph_output when out_spec is _LEAF_SPEC
[none][feat] Export .json files
[none][feat] add AutoDeploy export onnx end-to-end test
[none][feat] Export ONNX with cpu to reduce GPU memory footprint
[none][feat] Use model.config to get head_dim, instead of using literal
Signed-off-by: Po-Han Huang <[email protected]>
Signed-off-by: yoco xiao <[email protected]>
[none][feat] Visualize graph only when env var AD_DEBUG_VISUALIZE_DIR is set
- Now we don't visualize by default, only when AD_DEBUG_VISUALIZE_DIR is set.
- Also, AD_DEBUG_VISUALIZE_DIR is the output dir, so you can specify the output dir
- Simplify the logging message, move lots of msg to debug instead of info
- Add .cursor to .gitignore
Signed-off-by: yoco xiao <[email protected]>1 parent 355e06d commit b2fc9a2
File tree
22 files changed
+3931
-16
lines changed- docker/common
- examples/auto_deploy
- tensorrt_llm/_torch/auto_deploy
- config
- custom_ops
- transform
- library
- utils
- tests/unittest/_torch/auto_deploy
- _utils_test
- unit/singlegpu
- transformations/library
22 files changed
+3931
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| |||
Lines changed: 148 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
0 commit comments