Skip to content

Commit f25cd52

Browse files
committed
Adding ccl_enabled flag during model loading and passing CCL lists during compilation process
Signed-off-by: Vahid Janfaza <[email protected]>
1 parent 0912c39 commit f25cd52

File tree

4 files changed

+7
-7
lines changed

4 files changed

+7
-7
lines changed

examples/ccl_gpt_oss.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,16 @@
1515
## Use the optional comp_ctx_lengths argument to provide two lists of context lengths for the prefilling and decoding processes. If comp_ctx_lengths=None, the model will run with its default context length.
1616
## - The first list, comp_ctx_lengths_prefill, defines the compute-context-length values for the prefilling process.
1717
## -- The process starts with the first value in the list and gradually increases the context length based on the position_id of the current prompt chunk.
18-
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
18+
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
1919
## -- During decoding, the model selects an appropriate context length from the list based on the input prompt length and cache index.
2020
## -- It starts from the correct value in the list and increases the context length dynamically when the cache index exceeds the current threshold.
2121

2222
ctx_len = 4096
2323
# In moe models like gpt-oss, since prefill_seq_len=1 both comp_ctx_lengths_prefill and comp_ctx_lengths_decode can share similar lists.
2424
# Set the list of ccl during prefilling process
25-
comp_ctx_lengths_prefill = [512, ctx_len] #None #
25+
comp_ctx_lengths_prefill = [512, ctx_len] # None #
2626
# Set the list of ccl during decoding process
27-
comp_ctx_lengths_decode = [512, ctx_len] #None #
27+
comp_ctx_lengths_decode = [512, ctx_len] # None #
2828

2929

3030
qeff_model = QEFFAutoModelForCausalLM.from_pretrained(

examples/ccl_llama4_CB_example_vision_lang.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
## Use the optional comp_ctx_lengths argument to provide two lists of context lengths for the prefilling and decoding processes. If comp_ctx_lengths=None, the model will run with its default context length.
2424
## - The first list, comp_ctx_lengths_prefill, defines the compute-context-length values for the prefilling process.
2525
## -- The process starts with the first value in the list and gradually increases the context length based on the position_id of the current prompt chunk.
26-
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
26+
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
2727
## -- During decoding, the model selects an appropriate context length from the list based on the input prompt length and cache index.
2828
## -- It starts from the correct value in the list and increases the context length dynamically when the cache index exceeds the current threshold.
2929

examples/ccl_llama4_multi_image_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
## Use the optional comp_ctx_lengths argument to provide two lists of context lengths for the prefilling and decoding processes. If comp_ctx_lengths=None, the model will run with its default context length.
2222
## - The first list, comp_ctx_lengths_prefill, defines the compute-context-length values for the prefilling process.
2323
## -- The process starts with the first value in the list and gradually increases the context length based on the position_id of the current prompt chunk.
24-
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
24+
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
2525
## -- During decoding, the model selects an appropriate context length from the list based on the input prompt length and cache index.
2626
## -- It starts from the correct value in the list and increases the context length dynamically when the cache index exceeds the current threshold.
2727

examples/ccl_qwen2_5_vl_CB.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
## Use the optional comp_ctx_lengths argument to provide two lists of context lengths for the prefilling and decoding processes. If comp_ctx_lengths=None, the model will run with its default context length.
2323
## - The first list, comp_ctx_lengths_prefill, defines the compute-context-length values for the prefilling process.
2424
## -- The process starts with the first value in the list and gradually increases the context length based on the position_id of the current prompt chunk.
25-
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
25+
## - The second list, comp_ctx_lengths_decode, defines the compute-context-length values for the decoding process.
2626
## -- During decoding, the model selects an appropriate context length from the list based on the input prompt length and cache index.
2727
## -- It starts from the correct value in the list and increases the context length dynamically when the cache index exceeds the current threshold.
2828

@@ -81,7 +81,7 @@
8181
processor=processor,
8282
images=image_urls,
8383
generation_len=100,
84-
device_ids=[28,29,30,31],
84+
device_ids=[28, 29, 30, 31],
8585
)
8686
print(output.generated_ids)
8787
print(tokenizer.batch_decode(output.generated_ids))

0 commit comments

Comments
 (0)