[NPUW] Pass import config when deserializing compiled submodels#34174
[NPUW] Pass import config when deserializing compiled submodels#34174darius-chirla wants to merge 5 commits intoopenvinotoolkit:masterfrom
Conversation
|
@esmirno can you please have a look |
|
@esmirno please review |
| if (ov::npuw::util::starts_with(device, "NPU")) { | ||
| // Pass NPU_RUN_INFERENCES_SEQUENTIALLY if NPUW_UNFOLD_IREQS is enabled | ||
| if (compiled->m_cfg.get<::intel_npu::NPUW_UNFOLD_IREQS>()) { | ||
| import_config["NPU_RUN_INFERENCES_SEQUENTIALLY"] = "YES"; |
There was a problem hiding this comment.
what i think the source of issue is that we have two initialization places for this device config, and for example in compile_model() there are already one more option:
EXCLUSIVE_ASYNC_REQUESTS that is not deserialized - not sure whether it is relevant to current issue.
I would propose serialize device config that was used to compile model, and avoid ifs on deserialisation path, for that we would need to keep track of pairs: {ov::compiled_model, extra-options} since not options are matter i'm not sure why model serialized cannot imported correctly, and why do we ever need config again.
So please create a follow-up tasks for addressing that.
There was a problem hiding this comment.
I think it is quite shady if we need to maintain the compilation config on import, I actually find it counter-intuitive.
If the model was compiled with a certain option, its compiled blob should preserve that option.
However, there's a distinction between compile-time and run-time options (despite they all are passed to .compile()) so we have this problem.
What's done in this PR is, I think, okay-ish.. Unless we'll come up with a better, generic solution.
dmatveev
left a comment
There was a problem hiding this comment.
Thanks @darius-chirla !
| if (ov::npuw::util::starts_with(device, "NPU")) { | ||
| // Pass NPU_RUN_INFERENCES_SEQUENTIALLY if NPUW_UNFOLD_IREQS is enabled | ||
| if (compiled->m_cfg.get<::intel_npu::NPUW_UNFOLD_IREQS>()) { | ||
| import_config["NPU_RUN_INFERENCES_SEQUENTIALLY"] = "YES"; |
There was a problem hiding this comment.
I think it is quite shady if we need to maintain the compilation config on import, I actually find it counter-intuitive.
If the model was compiled with a certain option, its compiled blob should preserve that option.
However, there's a distinction between compile-time and run-time options (despite they all are passed to .compile()) so we have this problem.
What's done in this PR is, I think, okay-ish.. Unless we'll come up with a better, generic solution.
|
build_jenkins |
|
build_jenkins |
1 similar comment
|
build_jenkins |
When importing compiled models during deserialization, pass the import configuration to ensure consistent behavior.
Details:
import_configfor NPU devices during deserializationNPU_RUN_INFERENCES_SEQUENTIALLYwhenNPUW_UNFOLD_IREQSis enabledimport_configparameter to allimport_models()callsTickets: