Hello, your code is well-crafted. It utilizes the xoscar plugin to achieve distributed scheduling. From an architectural standpoint, it employs only the actor_ref, which is more elegant compared to the HTTP interaction. You can refer to the specific code here: https://github.com/xorbitsai/inference/blob/6135eb66f1595d41a7210f9f64c3db97adf0364b/xinference/client/oscar/actor_client.py#L432C14-L432C14
Regarding the features of the RPC framework, approximately 40% of the code in xinference is dedicated to handling basic interactions: https://github.com/xorbitsai/inference/blob/main/xinference/core/supervisor.py and https://github.com/xorbitsai/inference/blob/main/xinference/core/worker.py
Hence, this framework seems destined to be unable to resolve conflicts that may arise when multiple LLMs collaborate: https://github.com/xorbitsai/inference/blob/main/xinference/model/core.py#L32
From the code, it appears that they are planning to place the TensorRT LLM in another project: https://github.com/xorbitsai/inference/blob/main/xinference/model/llm/core.py#L31