|
| 1 | +# PyTorchEngine Multi-Node Deployment Guide |
| 2 | + |
| 3 | +To support larger-scale model deployment requirements, PyTorchEngine provides multi-node deployment support. Below are the detailed steps for deploying a `tp=16` model across two 8-GPU nodes. |
| 4 | + |
| 5 | +## 1. Create Docker Containers (Optional) |
| 6 | + |
| 7 | +To ensure consistency across the cluster environment, it is recommended to use Docker to set up the cluster. Create containers on each node as follows: |
| 8 | + |
| 9 | +```bash |
| 10 | +docker run -it \ |
| 11 | + --network host \ |
| 12 | + -v $MODEL_PATH:$CONTAINER_MODEL_PATH \ |
| 13 | + openmmlab/lmdeploy:latest |
| 14 | +``` |
| 15 | + |
| 16 | +> \[!IMPORTANT\] |
| 17 | +> Ensure that the model is placed in the same directory on all node containers. |
| 18 | +
|
| 19 | +## 2. Set Up the Cluster Using Ray |
| 20 | + |
| 21 | +### 2.1 Start the Head Node |
| 22 | + |
| 23 | +Select one node as the **head node** and run the following command in its container: |
| 24 | + |
| 25 | +```bash |
| 26 | +ray start --head --port=$DRIVER_PORT |
| 27 | +``` |
| 28 | + |
| 29 | +### 2.2 Join the Cluster |
| 30 | + |
| 31 | +On the other nodes, use the following command in their containers to join the cluster created by the head node: |
| 32 | + |
| 33 | +```bash |
| 34 | +ray start --address=$DRIVER_NODE_ADDR:$DRIVER_PORT |
| 35 | +``` |
| 36 | + |
| 37 | +run `ray status` on head node to check the cluster. |
| 38 | + |
| 39 | +> \[!IMPORTANT\] |
| 40 | +> Ensure that `DRIVER_NODE_ADDR` is the address of the head node and `DRIVER_PORT` matches the port number used during the head node initialization. |
| 41 | +
|
| 42 | +## 3. Use LMDeploy Interfaces |
| 43 | + |
| 44 | +In the head node's container, you can use all functionalities of PyTorchEngine as usual. |
| 45 | + |
| 46 | +### 3.1 Start the Server |
| 47 | + |
| 48 | +```bash |
| 49 | +lmdeploy serve api_server \ |
| 50 | + $CONTAINER_MODEL_PATH \ |
| 51 | + --backend pytorch \ |
| 52 | + --tp 16 |
| 53 | +``` |
| 54 | + |
| 55 | +### 3.2 Use the Pipeline |
| 56 | + |
| 57 | +```python |
| 58 | +from lmdeploy import pipeline, PytorchEngineConfig |
| 59 | + |
| 60 | +if __name__ == '__main__': |
| 61 | + model_path = '/path/to/model' |
| 62 | + backend_config = PytorchEngineConfig(tp=16) |
| 63 | + with pipeline(model_path, backend_config=backend_config) as pipe: |
| 64 | + outputs = pipe('Hakuna Matata') |
| 65 | +``` |
| 66 | + |
| 67 | +> \[!NOTE\] |
| 68 | +> PyTorchEngine will automatically choose the appropriate launch method (single-node/multi-node) based on the `tp` parameter and the number of devices available in the cluster. If you want to enforce the use of the Ray cluster, you can configure `distributed_executor_backend='ray'` in `PytorchEngineConfig` or use the environment variable `LMDEPLOY_EXECUTOR_BACKEND=ray`. |
| 69 | +
|
| 70 | +______________________________________________________________________ |
| 71 | + |
| 72 | +By following the steps above, you can successfully deploy PyTorchEngine in a multi-node environment and leverage the Ray cluster for distributed computing. |
| 73 | + |
| 74 | +> \[!WARNING\] |
| 75 | +> To achieve better performance, we recommend users to configure a higher-quality network environment (such as using [InfiniBand](https://en.wikipedia.org/wiki/InfiniBand)) to improve engine efficiency. |
0 commit comments