Oneflow-Inc · mosout · Feb 4, 2024 · Feb 4, 2024
diff --git a/README.md b/README.md
@@ -13,7 +13,7 @@ OneFlow Backend For Triton Inference Server
 
 ## Get Started
 
-Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started.
+Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started. [Building the Docker image](./doc/build.md) is necessary before you start.
 
 1. Download and save model
 
@@ -27,7 +27,7 @@ Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to
   ```
   cd ../../  # back to root of the serving
   docker run --rm --runtime=nvidia --network=host -v$(pwd)/examples:/models \
-    oneflowinc/oneflow-serving
+    serving:final
   curl -v localhost:8000/v2/health/ready  # ready check
   ```
 

diff --git a/doc/build.md b/doc/build.md
@@ -1,101 +1,32 @@
 # Build From Source
 
-To use the model server, you can just pull the image `oneflowinc/oneflow-serving` from docker hub. Only when you want to modify the source code, you need to build from source.
-
-You can build on bare metal, and you can also pull the docker image and follow the instructions below to build in docker container.
-
-```
-docker pull registry.cn-beijing.aliyuncs.com/oneflow/triton-devel
-```
-
 To build from source, you need to build liboneflow first.
 
-1. Build liboneflow from source
-
+1. Build base image
+    
     ```
-    git clone https://github.com/Oneflow-Inc/oneflow --depth=1
-    cd oneflow
-    mkdir build && cd build
-    cmake -C ../cmake/caches/cn/cuda.cmake -DBUILD_CPP_API=ON -DWITH_MLIR=ON -G Ninja ..
-    ninja
+    docker build -t serving:base . -f docker/Dockerfile.base
     ```
 
-
-2. Build oneflow backend from source
-
+2. Build liboneflow in docker image
+
     ```
-    mkdir build && cd build
-    cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-      -DTRITON_ENABLE_GPU=ON -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
-    ninja
+    docker build -t serving:build_of . -f docker/Dockerfile.build_of
+    ```
+3. Build backends in docker image
+
+    ```
+    docker build -t serving:final . -f docker/Dockerfile.serving
     ```
 
-
-3. Launch triton server
+4. Launch triton server
 
     ```
     cd ../  # back to root of the serving
-    docker run --runtime=nvidia --rm --network=host \
-      -v$(pwd)/examples:/models \
-      -v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \
-      -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib nvcr.io/nvidia/tritonserver:21.10-py3 \
-      bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/ /opt/tritonserver/bin/tritonserver \
-      --model-repository=/models --backend-directory=/backends' 
+    docker run --runtime=nvidia \
+      --rm \
+      --network=host \
+      -v$(pwd)/examples/cpp:/models \
+      serving:final
     curl -v localhost:8000/v2/health/ready  # ready check
     ```
-
-
-If you want to use XLA, TensorRT and OpenVINO in OneFlow-Serving, please build OneFlow-XRT and rebuild oneflow backend.
-
-4. Build OneFlow-XRT with XLA, TensorRT or OpenVINO
-
-   ```shell
-   git clone https://github.com/Oneflow-Inc/oneflow-xrt.git
-   cd oneflow-xrt
-   mkdir build && cd build
-
-   # Build OneFlow-XRT XLA
-   cmake -G Ninja .. -DBUILD_XLA=ON && ninja
-
-   # Build OneFlow-XRT TensorRT
-   cmake -G Ninja .. -DBUILD_TENSORRT=ON -DTENSORRT_ROOT=/path/to/tensorrt && ninja
-
-   # Build OneFlow-XRT OpenVINO
-   cmake -G Ninja .. -DBUILD_OPENVINO=ON -DOPENVINO_ROOT=/path/to/openvino && ninja
-   ```
-
-5. Build oneflow backend from source
-
-   ```shell
-   mkdir build && cd build
-
-   # Use TensorRT
-   cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-     -DTRITON_ENABLE_GPU=ON -DUSE_TENSORRT=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
-   ninja
-
-   # Use XLA
-   cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-     -DTRITON_ENABLE_GPU=ON -DUSE_XLA=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
-   ninja
-
-   # Use OpenVINO
-   cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-     -DTRITON_ENABLE_GPU=ON -DUSE_OPENVINO=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
-   ninja
-   ```
-
-6. Launch triton server
-
-   ```shell
-   cd ../  # back to root of the serving
-   docker run --runtime=nvidia --rm --network=host \
-     -v$(pwd)/examples:/models \
-     -v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \
-     -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib \
-     -v$(pwd)/oneflow-xrt/build/install/lib:/xrt_libs \
-     nvcr.io/nvidia/tritonserver:21.10-py3 \
-     bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/:/xrt_libs /opt/tritonserver/bin/tritonserver \
-     --model-repository=/models --backend-directory=/backends' 
-   curl -v localhost:8000/v2/health/ready  # ready check
-   ```