diff --git a/README.md b/README.md index 74484b6..ac61fdd 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ OneFlow Backend For Triton Inference Server ## Get Started -Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started. +Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started. [Building the Docker image](./doc/build.md) is necessary before you start. 1. Download and save model @@ -27,7 +27,7 @@ Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to ``` cd ../../ # back to root of the serving docker run --rm --runtime=nvidia --network=host -v$(pwd)/examples:/models \ - oneflowinc/oneflow-serving + serving:final curl -v localhost:8000/v2/health/ready # ready check ``` diff --git a/doc/build.md b/doc/build.md index 0630f17..ead60e9 100644 --- a/doc/build.md +++ b/doc/build.md @@ -1,101 +1,32 @@ # Build From Source -To use the model server, you can just pull the image `oneflowinc/oneflow-serving` from docker hub. Only when you want to modify the source code, you need to build from source. - -You can build on bare metal, and you can also pull the docker image and follow the instructions below to build in docker container. - -``` -docker pull registry.cn-beijing.aliyuncs.com/oneflow/triton-devel -``` - To build from source, you need to build liboneflow first. -1. Build liboneflow from source - +1. Build base image + ``` - git clone https://github.com/Oneflow-Inc/oneflow --depth=1 - cd oneflow - mkdir build && cd build - cmake -C ../cmake/caches/cn/cuda.cmake -DBUILD_CPP_API=ON -DWITH_MLIR=ON -G Ninja .. - ninja + docker build -t serving:base . -f docker/Dockerfile.base ``` - -2. Build oneflow backend from source - +2. Build liboneflow in docker image + ``` - mkdir build && cd build - cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ - -DTRITON_ENABLE_GPU=ON -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. - ninja + docker build -t serving:build_of . -f docker/Dockerfile.build_of + ``` +3. Build backends in docker image + + ``` + docker build -t serving:final . -f docker/Dockerfile.serving ``` - -3. Launch triton server +4. Launch triton server ``` cd ../ # back to root of the serving - docker run --runtime=nvidia --rm --network=host \ - -v$(pwd)/examples:/models \ - -v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \ - -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib nvcr.io/nvidia/tritonserver:21.10-py3 \ - bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/ /opt/tritonserver/bin/tritonserver \ - --model-repository=/models --backend-directory=/backends' + docker run --runtime=nvidia \ + --rm \ + --network=host \ + -v$(pwd)/examples/cpp:/models \ + serving:final curl -v localhost:8000/v2/health/ready # ready check ``` - - -If you want to use XLA, TensorRT and OpenVINO in OneFlow-Serving, please build OneFlow-XRT and rebuild oneflow backend. - -4. Build OneFlow-XRT with XLA, TensorRT or OpenVINO - - ```shell - git clone https://github.com/Oneflow-Inc/oneflow-xrt.git - cd oneflow-xrt - mkdir build && cd build - - # Build OneFlow-XRT XLA - cmake -G Ninja .. -DBUILD_XLA=ON && ninja - - # Build OneFlow-XRT TensorRT - cmake -G Ninja .. -DBUILD_TENSORRT=ON -DTENSORRT_ROOT=/path/to/tensorrt && ninja - - # Build OneFlow-XRT OpenVINO - cmake -G Ninja .. -DBUILD_OPENVINO=ON -DOPENVINO_ROOT=/path/to/openvino && ninja - ``` - -5. Build oneflow backend from source - - ```shell - mkdir build && cd build - - # Use TensorRT - cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ - -DTRITON_ENABLE_GPU=ON -DUSE_TENSORRT=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. - ninja - - # Use XLA - cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ - -DTRITON_ENABLE_GPU=ON -DUSE_XLA=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. - ninja - - # Use OpenVINO - cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \ - -DTRITON_ENABLE_GPU=ON -DUSE_OPENVINO=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun .. - ninja - ``` - -6. Launch triton server - - ```shell - cd ../ # back to root of the serving - docker run --runtime=nvidia --rm --network=host \ - -v$(pwd)/examples:/models \ - -v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \ - -v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib \ - -v$(pwd)/oneflow-xrt/build/install/lib:/xrt_libs \ - nvcr.io/nvidia/tritonserver:21.10-py3 \ - bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/:/xrt_libs /opt/tritonserver/bin/tritonserver \ - --model-repository=/models --backend-directory=/backends' - curl -v localhost:8000/v2/health/ready # ready check - ```