Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ OneFlow Backend For Triton Inference Server

## Get Started

Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started.
Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to deploy it. You can also follow the instructions below to get started. [Building the Docker image](./doc/build.md) is necessary before you start.

1. Download and save model

Expand All @@ -27,7 +27,7 @@ Here is a [tutorial](./doc/tutorial.md) about how to export the model and how to
```
cd ../../ # back to root of the serving
docker run --rm --runtime=nvidia --network=host -v$(pwd)/examples:/models \
oneflowinc/oneflow-serving
serving:final
curl -v localhost:8000/v2/health/ready # ready check
```

Expand Down
103 changes: 17 additions & 86 deletions doc/build.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,32 @@
# Build From Source

To use the model server, you can just pull the image `oneflowinc/oneflow-serving` from docker hub. Only when you want to modify the source code, you need to build from source.

You can build on bare metal, and you can also pull the docker image and follow the instructions below to build in docker container.

```
docker pull registry.cn-beijing.aliyuncs.com/oneflow/triton-devel
```

To build from source, you need to build liboneflow first.

1. Build liboneflow from source

1. Build base image
```
git clone https://github.com/Oneflow-Inc/oneflow --depth=1
cd oneflow
mkdir build && cd build
cmake -C ../cmake/caches/cn/cuda.cmake -DBUILD_CPP_API=ON -DWITH_MLIR=ON -G Ninja ..
ninja
docker build -t serving:base . -f docker/Dockerfile.base
```


2. Build oneflow backend from source

2. Build liboneflow in docker image

```
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-DTRITON_ENABLE_GPU=ON -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
ninja
docker build -t serving:build_of . -f docker/Dockerfile.build_of
```
3. Build backends in docker image

```
docker build -t serving:final . -f docker/Dockerfile.serving
```


3. Launch triton server
4. Launch triton server

```
cd ../ # back to root of the serving
docker run --runtime=nvidia --rm --network=host \
-v$(pwd)/examples:/models \
-v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \
-v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib nvcr.io/nvidia/tritonserver:21.10-py3 \
bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/ /opt/tritonserver/bin/tritonserver \
--model-repository=/models --backend-directory=/backends'
docker run --runtime=nvidia \
--rm \
--network=host \
-v$(pwd)/examples/cpp:/models \
serving:final
curl -v localhost:8000/v2/health/ready # ready check
```


If you want to use XLA, TensorRT and OpenVINO in OneFlow-Serving, please build OneFlow-XRT and rebuild oneflow backend.

4. Build OneFlow-XRT with XLA, TensorRT or OpenVINO

```shell
git clone https://github.com/Oneflow-Inc/oneflow-xrt.git
cd oneflow-xrt
mkdir build && cd build

# Build OneFlow-XRT XLA
cmake -G Ninja .. -DBUILD_XLA=ON && ninja

# Build OneFlow-XRT TensorRT
cmake -G Ninja .. -DBUILD_TENSORRT=ON -DTENSORRT_ROOT=/path/to/tensorrt && ninja

# Build OneFlow-XRT OpenVINO
cmake -G Ninja .. -DBUILD_OPENVINO=ON -DOPENVINO_ROOT=/path/to/openvino && ninja
```

5. Build oneflow backend from source

```shell
mkdir build && cd build

# Use TensorRT
cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-DTRITON_ENABLE_GPU=ON -DUSE_TENSORRT=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
ninja

# Use XLA
cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-DTRITON_ENABLE_GPU=ON -DUSE_XLA=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
ninja

# Use OpenVINO
cmake -DCMAKE_PREFIX_PATH=/path/to/liboneflow_cpp/share -DTRITON_RELATED_REPO_TAG=r21.10 \
-DTRITON_ENABLE_GPU=ON -DUSE_OPENVINO=ON -DONEFLOW_XRT_ROOT=$(pwd)/oneflow-xrt/build/install -G Ninja -DTHIRD_PARTY_MIRROR=aliyun ..
ninja
```

6. Launch triton server

```shell
cd ../ # back to root of the serving
docker run --runtime=nvidia --rm --network=host \
-v$(pwd)/examples:/models \
-v$(pwd)/build/libtriton_oneflow.so:/backends/oneflow/libtriton_oneflow.so \
-v$(pwd)/oneflow/build/liboneflow_cpp/lib/:/mylib \
-v$(pwd)/oneflow-xrt/build/install/lib:/xrt_libs \
nvcr.io/nvidia/tritonserver:21.10-py3 \
bash -c 'LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mylib/:/xrt_libs /opt/tritonserver/bin/tritonserver \
--model-repository=/models --backend-directory=/backends'
curl -v localhost:8000/v2/health/ready # ready check
```