Skip to content
This repository was archived by the owner on Nov 27, 2024. It is now read-only.

Commit 098b758

Browse files
authored
Merge pull request #41 from james-s-tayler/gpu_accelerated_tests
Run integration tests via CUDA Execution Provider
2 parents 6bb0e1a + e2a82e1 commit 098b758

10 files changed

+81
-14
lines changed

Dockerfile

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
1+
# Since we're using the nvidia/cuda base image, this requires nvidia-container-toolkit installed on the host system to pass through the drivers to the container.
2+
# see: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
3+
FROM nvidia/cuda:12.3.0-runtime-ubuntu22.04 AS final
24
WORKDIR /app
35

46
# Install Git and Git LFS
5-
RUN apt-get update && apt-get install -y curl
7+
RUN apt-get update && apt-get install -y curl wget
68
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash && apt-get install -y git-lfs
79

810
# Clone the Stable Diffusion 1.5 base model
@@ -11,7 +13,37 @@ RUN git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx
1113
# Clone the LCM Dreamshaper V7 model
1214
RUN git clone https://huggingface.co/TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
1315

16+
#need to install NVIDIA's gpg key before apt search will show up to date packages for cuda
17+
RUN wget -N -t 5 -T 10 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb \
18+
&& dpkg -i ./cuda-keyring_1.1-1_all.deb
19+
20+
# install CUDA dependencies required according to `ldd libonnxruntime_providers_cuda.so`
21+
RUN apt-get update \
22+
&& apt-get install -y libcublaslt11 libcublas11 libcudnn8=8.9.1.23-1+cuda11.8 libcufft10 libcudart11.0
23+
24+
# According to `ldd libortextensions.so` it depends on ssl 1.1 to run, and the dotnet/runtime-deps base image installs it which is why it works inside the dotnet base images.
25+
# Since we need access to the GPU to use the CUDA execution provider we need to use the nvidia/cuda base image instead.
26+
# The nvidia/cuda base image doesn't contain SSL 1.1, hence we have to manually install it like this ot satisfy the dependency.
27+
# This fixes the "The ONNX Runtime extensions library was not found" error.
28+
# See: https://stackoverflow.com/questions/72133316/libssl-so-1-1-cannot-open-shared-object-file-no-such-file-or-directory
29+
RUN wget http://nz2.archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.20_amd64.deb && dpkg -i libssl1.1_1.1.1f-1ubuntu2.20_amd64.deb
30+
31+
# Need to install dotnet sdk since we're not using the dotnet/sdk base image.
32+
# Note: icu is also installed to help with globalization https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu
33+
RUN apt-get update \
34+
&& apt-get install -y dotnet-sdk-7.0 icu-devtools
35+
36+
ENV \
37+
# Enable detection of running in a container
38+
DOTNET_RUNNING_IN_CONTAINER=true \
39+
# Do not generate certificate
40+
DOTNET_GENERATE_ASPNET_CERTIFICATE=false \
41+
# Do not show first run text
42+
DOTNET_NOLOGO=true \
43+
# Skip extraction of XML docs - generally not useful within an image/container - helps performance
44+
NUGET_XMLDOC_MODE=skip
45+
1446
COPY . .
1547
RUN dotnet build OnnxStackCore.sln
1648

17-
ENTRYPOINT ["dotnet", "test", "OnnxStackCore.sln"]
49+
ENTRYPOINT ["sh", "-c", "nvidia-smi && dotnet test OnnxStackCore.sln"]
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
namespace OnnxStack.IntegrationTests;
2+
3+
/// <summary>
4+
/// All integration tests need to go in a single collection, so tests in different classes run sequentially and not in parallel.
5+
/// </summary>
6+
[CollectionDefinition("IntegrationTests")]
7+
public class IntegrationTestCollection { }

OnnxStack.IntegrationTests/OnnxStack.IntegrationTests.csproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" Version="8.0.0" />
1717
<PackageReference Include="Microsoft.Extensions.Logging.Console" Version="8.0.0" />
1818
<PackageReference Include="Microsoft.ML.OnnxRuntime" Version="1.16.2" />
19-
<PackageReference Include="Microsoft.ML.OnnxRuntime.Extensions" Version="0.9.0" />
19+
<PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" Version="1.16.2" />
2020
<PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.8.0" />
2121
<PackageReference Include="xunit" Version="2.4.2" />
2222
<PackageReference Include="Xunit.Extensions.Logging" Version="1.1.0" />

OnnxStack.IntegrationTests/StableDiffusionTests.cs

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
using System.Security.Cryptography;
21
using FluentAssertions;
32
using FluentAssertions.Execution;
43
using Microsoft.Extensions.DependencyInjection;
@@ -13,13 +12,10 @@
1312
namespace OnnxStack.IntegrationTests;
1413

1514
/// <summary>
16-
/// These tests just run on CPU execution provider for now, but could switch it to CUDA and run on GPU
17-
/// if the necessary work is done to setup the docker container to allow GPU passthrough to the container.
18-
/// See https://blog.roboflow.com/use-the-gpu-in-docker/ for an example of how to do this.
19-
///
20-
/// Can then also setup a self-hosted runner in Github Actions to run the tests on your own GPU as part of the CI/CD pipeline.
15+
/// These tests could be run via a self-hosted runner in Github Actions to run the tests on your own GPU as part of the CI/CD pipeline.
2116
/// Maybe something like https://www.youtube.com/watch?v=rVq-SCNyxVc
2217
/// </summary>
18+
[Collection("IntegrationTests")]
2319
public class StableDiffusionTests
2420
{
2521
private readonly IStableDiffusionService _stableDiffusion;

OnnxStack.IntegrationTests/Usings.cs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
1-
global using Xunit;
1+
global using Xunit;
2+
3+
// need all tests to run one at a time sequentially to not overwhelm the GPU
4+
[assembly: CollectionBehavior(DisableTestParallelization = true)]

OnnxStack.IntegrationTests/appsettings.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"InterOpNumThreads": 0,
2525
"IntraOpNumThreads": 0,
2626
"ExecutionMode": "ORT_SEQUENTIAL",
27-
"ExecutionProvider": "Cpu",
27+
"ExecutionProvider": "Cuda",
2828
"ModelConfigurations": [
2929
{
3030
"Type": "Tokenizer",
@@ -65,7 +65,7 @@
6565
"InterOpNumThreads": 0,
6666
"IntraOpNumThreads": 0,
6767
"ExecutionMode": "ORT_SEQUENTIAL",
68-
"ExecutionProvider": "Cpu",
68+
"ExecutionProvider": "Cuda",
6969
"ModelConfigurations": [
7070
{
7171
"Type": "Tokenizer",

OnnxStackCore.sln

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "SolutionItems", "SolutionIt
1313
.gitignore = .gitignore
1414
docker-compose.yml = docker-compose.yml
1515
README.md = README.md
16+
run-integration-tests-cuda.sh = run-integration-tests-cuda.sh
1617
EndProjectSection
1718
EndProject
1819
Global

README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,21 @@ Other `Microsoft.ML.OnnxRuntime.*` executors like `Cuda` may work but are untest
182182

183183
`DirectML` > 10GB VRAM
184184

185-
185+
## Troubleshooting
186+
187+
- I'm running on linux but it's not working citing:`The ONNX Runtime extensions library was not found`?
188+
- It's having a problem loading `libortextensions.so`
189+
- From the project root run `find -name "libortextensions.so"` to locate that file
190+
- Then run `ldd libortextensions.so` against it to see what dependencies it needs versus what your system has.
191+
- It has a dependency on SSL 1.1 which was removed from Ubuntu based OSes and causes this error.
192+
- It can be remedied by manually installing the dependency.
193+
- See: https://stackoverflow.com/questions/72133316/libssl-so-1-1-cannot-open-shared-object-file-no-such-file-or-directory
194+
- I've installed `Microsoft.ML.OnnxRuntime` and `Microsoft.ML.OnnxRuntime.Gpu` into my project and set the execution provider to `Cuda`, but it's complaining it can't find an entry point for CUDA?
195+
- `System.EntryPointNotFoundException : Unable to find an entry point named 'OrtSessionOptionsAppendExecutionProvider_CUDA' in shared library 'onnxruntime'`
196+
- Adding both `Microsoft.ML.OnnxRuntime` AND `Microsoft.ML.OnnxRuntime.Gpu` at the same time causes this.
197+
- Remove `Microsoft.ML.OnnxRuntime` and try again.
198+
- I'm trying to run via CUDA execution provider but it's complaining about missing `libcublaslt11`, `libcublas11`, or `libcudnn8`?
199+
- Aside from just the NVIDIA Drivers you also need to install CUDA, and cuDNN.
186200

187201
## Contribution
188202

docker-compose.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,12 @@ version: '3.7'
33
services:
44
app:
55
build: .
6+
deploy:
7+
resources:
8+
reservations:
9+
devices:
10+
- driver: nvidia
11+
count: all
12+
capabilities: [gpu]
613
volumes:
714
- "./docker-test-output:/app/OnnxStack.IntegrationTests/bin/Debug/net7.0/images"

run-integration-tests-cuda.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#! /bin/bash
2+
# running this requires:
3+
# - nvidia GPU with sufficient VRAM
4+
# - nvidia drivers installed on the host system
5+
# - nvidia-container-toolkit installed on the host system (see: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
6+
# - nvidia-smi also reports peak VRAM close 24GB while running the tests
7+
docker-compose up --build

0 commit comments

Comments
 (0)