Skip to content

Boltz 2.2.1 image throwing error #417

@JoseEspinosa

Description

@JoseEspinosa

Description of the bug

I am trying to create an image for the last boltz version (2.2.1), for this I updated the Dockerfile as follows:

FROM python:3.12-slim

LABEL authors="Ziad Al-Bkhetan <ziad.albkhetan@gmail.com>" \
    title="nfcore/proteinfold_boltz" \
    Version="1.2.0dev" \
    description="Docker image containing all software requirements to run boltz using the nf-core/proteinfold pipeline"

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    procps \
    && apt-get autoremove -y \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir boltz==2.2.1
## about triton version, thought the error msg state to be sure of using 3.3.0
    triton==3.3.0 \
    cuequivariance_ops_cu12==0.7.0 \
    cuequivariance_ops_torch_cu12==0.7.0 \
    cuequivariance_torch==0.7.0

The cuequivariance_* and triton libraries are installed since otherwise, boltz complains.

When boltz is run it throws this exception:

         ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/boltz/model/layers/pairformer.py", line 243, in forward
      z = z + dropout * self.tri_mul_out(
                        ^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      return forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/boltz/model/layers/triangular_mult.py", line 92, in forward
      return kernel_triangular_mult(
             ^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
      return fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/boltz/model/layers/triangular_mult.py", line 22, in kernel_triangular_mult
      from cuequivariance_torch.primitives.triangle import triangle_multiplicative_update
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_torch/__init__.py", line 26, in <module>
      from .primitives.transpose import TransposeSegments, TransposeIrrepsLayout
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_torch/primitives/transpose.py", line 183, in <module>
      from cuequivariance_ops_torch import segmented_transpose
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops_torch/__init__.py", line 39, in <module>
      from cuequivariance_ops_torch.fused_layer_norm_torch import layer_norm_transpose
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops_torch/fused_layer_norm_torch.py", line 17, in <module>
      from cuequivariance_ops.triton import (
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops/triton/__init__.py", line 24, in <module>
      from .tuning_decorator import autotune_aot
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops/triton/tuning_decorator.py", line 17, in <module>
      from .cache_manager import get_cache_manager
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops/triton/cache_manager.py", line 255, in <module>
      cache_manager = CacheManager()
                      ^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops/triton/cache_manager.py", line 110, in __init__
      self.gpu_information = get_gpu_information()
                             ^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/cuequivariance_ops/triton/cache_manager.py", line 71, in get_gpu_information
      gpu_core_count = pynvml.nvmlDeviceGetNumGpuCores(handle)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/pynvml.py", line 5872, in nvmlDeviceGetNumGpuCores
      _nvmlCheckReturn(ret)
    File "/usr/local/lib/python3.12/site-packages/pynvml.py", line 1061, in _nvmlCheckReturn
      raise NVMLError(ret)
  pynvml.NVMLError_Unknown: Unknown Error

The drivers installed on the HPC system are:

NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9

Steps to reproduce the error

apptainer pull --disable-cache --name quay.io-nf-core-proteinfold_boltz-test.img docker://quay.io/nf-core/proteinfold_boltz:test > /dev/null
apptainer shell --nv /users/cn/jespinosa/nxf_singuarlity_cachedir/quay.io-nf-core-proteinfold_boltz-test.img

python - << 'EOF'
import pynvml
pynvml.nvmlInit()
h = pynvml.nvmlDeviceGetHandleByIndex(0)
print("Name:", pynvml.nvmlDeviceGetName(h))
print("SM cores:", pynvml.nvmlDeviceGetNumGpuCores(h))
EOF

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions