Using GPUs with Fuzzball
Fuzzball intelligently passes GPUs from the host through to the containers that support jobs. Once an administrator has properly added and configured resources that include GPUs, you can access them simply by specifying them in your Workflow.
Observe the following Fuzzfile:
version: v1
jobs:
gpu-check:
image:
uri: docker://rockylinux:9
command:
- /bin/sh
- '-c'
- nvidia-smi
resource:
cpu:
cores: 1
memory:
size: 1GB
devices:
nvidia.com/gpu: 1
Note that the gpu-check
job in this workflow is based on the official Rocky Linux (v9) container
on Docker Hub. This
container does not have a GPU driver or any NVIDIA related software installed. However, the Fuzzfile
specifies that this job should run the program nvidia-smi
to check the status of any GPUs that are
present.
The resources
section has a field devices
that allows you to specify that the job should use
GPUs. Since our administrator has configured GPUs for use with nvidia.com/gpu
, we can specify that
we need to use one. Fuzzball will cause all of the driver-related software to be available in the
container. Note that the nvidia-smi
command has no trouble completing even though the Rocky Linux
container has no NVIDIA-related software.
Thu Aug 15 20:27:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:1E.0 Off | 0 |
| N/A 42C P0 40W / 300W | 1MiB / 16384MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Not only is it unnecessary to install NVIDIA drivers into containers that will run GPU workflows, it is considered a bad practice to do so. Under the hood, Fuzzball finds libraries and binaries associated with the NVIDIA driver and injects them into the container. This ensures that the kernel modules and the libraries are both in agreement. If you install GPU drivers into the container there is a chance that they could be found by the software in your container and disrupt the correspondence between the libraries and the installed kernel modules.
There is a difference between the NVIDIA driver and CUDA. This is confusing since CUDA is often packaged with the NVIDIA driver. You should install the correct version of CUDA in your container to support your GPU-accelerated software. But you should not install the driver.