Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Notes on Containers

All jobs in a given Fuzzball workflow run within containers. You can find containers for your jobs in public registries like Docker Hub, Amazon Elastic Container Registry (ECR) Public, or NVIDIA NGC.

You can also create your own containers using Apptainer, Docker, Podman or other container platforms. You can push those containers to public or private registries in either the Singularity Image File (SIF) format, or using the Open Container Initiative (OCI) standard.

Container URIs

Fuzzball locates and pulls containers for jobs based on URIs you provide in your Fuzzfile. The URIs you use to specify containers follow the format used by Apptainer. A fully specified container URI looks like one of the following:

[PROTOCOL]://<REGISTRY_HOST>/<NAMESPACE>/[REPOSITORY]:<TAG>

[PROTOCOL]://<REGISTRY_HOST>/<NAMESPACE>/[REPOSITORY]@sha256:<HASH>

Fuzzball will accept shorter versions of registry URIs and will try to supply sensible default values where appropriate. You can use the following information to help you specify URIs for your containers.

PROTOCOL (required)

You must explicitly specify which protocol to use when pulling containers. Fuzzball currently supports two container-pulling protocols:

  • docker://: This protocol pulls containers in OCI format.
  • oras://: This protocol pulls containers in SIF format (using the OCI Registry As Storage or ORAS protocol).

REGISTRY_HOST

The registry host specifies the location from which Fuzzball will pull your container. When using the docker:// protocol the values under unqualified-search-registries within /etc/containers/registries.conf (on the Fuzzball Substrate node) are searched in order. On Rocky 9, the default search order is ["registry.access.redhat.com", "registry.redhat.io", "docker.io"]. Note that when using the oras:// protocol, no default registry is provided—you must explicitly specify the complete registry host.

Here are some example registry hosts that are commonly used:

  • docker.io: This is the correct host for Docker Hub.
  • nvcr.io: The NVIDIA container registry has a lot of HPC-specific containers many of which are optimized to use GPUs.
  • ghcr.io: The GitHub container registry is a popular place for users to host their containers.
  • public.ecr.aws: These are containers hosted on the AWS public registry.
  • us-west1-docker.pkg.dev: This is just an example showing how you might pull a container from a private registry on Google cloud. You can use any appropriate URI to pull from a private OCI registry.

NAMESPACE

OCI registries are organized into namespaces. In the case of common public registries like Docker Hub, these are usually the organization or user who pushed the image. On some registries like Docker Hub there are special official images that can be pulled without supplying a namespace.

REPOSITORY (required)

This is the name of the actual repository with the container image that you want to pull. Along with the protocol, this value is always required.

You can use this value to specify the version of the container image that you want to access. If you omit this value, the “latest” tag will be pulled. Some images (like the official Rocky Linux image) do not have the tag “latest” and will error if you try to pull them without supplying a tag.

Later versions of the Fuzzball workflow editor will complain if you do not supply a tag for your container URI.

HASH

Instead of specifying a tag (which might be a moving target) you can pull your images by hash to help ensure that you always get a consistent container. The syntax @sha256:<HASH> allows you to do this. Pulling containers by hash is considered the best strategy for reproducibility.

Containers in SIF vs. OCI Format

Containers can be pushed to and stored on registries in OCI format or in SIF format. The latter uses the OCI Registry as Storage (ORAS) protocol that allows arbitrary data to be stored in container registries.

When you submit a workflow to Fuzzball, Orchestrate creates one or more stages to pull your container(s) using the URI(s) listed in your Fuzzfile. If you reference a container URI in SIF format (via the oras:// protocol), you are simply performing a file download.

If you reference a container URL in OCI format (via the docker:// protocol), Fuzzball must convert it to SIF format before running your jobs. The container layers (.tar files) are first downloaded and then extracted and combined into a single filesystem using OverlayFS. This, in turn, is converted into a compressed SquashFS image and finally added as a partition to the newly created SIF file.

SIF files are cached in shared storage served by NFS. The OCI .tar layers are not cached in this way. See the Appendix detailing Orchestrate dependencies for more information.

Since the container conversion process from OCI to SIF requires data duplication and data compression it can require a lot of disk space and compute resources for large containers. OCI containers larger than 10 GBs may run out of disk space during the container conversion process.

For these reasons, it is considered a good practice to work with containers in SIF format whenever possible, and it is necessary to work with very large containers in SIF format. See the Apptainer documentation on building and pushing container files for information about creating and using SIF files.