Glossary
The Fuzzball guides include many new terms that are specific to Fuzzball, as well as some familiar terms that are used in specific ways. This glossary is provided to minimize ambiguity and promote greater understanding.
A Fuzzball account is a subdivision of a Fuzzball Organization and represents a collection of users. Users can be members of 1 or more account(s). Accounts control access to things like Storage Volumes, resource definitions, and secrets. An organization owner may create an account (for example) to provide access for a single lab’s users, or a single working group/team’s users, etc, within the overall organization.
Group accounts are created by organization owners and can include more than one user. Group accounts provide a way for users to share Storage Volumes, Secrets, resource definitions, Fuzzfiles, and basic information about workflows.
This is a user’s personal account. When a user is first created (by an organization owner) a personal user account is also created by default. A user may also be added to one or more group accounts. The user account is helpful if the user needs to create Storage Account Volumes, Secrets, etc. and does not have permission or does not wish to scope these items to a group account. Users may also want to run workflows that are not visible to others and keep their Fuzzfiles private.
A computer program designed to carry out a specific task. Within HPC, applications often do things like process and analyze data, run simulations, or render/display results.
The Linux Foundation successor to Singularity, Apptainer is an open-source container platform purpose-built for HPC with a focus on running discrete jobs rather than ongoing services. Apptainer was originally built to address the specific security requirements of containers in multi-tenant HPC environments. It supports hardware/software components commonly found in an HPC cluster like GPUs and MPI. Apptainer uses the SIF container format that is also supported by Fuzzball.
The name “Apptainer” itself is a portmanteau of the words “application” and “container”.
A user “builds” a workflow by first thinking about the desired input and output of a computational task. From there, the user can start thinking about the jobs that must be completed, any dependencies between jobs, and the resources, environment, policies, etc. that each job requires. The user is then able to write these requirements in a Fuzzfile, or use the Fuzzball workflow editor to create a Fuzzfile. It is a common pattern to begin by using the workflow editor to build a high-level template Fuzzfile for a workflow and then edit the Fuzzfile directly to refine details.
In most basic terms, “the cloud” refers to a pool of computational resources made commercially available to rent for generic computing tasks. Major providers include Amazon (AWS), Google (GCP), Microsoft (Azure), and Oracle (OCI). Some companies also provide cloud resources as their primary business offering such as Digital Ocean, RackSpace, and Vultr. Within HPC, the term “Cloud” usually denotes the opposite of on-premises (“on-prem”) computational resources (i.e. a cluster of computers in a physical location maintained and controlled by the organization that uses them).
Within HPC, “cluster” refers to a set of computers on a common network with software that allows their resources to be scheduled and pooled.
A human being responsible for administering a compute cluster. Within the context of Fuzzball, a cluster administrator is someone who is responsible for deploying, configuring, and maintaining a Fuzzball instance on a given cluster.
A text-based method for interacting with an application. A CLI runs within a terminal emulator.
Linux containers provide a lightweight method for swapping one (Linux) environment for another. They are similar in concept to very fast, lightweight virtual machines. Containers leverage Linux kernel features like namespaces and cgroups to provide a completely new environment to one or more applications. The new environment is minimally a new root file system, but it may also include things like:
- a different set of environment variables
- a new set of the process IDs (pids)
- a new set of network interfaces
- access restricted to a subset of hardware through cgroups
Installing an application in a container allows you to package all of the dependencies into a portable and reproducible package. This is one of the innovations that allows users to run Fuzzball workflows in different clusters without modification.
Containers provide a portable and reproducible way to deploy applications. It is now a standard of modern cloud architecture to deconstruct large monolithic applications into collections of small interacting services (called microservices) and run them in containers. The process of managing these containers and the interactions between them is referred to as “container orchestration”.
Widely-used container orchestration tools like Kubernetes have grown out of this cloud native culture and focus heavily on orchestrating containerized services (i.e. applications that are expected to run forever awaiting requests to serve through a well-defined API). Because of this focus, they tend to prioritize features that are not useful when orchestrating containerized jobs (i.e. applications that run to a predetermined endpoint, and produce some result). The HPC community has therefore found it difficult to adapt existing container orchestration tools to job orchestration (often referred to as pipelines or workflows). And existing container orchestrion tools often add unacceptable computational overhead to HPC jobs. This is the need that Fuzzball fulfills. It provides a lightweight job-centric solution to container orchestration that is perfect for HPC and PIC workflows.
A set of infrastructure tools centered around the Rocky Linux ecosystem. Depot can serve RPMs from repos, as well as OCI and SIF containers and other artifacts.
A type of computing in which a single application splits computations up across multiple cores within one or more compute nodes. Distributed Computing usually relies on MPI or a similar technology to manage work across cores/nodes. Distributed Computing is somewhat synonymous with HPC and supercomputing generally. It is related to, but distinct from Parallel Computing.
Refers to the process of moving data out of an environment. In the context of Fuzzball, data egress describes moving data from a Storage Account Volume used within a workflow to a personal location like an S3 bucket, ftp server, or GitHub/Bitbucket repo. Data egress can incur charges within a cloud environment like AWS or GCP.
Fuzzball is HPC 2.0.
From a user’s perspective, Fuzzball provides a simple, YAML based method to describe and submit computational workflows made of one or more jobs. User’s can access Fuzzball through a simple, powerful web-based GUI, a CLI, or leverage the API themselves. It’s easy for users to access personal data and to utilize specialized hardware and software (e.g. GPUs, high-speed interconnects, MPI, etc.). And because jobs within workflows are container based, the same workflow can run on a on-prem cluster, a colleagues cluster at a different site, or in the cloud.
From an admin’s perspective, Fuzzball provides an easy way to grant users access to the hardware they need to run their workflows. Admins can organize users, Storage Volumes, and secrets into accounts and grant some users the power to appropriately administer these accounts.
At it’s heart, Fuzzball is a container orchestration engine that is focussed on orchestrating containerized jobs instead of containerized services with bare metal performance. This simple change in philosophy makes Fuzzball a natural fit for HPC where workflows are built out of jobs that are expected to start, end, and produce results (that may be used by subsequent jobs within the workflow). Fuzzball itself is split into two main components. Fuzzball Orchestrate is designed as a cloud native application composed of containerized microservices, and it therefore (unsurprisingly), runs on top of Kubernetes. But the container platform, Fuzzball Substrate executes jobs outside of the Kubernetes environment avoiding the performance pitfalls of running HPC applications within Kubernetes. Fuzzball is therefore not an attempt to shoehorn Kubernetes into workflow management. While Fuzzball itself runs on Kubernetes, it represents an entirely distinct container orchestration platform.
One discrete instance of a Fuzzball deployment running in an on-prem environment or a cloud provider.
A collection of metadata that allows the Fuzzball CLI to connect to and authenticate with a Fuzzball cluster. The CLI may be configured with more than one context, and may switch between them to interact with multiple Fuzzball clusters.
A single computational job that is a part of a Fuzzball workflow. Jobs run in containers (which are pulled in an early stage of the workflow). Jobs may be dependant on one or more other jobs to complete before they can be initiated. If jobs do not depend upon one another, Fuzzball will attempt to run them in parallel (assuming sufficient resources are available).
The heart of Fuzzball that runs on a server node within a Kubernetes deployment and awaits requests (in the form of Fuzzfiles) to allocate resources and orchestrate workflows.
A custom container runtime developed by CIQ. Substrate provides the container runtime that the Fuzzball cluster uses to run containers. Some installations of Fuzzball use Substrate on the server node to run a containerized instance of Kubernetes which then hosts the Fuzzball Orchestrate.
A collection of stages and jobs arranged in a directed acyclic graph that Fuzzball Orchestrate will run to completion.
For example, a workflow may begin by creating an ephemeral volume and pulling a container image in which the job(s) will run. These stages can run in parallel. Once the volume has been created, a data ingress stage may be used to copy input data from an S3 bucket to the new ephemeral volume. After the data ingress and image pull stages have completed, one or more jobs may start. If other jobs are dependant upon the first jobs, they will run afterward and so on. Finally, a data egress stage may move analyzed data back from the ephemeral volume to an S3 bucket.
Workflows are defined by YAML documents called Fuzzfiles.
Fuzzball stages are comprised of any workflow operations that are not user-defined jobs. Stages may create or locate volumes, perform data ingress or egress, or pull container images to support jobs.
A YAML (or JSON) file written with specific syntax to specify the jobs, environment, requirements, dependencies, etc. of a Fuzzball workflow. Fuzzball Orchestrate parses a Fuzzfile and uses the fields to determine how resources and environments will be set up for a workflow to execute. Fuzzfiles may be written manually or generated automatically using the GUI Workflow Editor.
A collection of buttons, menus, text boxes, etc. allowing a user to interact with an application by pointing and clicking, scrolling, selecting menu items, and the like. GUIs are the most common way that users intact with applications on their computers. They contrast with CLIs that require users to interact with applications by typing instructions. Web-GUIs are simply GUIs that are designed to run in web browsers.
HPC can refer loosely to the process of using some form of dedicated computing resource(s) to accelerate a computational workload. However, the phrase usually carries a more narrow connotation. Typically, HPC refers to the use of Beowulf style compute clusters like those found in academic and research settings. This differentiates HPC from enterprise computing and PIC which are generally carried out in corporate data centers or in the cloud. Some scientists and researchers narrow the definition of HPC even further, by using the phrase exclusively to refer to distributed computing on a network supported by high speed interconnects. This usage of the term differentiates HPC from HTC which is usually more closely associated with parallel computing.
In computing, “image” is often used to refer to a file that is used to place the entire system into a particular state. Often this is a file that contains an operating system or a root file system. For instance, the initramfs file that is loaded into memory to allow the computer to mount the root file system from disk during boot can be referred to as an image. Similarly, ISO files that contain live installations of an operating system and can be used to boot your computer from an optical disk or USB drive are referred to as images.
In the context of Linux containers, images are files that contain the root file systems, environment variables, metadata, etc. to create and run a container. In OCI platforms like Docker and Podman, there is a logical distinction between an image (that is a bundle of tarballs and a manifest saved to disk) and a container (that is created in memory from the image). In Apptainer, this distinction is a bit more blurry because SIF files share a 1:1 relationship with the container they produce.
Refers to the process of moving data in to an environment. In the context of Fuzzball, data ingress describes moving data from an external location like an S3 bucket, ftp server, or GitHub/Bitbucket repo to a Storage Account Volume used within a workflow.
A specialized, high-speed, low-latency network capable of sharing data with performance necessary to facilitate effective distributed computing. HPC interconnects can transfer data at hundreds of gigabits/second, and can provide exotic connections like direct GPU-to-storage or even direct GPU-to-GPU communication in some configurations. These networks usually provide remote direct memory access (RDMA) capabilities, allowing a program to bypass the compute node operating system network stack, significantly improving the speed of HPC workloads that use MPI or similar. Common implementations include InfiniBand, Omni-Path, RDMA-over-Converged-Ethernet (RoCE), the AWS Elastic Fabric Adapter, and NVLinks/NVSwitches. High-speed interconnects differ fundamentally from ethernet and from one another and require specialized switches, network interface cards (NICs), and cabling.
A single, discrete task that is done on some set of HPC resources. A job has a start, an end, and produces some result. In traditional HPC, jobs are submitted to a workload manager that puts them into a queue and runs them when the requested resources are available. A single job may use many nodes (in the case of a “multinode” job) if it runs a distributed application, or it may be composed of many smaller jobs (in the case of a “task array”) that run in parallel. One job may use the output of another job as input, and a series of jobs that run with dependencies are referred to as a “pipeline” or “workflow”. In the context of Fuzzball, jobs are one of the atomic elements (along with stages) that make up workflows.
A widely-used container orchestration solution that underpins much of modern enterprise computing like websites, databases, SaaS deployments, etc. Most cloud providers have a dedicated Kubernetes deployment options (ex. AWS EKS). Not traditionally used in HPC, since it focusses on orchestrating containerized services instead of jobs and it produces a lot of overhead.
Small, lightweight compute services created to do one specific task. Microservices are often run within their own containers giving them portability and reproducibility. They are designed to interact with one another through API calls. Collections of microservices work together to implement large and complex applications, and can be orchestrated using a tool like Kubernetes. This “microservice architecture” is one of the core philosophies within the cloud-native community and contrasts with monolithic design in which all functionality is built into a single program and any failure may cause the entire application to fall over.
A standard that defines a number of different operations for intra- or inter-node communication between CPU cores. The main way that distributed computing is carried out within HPC, MPI-based communications allow for a single program to to be split across an arbitrary number of CPU cores to gain performance increases. There are several implementations of the MPI standard built by different organizations, but they implement some common operations, such as data transfers between two nodes, data transfers from one node to every other node, data transfers from every node to every other node, etc. Applications must be written and compiled with support for a specific MPI library to take advantage of these gains.
In some deployments, the Admin Node is used to deploy and configure Kubernetes and Fuzzball remotely (via ssh) to the Server Node. After deployment the Admin Node can be used to monitor the health of the Fuzzball Cluster, control configuration, submit test workflows, etc.
Generally, the term node may refer to a server, computer, virtual machine, etc. More specifically, within the context of a traditional HPC cluster or a Fuzzball deployment, the phrase “compute node” refers to a resource (with CPU, memory, network connectivity, etc.) that can be allocated to perform work within a job.
In a Fuzzball deployment, the server node hosts Fuzzball Orchestrate, serves the Fuzzball GUI, and may also perform other administrative tasks (such as serving an NFS share to compute nodes). In a production deployment it is common to deploy 3 Server Nodes as a single Kubernetes cluster that hosts Fuzzball Orchestrate.
OCI refers to an organization and a set of standards created by that organization that are meant to specify how containers should be stored on disk, how a container runtime should operate, and how a container registry should save and serve containers. OCI standards were largely determined after Apptainer had been created and were based primarily on Docker. The Apptainer runtime therefore does not follow the OCI standard. Nor do the SIF files that store Apptainer containers. Apptainer does have the ability to utilize OCI registries to run containers, and even to store SIF files through the OCI Registry As Storage (ORAS) project.
A cluster of computers in a physical location maintained and controlled by the organization that uses them. On-prem usually refers to traditional HPC resources bought, installed, maintained, and used by the same organization. On-prem is often seen as the opposite of the Cloud.
An organization in Fuzzball is the highest level of the account structure. An organization can be thought of as the overarching set of secrets, volumes, options, etc that can be configured by admins and added to accounts. Organizations can contain multiple accounts, and are managed from a special organization management panel via the Fuzzball Admin Node.
A type of computing in which a computation must be performed independently many times and can be split up to run across multiple cores/nodes at the same time. For example, perhaps a researcher needs to use a script to analyze 10 data sets. By using 10 cores or 10 nodes, the researcher can analyze each data set simultaneously and speed up processing dramatically. Parallel computing is somewhat synonymous with High Throughput Computing (HTC). It is related to, but distinct from Distributed Computing.
Enterprise organizations often require large computational resources to carry out tasks like analyzing large amounts of customer data, training AI models, serving large and growing data sets with high availability, etc. These types of tasks share many traits in common with traditional HPC workloads, but enterprise staff may not be exposed to prior art within the HPC community and may use commodity data-centers or cloud resources instead of the types of HPC resources typically available in research or academic settings. While the needs and solutions of these computer scientists are highly similar to those of the HPC community, they have developed solutions in parallel and somewhat in isolation of one another. The term PIC is therefore used to describe this community of users who are similar to but separate from the HPC community.
A resource is anything that can be used to run a compute workload. Resources may refer to CPUs, memory, storage, GPUs or other accelerators, bandwidth, nodes themselves, or even clusters. Resources may be physically located on-prem or reside in the cloud.
Within the context of a workflow management system like Fuzzball, resources take on a more specific meaning. To run jobs within workflows it is necessary to specify the types of resources they require. These requirements are specified within a Fuzzfile.
Within Fuzzball, resource definitions allow administrators to specify compute resources and configurations that users can leverage for their jobs.
Within Fuzzball, a workflow is specified within a Fuzzfile. The act of submitting, or running a workflow is therefore carried out by pointing Fuzzball to a Fuzzfile and requesting that it run. This can be carried out by executing a command in the CLI, or clicking a button in the GUI.
SIF is a single-file format for Apptainer containers. The file contains a root file system saved as a squashfs image as well as several other partitions with metadata.
Storage Volumes can be added by administrators as a specific instance of type of storage based on a Storage Driver. Storage volumes describe the top-level resources that a user can leverage using a Storage Volume.
A directory within a Storage Class that is allocated to a user’s workflow. Storage account volumes are the actual storage space that a user’s workflows can read from and write to. These may either be created by the user ahead of time, or can be created on-demand by Fuzzball when executing a workflow on behalf of the user.
Fuzzball uses the Kubernetes Container Storage Interface (CSI) to interact with different types of storage. CSI drivers can be used to prepare Fuzzball to use a specific type of storage through a Storage Class.
For the purposes of the Fuzzball guides, a user is anyone who uses Fuzzball. Within computer science, the term user is often used to distinguish an unprivileged persona (user) from a privileged one (administrator). But technically, an administrator may also be a user.
Volume creation refers to the act of making a directory for an account or user on a storage volume. Users with appropriate permissions may do this on their own, or workflows may create volumes when they run.
Ephemeral volumes are created on demand by Fuzzball when a workflow starts and are destroyed when the workflow ends. They are usually used in combination with data ingress and egress so that data appears on the volume during jobs and is copied from the volume when the workflow completes. Ephemeral volumes are configured at the Storage Volume level by setting the value “persistent” to false in the volume definition.
Volume mounting refers to the act of making a Storage Volume visible within a containerized Fuzzball job. Since the same volume might be mounted within multiple jobs, this grants a method for jobs to share data.
Persistent volumes are created the first time they are used by a Fuzzball workflow. They are usually not used in combination with data ingress and egress since data may already be located on the volume when it is used in the workflow, and can remain on the volume after the workflow is finished. Persistent volumes are configured at the Storage Volume level by setting the value “persistent” to true in the volume definition.