Overview / About these Guides
HPC clusters really haven’t changed much since the mid 1990’s when the first Beowulf cluster was built at NASA. This means that HPC users rely on outdated, difficult to learn interfaces, and that HPC workflows are nearly impossible to port to cloud environments.
Fuzzball was created to ease the use and administration of traditional HPC clusters, and overcome the challenges of running HPC workloads in the cloud.
At its heart, Fuzzball is a container orchestration platform that is geared toward jobs instead of services. This simple change in philosophy along with a clear administration layer and a thoughtful user interface allows Fuzzball to overcome the problems typically associated with traditional HPC.
Fuzzball has a clean, intuitive Graphical User Interface (GUI) and Command Line Interface (CLI) making it easy for users to submit and monitor workflows. Workflows themselves are specified in YAML documents called Fuzzfiles. These files are easy to generate via the Fuzzball GUI’s workflow editor. Every workflow is container-based and Fuzzball abstracts hardware details. This ensures that workflows are portable and reproducible. They can be run on-prem or in the cloud. Fuzzball also has a clearcut administrative layer allowing admins to easily manage settings and permissions at the organization, group, and individual user levels.
The Fuzzball solution is composed of 2 parts:
Orchestrate runs on a central server node, and accepts requests from users in the form of Fuzzfiles. These YAML documents specify everything needed to run a workflow. Orchestrate allocates computational resources appropriate for jobs within the workflow, downloads containers, sets up storage volumes and initiates jobs. Orchestrate itself runs on Kubernetes.
Substrate runs on the compute nodes. It is a purpose-built container platform that interacts with Orchestrate to set up the appropriate environment and run jobs within a workflow. Substrate is a lightweight service that runs directly on the host operating system (outside of Kubernetes) and incurs virtually no computational overhead. Conceptually, Substrate adopts much of the same design philosophy as the famous HPC container platform Apptainer (formerly known as Singularity).
Fuzzball has features that position it as HPC 2.0.
Fuzzball is completely API driven and has an intuitive web-based Graphical User Interface (GUI).
There is no need for an HPC user to open a terminal emulator, use the secure shell (ssh
) utility,
understand login nodes vs compute nodes, learn the details of a batch scheduling system, or perform
file management tasks with Linux commands. This greatly simplifies the onboarding process for new
users and democratizes HPC.
Advanced users who prefer text based interaction can use the Fuzzball Command Line Interface (CLI) that provides the same functionality as the GUI through a terminal.
Fuzzball is completely container based and abstracts concepts like hardware and storage. This makes your workflows reproducible and enables sharing workflows with colleagues even if are users on a different cluster.
The same innovations that make Fuzzball workflows reproducible enable portability between on-prem and cloud environments. Tasks like provisioning and allocating compute nodes are abstracted away in both on-prem and cloud environments leaving users free to concentrate on their workflows.
Containers are the most widely used deployment method for scientific software, AI frameworks, modeling suites, etc. Because Fuzzball is completely container based, users can usually access the software they need simply by providing a URI. If the software does not exist in a container, it is usually much easier to create a new container with the required packages than to install and maintain software on an HPC system.
Fuzzball provides integrated MPI support for a broad range of common implementations. This means that users of distributed programs won’t be limited by the MPI implementation that is available on the system. And they won’t be required to recompile programs or to wait until their administrators update and recompile the software stack for them. Fuzzball MPI integration makes it easy to use MPI with containerized software that may otherwise prove difficult or impossible to run on multiple nodes.
Fuzzball has an easy management model allowing owners and users to be categorized at the organization, group, or individual user level. Within traditional HPC, user accounts are managed via the tools that are available in the Linux distribution as well as services like Active Directory. Integration between these services is usually the responsibility of the cluster administrators, and one-off bespoke solutions are common. File system access must be managed by Linux Groups, and it is not unusual for users to accidentally set incorrect permissions making them unable to access their own data or mistakenly making their data accessible to others. These time-consuming administrative hassels are solved by Fuzzball.
Fuzzball creates interoperability between workflows that have been created for on-prem systems and cloud environments. By re-imagining the idea of a batch scheduling system to encompass the capability of provisioning cloud resources on demand, workflows become flexible sets of instructions that can be executed anywhere.
These guides are designed to teach you everything you need to know to use and administer Fuzzball. In most cases, the examples are meant to work, or to almost work with very little modification. You should be able to follow along and actually perform the examples within the guides.
The guides are broken into sections and arranged in order from most basic usage to the most advanced administrative concepts. This allows you to simply read what you need. Many users can get the general concepts from the Fuzzball in 15 Minutes guide and start running their workflows. Others will need to additionally learn about Fuzzfiles, Volumes, and Secrets to access their data and get the most from Fuzzball. More advanced users with specific HPC or AI/ML style workflows will benefit from the parallel and distributed docs and the advanced hardware section. Developers will be interested in the SDK section, and admins supporting on-prem installations of Fuzzball will find it useful to familiarize themselves with the Cluster Admin Guide.
Throughout the guides, technical terms are highlighted on their first use within any given page. Clicking on these terms will open the appropriate entry within the Glossary.