Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Requirements

Before setting up the Slurm or PBS integration for Fuzzball, you should ensure you have met these requirements.

Please select either the Slurm or PBS tab to see the appropriate instructions for your environment.
  1. Functional Slurm Cluster:

    • Slurm controller (slurmctld) running and accessible
    • One or more compute nodes with slurmd daemons
    • Standard Slurm commands available (sbatch, squeue, scancel, optionally sacct)
  2. Fuzzball Substrate Components:

    • fuzzball-substrate binary installed (not running) on all compute nodes
    • fuzzball-substrate-orchestrate extension installed and configured
    • Substrate binaries accessible in the compute node PATH or at a known location
  3. Network and Authentication:

    • SSH access from Fuzzball Orchestrate to Slurm head node
    • Network connectivity from compute nodes to Fuzzball Orchestrate
    • Ensure there is network connectivity from compute nodes to the outside internet (maybe through a NAT gateway) for image pulls. Note that this should be configured before installing Fuzzball if possible. If it is configured afterward, it might cause the network interfaces of the Fuzzball K8s pods to become blocked by the firewall. In that case, you can add them to the trusted zone like so. (Your IP addresses and interfaces will differ.)
    # firewall-cmd --permanent --zone=trusted --add-interface=flannel.1 # Add Flannel overlay interface
    
    # firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 # Add the pod network as a trusted source
    
    # firewall-cmd --permanent --zone=trusted --add-interface=enp8s0 # The internal interface (enp8s0) should also be trusted
    
    # firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24 # Also trust the internal compute node network
    
    # firewall-cmd --reload # Reload firewall configuration
    
    # firewall-cmd --zone=trusted --list-all # Verify configuration
    • Appropriate firewall rules for bidirectional communication
    • Hosts file configured to allow compute nodes to contact substrate-bridge. This will require you to add IP addresses configured for metallb into /etc/hosts as aliases to Fuzzball Orchestrate node. For example:
    10.0.0.4 fuzzball-orchestrate-host 10.0.0.149 substrate-bridge.10.0.0.149.nip.io
  4. Permissions:

    • A service account user with permission to SSH to compute nodes and submit and manage Slurm jobs
    • Password-less Sudo permissions for Substrate binary execution (needed for container namespace isolation)
  5. Proper type of root file system on compute nodes:

    • This can be an issue in Warewulf clusters where the operating system is served by rootfs by default. If you are using the 2 stage boot process, your nodes will be using tmpfs which is supported. If you want to force your compute nodes to use tmpfs, run the following on your Warewulf head node and then reboot your compute nodes.
    $ wwctl profile set default --root=tmpfs
  1. Functional PBS Cluster:

    • PBS server running and accessible (PBS Professional or Torque)
    • One or more compute nodes with PBS mom daemons
    • Standard PBS commands available (qsub, qstat, qsig, optionally qdel)
  2. Fuzzball Substrate Components:

    • fuzzball-substrate binary installed (not running) on all compute nodes
    • fuzzball-substrate-orchestrate extension installed and configured
    • Substrate binaries accessible in the compute node PATH or at a known location
  3. Network and Authentication:

    • SSH access from Fuzzball Orchestrate to PBS head node
    • Network connectivity from compute nodes to Fuzzball Orchestrate
    • Ensure there is network connectivity from compute nodes to the outside internet (maybe through a NAT gateway) for image pulls. Note that this should be configured before installing Fuzzball if possible. If it is configured afterward, it might cause the network interfaces of the Fuzzball K8s pods to become blocked by the firewall. In that case, you can add them to the trusted zone like so. (Your IP addresses and interfaces will differ.)
    # firewall-cmd --permanent --zone=trusted --add-interface=flannel.1 # Add Flannel overlay interface
    
    # firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 # Add the pod network as a trusted source
    
    # firewall-cmd --permanent --zone=trusted --add-interface=enp8s0 # The internal interface (enp8s0) should also be trusted
    
    # firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24 # Also trust the internal compute node network
    
    # firewall-cmd --reload # Reload firewall configuration
    
    # firewall-cmd --zone=trusted --list-all # Verify configuration
    • Appropriate firewall rules for bidirectional communication
    • Hosts file configured to allow compute nodes to contact substrate-bridge. This will require you to add IP addresses configured for metallb into /etc/hosts as aliases to Fuzzball Orchestrate node. For example:
    10.0.0.4 fuzzball-orchestrate-host 10.0.0.149 substrate-bridge.10.0.0.149.nip.io
  4. Permissions:

    • A service account user with permission to SSH to compute nodes and submit and manage PBS jobs
    • Password-less Sudo permissions for Substrate binary execution (needed for container namespace isolation)
  5. Proper type of root file system on compute nodes:

    • This can be an issue in Warewulf clusters where the operating system is served by rootfs by default. If you are using the 2 stage boot process, your nodes will be using tmpfs which is supported. If you want to force your compute nodes to use tmpfs, run the following on your Warewulf head node and then reboot your compute nodes.

      $ wwctl profile set default --root=tmpfs