Requirements
Before setting up the Slurm or PBS integration for Fuzzball, you should ensure you have met these requirements.
Please select either the Slurm or PBS tab to see the appropriate instructions for your
environment.
Functional Slurm Cluster:
- Slurm controller (slurmctld) running and accessible
- One or more compute nodes with slurmd daemons
- Standard Slurm commands available (
sbatch,squeue,scancel, optionallysacct)
Fuzzball Substrate Components:
fuzzball-substratebinary installed (not running) on all compute nodesfuzzball-substrate-orchestrateextension installed and configured- Substrate binaries accessible in the compute node PATH or at a known location
Network and Authentication:
- SSH access from Fuzzball Orchestrate to Slurm head node
- Network connectivity from compute nodes to Fuzzball Orchestrate
- Ensure there is network connectivity from compute nodes to the outside internet (maybe through a NAT gateway) for image pulls. Note that this should be configured before installing Fuzzball if possible. If it is configured afterward, it might cause the network interfaces of the Fuzzball K8s pods to become blocked by the firewall. In that case, you can add them to the trusted zone like so. (Your IP addresses and interfaces will differ.)
# firewall-cmd --permanent --zone=trusted --add-interface=flannel.1 # Add Flannel overlay interface # firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 # Add the pod network as a trusted source # firewall-cmd --permanent --zone=trusted --add-interface=enp8s0 # The internal interface (enp8s0) should also be trusted # firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24 # Also trust the internal compute node network # firewall-cmd --reload # Reload firewall configuration # firewall-cmd --zone=trusted --list-all # Verify configuration- Appropriate firewall rules for bidirectional communication
- Hosts file configured to allow compute nodes to contact substrate-bridge. This will require you
to add IP addresses configured for metallb into
/etc/hostsas aliases to Fuzzball Orchestrate node. For example:
10.0.0.4 fuzzball-orchestrate-host 10.0.0.149 substrate-bridge.10.0.0.149.nip.ioPermissions:
- A service account user with permission to SSH to compute nodes and submit and manage Slurm jobs
- Password-less Sudo permissions for Substrate binary execution (needed for container namespace isolation)
Proper type of root file system on compute nodes:
- This can be an issue in Warewulf clusters where the operating system is served by rootfs by default. If you are using the 2 stage boot process, your nodes will be using tmpfs which is supported. If you want to force your compute nodes to use tmpfs, run the following on your Warewulf head node and then reboot your compute nodes.
$ wwctl profile set default --root=tmpfs
Functional PBS Cluster:
- PBS server running and accessible (PBS Professional or Torque)
- One or more compute nodes with PBS mom daemons
- Standard PBS commands available (
qsub,qstat,qsig, optionallyqdel)
Fuzzball Substrate Components:
fuzzball-substratebinary installed (not running) on all compute nodesfuzzball-substrate-orchestrateextension installed and configured- Substrate binaries accessible in the compute node PATH or at a known location
Network and Authentication:
- SSH access from Fuzzball Orchestrate to PBS head node
- Network connectivity from compute nodes to Fuzzball Orchestrate
- Ensure there is network connectivity from compute nodes to the outside internet (maybe through a NAT gateway) for image pulls. Note that this should be configured before installing Fuzzball if possible. If it is configured afterward, it might cause the network interfaces of the Fuzzball K8s pods to become blocked by the firewall. In that case, you can add them to the trusted zone like so. (Your IP addresses and interfaces will differ.)
# firewall-cmd --permanent --zone=trusted --add-interface=flannel.1 # Add Flannel overlay interface # firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 # Add the pod network as a trusted source # firewall-cmd --permanent --zone=trusted --add-interface=enp8s0 # The internal interface (enp8s0) should also be trusted # firewall-cmd --permanent --zone=trusted --add-source=10.0.0.0/24 # Also trust the internal compute node network # firewall-cmd --reload # Reload firewall configuration # firewall-cmd --zone=trusted --list-all # Verify configuration- Appropriate firewall rules for bidirectional communication
- Hosts file configured to allow compute nodes to contact substrate-bridge. This will require you
to add IP addresses configured for metallb into
/etc/hostsas aliases to Fuzzball Orchestrate node. For example:
10.0.0.4 fuzzball-orchestrate-host 10.0.0.149 substrate-bridge.10.0.0.149.nip.ioPermissions:
- A service account user with permission to SSH to compute nodes and submit and manage PBS jobs
- Password-less Sudo permissions for Substrate binary execution (needed for container namespace isolation)
Proper type of root file system on compute nodes:
This can be an issue in Warewulf clusters where the operating system is served by rootfs by default. If you are using the 2 stage boot process, your nodes will be using tmpfs which is supported. If you want to force your compute nodes to use tmpfs, run the following on your Warewulf head node and then reboot your compute nodes.
$ wwctl profile set default --root=tmpfs