Complicated Job DAGs, Ingress, and Egress
We can continue to build upon this Fuzzfile to create a workflow with a more complicated Directed Acyclic Graph (DAG) of jobs and dependencies. We will also add data ingress and egress to demonstrate the process of importing and exporting data to and from your workflow.
Our new workflow should create multiple jobs that will run the Cowsay program on the input data using different options to produce different results. Instead of creating a fortune from scratch, we will use data ingress to import data to our workflow, and then add to the data using the fortune command. Finally, we will use job to concatenate all of our results into a single file and data egress will move the file to an S3 bucket where we have read/write access through a saved secret.
This example requires that you are able to create ephemeral volumes, you have access to an external S3 bucket, and you have configured a secret to access the bucket. You may need to spend some time reviewing the linked docs and/or you may need your administrator to set permissions for you.
You can open the workflow that we created in the last example in the Workflow Editor. If it is not
already open you can navigate to the Workflows tab (on the left), find the workflow that you
executed in the last example containing the cowsay
command displaying text created by the
fortune
command, open it in the workflow dashboard, and select “Open in Workflow Editor” in the
upper right.
First, let’s add data ingress to the existing volume so that it starts with some data already saved
on it. We’ll just grab a public file from GitHub containing the text “Hello World!”. Click on the
vertical Volumes tab, and open the Volume we created to support the jobs (called testVolume
in our
example). Now click on the Add Ingress button:
A new dialog box opens up to guide you through the process of creating ingress. You can use the drop-down under the Location heading to select “https”. Now you can add the following text to the dialog box to get the README.md file from one of Octocat’s repo on GitHub.
raw.githubusercontent.com/octocat/Hello-World/master/README
Since, this file will be used as the starting point for a new fortune, we will set the destination
to be a file called fortune.txt
. (Keeping the file name the same as the last example will also
help minimize the custom changes we have to make.)
After pressing “OK” you can double check the changes in the new Ingess box in the Volume Configuration tab.
Now let’s make a small tweak to our fortune
job. First click the job to open the Job tab in the
job configuration menu. Add a second angle bracket (>
) to the command so that standard out is
appended to the file end instead of overwriting. The command should look like this:
fortune >>/tmp/fortune.txt
Now the fortune
command will append some text to the file we got from GitHub instead of
overwriting it.
Now let’s tweak our cowsay job so that it saves a file instead of echoing the standard output to the
log. Add the following to the commands end >/tmp/cow1.txt
so that the full command looks like
this:
cat /tmp/fortune.txt | cowsay >/tmp/cow1.txt
Next, let’s make a few more jobs that will run in
parallel with the job that executes
the cowsay
command. These jobs will create a few different ascii-art files. Use the button with
the plus sign in the lower right corner and drag and drop 2 more jobs into the workflow grid. You
can name them sheepsay
and tuxsay
. Finally, you can draw lines from the fortune
job to these
new jobs to indicate dependencies.
If you are comfortable editing the Fuzzfile directly, it might be easier to press the “e” key to open the editor and then copy text blocks under thejobs
field to “clone” thecowsay
job with all of its settings in place.
If your workflow looks a little more messy, you can use the “o” key to automatically organize the workflow jobs.
Now we need to configure the new jobs. You can simply copy all of the settings from the cowsay
job
to the newly created sheepsay
and tuxsay
jobs. Don’t forget to copy the
Mounted Volume configuration as well.
The one change that we will make to the sheepsay
and tuxsay
jobs respectively is that we will
need to tweak their commands to make them produce different output and save it to different files.
You can make their commands look like this:
cat /tmp/fortune.txt | cowsay -f sheep >/tmp/cow2.txt
cat /tmp/fortune.txt | cowsay -f tux >/tmp/cow3.txt
Now let’s create another job that will concatenate all of the results into a single file. Of course, this job will need to run after all of the other jobs have completed. The DAG will reflect this.
Use the drag and drop widget to create another job in your workflow grid. Name this job
concatenate
. Draw connections from all 3 *say
jobs to the top of the new concatenate
job. Your
finished workflow should look like this in the editor.
In the Job tab, make sure to add the following command to the concatenate
job.
cat /tmp/cow*txt >/tmp/output.txt
We don’t need any special programs to be installed in the container that supports this job, so you can use the lightweight Alpine container by heading to the Environment tab and adding the following URI:
docker://alpine
While you are there, you can go ahead and click the Add Mounted Volume button and bind the
testVolume
to /tmp
as we did with the previous jobs.
Then you can go to the Resources tab and
allocate 1 core and 1GB memory. Once you save your changes, the concatenate
jobs should be fully
configured.
To finish out the example, let’s make sure that the output.txt
file gets saved in a location of
our choice using data egress. You can click on the vertical Volumes tab and click on the
testVolume
that we set up earlier. Then you can click the button to Add Data Egress. In the dialog
box that opens, add the appropriate values to access your S3 bucket. In my case these values will do
the trick, but you will need to use something different to access your S3 bucket with your
configured secret.
And that’s it! After running the workflow and downloading the resulting output.txt
file you can
see that it contains something like this. (Your fortune will probably be different.)
_________________________________________
/ Hello World! If you sow your wild oats, \
\ hope for a crop failure. /
-----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
_________________________________________
/ Hello World! If you sow your wild oats, \
\ hope for a crop failure. /
-----------------------------------------
\
\
__
UooU\.'@@@@@@`.
\__/(@@@@@@@@@@)
(@@@@@@@@)
`YY~~~~YY'
|| ||
_________________________________________
/ Hello World! If you sow your wild oats, \
\ hope for a crop failure. /
-----------------------------------------
\
\
.--.
|o_o |
|:_/ |
// \ \
(| | )
/'\_ _/`\
\___)=(___/
As in the previous sections, you can see the Fuzzfile at any time from the Workflow Editor by clicking the ellipsis menu in the lower right of the workflow grid and selecting “Edit YAML” or by pressing “e” on your keyboard. You can also view the Fuzzfile by clicking on the “Definition” tab in the “Workflows” dashboard. The Fuzzfile that is generated by the Workflow Editor is now pretty complicated. But hopefully it is also approachable and understandable after we’ve gone through the exercise of creating it.
version: v1
jobs:
cowsay:
image:
uri: oras://godlovedc/lolcow:sif
mounts:
testVolume:
location: /tmp
command:
- /bin/sh
- '-c'
- cat /tmp/fortune.txt | cowsay >/tmp/cow1.txt
requires:
- fortune
resource:
cpu:
cores: 1
memory:
size: 1GB
tuxsay:
image:
uri: oras://godlovedc/lolcow:sif
mounts:
testVolume:
location: /tmp
command:
- /bin/sh
- '-c'
- cat /tmp/fortune.txt | cowsay -f tux >/tmp/cow3.txt
requires:
- fortune
resource:
cpu:
cores: 1
memory:
size: 1GB
fortune:
image:
uri: oras://godlovedc/lolcow:sif
mounts:
testVolume:
location: /tmp
command:
- /bin/sh
- '-c'
- fortune >>/tmp/fortune.txt
resource:
cpu:
cores: 1
memory:
size: 1GB
sheepsay:
image:
uri: oras://godlovedc/lolcow:sif
mounts:
testVolume:
location: /tmp
command:
- /bin/sh
- '-c'
- cat /tmp/fortune.txt | cowsay -f sheep >/tmp/cow2.txt
requires:
- fortune
resource:
cpu:
cores: 1
memory:
size: 1GB
concatenate:
image:
uri: docker://alpine
mounts:
testVolume:
location: /tmp
command:
- /bin/sh
- '-c'
- cat /tmp/cow*txt >/tmp/output.txt
requires:
- cowsay
- sheepsay
- tuxsay
resource:
cpu:
cores: 1
memory:
size: 1GB
volumes:
testVolume:
egress:
- source:
uri: file://output.txt
destination:
uri: s3://co-ciq-misc-support/godloved/output.txt
secret: secret://user/GODLOVED_S3
ingress:
- source:
uri: https://raw.githubusercontent.com/octocat/Hello-World/master/README
destination:
uri: file://fortune.txt
reference: volume://user/ephemeral
If you want to replicate this or any of the workflows in these examples, but you don’t want to manually recreate them using the Workflow Editor, you can always copy and paste this text into a file and open the file in the Workflow Editor. Or you can just press “e” to open the text editor window in the Workflow Editor and paste in this text!
The preceding sections have covered the major aspects of building workflow through the GUI. We will cover job arrays, distributed jobs, GPU jobs and other advanced resource requests in other sections.