Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Saving BLAST Results to AWS S3

The pre-installed BLAST workflow template obtains query sequences with efetch from NCBI or curl from any https URL and saves results to to a configurable path under the persistent DataVolume. What if you wanted to fetch query sequences from AWS S3 and save results to AWS S3 instead and you did not need the ability to create custom BLAST databases? To achieve that we will first follow the workflow catalog documentation to create a copy of the BLAST workflow and call it Blast S3. Then open the detail view on that Application and make the following changes:

Replace the workflow template

Since we are removing some functionality this template will be simpler and we will make use of the ingress and egress functionality of the volumes instead of using a job to fetch data and saving to a persistent volume. You can use the following template:

{{- $dbpath := list "/data" .BlastDbPath | join "/"  }}
{{- $dbname := .BlastDbName }}

version: v1
volumes:
  data:
    reference: {{.DataVolume}}
  scratch:
    reference: {{.ScratchVolume}}
    ingress:
      - source:
          uri: "{{.S3Uri}}/{{.RunName}}.fa"
          secret: {{.S3Secret}}
        destination:
          uri: "file://{{.RunName}}.fa"
    egress:
      - source:
          uri: "file://{{.RunName}}.blast.out"
        destination:
          uri: "{{.S3Uri}}/{{.RunName}}.blast.out"
          secret: {{.S3Secret}}
jobs:
  fetch-db:
    image:
      uri: {{.WorkflowContainer}}
    mounts:
      data:
        location: /data
      scratch:
        location: /scratch
    command:
      - /bin/bash
      - "-c"
      - |
        mkdir -p "{{$dbpath}}" && cd "{{$dbpath}}" || exit 1
        # fix update_blastdb.pl if it's from a conda container
        ubdb="$(type -p update_blastdb.pl)"
        curl="$(type -p curl)"
        [[ -z "$ubdb" || -z "$curl" ]] && exit 1
        if [[ "$ubdb" =~ conda ]]; then
          sed "s:^my \\\$curl.*$:my \$curl = '$curl';:" "${ubdb}" > update_blastdb.pl
        else
          cp "${ubdb}"  update_blastdb.pl
        fi
        chmod 750 update_blastdb.pl

        if  ./update_blastdb.pl --showall | grep -q {{$dbname}} ; then
          echo "{{$dbname}} is a public database available from NCBI"
          now=$(date +%s)
          if [[ -e {{$dbname}}__ ]]; then
            last=$(cat {{$dbname}}__)
            if (( (now - last) < 86400 )) ; then
              echo "  {{$dbname}} is current - update skipped."
              exit
            fi
          fi
          echo $now > {{$dbname}}__
          echo "  updating/downloading {{$dbname}}"
          ./update_blastdb.pl --num_threads=2 --decompress {{$dbname}} && exit 0 || exit 1
        else
          echo "{{$dbname}} is not a public database"
          exit 1
        fi        
    resource:
      cpu:
        cores: 1
        threads: true
      memory:
        size: 4GiB
    policy:
      timeout:
        execute: {{.BlastFetchTimeout}}
  run-blast:
    image:
      uri: {{.WorkflowContainer}}
    mounts:
      data:
        location: /data
      scratch:
        location: /scratch
    cwd: /scratch
    command:
      - /bin/sh
      - "-c"
      - |
        {{.BlastCmd}} -num_threads {{.BlastCores}} -query "{{.RunName}}.fa" -db {{$dbpath}}/{{$dbname}} -out {{.RunName}}.blast.out {{.BlastOpts}} || exit 1
        cat {{.RunName}}.blast.out        
    resource:
      cpu:
        cores: {{.BlastCores}}
        affinity: NUMA
      memory:
        size: {{.BlastMemory}}
    policy:
      timeout:
        execute: {{.BlastQueryTimeout}}
    requires: [fetch-db]

Adapt the parameters

  • Remove the following parameters: CustomBlastDbFetchCmd, CustomBlastDbOptions, CustomBlastDbName, CustomBlastDbTimeout, RetrieveQuerySequencesCmd, and BlastOutputPath
  • Add 2 new parameters:
    • Name: S3Uri Type: Text Description: AWS URI prefix for blast inputs and outputs. Default: A URI like s3://<bucket>/<path...>
    • Name: S3Secret Type: Text Description: Secret from the secret store with AWS credentials. Default: A URI like secret://<scope>/<name>

Run the workflow

Once you save the modified BLAST workflow template you can run analogously to the stock BLAST workflow template using the appropriate parameters.