BLAST
The Basic Local Alignment Search Tool (BLAST) uses a local alignment algorithm to compare protein or nucleotide queries to sequence databases and identify matches with local sequence similarity. It assigns each match a statistical significance score based on the probability of observing comparable matches by chance given the query and database. Matches can be used to identify families of similar sequence which may share similar functions.
The workflow catalog provides a BLAST workflow template. It can
- download, cache, and update public BLAST databases to a persistent volume or
- create and optionally save for reuse a custom BLAST database from a set of sequences in blast format
Sequences to query against these databases can be provided by downloading data via efetch from NCBI or by fetching any publicly available fasta file. Results are written to a configurable path on a persistent volume. The workflow could be easily adapted to different strategies for obtaining inputs and persisting results (e.g. pushing query results to a cloud object storage). Since BLAST databases can become quite large, using a cache in a persistent storage volume (as this workflow template does) reduces time spent on data transfers, which is appropriate for more frequent BLAST queries.
The following section will walk you through: