p3-submit-proteome-comparison

Submit a Proteome Comparison Request

This script submits a request to compare proteins against a reference genome. In addition to the reference genome ID, it takes as input one or more protein feature sets. These can be feature groups, protein FASTA files, or other genomes.

Usage Synopsis

p3-submit-proteome-comparison [options] output-path output-name

Start a proteome comparison job, producing output in the specified workspace path, using the specified name for the base filename of the output files.

Command-Line Options

  • –workspace-path-prefix

Base workspace directory for relative workspace paths.

  • –workspace-upload-path

Name of workspace directory to which local files should be uplaoded.

  • –overwrite

If a file to be uploaded already exists and this parameter is specified, it will be overwritten; otherwise, the script will error out.

  • –genome-ids

Main list of genome IDs, comma-delimited. Alternatively, this can be a local file name. If specified, the file must be tab-delimited, with a header line, containing the genome IDs in the first column. The genome IDs in this file can optionally be enclosed in quotes, allowing a text file download of a BV-BRC genome group or genome display to be used.

  • –protein-fasta

List of protein fasta files. These operate as virtual genomes containing the proteins in the FASTA file. (They may, in fact, be the protein fasta files of real genomes.) For multiple values, specify the option multiple times.

  • –user-feature-group

List of BV-BRC feature group names. These are specified as workspace files, so they are modified by the workspace path prefix, but they should not have the ws: prefix. Each group is treated as a virtual genome containing the proteins in the group. For multiple groups, specify the option multiple times.

  • –reference-genome-id

ID of the reference genome. If omitted, the first genome in the --genome-ids list will be used.

The following parameters determine whether a match between two proteins is acceptable. The matches are performed by BLASTP, so most of these correspond to BLAST parameters.

  • –min-seq-cov

The minimum coverage of the sequences for the match to be accepted. The default is 0.30 (30%).

  • –max-e-val

The maximum e-value of the sequence match for the match to be accepted. The default is 1e-5.

  • –min-ident

The minimum fraction identity for a match to be accepted. The default is 0.1 (10%).

  • –min-positive

The minimum fraction for positive-scording positions in a match. The default is 0.2 (20%).

The following options are used for assistance and debugging.

  • –help

Display the command-line usage and exit.

  • –dry-run

Display the JSON submission string and exit without invoking the service or uploading files.