PredictStructure CLI Reference¶
The predict-structure command-line tool is the workhorse beneath the BV-BRC Protein Structure Prediction Service. It exposes the same five engines (Boltz-2, OpenFold 3, Chai-1, AlphaFold 2, ESMFold) through a single Click-based interface with shared global options and per-tool subcommands.
You can run it three ways:
Directly on a workstation with the tools installed (or via Docker / Apptainer / Singularity containers).
As CWL workflow — every subcommand has a matching
cwl/tools/<tool>.cwldefinition.Through BV-BRC AppService — the Perl wrapper
service-scripts/App-PredictStructure.plbuilds and executes the same CLI under the hood.
Source: CEPI-dxkb/PredictStructureApp ·
predict_structure/cli.py
Installation¶
Prerequisites¶
Python 3.10+ (3.12 recommended)
conda or miniconda
One or more of: Docker / Apptainer / Singularity (if you want to use the per-tool containers instead of native installs)
A GPU (any tool other than ESMFold; ESMFold has a CPU-capable path)
Quick start¶
conda create -n predict-structure python=3.12 -y
conda activate predict-structure
git clone https://github.com/CEPI-dxkb/PredictStructureApp.git
cd PredictStructureApp
pip install -e ".[all]"
predict-structure --version
predict-structure --help
Optional dependency groups¶
Group |
Adds |
Install |
|---|---|---|
|
PyArrow (A3M → Parquet MSA conversion) |
|
|
PyTorch, Transformers, Accelerate |
|
|
cwltool |
|
|
pytest, black, ruff, mypy |
|
|
Everything above |
|
The prediction tools themselves (boltz, chai-lab, alphafold, etc.) are installed separately or run inside their Docker images.
Command structure¶
predict-structure [GLOBAL_FLAGS] <TOOL> [TOOL_FLAGS] --protein <FASTA> -o <DIR>
predict-structure --job jobs.yaml -o <DIR>
Where <TOOL> is one of auto, boltz, openfold, chai, alphafold, or esmfold. The CLI uses click.group(), so predict-structure <tool> --help shows the per-tool flag set.
Entity flags (inputs)¶
Inputs are specified with explicit entity flags. Every flag is repeatable to build multi-entity complexes.
Flag |
Type |
Description |
|---|---|---|
|
file path |
Protein FASTA (a single multi-record FASTA is treated as one multi-chain complex, not separate jobs). Boltz YAML manifests pass through automatically. |
|
file path |
DNA FASTA |
|
file path |
RNA FASTA |
|
string |
Ligand CCD code (e.g. |
|
string |
SMILES string for arbitrary small molecules |
Global options (every subcommand)¶
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
path |
required |
Output directory |
|
int |
1 |
Number of structure samples (diffusion samples for Boltz/OpenFold/Chai) |
|
int |
3 |
Recycling iterations |
|
int |
(none) |
Random seed |
|
path |
(none) |
Pre-computed MSA ( |
|
enum |
|
|
|
flag |
off |
Verbose logging |
|
flag |
off |
Print the command instead of executing it |
Execution options¶
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
enum |
|
|
|
enum |
|
|
|
string |
(per-tool) |
Override Docker image (docker / apptainer backends) |
|
string |
|
CWL runner command (cwl backend only) |
|
path |
(auto) |
CWL tool definition path (cwl backend only) |
Tool-specific options¶
boltz — Boltz-2¶
predict-structure boltz --protein input.fasta -o output/ \
--num-samples 5 --sampling-steps 200 --use-potentials
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
int |
200 |
Diffusion sampling steps |
|
flag |
off |
Use the ColabFold MSA server (MMseqs2 against UniRef + ColabFoldDB) instead of requiring a pre-computed MSA file |
|
string |
(none) |
Custom MSA server URL (implies |
|
flag |
off |
Enable potential terms (steered diffusion) |
openfold — OpenFold 3¶
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
int |
5 |
Diffusion samples per query |
|
int |
1 |
Independent model seeds |
|
flag |
on |
ColabFold MSA server |
|
flag |
on |
Use template structures |
|
string |
(latest) |
Model checkpoint name |
chai — Chai-1¶
predict-structure chai --protein input.fasta -o output/ \
--num-samples 5 --use-msa-server --no-esm-embeddings
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
int |
200 |
Diffusion timesteps |
|
flag |
off |
Use remote MSA server |
|
string |
(none) |
Custom MSA server URL |
|
flag |
off |
Disable ESM2 language-model embeddings |
|
flag |
off |
Use PDB template server |
|
path |
(none) |
Constraint JSON file |
|
path |
(none) |
Pre-computed template hits |
|
int |
1 |
Independent trunk forward passes |
|
int |
0 |
MSA subsample per recycle (0 = all) |
|
flag |
off |
Disable low-memory mode |
alphafold — AlphaFold 2¶
predict-structure alphafold --protein input.fasta -o output/ \
--af2-data-dir /databases --af2-model-preset monomer
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
path |
required |
AlphaFold database directory (~2 TB) |
|
string |
|
|
|
string |
|
|
|
YYYY-MM-DD |
|
Maximum template date |
esmfold — ESMFold¶
predict-structure esmfold --protein input.fasta -o output/ --fp16 --device cpu
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
flag |
off |
Half-precision inference (faster, lower memory) |
|
int |
(none) |
Chunk size for long sequences |
|
int |
(none) |
Max tokens per batch |
Auto subcommand¶
predict-structure auto --protein input.fasta -o output/ runs the auto-selector. The selection algorithm:
if device == cpu and only protein:
→ ESMFold
for tool in [boltz, openfold, chai, esmfold, alphafold]:
if tool in {alphafold, esmfold} and any non-protein entity:
skip
if tool in {boltz, openfold, chai} and protein and no MSA:
skip # diffusion tools need real MSA; dummy single-sequence MSA produces unusable output
if tool == alphafold and AF database dir missing:
skip
if tool not installed:
skip
return tool
raise: no prediction tool found
Batch jobs (--job)¶
The --job flag runs multiple independent predictions from a YAML manifest. It is mutually exclusive with the subcommands — you cannot combine --job with boltz, chai, etc.
predict-structure --job jobs.yaml -o output/
Each job lands in output/job_000/, output/job_001/, … Job manifest schema:
- protein: [/path/to/protein1.fasta]
options:
num_samples: 5
device: gpu
- protein: [/path/to/protein2.fasta]
ligands: [ATP]
tool: boltz
options:
num_samples: 3
use_potentials: true
- protein: [/path/to/protein3.fasta]
dna: [/path/to/dna.fasta]
tool: chai
options:
sampling_steps: 100
Key |
Type |
Description |
|---|---|---|
|
list of paths |
Protein FASTA files |
|
list of paths |
DNA FASTA files |
|
list of paths |
RNA FASTA files |
|
list of strings |
Ligand CCD codes |
|
list of strings |
SMILES strings |
|
string |
Tool name (optional — auto-selects if omitted) |
|
dict |
Any shared or tool-specific option |
Examples¶
# Protein structure prediction with Boltz-2
predict-structure boltz --protein input.fasta -o output/
# Protein with ESMFold (CPU-capable, FP16)
predict-structure esmfold --protein input.fasta -o output/ --fp16
# Chai-1 with pre-computed MSA
predict-structure chai --protein input.fasta -o output/ --msa alignment.a3m
# AlphaFold 2 with local databases
predict-structure alphafold --protein input.fasta -o output/ \
--af2-data-dir /databases
# Auto: pick the best available tool
predict-structure auto --protein input.fasta -o output/
# Multi-entity protein–DNA complex
predict-structure boltz --protein protein.fasta --dna dna.fasta -o output/
# Protein–ligand with CCD code
predict-structure boltz --protein protein.fasta --ligand ATP -o output/
# Protein with SMILES ligand
predict-structure boltz --protein protein.fasta --smiles "CCO" -o output/
# Multi-chain protein with Chai
predict-structure chai --protein chainA.fasta --protein chainB.fasta \
--ligand ATP -o output/
# Dry-run — print the underlying command without executing
predict-structure boltz --protein input.fasta -o output/ --debug
Exit codes¶
Code |
Meaning |
|---|---|
0 |
Success |
1 |
Generic failure (see logs) |
2 |
Usage / argument error (Click) |
3 |
Input validation error (missing required entity, conflicting flags) |
4 |
Tool / dependency not found |
5 |
Runtime error inside the underlying engine |
124 |
Time-out (killed by external scheduler) |
Logging and debugging¶
--verboseenablesINFO-level logging from the CLI and adapters.--debugis a dry-run: prints the full underlying command line and exits 0. Use it to verify parameter mapping before submitting a long job.Setting
P3_DEBUG=1andP3_LOG_LEVEL=DEBUGin the environment turns on the Perl AppService trace path (mirrors the Debug Mode checkbox in the BV-BRC form).Every run writes
metadata/runtime.jsonwith the resolved command, environment, peak memory, and wall time — the first thing to look at when a job behaves strangely.
CWL workflows¶
The CLI is mirrored by a set of CWL tool definitions for use in pipeline runners:
cwl/tools/
predict-structure-app.cwl # Entry point
predict-structure.cwl # Unified CLI wrapper
boltz.cwl
chai.cwl
openfold.cwl
alphafold.cwl
esmfold.cwl
...
cwl/workflows/
protein-structure-prediction.cwl
multi-tool-comparison.cwl # Run all engines side-by-side
boltz-report.cwl
alphafold-report.cwl
...
Run a workflow with cwltool or GoWe:
cwltool cwl/workflows/protein-structure-prediction.cwl cwl/jobs/test-predict-alphafold.yml
See docs/CWL_WORKFLOWS.md in the source repository for the full CWL reference.