# Protein Structure Prediction — Recipes

This is a cookbook for the Protein Structure Prediction service. If you're new to the service, start with the [Tutorial](/tutorial/predict_structure/predict_structure) — it walks you through a single-protein prediction end-to-end. Once you're comfortable with the form, the recipes here cover the patterns that go beyond the defaults: multi-chain complexes, ligands, MSAs you supply or have computed, and reruns.

For the full reference of every form field see the [Quick Reference Guide](/quick_references/services/predict_structure_service).

## Choosing a recipe

| Your situation | Recipe |
|---|---|
| Single protein, just want a structure | [Tutorial](/tutorial/predict_structure/predict_structure) — uses Auto → ESMFold |
| Multi-chain protein complex | [Multi-chain complex with Boltz](#recipe-1-multi-chain-protein-complex-with-boltz) |
| Protein + a small-molecule cofactor (ATP, NAD, …) | [Protein + ligand by CCD code](#recipe-2-protein-with-a-cofactor-ccd-code) |
| Protein + a novel small molecule | [Protein + SMILES ligand](#recipe-3-protein-with-a-smiles-ligand) |
| Need an MSA but don't have one | [Let BV-BRC compute the MSA](#recipe-4-let-bv-brc-compute-the-msa-with-colabfold) |
| Have an MSA, want best-quality fold | [Boltz / OpenFold / Chai with uploaded MSA](#recipe-5-uploaded-msa-with-boltz-openfold-or-chai) |
| Already ran one — try a different engine | [Rerun with a different tool](#recipe-6-rerun-an-existing-job-with-a-different-tool) |

## Recipe 1 — Multi-chain protein complex with Boltz

**Goal:** predict the structure of a complex (e.g. an antibody Fv with heavy + light chains, or a homo-tetramer).

**Inputs:** one FASTA file with multiple `>` records — each record becomes one chain. Soft limit: 26 chains and 10,000 total residues per job.

```
>heavy_chain
QVQLVQSGAEVKKPGASVKVSCKASGYTFT...
>light_chain
DIQMTQSPSSLSASVGDRVTITCRASQDIS...
```

**Form settings:**

| Field | Value |
|---|---|
| Prediction Tool | `Boltz` (or Auto if you've uploaded an MSA) |
| Protein | the multi-chain FASTA |
| MSA Source | *Use MSA Server or Service* (recommended) or *Precomputed MSA from Workspace* if you already have one |
| Other inputs | leave empty |
| Job Name | something descriptive — e.g. `antibody-fv-boltz` |

**Why Boltz for complexes?** Boltz, OpenFold 3, and Chai-1 are all AF3-class diffusion models with strong multimer performance. Boltz is the auto-selector's first choice when an MSA is available; OpenFold and Chai are reasonable swaps if you want to compare.

## Recipe 2 — Protein with a cofactor (CCD code)

**Goal:** fold a protein with a known cofactor in place (e.g. an ATP-binding domain with ATP modeled).

**Inputs:** the protein FASTA plus a CCD code in the Ligands input.

**Form settings:**

| Field | Value |
|---|---|
| Prediction Tool | `Boltz` (or Auto) |
| Protein | your FASTA |
| Ligands → Notation | `CCD codes` |
| Ligands → input | `ATP` (one per line if you have more) |
| MSA Source | *Use MSA Server or Service* or *Precomputed* |

Common CCD codes: `ATP`, `ADP`, `GTP`, `NAD`, `FAD`, `HEM`, `MG`, `ZN`, `CA`. Glycans use their CCD codes too (`NAG`, `MAN`, `BMA`).

**Limitation:** the form's Notation selector carries one notation per submission. To mix CCD ligands and SMILES ligands in the same job, you need the CLI (see [CLI Reference](/quick_references/services/predict_structure_cli)).

## Recipe 3 — Protein with a SMILES ligand

**Goal:** fold a protein with a novel small molecule (no CCD code yet) in place — e.g. a drug candidate.

**Inputs:** the protein FASTA plus the molecule as a SMILES string.

**Form settings:**

| Field | Value |
|---|---|
| Prediction Tool | `Boltz` |
| Protein | your FASTA |
| Ligands → Notation | `SMILES strings` |
| Ligands → input | the SMILES (e.g. `CCO` for ethanol, one per line for multiple) |
| MSA Source | *Use MSA Server or Service* |

The form validates SMILES as you type; the first invalid line is reported inline. Standard SMILES syntax — branching parentheses must balance, atoms come from the typical organic set + halogens.

**Tip:** if you have the molecule's 2D structure in another tool (e.g. RDKit, ChemDraw), export to canonical SMILES first.

## Recipe 4 — Let BV-BRC compute the MSA with ColabFold

**Goal:** predict a structure that needs an MSA, but you don't have one and don't want to build it yourself.

**Form settings:**

| Field | Value |
|---|---|
| Prediction Tool | `Boltz`, `OpenFold`, or `Chai` |
| Protein | your FASTA |
| MSA Source | **Use MSA Server or Service** |

The service runs MMseqs2 against UniRef + ColabFoldDB and feeds the result to the selected engine. Expect 30 s – 3 min of extra wall time before folding starts.

**When *not* to use this:** if you already have an MSA from your own pipeline (JackHMMER, ColabFold local, etc.), upload it instead — your MSA is likely deeper / better filtered than the server's defaults.

## Recipe 5 — Uploaded MSA with Boltz, OpenFold, or Chai

**Goal:** highest-quality fold, with an MSA you've curated.

**Inputs:** a `.a3m`, `.sto`, or `.pqt` MSA file uploaded to your workspace.

**Form settings:**

| Field | Value |
|---|---|
| Prediction Tool | `Boltz` (default), or `OpenFold` / `Chai` to compare |
| Protein | the query FASTA matching the MSA's first sequence |
| MSA Source | **Precomputed MSA from Workspace** |
| MSA File | the uploaded `.a3m` / `.sto` / `.pqt` file |

The service auto-converts A3M ↔ Parquet for the engine that needs it. Stockholm is converted to A3M-internally for Boltz / OpenFold.

**Tip:** a deeper MSA (more sequences) usually helps until ~256 effective sequences, after which gains flatten. ColabFold's default depth of 5,000 raw → ~256 effective is a reasonable target.

## Recipe 6 — Rerun an existing job with a different tool

**Goal:** you've run a fold and want to compare a different engine on the same inputs.

1. Open the completed job in your workspace.
2. Click the **Rerun** button in the Action Bar.
3. The form re-opens pre-filled with the prior submission's parameters.
4. Change the **Prediction Tool** to the engine you want to compare against.
5. Update the **Job Name** (the default will collide otherwise — `crambin-esmfold` → `crambin-boltz`, etc.).
6. **Submit**.

This is the cleanest way to A/B-test engines without rebuilding the form from scratch.

## What's not on the form

These advanced knobs are *not* surfaced on the web form. Reach them via the CLI or JSON-RPC API:

- `num_samples` — how many independent predictions to generate (default 5; useful for assessing ensemble disagreement)
- `num_recycles` — recycling iterations (engine-specific defaults; AlphaFold defaults to 3)
- `seed` — random seed for reproducibility (engine-specific)
- `output_format` — restrict output to a subset (`pdb`, `cif`, `report`, `confidence`, …)
- Per-engine flags — see each engine's adapter in `PredictStructureApp/adapters/`

See the [CLI Reference](/quick_references/services/predict_structure_cli) and [API Reference](/quick_references/services/predict_structure_api) for the full parameter surface.

## See also

- [Protein Structure Prediction Service Tutorial](/tutorial/predict_structure/predict_structure) — the novice walkthrough
- [Quick Reference Guide](/quick_references/services/predict_structure_service) — every form field, by section
- [Protein Structure Prediction Service](https://bv-brc.org/app/PredictStructure) — open the form