Protein Structure Prediction — Recipes¶
This is a cookbook for the Protein Structure Prediction service. If you’re new to the service, start with the Tutorial — it walks you through a single-protein prediction end-to-end. Once you’re comfortable with the form, the recipes here cover the patterns that go beyond the defaults: multi-chain complexes, ligands, MSAs you supply or have computed, and reruns.
For the full reference of every form field see the Quick Reference Guide.
Choosing a recipe¶
Your situation |
Recipe |
|---|---|
Single protein, just want a structure |
Tutorial — uses Auto → ESMFold |
Multi-chain protein complex |
|
Protein + a small-molecule cofactor (ATP, NAD, …) |
|
Protein + a novel small molecule |
|
Need an MSA but don’t have one |
|
Have an MSA, want best-quality fold |
|
Already ran one — try a different engine |
Recipe 1 — Multi-chain protein complex with Boltz¶
Goal: predict the structure of a complex (e.g. an antibody Fv with heavy + light chains, or a homo-tetramer).
Inputs: one FASTA file with multiple > records — each record becomes one chain. Soft limit: 26 chains and 10,000 total residues per job.
>heavy_chain
QVQLVQSGAEVKKPGASVKVSCKASGYTFT...
>light_chain
DIQMTQSPSSLSASVGDRVTITCRASQDIS...
Form settings:
Field |
Value |
|---|---|
Prediction Tool |
|
Protein |
the multi-chain FASTA |
MSA Source |
Use MSA Server or Service (recommended) or Precomputed MSA from Workspace if you already have one |
Other inputs |
leave empty |
Job Name |
something descriptive — e.g. |
Why Boltz for complexes? Boltz, OpenFold 3, and Chai-1 are all AF3-class diffusion models with strong multimer performance. Boltz is the auto-selector’s first choice when an MSA is available; OpenFold and Chai are reasonable swaps if you want to compare.
Recipe 2 — Protein with a cofactor (CCD code)¶
Goal: fold a protein with a known cofactor in place (e.g. an ATP-binding domain with ATP modeled).
Inputs: the protein FASTA plus a CCD code in the Ligands input.
Form settings:
Field |
Value |
|---|---|
Prediction Tool |
|
Protein |
your FASTA |
Ligands → Notation |
|
Ligands → input |
|
MSA Source |
Use MSA Server or Service or Precomputed |
Common CCD codes: ATP, ADP, GTP, NAD, FAD, HEM, MG, ZN, CA. Glycans use their CCD codes too (NAG, MAN, BMA).
Limitation: the form’s Notation selector carries one notation per submission. To mix CCD ligands and SMILES ligands in the same job, you need the CLI (see CLI Reference).
Recipe 3 — Protein with a SMILES ligand¶
Goal: fold a protein with a novel small molecule (no CCD code yet) in place — e.g. a drug candidate.
Inputs: the protein FASTA plus the molecule as a SMILES string.
Form settings:
Field |
Value |
|---|---|
Prediction Tool |
|
Protein |
your FASTA |
Ligands → Notation |
|
Ligands → input |
the SMILES (e.g. |
MSA Source |
Use MSA Server or Service |
The form validates SMILES as you type; the first invalid line is reported inline. Standard SMILES syntax — branching parentheses must balance, atoms come from the typical organic set + halogens.
Tip: if you have the molecule’s 2D structure in another tool (e.g. RDKit, ChemDraw), export to canonical SMILES first.
Recipe 4 — Let BV-BRC compute the MSA with ColabFold¶
Goal: predict a structure that needs an MSA, but you don’t have one and don’t want to build it yourself.
Form settings:
Field |
Value |
|---|---|
Prediction Tool |
|
Protein |
your FASTA |
MSA Source |
Use MSA Server or Service |
The service runs MMseqs2 against UniRef + ColabFoldDB and feeds the result to the selected engine. Expect 30 s – 3 min of extra wall time before folding starts.
When not to use this: if you already have an MSA from your own pipeline (JackHMMER, ColabFold local, etc.), upload it instead — your MSA is likely deeper / better filtered than the server’s defaults.
Recipe 5 — Uploaded MSA with Boltz, OpenFold, or Chai¶
Goal: highest-quality fold, with an MSA you’ve curated.
Inputs: a .a3m, .sto, or .pqt MSA file uploaded to your workspace.
Form settings:
Field |
Value |
|---|---|
Prediction Tool |
|
Protein |
the query FASTA matching the MSA’s first sequence |
MSA Source |
Precomputed MSA from Workspace |
MSA File |
the uploaded |
The service auto-converts A3M ↔ Parquet for the engine that needs it. Stockholm is converted to A3M-internally for Boltz / OpenFold.
Tip: a deeper MSA (more sequences) usually helps until ~256 effective sequences, after which gains flatten. ColabFold’s default depth of 5,000 raw → ~256 effective is a reasonable target.
Recipe 6 — Rerun an existing job with a different tool¶
Goal: you’ve run a fold and want to compare a different engine on the same inputs.
Open the completed job in your workspace.
Click the Rerun button in the Action Bar.
The form re-opens pre-filled with the prior submission’s parameters.
Change the Prediction Tool to the engine you want to compare against.
Update the Job Name (the default will collide otherwise —
crambin-esmfold→crambin-boltz, etc.).Submit.
This is the cleanest way to A/B-test engines without rebuilding the form from scratch.
What’s not on the form¶
These advanced knobs are not surfaced on the web form. Reach them via the CLI or JSON-RPC API:
num_samples— how many independent predictions to generate (default 5; useful for assessing ensemble disagreement)num_recycles— recycling iterations (engine-specific defaults; AlphaFold defaults to 3)seed— random seed for reproducibility (engine-specific)output_format— restrict output to a subset (pdb,cif,report,confidence, …)Per-engine flags — see each engine’s adapter in
PredictStructureApp/adapters/
See the CLI Reference and API Reference for the full parameter surface.
See also¶
Protein Structure Prediction Service Tutorial — the novice walkthrough
Quick Reference Guide — every form field, by section
Protein Structure Prediction Service — open the form