Protein Structure Prediction Service¶
Overview¶
The Protein Structure Prediction Service predicts the 3D atomic structure of proteins, protein complexes, and protein–DNA / RNA / ligand assemblies from sequence input. It exposes five state-of-the-art folding engines through a single unified form:
Engine |
Family |
Best for |
|---|---|---|
Boltz-2 |
Diffusion (co-folding) |
Protein + nucleic-acid + ligand / SMILES; binding-affinity-aware |
OpenFold 3 |
Diffusion (AF3-class, open) |
Same scope as Boltz, fully open weights |
Chai-1 |
Diffusion (AF3-class) |
Multi-chain protein complexes, ligands by CCD code |
AlphaFold 2 |
Co-evolutionary (MSA + Evoformer) |
High-accuracy monomer / multimer when MSA is rich |
ESMFold |
Single-sequence (protein language model) |
Fast, CPU-capable monomers; orphan / designed sequences |
The service picks the best available engine automatically when Prediction Tool is left at Auto, or you can pin a specific one. Parameter mapping, format conversion (FASTA → YAML / JSON, mmCIF → PDB, A3M → Parquet), and confidence normalization are handled for you. The output is a ranked set of structures plus a unified confidence report.
See also¶
Using the Protein Structure Prediction Service¶
The Protein Structure Prediction submenu option under the Services main menu (Protein Tools category) opens the prediction input form. Note: You must be logged into BV-BRC to use this service.
Options¶
The form is organized as Prediction Tool, Biomolecular Inputs, Ligands, Multiple Sequence Alignment, and Output. Fields that are not relevant to the selected engine are still visible but ignored (and the help text notes which engines use them).
Prediction Tool¶
Choose the structure prediction engine. Leaving this at Auto lets the service pick the best available tool given your inputs.
Tool |
Model/Version |
Entity support |
MSA handling |
|---|---|---|---|
|
(auto-select) |
all |
yes |
|
Boltz-2 |
protein, DNA, RNA, CCD ligand, SMILES |
Upload required |
|
OpenFold 3 |
protein, DNA, RNA, CCD ligand, SMILES |
Upload required |
|
Chai-1 |
protein, DNA, RNA, CCD ligand |
Upload required |
|
AlphaFold 2 |
protein only |
Built from BV-BRC’s local databases |
|
ESMFold |
protein only |
None (single-sequence) |
Auto-select priority (when tool = auto): Boltz → OpenFold → Chai → ESMFold → AlphaFold. The selector inspects your other inputs and falls back as follows:
Protein |
DNA / RNA / ligand / SMILES |
MSA File |
→ Auto picks |
|---|---|---|---|
✓ |
— |
— |
ESMFold (fast single-sequence); AlphaFold if ESMFold is unavailable |
✓ |
— |
✓ |
Boltz → OpenFold → Chai → ESMFold → AlphaFold |
✓ |
✓ |
— |
ERROR — diffusion tools need MSA; AF / ESMFold cannot use DNA / RNA / ligand |
✓ |
✓ |
✓ |
Boltz → OpenFold → Chai |
— |
DNA / RNA only |
any |
Boltz → OpenFold → Chai |
Biomolecular Inputs¶
Supply at least one FASTA file with Protein, DNA, or RNA sequences. One file per sequence type. Files come from your workspace.
Protein¶
Protein sequence(s) in FASTA format (.fasta, .fa). For a multi-chain complex, place all chains in a single file — each > record is one chain. The service enforces a soft limit of 26 chains and 10,000 total residues per job. Boltz also accepts a native YAML manifest in place of FASTA for full feature support (custom constraints, covalent bonds, modified residues).
DNA¶
DNA sequence(s) in FASTA format. Used as co-folding partners with proteins. Tools that support DNA: Boltz-2, OpenFold 3, Chai-1. Ignored by AlphaFold 2 and ESMFold.
RNA¶
RNA sequence(s) in FASTA format. Same engine support as DNA.
Ligands¶
Optional small-molecule ligands to co-fold with the proteins. Supported by Boltz-2, OpenFold 3, and Chai-1.
The form provides one ligand input with a Notation selector — pick CCD codes or SMILES strings and enter one ligand per line. Each notation is validated as you type; the first invalid line is reported inline. Submit can only carry one notation at a time, so if you need both standard cofactors and a novel small molecule, pick the notation that matches your less-trivial entries (typically SMILES for the novel ones).
CCD codes — three-letter Chemical Component Dictionary codes, e.g.
ATP,NAD,HEM. Glycans use their CCD codes here too (NAG,MAN,BMA); there is no separate glycan input. Each entry must be 1–3 alphanumeric characters.SMILES strings — arbitrary small molecules expressed as SMILES (e.g.
CCOfor ethanol). Live syntactic validation surfaces the first invalid line.
Multiple Sequence Alignment¶
The MSA Source selector controls how the multiple sequence alignment is supplied:
Source |
What happens |
When to use |
|---|---|---|
None |
No MSA is supplied. |
Default. Works with Auto (which will pick ESMFold for single-protein, no-MSA inputs), ESMFold, and AlphaFold 2 (which generates its own MSA from BV-BRC’s local databases). |
Precomputed MSA from Workspace |
A workspace file selector appears; pick a pre-computed |
Required for Boltz-2, OpenFold 3, and Chai-1. Generate the MSA elsewhere (ColabFold’s MMseqs2 server, JackHMMER, or the AlphaFold preprocessing pipeline) and upload the result to your workspace. |
Use MSA Server or Service |
BV-BRC computes the MSA with ColabFold (MMseqs2 against UniRef + ColabFoldDB) and feeds it to the selected engine. |
When you don’t have a pre-computed MSA on hand and the selected tool needs one (Boltz, OpenFold, Chai). Adds 30 s – 3 min to the job. |
Accepted formats for uploaded MSAs:
Extension |
Format |
Used by |
|---|---|---|
|
A3M (FASTA-like with lowercase inserts) |
Boltz, OpenFold, Chai (after conversion) |
|
Stockholm |
Boltz, OpenFold, Chai (after conversion) |
|
Parquet (Chai-native) |
Chai (no conversion) |
ESMFold ignores any MSA. AlphaFold 2 ignores uploaded MSAs and always builds its own from BV-BRC’s local databases.
Output¶
Every submission creates a job with the name you give it. A workspace object named after the job is created inside the Output Folder and holds all job-related info — parameters, status, logs, and the prediction results themselves. The full workspace path is shown in the form’s Result location bar:
<Output Folder>/<Job Name>
Output Folder¶
The workspace folder where the job will be created. Must already exist, or create one from the folder selector.
Job Name¶
Identifier for this run. Used as the workspace object name. Pre-filled with PredictStructure-<YYMMDD>-<HHMMSS> so a fresh form always has a unique default; replace it with something descriptive (e.g. crambin-esmfold) when it helps.
Output Results¶
Every job produces a normalized directory tree, regardless of the underlying engine. This makes downstream comparison and visualization possible without per-tool branching.
<job_name>/
├── results.json # CWL-style output manifest (paths + metadata)
├── report.html # Interactive HTML viewer (3Dmol.js)
├── inputs/
│ ├── query.fasta # Canonicalized input
│ ├── msa.a3m # MSA actually used (if any)
│ └── manifest.yaml
├── predictions/
│ ├── rank_1.pdb # Top-ranked structure
│ ├── rank_1.cif
│ ├── rank_2.pdb # ...if --num-samples > 1
│ └── ...
├── reports/
│ ├── confidence.json # Unified confidence schema
│ ├── plddt.csv # Per-residue pLDDT
│ └── pae.png # Predicted aligned error heatmap
├── metadata/
│ ├── tool.json # Tool name, version, parameters used
│ ├── runtime.json # Wall time, GPU model, peak memory
│ └── citations.bib
└── raw/ # Untouched tool-native output (for power users)
Confidence metrics¶
The unified confidence.json contains the metrics common across tools, with tool-specific extras passed through:
Metric |
Range |
Meaning |
|---|---|---|
|
0–100 |
Per-residue confidence averaged across the chain. >90 highly confident; 70–90 confident; 50–70 low; <50 disordered |
|
0–1 |
Predicted TM-score for monomer / multimer global fold |
|
0–1 |
Interface predicted TM-score (multi-chain / ligand interfaces) |
|
0–1 |
Aggregate ranking score (tool-specific weighting of pTM, ipTM, pLDDT) |
Visualizing the structure¶
Open report.html in the workspace viewer. The page embeds 3Dmol.js with chain-colored, pLDDT-colored, and confidence-colored representations, plus a draggable PAE heatmap.
Resource estimates¶
These are conservative ceilings the AppService uses for queue scheduling. Actual runtime is usually well below the cap.
Tool |
CPUs |
Memory |
Default time-out |
Typical 200 aa monomer |
|---|---|---|---|---|
Boltz-2 |
8 |
64 GB |
4 h |
1–3 min |
OpenFold 3 |
8 |
64 GB |
4 h |
2–5 min |
Chai-1 |
8 |
64 GB |
4 h |
2–5 min |
AlphaFold 2 |
16 |
96 GB |
8 h |
30–90 min (MSA + 5 models) |
ESMFold |
8 |
32 GB |
1 h |
20–60 s (GPU) / 2–6 min (CPU) |
References¶
Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). doi:10.1038/s41586-021-03819-2
Lin Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). doi:10.1126/science.ade2574
Abramson J et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). doi:10.1038/s41586-024-07487-w
Wohlwend J et al. Boltz-1: democratizing biomolecular interaction modeling. bioRxiv (2024). doi:10.1101/2024.11.19.624167
Passaro S et al. Boltz-2: towards accurate and efficient binding affinity prediction. bioRxiv (2025). PMC12262699
Chai Discovery. Chai-1: decoding the molecular interactions of life. bioRxiv (2024). doi:10.1101/2024.10.10.615955
Ahdritz G et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods 21, 1514–1524 (2024).
Mirdita M et al. ColabFold: making protein folding accessible to all. Nature Methods 19, 679–682 (2022).