Protein Structure Prediction Service

Overview

The Protein Structure Prediction Service predicts the 3D atomic structure of proteins, protein complexes, and protein–DNA / RNA / ligand assemblies from sequence input. It exposes five state-of-the-art folding engines through a single unified form:

Engine

Family

Best for

Boltz-2

Diffusion (co-folding)

Protein + nucleic-acid + ligand / SMILES; binding-affinity-aware

OpenFold 3

Diffusion (AF3-class, open)

Same scope as Boltz, fully open weights

Chai-1

Diffusion (AF3-class)

Multi-chain protein complexes, ligands by CCD code

AlphaFold 2

Co-evolutionary (MSA + Evoformer)

High-accuracy monomer / multimer when MSA is rich

ESMFold

Single-sequence (protein language model)

Fast, CPU-capable monomers; orphan / designed sequences

The service picks the best available engine automatically when Prediction Tool is left at Auto, or you can pin a specific one. Parameter mapping, format conversion (FASTA → YAML / JSON, mmCIF → PDB, A3M → Parquet), and confidence normalization are handled for you. The output is a ranked set of structures plus a unified confidence report.

See also

Using the Protein Structure Prediction Service

The Protein Structure Prediction submenu option under the Services main menu (Protein Tools category) opens the prediction input form. Note: You must be logged into BV-BRC to use this service.

Options

The form is organized as Prediction Tool, Biomolecular Inputs, Ligands, Multiple Sequence Alignment, and Output. Fields that are not relevant to the selected engine are still visible but ignored (and the help text notes which engines use them).

Prediction Tool

Choose the structure prediction engine. Leaving this at Auto lets the service pick the best available tool given your inputs.

Tool

Model/Version

Entity support

MSA handling

auto

(auto-select)

all

yes

boltz

Boltz-2

protein, DNA, RNA, CCD ligand, SMILES

Upload required

openfold

OpenFold 3

protein, DNA, RNA, CCD ligand, SMILES

Upload required

chai

Chai-1

protein, DNA, RNA, CCD ligand

Upload required

alphafold

AlphaFold 2

protein only

Built from BV-BRC’s local databases

esmfold

ESMFold

protein only

None (single-sequence)

Auto-select priority (when tool = auto): Boltz → OpenFold → Chai → ESMFold → AlphaFold. The selector inspects your other inputs and falls back as follows:

Protein

DNA / RNA / ligand / SMILES

MSA File

→ Auto picks

ESMFold (fast single-sequence); AlphaFold if ESMFold is unavailable

Boltz → OpenFold → Chai → ESMFold → AlphaFold

ERROR — diffusion tools need MSA; AF / ESMFold cannot use DNA / RNA / ligand

Boltz → OpenFold → Chai

DNA / RNA only

any

Boltz → OpenFold → Chai

Biomolecular Inputs

Supply at least one FASTA file with Protein, DNA, or RNA sequences. One file per sequence type. Files come from your workspace.

Protein

Protein sequence(s) in FASTA format (.fasta, .fa). For a multi-chain complex, place all chains in a single file — each > record is one chain. The service enforces a soft limit of 26 chains and 10,000 total residues per job. Boltz also accepts a native YAML manifest in place of FASTA for full feature support (custom constraints, covalent bonds, modified residues).

DNA

DNA sequence(s) in FASTA format. Used as co-folding partners with proteins. Tools that support DNA: Boltz-2, OpenFold 3, Chai-1. Ignored by AlphaFold 2 and ESMFold.

RNA

RNA sequence(s) in FASTA format. Same engine support as DNA.

Ligands

Optional small-molecule ligands to co-fold with the proteins. Supported by Boltz-2, OpenFold 3, and Chai-1.

The form provides one ligand input with a Notation selector — pick CCD codes or SMILES strings and enter one ligand per line. Each notation is validated as you type; the first invalid line is reported inline. Submit can only carry one notation at a time, so if you need both standard cofactors and a novel small molecule, pick the notation that matches your less-trivial entries (typically SMILES for the novel ones).

  • CCD codes — three-letter Chemical Component Dictionary codes, e.g. ATP, NAD, HEM. Glycans use their CCD codes here too (NAG, MAN, BMA); there is no separate glycan input. Each entry must be 1–3 alphanumeric characters.

  • SMILES strings — arbitrary small molecules expressed as SMILES (e.g. CCO for ethanol). Live syntactic validation surfaces the first invalid line.

Multiple Sequence Alignment

The MSA Source selector controls how the multiple sequence alignment is supplied:

Source

What happens

When to use

None

No MSA is supplied.

Default. Works with Auto (which will pick ESMFold for single-protein, no-MSA inputs), ESMFold, and AlphaFold 2 (which generates its own MSA from BV-BRC’s local databases).

Precomputed MSA from Workspace

A workspace file selector appears; pick a pre-computed .a3m, .sto, or .pqt file. The service uses it as-is.

Required for Boltz-2, OpenFold 3, and Chai-1. Generate the MSA elsewhere (ColabFold’s MMseqs2 server, JackHMMER, or the AlphaFold preprocessing pipeline) and upload the result to your workspace.

Use MSA Server or Service

BV-BRC computes the MSA with ColabFold (MMseqs2 against UniRef + ColabFoldDB) and feeds it to the selected engine.

When you don’t have a pre-computed MSA on hand and the selected tool needs one (Boltz, OpenFold, Chai). Adds 30 s – 3 min to the job.

Accepted formats for uploaded MSAs:

Extension

Format

Used by

.a3m

A3M (FASTA-like with lowercase inserts)

Boltz, OpenFold, Chai (after conversion)

.sto

Stockholm

Boltz, OpenFold, Chai (after conversion)

.pqt, .aligned.pqt

Parquet (Chai-native)

Chai (no conversion)

ESMFold ignores any MSA. AlphaFold 2 ignores uploaded MSAs and always builds its own from BV-BRC’s local databases.

Output

Every submission creates a job with the name you give it. A workspace object named after the job is created inside the Output Folder and holds all job-related info — parameters, status, logs, and the prediction results themselves. The full workspace path is shown in the form’s Result location bar:

<Output Folder>/<Job Name>

Output Folder

The workspace folder where the job will be created. Must already exist, or create one from the folder selector.

Job Name

Identifier for this run. Used as the workspace object name. Pre-filled with PredictStructure-<YYMMDD>-<HHMMSS> so a fresh form always has a unique default; replace it with something descriptive (e.g. crambin-esmfold) when it helps.

Output Results

Every job produces a normalized directory tree, regardless of the underlying engine. This makes downstream comparison and visualization possible without per-tool branching.

<job_name>/
├── results.json              # CWL-style output manifest (paths + metadata)
├── report.html               # Interactive HTML viewer (3Dmol.js)
├── inputs/
│   ├── query.fasta           # Canonicalized input
│   ├── msa.a3m               # MSA actually used (if any)
│   └── manifest.yaml
├── predictions/
│   ├── rank_1.pdb            # Top-ranked structure
│   ├── rank_1.cif
│   ├── rank_2.pdb            # ...if --num-samples > 1
│   └── ...
├── reports/
│   ├── confidence.json       # Unified confidence schema
│   ├── plddt.csv             # Per-residue pLDDT
│   └── pae.png               # Predicted aligned error heatmap
├── metadata/
│   ├── tool.json             # Tool name, version, parameters used
│   ├── runtime.json          # Wall time, GPU model, peak memory
│   └── citations.bib
└── raw/                      # Untouched tool-native output (for power users)

Confidence metrics

The unified confidence.json contains the metrics common across tools, with tool-specific extras passed through:

Metric

Range

Meaning

plddt_mean

0–100

Per-residue confidence averaged across the chain. >90 highly confident; 70–90 confident; 50–70 low; <50 disordered

ptm

0–1

Predicted TM-score for monomer / multimer global fold

iptm

0–1

Interface predicted TM-score (multi-chain / ligand interfaces)

model_confidence

0–1

Aggregate ranking score (tool-specific weighting of pTM, ipTM, pLDDT)

Visualizing the structure

Open report.html in the workspace viewer. The page embeds 3Dmol.js with chain-colored, pLDDT-colored, and confidence-colored representations, plus a draggable PAE heatmap.

Action buttons

After selecting an output file by clicking it, the right-hand Action Bar offers:

  • Hide / Show — toggles the Details Pane

  • Guide — link to this Quick Reference

  • Download — downloads the selected file

  • View — opens the file (PDB → 3Dmol viewer, HTML → rendered, JSON → text)

  • Delete / Rename / Copy / Move — standard workspace operations

Resource estimates

These are conservative ceilings the AppService uses for queue scheduling. Actual runtime is usually well below the cap.

Tool

CPUs

Memory

Default time-out

Typical 200 aa monomer

Boltz-2

8

64 GB

4 h

1–3 min

OpenFold 3

8

64 GB

4 h

2–5 min

Chai-1

8

64 GB

4 h

2–5 min

AlphaFold 2

16

96 GB

8 h

30–90 min (MSA + 5 models)

ESMFold

8

32 GB

1 h

20–60 s (GPU) / 2–6 min (CPU)

References

  • Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). doi:10.1038/s41586-021-03819-2

  • Lin Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). doi:10.1126/science.ade2574

  • Abramson J et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). doi:10.1038/s41586-024-07487-w

  • Wohlwend J et al. Boltz-1: democratizing biomolecular interaction modeling. bioRxiv (2024). doi:10.1101/2024.11.19.624167

  • Passaro S et al. Boltz-2: towards accurate and efficient binding affinity prediction. bioRxiv (2025). PMC12262699

  • Chai Discovery. Chai-1: decoding the molecular interactions of life. bioRxiv (2024). doi:10.1101/2024.10.10.615955

  • Ahdritz G et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods 21, 1514–1524 (2024).

  • Mirdita M et al. ColabFold: making protein folding accessible to all. Nature Methods 19, 679–682 (2022).