# Protein Structure Prediction Service ## Overview The Protein Structure Prediction Service predicts the 3D atomic structure of proteins, protein complexes, and protein–DNA / RNA / ligand assemblies from sequence input. It exposes five state-of-the-art folding engines through a single unified form: | Engine | Family | Best for | |---|---|---| | **Boltz-2** | Diffusion (co-folding) | Protein + nucleic-acid + ligand / SMILES; binding-affinity-aware | | **OpenFold 3** | Diffusion (AF3-class, open) | Same scope as Boltz, fully open weights | | **Chai-1** | Diffusion (AF3-class) | Multi-chain protein complexes, ligands by CCD code | | **AlphaFold 2** | Co-evolutionary (MSA + Evoformer) | High-accuracy monomer / multimer when MSA is rich | | **ESMFold** | Single-sequence (protein language model) | Fast, CPU-capable monomers; orphan / designed sequences | The service picks the best available engine automatically when `Prediction Tool` is left at **Auto**, or you can pin a specific one. Parameter mapping, format conversion (FASTA → YAML / JSON, mmCIF → PDB, A3M → Parquet), and confidence normalization are handled for you. The output is a ranked set of structures plus a unified confidence report. ## See also - [Protein Structure Prediction Service](https://bv-brc.org/app/PredictStructure) - [Protein Structure Prediction Tutorial](/tutorial/predict_structure/predict_structure) ## Using the Protein Structure Prediction Service The **Protein Structure Prediction** submenu option under the **Services** main menu (Protein Tools category) opens the prediction input form. *Note: You must be logged into BV-BRC to use this service.* ## Options The form is organized as **Prediction Tool**, **Biomolecular Inputs**, **Ligands**, **Multiple Sequence Alignment**, and **Output**. Fields that are not relevant to the selected engine are still visible but ignored (and the help text notes which engines use them). ## Prediction Tool Choose the structure prediction engine. Leaving this at **Auto** lets the service pick the best available tool given your inputs. | **Tool** | **Model/Version** | **Entity support** | **MSA handling** | |:---|:---|---|---| | `auto` | (auto-select) | all | yes | | `boltz` | Boltz-2 | protein, DNA, RNA, CCD ligand, SMILES | Upload required | | `openfold` | OpenFold 3 | protein, DNA, RNA, CCD ligand, SMILES | Upload required | | `chai` | Chai-1 | protein, DNA, RNA, CCD ligand | Upload required | | `alphafold` | AlphaFold 2 | protein only | Built from BV-BRC's local databases | | `esmfold` | ESMFold | protein only | None (single-sequence) | **Auto-select priority** (when `tool = auto`): Boltz → OpenFold → Chai → ESMFold → AlphaFold. The selector inspects your other inputs and falls back as follows: | Protein | DNA / RNA / ligand / SMILES | MSA File | → Auto picks | |:-:|:-:|:-:|---| | ✓ | — | — | ESMFold (fast single-sequence); AlphaFold if ESMFold is unavailable | | ✓ | — | ✓ | Boltz → OpenFold → Chai → ESMFold → AlphaFold | | ✓ | ✓ | — | **ERROR** — diffusion tools need MSA; AF / ESMFold cannot use DNA / RNA / ligand | | ✓ | ✓ | ✓ | Boltz → OpenFold → Chai | | — | DNA / RNA only | any | Boltz → OpenFold → Chai | ## Biomolecular Inputs Supply at least one FASTA file with **Protein**, **DNA**, or **RNA** sequences. One file per sequence type. Files come from your workspace. ### Protein Protein sequence(s) in FASTA format (`.fasta`, `.fa`). For a multi-chain complex, place all chains in a single file — each `>` record is one chain. The service enforces a soft limit of 26 chains and 10,000 total residues per job. Boltz also accepts a native YAML manifest in place of FASTA for full feature support (custom constraints, covalent bonds, modified residues). ### DNA DNA sequence(s) in FASTA format. Used as co-folding partners with proteins. **Tools that support DNA:** Boltz-2, OpenFold 3, Chai-1. Ignored by AlphaFold 2 and ESMFold. ### RNA RNA sequence(s) in FASTA format. Same engine support as DNA. ## Ligands Optional small-molecule ligands to co-fold with the proteins. Supported by Boltz-2, OpenFold 3, and Chai-1. The form provides one ligand input with a **Notation** selector — pick **CCD codes** or **SMILES strings** and enter one ligand per line. Each notation is validated as you type; the first invalid line is reported inline. Submit can only carry one notation at a time, so if you need both standard cofactors and a novel small molecule, pick the notation that matches your less-trivial entries (typically SMILES for the novel ones). - **CCD codes** — three-letter Chemical Component Dictionary codes, e.g. `ATP`, `NAD`, `HEM`. Glycans use their CCD codes here too (`NAG`, `MAN`, `BMA`); there is no separate glycan input. Each entry must be 1–3 alphanumeric characters. - **SMILES strings** — arbitrary small molecules expressed as SMILES (e.g. `CCO` for ethanol). Live syntactic validation surfaces the first invalid line. ## Multiple Sequence Alignment The **MSA Source** selector controls how the multiple sequence alignment is supplied: | Source | What happens | When to use | |---|---|---| | **None** | No MSA is supplied. | Default. Works with Auto (which will pick ESMFold for single-protein, no-MSA inputs), ESMFold, and AlphaFold 2 (which generates its own MSA from BV-BRC's local databases). | | **Precomputed MSA from Workspace** | A workspace file selector appears; pick a pre-computed `.a3m`, `.sto`, or `.pqt` file. The service uses it as-is. | Required for Boltz-2, OpenFold 3, and Chai-1. Generate the MSA elsewhere (ColabFold's MMseqs2 server, JackHMMER, or the AlphaFold preprocessing pipeline) and upload the result to your workspace. | | **Use MSA Server or Service** | BV-BRC computes the MSA with ColabFold (MMseqs2 against UniRef + ColabFoldDB) and feeds it to the selected engine. | When you don't have a pre-computed MSA on hand and the selected tool needs one (Boltz, OpenFold, Chai). Adds 30 s – 3 min to the job. | Accepted formats for uploaded MSAs: | Extension | Format | Used by | |---|---|---| | `.a3m` | A3M (FASTA-like with lowercase inserts) | Boltz, OpenFold, Chai (after conversion) | | `.sto` | Stockholm | Boltz, OpenFold, Chai (after conversion) | | `.pqt`, `.aligned.pqt` | Parquet (Chai-native) | Chai (no conversion) | ESMFold ignores any MSA. AlphaFold 2 ignores uploaded MSAs and always builds its own from BV-BRC's local databases. ## Output Every submission creates a **job** with the name you give it. A workspace object named after the job is created inside the Output Folder and holds all job-related info — parameters, status, logs, and the prediction results themselves. The full workspace path is shown in the form's *Result location* bar: ``` / ``` ### Output Folder The workspace folder where the job will be created. Must already exist, or create one from the folder selector. ### Job Name Identifier for this run. Used as the workspace object name. Pre-filled with `PredictStructure--` so a fresh form always has a unique default; replace it with something descriptive (e.g. `crambin-esmfold`) when it helps. ## Output Results Every job produces a normalized directory tree, regardless of the underlying engine. This makes downstream comparison and visualization possible without per-tool branching. ``` / ├── results.json # CWL-style output manifest (paths + metadata) ├── report.html # Interactive HTML viewer (3Dmol.js) ├── inputs/ │ ├── query.fasta # Canonicalized input │ ├── msa.a3m # MSA actually used (if any) │ └── manifest.yaml ├── predictions/ │ ├── rank_1.pdb # Top-ranked structure │ ├── rank_1.cif │ ├── rank_2.pdb # ...if --num-samples > 1 │ └── ... ├── reports/ │ ├── confidence.json # Unified confidence schema │ ├── plddt.csv # Per-residue pLDDT │ └── pae.png # Predicted aligned error heatmap ├── metadata/ │ ├── tool.json # Tool name, version, parameters used │ ├── runtime.json # Wall time, GPU model, peak memory │ └── citations.bib └── raw/ # Untouched tool-native output (for power users) ``` ### Confidence metrics The unified `confidence.json` contains the metrics common across tools, with tool-specific extras passed through: | Metric | Range | Meaning | |---|---|---| | `plddt_mean` | 0–100 | Per-residue confidence averaged across the chain. >90 highly confident; 70–90 confident; 50–70 low; <50 disordered | | `ptm` | 0–1 | Predicted TM-score for monomer / multimer global fold | | `iptm` | 0–1 | Interface predicted TM-score (multi-chain / ligand interfaces) | | `model_confidence` | 0–1 | Aggregate ranking score (tool-specific weighting of pTM, ipTM, pLDDT) | ### Visualizing the structure Open `report.html` in the workspace viewer. The page embeds [3Dmol.js](https://3dmol.csb.pitt.edu/) with chain-colored, pLDDT-colored, and confidence-colored representations, plus a draggable PAE heatmap. ### Action buttons After selecting an output file by clicking it, the right-hand Action Bar offers: - **Hide / Show** — toggles the Details Pane - **Guide** — link to this Quick Reference - **Download** — downloads the selected file - **View** — opens the file (PDB → 3Dmol viewer, HTML → rendered, JSON → text) - **Delete / Rename / Copy / Move** — standard workspace operations ## Resource estimates These are conservative ceilings the AppService uses for queue scheduling. Actual runtime is usually well below the cap. | Tool | CPUs | Memory | Default time-out | Typical 200 aa monomer | |---|---|---|---|---| | Boltz-2 | 8 | 64 GB | 4 h | 1–3 min | | OpenFold 3 | 8 | 64 GB | 4 h | 2–5 min | | Chai-1 | 8 | 64 GB | 4 h | 2–5 min | | AlphaFold 2 | 16 | 96 GB | 8 h | 30–90 min (MSA + 5 models) | | ESMFold | 8 | 32 GB | 1 h | 20–60 s (GPU) / 2–6 min (CPU) | ## References - Jumper J et al. **Highly accurate protein structure prediction with AlphaFold.** *Nature* 596, 583–589 (2021). [doi:10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2) - Lin Z et al. **Evolutionary-scale prediction of atomic-level protein structure with a language model.** *Science* 379, 1123–1130 (2023). [doi:10.1126/science.ade2574](https://doi.org/10.1126/science.ade2574) - Abramson J et al. **Accurate structure prediction of biomolecular interactions with AlphaFold 3.** *Nature* 630, 493–500 (2024). [doi:10.1038/s41586-024-07487-w](https://doi.org/10.1038/s41586-024-07487-w) - Wohlwend J et al. **Boltz-1: democratizing biomolecular interaction modeling.** bioRxiv (2024). [doi:10.1101/2024.11.19.624167](https://doi.org/10.1101/2024.11.19.624167) - Passaro S et al. **Boltz-2: towards accurate and efficient binding affinity prediction.** bioRxiv (2025). [PMC12262699](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12262699/) - Chai Discovery. **Chai-1: decoding the molecular interactions of life.** bioRxiv (2024). [doi:10.1101/2024.10.10.615955](https://doi.org/10.1101/2024.10.10.615955) - Ahdritz G et al. **OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.** *Nature Methods* 21, 1514–1524 (2024). - Mirdita M et al. **ColabFold: making protein folding accessible to all.** *Nature Methods* 19, 679–682 (2022).