BV-BRC Data Dictionary¶
antibiotics - Chemical, pharmacological and mechanistic details for antibacterial agents used in AMR analytics and clinical-isolate annotation.
bioset - Represents a contrast or set of statistically significant entities (genes, proteins, metabolites, etc.) derived from an experiment — typically a differential-expression list or time-point cluster.
bioset_result - Stores per-entity quantitative results (counts, log2 FC, p-values, z-scores…) that belong to a bioset so BV-BRC can render volcano plots, heat maps and enrichment tables.
enzyme_class_ref - Provides the authoritative mapping between EC numbers, their official textual descriptions, and corresponding GO terms, enabling enzyme look-ups and pathway annotation.
epitope - Captures experimentally verified immune epitopes mapped to BV-BRC proteins, including sequence, host context, assay counts and taxonomy.
epitope_assay - Holds detailed experimental assay records (B-cell, T-cell, MHC binding, etc.) linked to epitopes, with measurement values, host data and literature citations.
experiment - High-level metadata for functional-omics experiments (transcriptomics, proteomics, metabolomics, etc.), linking them to biosets, genomes, treatments and publications.
feature_sequence - Canonical raw sequence for every unique gene, RNA or protein encoded in BV-BRC genomes, keyed by MD5 for rapid identity checks and deduplication.
gene_ontology_ref - Maps GO identifiers to names, definitions and ontology namespaces so BV-BRC can annotate features and run GO-term enrichment.
genome - Rich sample, assembly and annotation metadata for every genome in BV-BRC, powering search, filters and downstream analyses across ∼1 M assemblies.
genome_amr - Antimicrobial-resistance phenotype and/or genotype evidence linked to each genome, including lab methods, computational predictions and literature support.
genome_feature - All annotated features—coding genes, RNAs, repeats—across BV-BRC genomes, with coordinates, functional annotation and family membership.
genome_sequence - Every nucleotide replicon (chromosome, plasmid, contig, viral segment, etc.) associated with a genome, with sequence, quality metrics and ACL information.
id_ref - Cross-walk table linking BV-BRC features to external identifier systems (NCBI GeneID, Ensembl, Pfam, etc.) for inter-database annotation and search.
misc_niaid_sgc - Tracks gene/protein targets selected by the NIAID Structural Genomics Centers, indicating clone/protein availability and project status.
pathway Maps - enzyme-encoding features in a genome to Pathway Tools / MetaCyc pathways, enabling metabolic reconstructions and presence/absence matrices.
pathway_ref - Master lookup linking each EC activity to its position on curated pathway diagrams plus global occurrence counts for enrichment statistics.
ppi - Pairwise protein–protein interaction metadata (detection methods, scores, literature) for network visualisation and comparative interactomics.
protein_feature - Domain, motif, signal-peptide and other protein-level annotations identified in CDS translations, with coordinates and statistical scores.
protein_family_ref - Lookup for every PATRIC protein-family accession (PGFam / PLFam) with descriptive name and namespace.
protein_structure - Metadata and file pointers for every PDB entry mapped to BV-BRC genomes, including experimental details and chain-to-gene mappings.
sequence_feature - Per-genome sequence features such as SNPs, indels or peptide motifs, linking them to canonical definitions, evidence codes and literature.
sequence_feature_vt - Tracks each specific variant form of a canonical sequence feature across genomes, recording prevalence and variant sequences.
serology - Laboratory serology results linked to host metadata and matching viral genomes, enabling antibody-prevalence and vaccine-impact studies.
sp_gene - Lists antimicrobial-resistance, virulence and other “specialty” genes detected in genomes, with evidence metrics and curated classifications.
sp_gene_ref - Master reference catalogue of specialty-gene alleles (AMR, virulence, toxins) with drug associations and literature support.
spike_lineage - Month-by-month statistics for SARS-CoV-2 Pango lineages (growth, prevalence, VOC status) and their defining spike mutations.
spike_variant - Monthly global statistics for notable SARS-CoV-2 spike-protein variants (D614G, etc.) including prevalence, growth rate and lineage diversity.
strain - Per-strain metadata for segmented viruses (esp. influenza), linking segment accessions, host/geographic info and genome records.
structured_assertion - Machine-readable statements about genomic features (e.g. “gene confers ciprofloxacin resistance”) with evidence codes and citations.
subsystem - Links individual genes in a genome to SEED subsystems and roles, enabling subsystem heat-maps and completeness metrics.
subsystem_ref - Reference hierarchy for SEED subsystems, including descriptions, literature and role lists.
surveillance - Rich epidemiological, clinical and environmental metadata linking genomes to patient or wildlife surveillance information.
taxonomy - NCBI lineage metadata plus genome-derived statistics (genome count, GC content, etc.) for every taxonomic node represented in BV-BRC.
enzyme_class_ref - Authoritative mapping between EC numbers, textual descriptions and GO terms for enzyme look-ups and pathway annotation.
pathway Maps - enzyme-encoding features to pathways enabling metabolic reconstructions.