BV-BRC Data Dictionary

  • antibiotics - Chemical, pharmacological and mechanistic details for antibacterial agents used in AMR analytics and clinical-isolate annotation.

  • bioset - Represents a contrast or set of statistically significant entities (genes, proteins, metabolites, etc.) derived from an experiment — typically a differential-expression list or time-point cluster.

  • bioset_result - Stores per-entity quantitative results (counts, log2 FC, p-values, z-scores…) that belong to a bioset so BV-BRC can render volcano plots, heat maps and enrichment tables.

  • enzyme_class_ref - Provides the authoritative mapping between EC numbers, their official textual descriptions, and corresponding GO terms, enabling enzyme look-ups and pathway annotation.

  • epitope - Captures experimentally verified immune epitopes mapped to BV-BRC proteins, including sequence, host context, assay counts and taxonomy.

  • epitope_assay - Holds detailed experimental assay records (B-cell, T-cell, MHC binding, etc.) linked to epitopes, with measurement values, host data and literature citations.

  • experiment - High-level metadata for functional-omics experiments (transcriptomics, proteomics, metabolomics, etc.), linking them to biosets, genomes, treatments and publications.

  • feature_sequence - Canonical raw sequence for every unique gene, RNA or protein encoded in BV-BRC genomes, keyed by MD5 for rapid identity checks and deduplication.

  • gene_ontology_ref - Maps GO identifiers to names, definitions and ontology namespaces so BV-BRC can annotate features and run GO-term enrichment.

  • genome - Rich sample, assembly and annotation metadata for every genome in BV-BRC, powering search, filters and downstream analyses across ∼1 M assemblies.

  • genome_amr - Antimicrobial-resistance phenotype and/or genotype evidence linked to each genome, including lab methods, computational predictions and literature support.

  • genome_feature - All annotated features—coding genes, RNAs, repeats—across BV-BRC genomes, with coordinates, functional annotation and family membership.

  • genome_sequence - Every nucleotide replicon (chromosome, plasmid, contig, viral segment, etc.) associated with a genome, with sequence, quality metrics and ACL information.

  • id_ref - Cross-walk table linking BV-BRC features to external identifier systems (NCBI GeneID, Ensembl, Pfam, etc.) for inter-database annotation and search.

  • misc_niaid_sgc - Tracks gene/protein targets selected by the NIAID Structural Genomics Centers, indicating clone/protein availability and project status.

  • pathway Maps - enzyme-encoding features in a genome to Pathway Tools / MetaCyc pathways, enabling metabolic reconstructions and presence/absence matrices.

  • pathway_ref - Master lookup linking each EC activity to its position on curated pathway diagrams plus global occurrence counts for enrichment statistics.

  • ppi - Pairwise protein–protein interaction metadata (detection methods, scores, literature) for network visualisation and comparative interactomics.

  • protein_feature - Domain, motif, signal-peptide and other protein-level annotations identified in CDS translations, with coordinates and statistical scores.

  • protein_family_ref - Lookup for every PATRIC protein-family accession (PGFam / PLFam) with descriptive name and namespace.

  • protein_structure - Metadata and file pointers for every PDB entry mapped to BV-BRC genomes, including experimental details and chain-to-gene mappings.

  • sequence_feature - Per-genome sequence features such as SNPs, indels or peptide motifs, linking them to canonical definitions, evidence codes and literature.

  • sequence_feature_vt - Tracks each specific variant form of a canonical sequence feature across genomes, recording prevalence and variant sequences.

  • serology - Laboratory serology results linked to host metadata and matching viral genomes, enabling antibody-prevalence and vaccine-impact studies.

  • sp_gene - Lists antimicrobial-resistance, virulence and other “specialty” genes detected in genomes, with evidence metrics and curated classifications.

  • sp_gene_ref - Master reference catalogue of specialty-gene alleles (AMR, virulence, toxins) with drug associations and literature support.

  • spike_lineage - Month-by-month statistics for SARS-CoV-2 Pango lineages (growth, prevalence, VOC status) and their defining spike mutations.

  • spike_variant - Monthly global statistics for notable SARS-CoV-2 spike-protein variants (D614G, etc.) including prevalence, growth rate and lineage diversity.

  • strain - Per-strain metadata for segmented viruses (esp. influenza), linking segment accessions, host/geographic info and genome records.

  • structured_assertion - Machine-readable statements about genomic features (e.g. “gene confers ciprofloxacin resistance”) with evidence codes and citations.

  • subsystem - Links individual genes in a genome to SEED subsystems and roles, enabling subsystem heat-maps and completeness metrics.

  • subsystem_ref - Reference hierarchy for SEED subsystems, including descriptions, literature and role lists.

  • surveillance - Rich epidemiological, clinical and environmental metadata linking genomes to patient or wildlife surveillance information.

  • taxonomy - NCBI lineage metadata plus genome-derived statistics (genome count, GC content, etc.) for every taxonomic node represented in BV-BRC.

  • enzyme_class_ref - Authoritative mapping between EC numbers, textual descriptions and GO terms for enzyme look-ups and pathway annotation.

  • pathway Maps - enzyme-encoding features to pathways enabling metabolic reconstructions.