< API Documentation Home

Genome Feature Data

Data Type: genome_feature

Primary Key: feature_id

Attributes

aa_length (integer) - Number of amino-acid residues in the translated product of a CDS. Used in quality checks, alignment trimming, and size filters. 328
aa_sequence_md5 (string) - 32-char MD5 hash for the amino-acid sequence; enables rapid duplicate detection without shipping the full sequence. "d41d8cd98f00b204e9800998ecf8427e"
accession (string) - GenBank/RefSeq accession of the replicon on which the feature resides. "NC_000913.3"
alt_locus_tag (string) - Historic or secondary locus tag from earlier annotations; improves cross-version mapping. "b0001_old"
annotation (string) - Label such as PATRIC, RASTtk, RefSeq; lets users compare pipelines. "PATRIC"
brc_id (string) - Internal monotonically increasing integer (string-typed) uniquely identifying the feature across clusters. "126547189"
classifier_round (integer) - Training round number used to generate the current classifier_score; supports auditing. 3
classifier_score (number) - Confidence (0–1) that the CDS is a true protein-coding gene according to BV-BRC’s machine-learning model. 0.97
codon_start (integer) - 1, 2 or 3 value sent to GenBank / tbl2asn to mark the first coding frame relative to feature start. 1
date_inserted (date) - ISO-8601 UTC date when the feature row entered BV-BRC. "2023-04-12T15:41:22Z"
date_modified (date) - Updated any time an attribute (location, product name, etc.) changes; drives incremental exports. "2025-05-03T07:18:51Z"
end (integer) - 1-based inclusive coordinate on the replicon. 40365
feature_id * (string) - Stable identifier (`fig taxon.replicon.peg.Nfor CDS;.rna., .repeat.` etc. for others).
feature_type (string) - Controlled values: CDS, rRNA, tRNA, misc_feature, repeat_region, etc. "CDS"
figfam_id (string) - Protein family ID from legacy FIGfam scheme; enables compatibility with older PATRIC tools. "FIG00012345"
gene (case insensitive string) - Short locus name (recA, rpoB). Case-insensitive for search. "recA"
gene_id (number) - Numeric GeneID from NCBI Gene when available. 947015
genome_id (string) - Foreign key linking to the genome metadata record. "511145.183"
genome_name (case insensitive string) - De-normalised for quick display. "Escherichia coli K-12 MG1655"
go (array of case insensitive strings) - List of Gene Ontology identifiers assigned to the protein; supports functional enrichment. [ "GO:0003677", "GO:0006310" ]
location (string) - Concise PATRIC location format contig_start+len or contig_start-len (strand encoded by sign). "NC_000913.3_190..1188"
na_length (integer) - Feature span in nucleotides. 999
na_sequence_md5 (string) - Hash of the genomic DNA sequence underlying the feature; speeds up variant detection. "0cc175b9c0f1b6a831c399e269772661"
notes (array of strings) - Free-text remarks (e.g. “pseudogene fragment”, “frameshift at pos 678”). [ "frameshift at 675-677" ]
og_id (string) - Pan-domain orthologous group (e.g. eggNOG) used for cross-kingdom analyses. "COG0468"
owner (string) - BV-BRC user or group that controls write access. "patric_public"
p2_feature_id (number) - Numeric key from retired schema; retained for backward compatibility. 21987654
patric_id (string) - Historical alias identical to feature_id for CDS; kept to avoid breaking old APIs. `"fig
pdb_accession (array of strings) - List of matching PDB entries for the protein. [ "1A2B", "6VSB" ]
pgfam_id (string) - Global protein family ID (cross-genus, length-normalized clustering). "PGF_00001234"
plfam_id (string) - Local (within-genus) protein family ID; finer granularity than PGFam. "PLF_1234567"
prediction_method (string)
product (case insensitive string) - Curated or predicted functional description; shown in browsers and BLAST. "DNA repair protein RecA"
property (array of strings) - Extra structured qualifiers (e.g. signal_peptide, transmembrane). [ "transmembrane", "lipoprotein" ]
protein_id (string) - INSDC / RefSeq protein accession for the translated product. "WP_000011355.1"
public (boolean) - true if the feature is visible to all users; false for private genomes. True
refseq_locus_tag (string) - Official NCBI locus tag for RefSeq annotation sets. "ECOLI_RS00001"
segments (array of strings) - For spliced genes: array of start..end regions; used by GFF exporters. [ "190..350", "500..1188" ]
sequence_id (string) - Links to genome_sequence.sequence_id (chromosome, plasmid, or viral segment). "NC_000913.3"
sog_id (string) - Finer split of og_id based on bidirectional best hits; helps detect recent duplications. "SOG_987654"
start (integer) - 1-based inclusive start coordinate. 190
strand (string) - "+" or "-"; required for translation and plotting. "+"
taxon_id (integer) - Taxon of the parent genome; used for taxon-restricted searches. 562
uniprotkb_accession (string) - Primary accession mapping to UniProt for functional & structural metadata. "P0A7V8"
user_read (array of strings) - BV-BRC user/org IDs allowed to view the record (includes "public" for open data). [ "public" ]
user_write (array of strings) - User/org IDs permitted to edit the record. [ "maulik@bvbrc.org" ]

API

GET :feature_id

Retrieve a genome_feature data object by feature_id

EXAMPLE

https://www.bv-brc.org/api/genome_feature/RefSeq.1001732.3.AKUQ01000008.CDS.655540.656001.rev

Try It!

QUERY :query

Query for genome_feature data objects with an RQL Query

Return Formats

Requests may include an HTTP ACCEPT header from this list to transform the data into the requested type.

application/json - Returns results as an array of JSON objects
application/solr+json - Results results in SOLR JSON response format
text/csv - Returns results in Comma Separated values (CSV) format. Columns are separated by ','. Multi-value columns are separated by ';'. Rows are separated by new line
text/tsv - Returns results in Tab Separated values (TSV) format. Columns are separated by a tab. Multi-value columns are separated by ';'. Rows are separated by new line
application/vnd.openxmlformats - Returns objects as an MS Excel document
application/dna+fasta - Returns DNA sequences for queries in FASTA format
application/protein+fasta - Returns Protein sequences for queries in FASTA format
application/dna+jsonh+fasta - Returns DNA sequences for queries in JSONH-FASTA format
application/protein+jsonh+fasta - Returns Protein sequences for queries in JSONH-FASTA format
application/gff - Returns a genomic features in GFF format

EXAMPLES

Query for genome_feature data objects with a feature_id equal to RefSeq.1001732.3.AKUQ01000008.CDS.655540.656001.rev. Return results as a JSON Array.
```
https://www.bv-brc.org/api/genome_feature/?eq(feature_id,RefSeq.1001732.3.AKUQ01000008.CDS.655540.656001.rev)
```
Try It!
Query for genome features for genome 90370.851, limit to 5 sequences. Return JSON data.
```
https://www.bv-brc.org/api/genome_feature/?eq(genome_id,90370.851)&limit(5)
```
Try It!

Query for genome features for genome 90370.851 with PATRIC Annotation, limit to 5 sequences. Return DNA Fasta.

https://www.bv-brc.org/api/genome_feature/?and(eq(annotation,PATRIC),eq(genome_id,90370.851))&limit(5)&http_accept=application/dna+fasta

Try It!