Genome Sequence Data
Contig, chromosome, and plasmid nucleotide sequences for a genome
Data Type: genome_sequence
Primary Key: sequence_id
Attributes
-
_version_
(number)
-
accession
(string)
- GenBank/RefSeq accession (with or without version) for this replicon; authoritative link to external databases. "NC_000913.3"
-
chromosome
(case insensitive string)
- Label for chromosomal replicons; may be empty for plasmids or viral segments. "Chromosome"
-
date_inserted
(date)
- ISO-8601 UTC datetime when the sequence row entered BV-BRC. "2023-05-14T12:33:21Z"
-
date_modified
(date)
- Updated whenever any field in the row changes; enables cache invalidation and incremental syncs. "2025-04-02T08:11:05Z"
-
description
(case insensitive string)
- Full DEFINITION from the GenBank record or submitter-supplied description. "Escherichia coli K-12 MG1655 complete genome"
-
gc_content
(number)
- 100 × (G + C) / length, stored as float with one decimal. 50.8
-
genome_id
(string)
- Stable BV-BRC genome_id that owns this sequence; foreign-key join target. "511145.183"
-
genome_name
(case insensitive string)
- Scientific name + strain for convenience display; denormalised from the genome record. "Escherichia coli K-12 MG1655"
-
gi
(integer)
- Historic GenInfo Identifier; null for accessions created after NCBI retired GIs (2016). 556503834
-
length
(integer)
- Total number of nucleotides in the replicon. 4 641 652
-
mol_type
(case insensitive string)
- Values such as genomic DNA, viral cRNA, plasmid DNA. Mirrors the GenBank MOL_TYPE qualifier. "genomic DNA"
-
owner
(string)
- BV-BRC user/org that controls the record; governs default ACLs. "patric_public"
-
p2_sequence_id
(integer)
- Numeric key from retired PATRIC2 schema; retained for cross-reference. 1234567
-
plasmid
(case insensitive string)
- Name/ID of plasmid when sequence_type = plasmid. "pO157"
-
public
(boolean)
- true → sequence is accessible to all users; false → restricted to workspace owners. True
-
release_date
(date)
- Date the sequence became publicly available in BV-BRC (often matches GenBank release). "1997-09-05T00:00:00Z"
-
segment
(case insensitive string)
- Segment label for segmented viruses (PB2, HA, S, L etc.); empty for non-segmented genomes. "HA"
-
sequence
()
- Complete FASTA string (A,C,G,T,N) stored in compressed form; served on demand for downloads and BLAST. "ATGAC...TAA"
-
sequence_id
*
(string)
- BV-BRC unique identifier for the sequence row; usually equals accession but guaranteed unique even for private drafts. "NC_000913.3"
-
sequence_md5
(string)
- MD5 hash of the raw sequence; enables rapid identity checks and deduplication. "b3b2a5b1d5fbb5e5e3d5e5b1d5fbb5e5"
-
sequence_status
(string)
- Controlled terms: complete, partial, draft, degapped; guides quality filters in UI. "complete"
-
sequence_type
(case insensitive string)
- chromosome, plasmid, contig, scaffold, segment, etc.; determines iconography and default ordering. "chromosome"
-
taxon_id
(integer)
- Numeric taxon identifier inherited from parent genome; used for taxon-scoped searches. 562
-
topology
(case insensitive string)
- circular or linear; affects downstream tools like GC-skew plots. "circular"
-
user_read
(array of strings)
- BV-BRC user/org IDs allowed to view the record (includes "public" for public data). [ "public" ]
-
user_write
(array of strings)
- User/org IDs with edit rights to the record. [ "maulik@bvbrc.org" ]
-
version
(integer)
- Numeric suffix from INSDC accessions (NC_000913.**3** → 3); enables tracking of updated sequences. 3
API
GET :sequence_id
Retrieve a genome_sequence data object by sequence_id
EXAMPLE
https://www.bv-brc.org/api/genome_sequence/170673.13.con.0100
Try It!
QUERY :query
Query for genome_sequence data objects with an RQL Query
Return Formats
Requests may include an HTTP ACCEPT header from this list to transform the data into the requested type.
-
application/json - Returns results as an array of JSON objects
-
application/solr+json - Results results in SOLR JSON response format
-
text/csv - Returns results in Comma Separated values (CSV) format. Columns are separated by ','. Multi-value columns are separated by ';'. Rows are separated by new line
-
text/tsv - Returns results in Tab Separated values (TSV) format. Columns are separated by a tab. Multi-value columns are separated by ';'. Rows are separated by new line
-
application/vnd.openxmlformats - Returns objects as an MS Excel document
-
application/dna+fasta - Returns DNA sequences for queries in FASTA format
-
application/dna+jsonh+fasta - Returns DNA sequences for queries in JSONH-FASTA format
-
application/sralign+dna+fasta - Returns DNA sequences aligned from SRA data in FASTA format
EXAMPLES
- Query for genome_sequence data objects with a sequence_id equal to 170673.13.con.0100. Return results as a JSON Array.
https://www.bv-brc.org/api/genome_sequence/?eq(sequence_id,170673.13.con.0100)
Try It!
- Query for genome sequences for genome 1765.317, limit to 5 sequences. Return DNA FASTA.
https://www.bv-brc.org/api/genome_sequence/?eq(genome_id,1765.317)&limit(5)&http_accept=application/dna+fasta
Try It!