.. _cli-getting-started: ========================================== Using the BV-BRC Command-line Interface ========================================== `BV-BRC `_ is an integration of different types of data and software tools that support research on bacterial pathogens. The typical biologist seeking access to the BV-BRC data and tools will usually explore the web-based user interface. However, there are many instances in which programatic or command-line interfaces are more suitable. For users that wish command-line access to BV-BRC, we provide the tools described in this document. We call these tools the *P3-scripts*. They are intended to run on your machine, going over the network to access the services provided by BV-BRC. Document Conventions -------------------- In this document, we observe the following conventions. Text that you enter or type is shown in a white-background box. :: This is input. Output is shown in a yellow-background box. In general, you will only see the top portion of the output, since the whole thing could be quite large. :: This is the top portion of the output. Output is usually tab-delimited, and you will see columns separated by multiple spaces that don't always line up. If it is necessary to show multiple excerpts of a single large output stream, the missing parts will be shown with a gray bar. :: This is the top part.   This is somewhere in the middle. NOTE: we add new genomes to the BV-BRC database every week. Your results from the examples in this tutorial may not match ours. Installing the CLI Release ~~~~~~~~~~~~~~~~~~~~~~~~~~ Since the CLI tools run on your computer, to use them you will need to download and install a software package in order to use them. We currently have macOS and Debian/Ubuntu releases of the BV-BRC Command Line Interface. A Windows version is in the works. The releases are available at the `BV-BRC github site `_. Full installation installations are available in :ref:`cli-installation`. Command-Line Help ~~~~~~~~~~~~~~~~~ You can specify ``--help`` as a command-line option on any command to get a summary of the options and parameters, for example :: p3-match --help :: p3-match.pl [-bchiv] [long options...] match-value -i STR --input STR name of the input file (if not the standard input) -c STR --col STR column number (1-based) or name -b INT --batchSize INT input batch size --nohead file has no headers -v --invert --reverse output non-matching records --discards STR name of file to contain discarded records -h --help display usage information The BV-BRC Database ------------------- The main BV-BRC database is organized as a series of large, heavily-indexed relational tables. From the perspective of the CLI, there are five main tables representing objects of interest, connected by four relationships. .. figure:: images/P3DB.JPG :align: center :alt: The five entities are as follows. Genome A genome is a set of contigs and annotations representing our best estimate of the DNA sequence for an organism. Use **p3-all-genomes** to list all of the genomes or a subset. Given a list of genomes, - Use :ref:`cli::p3-get-genome-data` to retrieve data about the individual genomes. - Use :ref:`cli::p3-get-genome-features` to access the features of the genomes. - Use :ref:`cli::p3-get-genome-contigs` to access the genomes' sequences. - Use :ref:`cli::p3-get-genome-drugs` to access drug resistance data about the genomes. Fields from the Genome table appear in the output with a heading prefix of ``genome``. Thus, the *genome\_name* will be in a column named ``genome.genome_name``. Contig Represents one of the DNA sequences that comprise a genome. A contig can be a chromosome, a plasmid, or a fragment thereof. Contig data can be accessed from genome IDs using :ref:`cli::p3-get-genome-contigs`. Fields from the Contig table appear in the output with a heading prefix of ``contig``. Thus, the *length* will be in a column named ``contig.length``. Drug Represents an antimicrobial drug used for therapeutic treatment. This table is the anchor for all antimicrobial resistance data in BV-BRC. Use :ref:`cli::p3-all-drugs` to get a list of drugs. Use :ref:`cli::p3-get-drug-genomes` to get resistance data relating to specific drugs from a list. Fields from the Drug table appear in the output with a heading prefix of ``drug``. Thus, the *molecular\_formula* will be in a column named ``drug.molecular_formula``. Feature Represents a region of interest in a genome. This could be a CRISPR array, an RNA site, a protein encoding region, or a regulatory site, among others. A feature can be split across multiple regions, or even multiple contigs, but never multiple genomes. Given a list of genome IDs, use :ref:`cli::p3-get-genome-features` to access the features in the genomes. Given a list of family IDs, use :ref:`cli::p3-get-family-features` to access the features in the families. Given a list of feature IDs, use :ref:`cli::p3-get-feature-data` to access data about those features. **It is important to remember that the ID of the feature is called *patric\_id*, not *feature\_id*.** The internal feature ID is a long string with a lot of data packed into it that may change if the genome is re-annotated (e.g. ``PATRIC.269798.23.NC_008255.CDS.22581.24344.fwd``). The *patric\_id* value is shorter and more consistent (``fig|269798.23.peg.22``). Fields from the Feature table appear in the output with a heading prefix of ``feature``. Thus, the *location* will be in a column named ``feature.location``. Family Represents a protein family, which is a set of features believed to be isofunctional homologs. Given a list of family IDs, use :ref:`cli::p3-get-family-features` to get data about the features in the families or :ref:`cli::p3-get-family-data` to get data about the families themselves. Given a list of feature IDs, use :ref:`cli::p3-get-feature-data` to get the families to which the features belong. There are three types of protein families supported-- *local* families which are confined to one genus, *global* families which cross the entire database, and *figfams*, which are computed using a different method. Fields from the Family table appear in the output with a heading prefix of ``family``. Thus, the *product* will be in a column named ``family.product``. Files and Pipelines ------------------- The BV-BRC CLI operates on tab-delimited files. That is, each record is divided into fields or columns separated by tab characters. The first record in each file contains the name of each column. Typically, a column name consists of a record name, a dot, and a field name. For example, the following file fragment contains a column from the genome table followed by two columns from the feature table. :: genome.genome_id feature.patric_id feature.product 670.470 fig|670.470.repeat.1 repeat region 670.470 fig|670.470.repeat.2 repeat region 670.470 fig|670.470.repeat.3 repeat region 670.470 fig|670.470.rna.1 tRNA-Ala 670.470 fig|670.470.rna.2 tRNA-Ile 670.470 fig|670.470.repeat.4 repeat region 670.470 fig|670.470.rna.3 16S ribosomal RNA 670.470 fig|670.470.repeat.5 repeat region 670.470 fig|670.470.rna.4 tRNA-Val 670.470 fig|670.470.rna.5 tRNA-Ala 670.470 fig|670.470.repeat.6 repeat region The scripts are designed so they can be chained together in pipelines where the output of one becomes input to the next. For example, the above file was generated by the pipeline :: p3-all-genomes --eq genus,Methylobacillus | p3-get-genome-features --attr patric_id --attr product In the first command of this pipeline, the ``--eq`` command-line option was used to filter a query, while the ``--attr`` option in the second command was used to specify the output columns and the order in which they appear. These options are available on all of the database scripts. For get-type scripts (:ref:`cli::p3-get-genome-data`, :ref:`cli::p3-get-feature-data`, ...), you must supply the id of the object of interest, e.g., the genome id, feature id, etc. By default, the last column in the input file is used as the key field for these get-type scripts. You can modify this behavior using the ``--col`` command-line option. The special value ``0`` denotes the last column, but you can also use a 1-based column number (``1`` for the first, ``2`` for the second) or a column name. If the field-name portion of the column name is unique, you can leave off the table-name portion. So, if you want to get location information from the features output by the pipeline above (identified in column ``feature.patric_id``, which is the second one), you could use any of the three following commands :: p3-get-feature-data --col=feature.patric_id --attr sequence_id --attr location `_ and register now. Now that you have a working user name and password, you can use the :ref:`cli::p3-login` script to tell the CLI who you are. For example, if your name is ``rastuser25``, you would type :: p3-login rastuser25 The script asks you for your password and places a special file on your hard drive that can be used to get authorized access to your workspace data. To log out again, simply use :: p3-login --logout At any time, you can verify your login status using :: p3-login --status If you are logged out, it will respond :: You are currently logged out of BV-BRC. If you are logged in, you will get something like :: You are logged in as rastuser25@patricbrc.org. Working with Genome Groups ~~~~~~~~~~~~~~~~~~~~~~~~~~ Your workspace looks like a full-blown file system, but there are three special folders. - **Genome Groups** contains named lists of genomes. - **Feature Groups** contains named lists of features. - **QuickData** contains folders full of genomes you submitted through the CLI annotation interface. To create a genome group, you use :ref:`cli::p3-put-genome-group`. Say, for example, you want to examine Streptococcus penumoniae genomes that are resistant to penicillin. The following query command will return this list of genomes (we will discuss all query commands in more details later). :: p3-echo -t antibiotic penicillin | p3-get-drug-genomes --eq "genome_name,Streptococcus pneumoniae" --resistant --attr genome_id --attr genome_name >resist.tbl This particular command asks for data from the anti-microbial resistance table. Each record in this table posits a relationship between a genome and an antibiotic drug. We are accessing the table from the direction of taking a drug and finding resistant genomes. To do this, we need a file with a drug name in it. The :ref:`cli::p3-echo` command creates this file: the ``-t antibiotic`` parameter tells it we want a one-column file with a column header of ``antibiotic``. We put the single record ``penicillin`` in that column. The antibiotic file is then piped into :ref:`cli::p3-get-drug-genomes`. Its parameters do the following. ``--eq "genome_name,Streptococcus pneumoniae"`` Only include records for Streptococcus pneumoniae genomes. Because this is a string field, it does a substring match. A genome name including follow-on strain information (e.g. ``Streptococcus pneumoniae strain LMG2888``) will still match. ``--resistant`` Only include records that state the genome is resistant. This is a special parameter for the :ref:`cli::p3-get-drug-genomes` and :ref:`cli::p3-get-genome-drugs` commands that is provided for convenience. ``--attr genome_id`` Output the genome ID. ``--attr genome_name`` Output the genome name. When the command completes, the file **resist.tbl** will contain around 114 lines beginning with the following. :: antibiotic genome_drug.genome_id genome_drug.genome_name penicillin 1313.7006 Streptococcus pneumoniae P310010-154 penicillin 1313.7016 Streptococcus pneumoniae P310937-212 penicillin 1313.7018 Streptococcus pneumoniae P311313-217 penicillin 760749.3 Streptococcus pneumoniae GA05248 penicillin 760763.3 Streptococcus pneumoniae GA11304 penicillin 760765.3 Streptococcus pneumoniae GA11663 penicillin 760766.3 Streptococcus pneumoniae GA11856 penicillin 760769.3 Streptococcus pneumoniae GA13338 penicillin 760771.3 Streptococcus pneumoniae GA13455 penicillin 760776.3 Streptococcus pneumoniae GA14373 penicillin 760777.3 Streptococcus pneumoniae GA14688 Now we want to create a group for these genomes called **resist\_strep**. We use the following command. :: p3-put-genome-group --col=2 resist_strep weak.tbl Now we pipe in *resist\_strep* (the interesting set) and specify *weak.tbl* as the source of group 2. :: p3-get-genome-group resist_strep | p3-signature-families --gs2=weak.tbl >families.tbl The output contains protein families that are common in the interesting set (resistant to penicillin) but not in the other set. If a set file is not specified, it is taken from the standard input. In this case, that would be the interesting set, since there is no **--gs1** parameter. Our signature families analysis script has no output, because we redirected it to *families.tbl*. We can peek at the results using the ``--all`` option of :ref:`cli::p3-extract`. :: p3-extract --all `_ and supported by most programming languages. Without even fully understanding the notation, you can still see in the above listing that various bits of key metadata (scientific name, taxonomy, ID) are present in the file, along with the ID and sequence of each contig and various pertinent data about each feature. You can use a minus sign (``-``) in the parameter list to specify that the genome list come from the standard input. The following creates GTOs for every genome in the genome group *weak\_strep*. :: p3-get-genome-group weak_strep | p3-gto - This capability can be mixed with explicit genome IDs. So the following script creates a GTO for *594.8*, all of the genomes in group *weak\_strep*, and then genome *149539.441*. :: p3-get-genome-group weak_strep | p3-gto 594.8 - 149539.441 You can also use the ``--outDir`` option to specify that the output be put in a different directory. The following creates a new subdirectory **PathogenGTO** in the current directory and puts all the GTOs in it. :: p3-get-genome-group weak_strep | p3-gto --outDir=PathogenGTO 594.8 - 149539.441 You are not required to write code to manipulate GTOs. Instead, we've included some useful scripts in the BV-BRC CLI. First and foremost is :ref:`cli::p3-gto-scan`. For example, if you run :: p3-gto-scan 1313.7001.gto you would see the following analysis :: Processing contigs of 1313.7001.gto. Processing features of 1313.7001.gto. All done. contigs 52 dna 2101113 features 3382 functionAnalyzed 1418 functionRead 3382 functionReused 1964 roleMatch 1492 roleProcessed 3478 This rather arcane output tells you several things. First, that there are 52 contigs and 2,101,113 base pairs in the genome. It has 3382 features containing 1418 distinct assigned functions (*functionAnalyzed*). 3382 features had assigned functions (*functionRead*). This means every feature had a valid functional assignment, which is usually the case. 1964 of the features had redundant functions, that is, functions also found earlier in the genome (*functionReused*). 3478 roles were found (*roleProcessed*) of which 1492 were distinct (*roleMatch*). If you want to see the actual roles, specify the command-line option ``--verbose``. :: p3-gto-scan --verbose 1313.7001.gto :: Role name 1313.7001.gto (2E,6E)-farnesyl diphosphate synthase (EC 2.5.1.10) 1 1,2-diacylglycerol 3-glucosyltransferase (EC 2.4.1.337) 1 1,4-alpha-glucan (glycogen) branching enzyme, GH-13-type (EC 2.4.1.18) 1 1-phosphofructokinase (EC 2.7.1.56) 1 16S rRNA (cytidine(1402)-2'-O)-methyltransferase (EC 2.1.1.198) 1 16S rRNA (cytosine(1402)-N(4))-methyltransferase EC 2.1.1.199) 1 16S rRNA (cytosine(967)-C(5))-methyltransferase (EC 2.1.1.176) 1 16S rRNA (guanine(1207)-N(2))-methyltransferase (EC 2.1.1.172) 1 16S rRNA (guanine(527)-N(7))-methyltransferase (EC 2.1.1.170) 1 16S rRNA (guanine(966)-N(2))-methyltransferase (EC 2.1.1.171) 1 16S rRNA (uracil(1498)-N(3))-methyltransferase (EC 2.1.1.193) 1 In this table, each role name is shown along with the number of times it occurs in the genome. You can see the features as well by adding the ``--features`` command line. :: p3-gto-scan --verbose --features 1313.7001.gto :: Role name 1313.7001.gto Features containing role (2E,6E)-farnesyl diphosphate synthase (EC 2.5.1.10) 1 fig|1313.7001.peg.1606 1,2-diacylglycerol 3-glucosyltransferase (EC 2.4.1.337) 1 fig|1313.7001.peg.679 1,4-alpha-glucan (glycogen) branching enzyme, GH-13-type (EC 2.4.1.18) 1 fig|1313.7001.peg.595 1-phosphofructokinase (EC 2.7.1.56) 1 fig|1313.7001.peg.227 16S rRNA (cytidine(1402)-2'-O)-methyltransferase (EC 2.1.1.198) 1 fig|1313.7001.peg.1813 16S rRNA (cytosine(1402)-N(4))-methyltransferase EC 2.1.1.199) 1 fig|1313.7001.peg.503 16S rRNA (cytosine(967)-C(5))-methyltransferase (EC 2.1.1.176) 1 fig|1313.7001.peg.1301 16S rRNA (guanine(1207)-N(2))-methyltransferase (EC 2.1.1.172) 1 fig|1313.7001.peg.83 16S rRNA (guanine(527)-N(7))-methyltransferase (EC 2.1.1.170) 1 fig|1313.7001.peg.1682 16S rRNA (guanine(966)-N(2))-methyltransferase (EC 2.1.1.171) 1 fig|1313.7001.peg.145 16S rRNA (uracil(1498)-N(3))-methyltransferase (EC 2.1.1.193) 1 fig|1313.7001.peg.728 Later on in this file you can see an example of a role that occurs in multiple features. You will note that a double colon (``::``) is used to separate the individual feature IDs. :: 6-phospho-beta-galactosidase (EC 3.2.1.85) 1 fig|1313.7001.peg.607 6-phospho-beta-glucosidase (EC 3.2.1.86) 4 fig|1313.7001.peg.1031::fig|1313.7001.peg.1517::fig|1313.7001.peg.443::fig|1313.7001.peg.896 6-phosphofructokinase (EC 2.7.1.11) 1 fig|1313.7001.peg.1372 6-phosphogluconate dehydrogenase, decarboxylating (EC 1.1.1.44) 1 fig|1313.7001.peg.542 This is a common convention in the BV-BRC CLI-- when a single column contains multiple values, we use a double colon to separate them. You can use the ``--delim`` option to change this default. Supported alternate delimiters include ``space``, ``tab``, and ``comma``. For example, the following would show if you coded ``--delim=space``. :: 6-phospho-beta-galactosidase (EC 3.2.1.85) 1 fig|1313.7001.peg.607 6-phospho-beta-glucosidase (EC 3.2.1.86) 4 fig|1313.7001.peg.1031 fig|1313.7001.peg.1517 fig|1313.7001.peg.443 fig|1313.7001.peg.896 6-phosphofructokinase (EC 2.7.1.11) 1 fig|1313.7001.peg.1372 6-phosphogluconate dehydrogenase, decarboxylating (EC 1.1.1.44) 1 fig|1313.7001.peg.542 The true power in :ref:`cli::p3-gto-scan` comes when you use it to compare multiple GTO files. The following command displays a summary of the differences between **1313.7001.gto** and **1313.7016.gto**. :: p3-gto-scan 1313.7001.gto 1313.7016.gto :: Processing contigs of 1313.7001.gto. Processing features of 1313.7001.gto. Processing contigs of 1313.7016.gto. Processing features of 1313.7016.gto. Role name 1313.7001.gto 1313.7016.gto 2,3-butanediol dehydrogenase, R-alcohol forming, (R)- and (S)-acetoin-specific (EC 1.1.1.4) 0 1 2-isopropylmalate synthase (EC 2.3.3.13) 1 2 23S rRNA (adenine(2058)-N(6))-dimethyltransferase (EC 2.1.1.184) => Erm(B) 0 1 4-hydroxybenzoate polyprenyltransferase and related prenyltransferases 0 1 5S rRNA 2 3 6-phospho-beta-galactosidase (EC 3.2.1.85) 1 2 AAA superfamily ATPase 0 1 ABC transporter amino acid-binding protein 0 1 ABC transporter, ATP-binding protein 13 11 ABC transporter, ATP-binding protein (cluster 3, basic aa/glutamine/opines) 3 4 ABC transporter, permease protein (cluster 3, basic aa/glutamine/opines) 5 6 ABC transporter, substrate-binding protein PebA (cluster 3, basic aa/glutamine/opines) 2 1   weak similarity to aminoglycoside phosphotransferase 1 0 * Features 3382 3304 * DNA 2101113 2052306 All done. contigs 110 dna 4153419 features 6686 functionAnalyzed 1457 functionRead 6686 functionReused 5229 roleMatch 1356 roleMismatch 175 roleProcessed 6877 Only roles that differ between the two genomes are shown (175, the number in *roleMismatch*). For each, the role name is shown followed by the number of occurrences in 1313.7001.gto and then the number of occurrences in 1313.7016.gto. So, we can see that *2-isopropylmalate synthase* occurs once in 1313.7001 but twice in 1313.7016. At the end of the role listing, feature and DNA counts are shown. We see that 1313.7016 has 78 fewer features and around 50,000 fewer base pairs (48,807 to be exact). 1356 roles occurred the same number of times in both genomes (*roleMatch*). You can specify as many GTO file names as you wish in the parameter list for :ref:`cli::p3-gto-scan`. As with the single-genome case, ``--features`` causes the features to be listed in the last column. The ``--verbose`` option causes even the matching roles to be listed, so you can get counts for everything. The status and statistical messages are sent to the standard error output, and the role table to the standard output. Thus, if you redirect these to separate files, the direct output from :ref:`cli::p3-gto-scan` can be used to get a convenient list of roles from the script. The file thus created is tab-delimited with headers, just like a normal CLI output file. The script :ref:`cli::p3-gto-fasta` creates FASTA files from a single GTO. Three command-line options (all mutually exclusive) are supported. --contig Output a DNA fasta for the genome's contigs. This is the default. --protein Output a protein fasta for the genome's features. Obviously, only protein-encoding features will be included. --feature Output a DNA fasta for the genome's features. All features are included. You specify the name of the GTO file as the first parameter of :ref:`cli::p3-gto-fasta`. :: p3-gto-fasta 1313.7001.gto >1313.7001.fna After this script, **1313.7001.fna** will look something like this. :: >1313.7001.con.0001 gaaaggacaaaatttgtcctttctcaagcttagctgacttcaacccactacagttgacaa agagcctgttttctcaataggattgtactcaggtgagtagggaggaagaggtaaaagttt atgcccaaactcttcacacaagagttctagcttacccattctatggaatcttgcattatc cataataataaccgatggtgtggttaatgttggtaagagaaatttctgaaaccatacttc aaaaaagtcgctcgtcatcgtctcttcgtaagtcattggagcgattaattcaccatttgt tagacctgcaaccaaagaaatcctctgatatcttcttccagatactttgcctcttcttaa ctgaccttttaatgagcgaccatattctcgataaaaataagtatcgaatcctgtttcgtc aatctaaacaggtgctaggtgctttaaactattaaaattcttaagaaataaggctactta tcgccctgaatatcaaaaaagaaaggacaaaatttgtcctttctcaagcttagctgactt caacccactacagttgacaaagagcctgttttctcaataggattgtactcaggtgagtag ggaggaagaggtaaaagtttatgcccaaactcttcacacaagagttctagcttacccatt In the feature-based fasta files, the functional assignment is included as a comment, as shown below. :: p3-gto-fasta --protein 1313.7001.gto :: >fig|1313.7001.peg.1182 beta-glycosyl hydrolase MKHEKQQRFSIRKYAVGAASVLIGFAFQAQTVAADGVTTTTENQPTIHTVSDSPQSSENR TEETPKAELQPETPATDKVASLPKTEEKPQEEVSSTPSDKAEVVTPTSAEKETANKKAEE ASPKKEEAKEVDSKESNTDKTDKDKPAKKDEAKAEADKPETEAGKERAATVNEKLAKKKI VSIDAGRKYFSPEQLKEIIDKAKHYGYTDLHLLVGNDGLRFMLDDMSITANGKTYASDDV KRAIEKGTNDYYNDPNGNHLTESQMTDLINYAKDKGIGLIPTVNSPGHMDAILNAMKELG IQNPNFSYFGKESARTVNLDNEQAVAFTKALIDKYAAYFAKKTEIFNIGLDEYANDATDA KGWSVLQADKYYPNEGYPVKGYEKFIAYANDLARIVKSHGLKPMAFNDGIYYNSDTSFGS FDKDIIVSMWTGGWGGYDVASSKLLAEKGHQILNTNDAWYYVLGRNADGQGWYNLDQGLN GIKNTPITSVPKTEGADIPIIGGMVAAWADTPSARYSPSRLFKLMRHFANANAEYFAADY ESAEQALNEVPKDLNRYTAESVAAVKEAEKAIRSLDSNLSRAQQDTIDQAIAKLQETVNN LTLTPEAQKEEEAKREVEKLAKNKVISIDAGRKYFTLNQLKRIVDKASELGYSDVHLLLG NDGLRFLLDDMTITANGKTYASDDVKKAIIEGTKAYYDDPNGTTLTQAEVTELIEYAKSK DIGLIPAINSPGHMDAMLVAMEKLGIKNPQAHFDKVSKTTMDLKNEEAMNFVKALIGKYM Only protein-encoding genes are output with the ``--protein`` option; however, you see all the features when you use the ``--feature`` option. :: p3-gto-fasta --feature 1313.7001.gto :: >fig|1313.7001.repeat.1 repeat region tgttttctcaataggattgtactcaggtgagtagggaggaagaggtaaaagtttatgccc aaactcttcacacaagagttctagcttacccattctatggaatcttgcattatccataat aataaccgatggtgtggttaatgttggtaagagaaatttctgaaaccatacttcaaaaaa gtcgctcgtcatcgtctcttcgtaagtcattggagcgattaattcaccatttgttagacc tgcaaccaaagaaatcctctgatatcttcttccagatactttgcctcttcttaactgacc ttttaatgagcgaccatattctcgataaaaataagtatcgaatcctgtttcgtcaatcta aacaggtgctaggtgctttaaactattaaaattcttaagaaataaggctactt >fig|1313.7001.repeat.2 repeat region tgttttctcaataggattgtactcaggtgagtagggaggaagaggtaaaagtttatgccc aaactcttcacacaagagttctagcttacccattctatggaatcttgcattatccataat Using RAST to Create New Genomes -------------------------------- If you have a DNA fasta file and you know the taxonomic ID with a certain degree of confidence, you can use the script :ref:`cli::p3-rast` to annotate the DNA and produce a new genome. The standard output of the script is a GTO. In almost every case, you will want to redirect this to a file. In addition, the new genome is stored in your workspace. It will appear in listings from :ref:`cli::p3-all-genomes`, and you can find its files via the web interface in your QuickData folders. To invoke :ref:`cli::p3-rast`, you specify a taxonomic ID or the ID of a genome with the same taxonomic ID plus the name to give to the new genome. The contigs should be in the form of a FASTA file via the standard input. All this data is submitted to the BV-BRC annotation service. When the service completes, it stores the new genome in your workspace and sends back a GTO. The example below shows a submission of sequences taken from a metagenomic sample named *SRS576036* chosen because they have a high similarity to sequences from Catenibacterium mitsuokai (taxon ID 100886). :: p3-rast 100886 "Catenibacterium from sample SRS576036" test.gto 2>test.log Now **test.gto** contains a GTO of the resulting genome and **test.log** contains information about the RAST job. If we use the ``--private`` option of :ref:`cli::p3-all-genomes`, we will see the new genome in the list. :: p3-all-genomes --private --attr genome_name :: genome.genome_id genome.genome_name 100886.26 Catenibacterium from sample SRS576036 The genome was assigned the ID *100886.26*. We can see this in the GTO file as well. :: { "genetic_code" : "11",   ], "family_assignments" : [], "type" : "CDS", "id" : "fig|100886.26.peg.1540" }, { "protein_translation" : "MLQIENASIAYGNDILFSGFNLQLERGEIASISGPSGCGKSSLLNAILGFTPLKEGRIVLNGILLDKGNVDVVRKQTAWIPQELALPLEWVKDMVQLPFGLKANRGTPFSETRLFACFEDLGLEQELYYKRVNEISGGQRQRMMIAVASMIGKPLTIVDEPTSALDSGSAEKVLSFFRRQTENGSAILTVSHDKRFANGCDRHIIMK", "aliases" : [], "location" : [ [ "100886.26.con.0010", "23684", "-", 624 ]   "type" : "CDS" } ], "id" : "100886.26", "contigs" : [ { "id" : "100886.26.con.0001",   } The genome ID appears as a part of every feature ID, as an ID in its own right, and as the first part of every contig ID. As long as you are signed in, the genomes you create using :ref:`cli::p3-rast` will participate in all queries. :: p3-all-genomes --eq taxon_id,100886 --attr genome_name :: genome.genome_id genome.genome_name 100886.3 Catenibacterium mitsuokai 100886.26 Catenibacterium from sample SRS576036 However, just as you can restrict :ref:`cli::p3-all-genomes` to your own private genomes using the ``--private`` option, you can restrict it to public genomes only using the ``--public`` option. :: p3-all-genomes --public --eq taxon_id,100886 --attr genome_name :: genome.genome_id genome.genome_name 100886.3 Catenibacterium mitsuokai The GTO produced by :ref:`cli::p3-rast` has extra information in it describing the annotation process, but it is functionally equivalent to the output were you to re-fetch the genome using the standard script. :: p3-gto 100886.26 A :ref:`cli::p3-gto-scan` for **test.gto** would return the same role profile as for **100886.26.gto**. Customizing Your Toolkit ------------------------ The set of commands that we support via the p3-scripts offers a fairly broad set of capabilities. For example, say you want the name of a specific genome from the ID. You can do this easily using :: p3-echo -t genome_id 670.470 | p3-get-genome-data --attr genome_name :: genome_id genome.genome_name 670.470 Vibrio parahaemolyticus strain S176-10 If you do this a lot, you may find the extra typing tedious. It is worth, therefore, a brief discussion of how to create shortcut scripts. Custom Scripts in the BASH Environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In BASH (the most popular Unix shell) you can add functions to your **.bashrc** file, using ``$``-notation to indicate the incoming command-line variables. So, to create the command :: gn 670.470 You would use the function definition :: function gn { p3-all-genomes --eq=genome_id,$1 --attr genome_name } You must reload the shell to activate your changes to the **.bashrc** file. Use :: exec bash to replace your current shell with a new instance. In the function, the ``$1`` is replaced by the first parameter on the command, which in our example is ``670.470``. If you type :: gn 1313.7001 the ``$1`` is replaced by ``1313.7001``, so the output would be :: genome_id genome.genome_name 1313.7001 Streptococcus pneumoniae P210774-233 You can have more than one parameter. The second is called ``$2``, the third ``$3``, and so on. The following function creates a genome group of everything resistant to a particular drug. The drug is the first parameter, the group name the second. :: function rg { p3-echo -t antibiotic $1 | p3-get-drug-genomes --resistant --attr genome_id | p3-put-genome-group $2 } Once the above definition is in place, the following command will put all the methicillin-resistant genomes into the group **meth\_resist**. :: rg methicillin meth_resist Custom Scripts for the Windows CMD Shell ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In Windows, you create a file with the extension ``.cmd`` that has your script in it, and put the file somewhere in your path. The incoming command-line variables use ``%``-notation. The special command ``@echo off`` is normally put at the beginning of the file to prevent the file internals from displaying. So, to create the command :: gn 670.470 You would create the file :: @echo off p3-all-genomes --eq=genome_id,%1 --attr genome_name and save it as **gn.cmd** in your script directory (which should be some directory you have defined and placed on your path). In the function, the ``%1`` is replaced by the first parameter on the command, which in our example is ``670.470``. If you type :: gn 1313.7001 the ``%1`` is replaced by ``1313.7001``, so the output would be :: genome_id genome.genome_name 1313.7001 Streptococcus pneumoniae P210774-233 You can have more than one parameter. The second is called ``%2``, the third ``%3``, and so on. The following function creates a genome group of everything resistant to a particular drug. The drug is the first parameter, the group name the second. :: @echo off p3-echo -t antibiotic %1 | p3-get-drug-genomes --resistant --attr genome_id | p3-put-genome-group %2 Once the above is saved as **rg.cmd**, the following command will put all the methicillin-resistant genomes into the roup **meth\_resist**. :: rg methicillin meth_resist More Applications ----------------- The following documents describe more applications for the BV-BRC CLI. #. :ref:`cli-clustering` #. :ref:`cli-signature-clusters` #. :ref:`cli-common-tasks` #. What Distinguishes One Set of Genomes from Another? (coming soon) #. Uploading Genomes and Assembling Reads (coming soon)