.. _cli-common-tasks: Common Tasks With P3 Scripts ============================ Here we present examples of common tasks and show how to accomplish them with the :doc:`command-line interface ` Working with Taxonomic Groupings -------------------------------- List Roles that are Found in One Species but not Another ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For our example, we will compare Vibrio campbellii with Vibrio alginolyticus. To answer this question, we need a file of roles from Vibrio alginolyticus and use it to filter out roles from Vibrio campbellii. The following pipe gets all the roles from Vibrio alginolyticus genomes and puts them in the file **aRoles.tbl**. :: p3-all-genomes --eq "genome_name,Vibrio alginolyticus" | p3-get-genome-features --attr product | p3-function-to-role | p3-sort --count feature.role >aRoles.tbl There are a lot of pieces to this pipe. First, :ref:`cli::p3-all-genomes` gets all the genome IDs for Vibrio alginolyticus. Then :ref:`cli::p3-get-genome-features` finds all the features for those genomes and outputs the functional assignment (product). :ref:`cli::p3-function-to-role` converts the functions to roles and eliminates the hypotheticals. Finally, :ref:`cli::p3-sort` with the ``--count`` option counts the number of occurrences of each role. It takes a while, but the output looks something like this. :: feature.role count (2E,6E)-farnesyl diphosphate synthase (EC 2.5.1.10) 34 (3R)-hydroxymyristoyl-[ACP] dehydratase (EC 4.2.1.-) 34 1,4-alpha-glucan (glycogen) branching enzyme, GH-13-type (EC 2.4.1.18) 37 1,4-alpha-glucan branching enzyme (EC 2.4.1.18) 34 1,4-dihydroxy-2-naphthoate polyprenyltransferase (EC 2.5.1.74) 34 1,4-dihydroxy-2-naphthoyl-CoA hydrolase (EC 3.1.2.28) in menaquinone biosynthesis 34 1,6-anhydro-N-acetylmuramyl-L-alanine amidase 35 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267) 34 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7) 35 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1) 38 1-phosphofructokinase (EC 2.7.1.56) 34 Now we perform the same exercise with Vibrio campbellii. :: p3-all-genomes --eq "genome_name,Vibrio campbellii" | p3-get-genome-features --attr product | p3-function-to-role | p3-sort --count feature.role >cRoles.tbl :: feature.role count (2E,6E)-farnesyl diphosphate synthase (EC 2.5.1.10) 25 1,4-alpha-glucan (glycogen) branching enzyme, GH-13-type (EC 2.4.1.18) 27 1,4-alpha-glucan branching enzyme (EC 2.4.1.18) 25 1,4-dihydroxy-2-naphthoate polyprenyltransferase (EC 2.5.1.74) 27 1,4-dihydroxy-2-naphthoyl-CoA hydrolase (EC 3.1.2.28) in menaquinone biosynthesis 28 1,6-anhydro-N-acetylmuramyl-L-alanine amidase 25 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267) 30 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7) 27 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1) 26 1-phosphofructokinase (EC 2.7.1.56) 33 16 kDa heat shock protein A 25 Now we filter **cRoles.tbl** by removing records that match **aRoles.tbl**. Note that we are matching on the key column ONLY. We don't care about the counts, only which roles are in campbellii but not alginolyticus. The :ref:`cli::p3-file-filter` command performs this task. :: p3-file-filter --reverse --col=feature.role aRoles.tbl 1302.21.con.0001 contig agctcagttggtagtagcgcatgactgttaatcatgatgtcgtaggttcgagtcctactg ccggagttatatctataagtaagacaagaaattcttgtctttttatatttattgtgtttt tgcaatttaatttttaagttcttatttaataaaaagcttgaagattattcttcaagcttt ttatgtttattaaagaatgcttcatagagggctttaatagctgctttttcttgttcagag tttactacgagcatgatagaaacttcgctagatccttgagagatcatttgaatattaatt ttgctgtctgatagagcctttgtagccgtagcagtcagaccgatatgacttttcatttgc List the Protein Sequences for the Genes in a Genome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Our example is 1302.21 (Streptococcus gordonii strain DD07). :: p3-genome-fasta --protein 1302.21 :: >fig|1302.21.peg.966 putative Zn-dependent protease MRFLLNLFRFIWRMFWRLVWAGIVAFIILVSVLYLTNPSQTGLTAVRQAVQTAVNQLDTF LDQQGIHTGLGQNVQNLGEHLTDQHVASSDGARWENARATVYIETENSTFRAAYQEAIKS WNATGAFTFQLVEDKSQANIIATEMNDSTITAAGEAESQTNVLTKRFTKVTVRLNAYYLL NNYYGYSHERIVNTASHELGHAIGLDHNESESVMQSAGSFYSIQPIDIQAVKELYQD >fig|1302.21.peg.969 Putative metallopeptidase (Zinc) SprT family MNLNEYIKQVSLEDFGWEFRHQAFWNKRLRTTGGRFFPKDGHLDFNPKIYETFGLETFRK IVRHELAHYHLYYQGKGYRHKDRDFKELLKQVGGLRYAPGLPAKKLKLHYQCRSCCTDFY RQRRIEIKKYRCGRCKGKLRLLKQER Given a List of Genomes, Produce a List of Pairs of Roles that are Implemented by Genes that are Close on the Chromosome, Sorted by Number of Occurrences ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here we assume our list of genomes is in the file **genomes.tbl**. The content of this file is shown below. :: genome_id 1310696.14 66976.17 91890.5 316273.25 186497.12 1353158.3 135461.13 1173954.3 1176728.3 We use :ref:`cli::p3-get-genome-features` to get the feature and location data, :ref:`cli::p3-function-to-role` to convert the functions to roles, and :ref:`cli::p3-generate-close-roles` to compute the physically close roles. Because we only want protein-encoding genes (pegs), we filter the genome features by type. (If we didn't do this, the output would start with a whole bunch of generic roles involving ribosomes and CRISPR repeats.) The output is automatically sorted by decreasing number of occurrences. :: p3-get-genome-features --eq feature_type,CDS --attr sequence_id --attr location --attr product