p3-generate-close-roles

Find Roles That Occur Close Together

p3-generate-close-roles.pl [options] <roles.tbl >pairs.tbl

This script is part of a pipeline to compute functionally-coupled roles. It takes a file of locations and roles, then outputs a file of pairs of roles with the number of times features containing those two roles occur close together on the chromosome. Such roles typically have related functions in a genome.

The input file must contain the following four fields.

  • 1

genome ID

  • 2

contig (sequence) ID

  • 3

location in the sequence

  • 4

functional role

The default script assumes the four columns are in that order. This can all be overridden with command-line options.

The input file must be sorted by genome ID and then by sequence ID within genome ID. Otherwise, the results will be incorrect. Use p3-sort to sort the file.

The location is a BV-BRC location string, either of the form start..end or complement(left..right). Given a set of genome IDs in the file genomes.tbl, you can generate the proper file using the following pipe.

p3-get-genome-features --attr sequence_id --attr location --attr product <genomes.tbl | p3-function-to-role

(If BV-BRC does not yet have roles defined, you will need to use an additional command-line option on p3-function-to-role.)

Parameters

There are no positional parameters.

The standard input can be overriddn using the options in Input Options.

Additional command-line options are

  • genome

The index (1-based) or name of the column containing the genome ID. The default is 1.

  • sequence

The index (1-based) or name of the column containing the sequence ID. The default is 2.

  • location

The index (1-based) or name of the column containing the location string. The default is 3.

  • role

The index (1-based) or name of the column containing the role description. The default is 4.

  • maxGap

The maximum space between two features considered close. The default is 2000.

  • minOcc

The minimum number of occurrences for a pair to be considered significant. The default is 4.

Example

This command is shown in the tutorial p3_common_tasks.html

p3-get-genome-features –eq feature_type,CDS –attr sequence_id –attr location –attr product &lt;genomes.tbl | p3-function-to-role | p3-generate-close-roles

role1 role2 count Transposase, IS3/IS911 family Mobile element protein 33 Mobile element protein Mobile element protein 29 Lead, cadmium, zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5) Copper-translocating P-type ATPase (EC 3.6.3.4) 25 Potassium efflux system KefA protein Small-conductance mechanosensitive channel 13 …