p3-genome-distance

Compute the Distance Between a Specified Genome and Genomes in a File

p3-genome-distance.pl [options] baseGenome

This script uses protein families to compute genome distance. A single genome is chosen as the base genome. We run through all the protein families and compute the kmers in common for each family member in the other genomes. The similarity score is the total number of kmers in common. The higher the similarity score, the closer the genome is to the base. The number produced is not scaled, so it can only be used for relative comparison.

Parameters

The positional parameter is the ID of the base genome.

The standard input should contain the IDs of the genomes to compare. The standard input can be overridden using the options in Input Options and the options in Column Options can be used to specify the column containing the genome IDs. If the base genome is found in the input, it will be ignored.

The standard output will contain the genome IDs and their distances. The following additional options are supported.

  • dna

Use DNA kmers instead of protein kmers.

  • kmer

The kmer size to use. The default is 8 for proteins and 16 for DNA.

  • verbose

If specified, progress messages will be displayed on the standard error output.