Updates and Reports

  • News and Announcements
  • Reports and Analysis

Help Resources:

  • Overview
  • Quick Start
  • Data Sources and Protocols
  • Quick Reference Guides
  • Tutorials
  • Command Line Interface
    • Installing the CLI Release
    • Tutorials and Reference Documentation
      • Installing the BV-BRC Command Line Interface
      • Using the BV-BRC Command-line Interface
      • Using BV-BRC Services from the Command Line
      • Common Tasks With P3 Scripts
      • Looking for Hypothetical Proteins in Clusters of Related Features
      • Computing Signature Clusters: an Application of the Command-Line Tools
      • BV-BRC Command Line Program Reference
        • Common Options to Command Line Programs
        • GenomeTypeObject
        • Hsp
        • P3DataAPI
        • P3Utils
        • Sim
        • appserv-enumerate-apps
        • appserv-enumerate-tasks
        • appserv-kill-task
        • appserv-query-task
        • appserv-rerun-task
        • appserv-start-app
        • p3-aggregates-to-html
        • p3-all-contigs
        • p3-all-drugs
        • p3-all-genomes
        • p3-all-subsystem-roles
        • p3-all-subsystems
        • p3-all-taxonomies
        • p3-blast
        • p3-build-kmer-db
        • p3-cat
        • p3-closest-seqs
        • p3-co-occur
        • p3-collate
        • p3-compare-cols
        • p3-count
        • p3-count-families
        • p3-cp
        • p3-discriminating-kmers
        • p3-dump-genomes
        • p3-echo
        • p3-extract
        • p3-fasta-md5
        • p3-feature-gap
        • p3-feature-upstream
        • p3-file-filter
        • p3-find-couples
        • p3-find-features
        • p3-find-in-clusters
        • p3-find-serology-data
        • p3-find-surveillance-data
        • p3-format-results
        • p3-function-parse
        • p3-function-to-role
        • p3-generate-close-roles
        • p3-generate-clusters
        • p3-genome-distance
        • p3-genome-fasta
        • p3-genome-kmer-hits
        • p3-genome-md5
        • p3-genus-species
        • p3-get-drug-genomes
        • p3-get-family-data
        • p3-get-family-features
        • p3-get-feature-data
        • p3-get-feature-group
        • p3-get-feature-protein-regions
        • p3-get-feature-protein-structures
        • p3-get-feature-regions
        • p3-get-feature-sequence
        • p3-get-feature-subsystems
        • p3-get-features-by-sequence
        • p3-get-features-in-regions
        • p3-get-genome-contigs
        • p3-get-genome-data
        • p3-get-genome-drugs
        • p3-get-genome-expression
        • p3-get-genome-features
        • p3-get-genome-group
        • p3-get-genome-protein-regions
        • p3-get-genome-protein-structures
        • p3-get-genome-refseq-features
        • p3-get-genome-sp-genes
        • p3-get-genome-subsystems
        • p3-get-subsystem-features
        • p3-get-subsystem-roles
        • p3-get-taxonomy-data
        • p3-gto
        • p3-gto-dna
        • p3-gto-fasta
        • p3-gto-fetch
        • p3-gto-scan
        • p3-head
        • p3-identify-clusters
        • p3-job-status
        • p3-join
        • p3-kmer-compare
        • p3-less
        • p3-list-feature-groups
        • p3-list-genome-groups
        • p3-login
        • p3-logout
        • p3-ls
        • p3-mass-cluster-run
        • p3-match
        • p3-merge
        • p3-mkdir
        • p3-nucleon-runs
        • p3-peg-kmer-hits
        • p3-pick
        • p3-pick-by-class
        • p3-pivot
        • p3-project-subsystems
        • p3-put-feature-group
        • p3-put-genome-group
        • p3-qstat
        • p3-rast
        • p3-related-by-clusters
        • p3-rep-prots
        • p3-rm
        • p3-rmdir
        • p3-role-fasta
        • p3-role-features
        • p3-role-matrix
        • p3-sequence-profile
        • p3-set-to-relation
        • p3-shuffle
        • p3-signature-clusters
        • p3-signature-families
        • p3-signature-peginfo
        • p3-sort
        • p3-stats
        • p3-submit-BLAST
        • p3-submit-CGA
        • p3-submit-MSA
        • p3-submit-codon-tree
        • p3-submit-fastqutils
        • p3-submit-gene-tree
        • p3-submit-genome-annotation
        • p3-submit-genome-assembly
        • p3-submit-metagenome-binning
        • p3-submit-metagenomic-read-mapping
        • p3-submit-proteome-comparison
        • p3-submit-rnaseq
        • p3-submit-sars2-analysis
        • p3-submit-sars2-assembly
        • p3-submit-taxonomic-classification
        • p3-submit-variation-analysis
        • p3-subsys-roles
        • p3-tail
        • p3-tbl-to-fasta
        • p3-tbl-to-html
        • p3-tests
        • p3-uni-roles
        • p3-user-subsystem
        • p3-whoami
        • p3-write-kmers
        • submit-patric-annotation
        • submit-patric-genbank
    • RAST Tutorials
  • Instructional Videos
  • Webinars
  • Workshops

System Documentation:

  • Data Management and Sharing
  • System Architecture
  • Test Procedures
  • Usage Metrics
BV-BRC
  • Command Line Interface
  • BV-BRC Command Line Program Reference
  • p3-related-by-clusters
  • View page source

p3-related-by-clusters¶

Compute Related Protein Families Based on Clusters¶

p3-related-by-clusters --gs1 Genome_set_1
                       --gs2 Genome_set_2
                       --sz1 Sample_size_for_gs1
                       --sz2 Sample_size_for_gs2
                       --iterations Number_random_sample_iterations
                       --family fam_type
                       --Output Directory

This tool takes as input two genome sets. These will often be

gs1    genomes for a specific species (e.g., Streptococcus pyogenes)
gs2    genomes from the same genus, but different species

The tool picks random subsets of gs1 and gs2, computes signature families for each pair of picks, then computes clusters of these families for each pick.

It does a set of iterations, saving the signature clusters for each iteration.

After running the set of iterations, it computes the number of times each pair of signature families were in signature clusters.

It outputs the pairs of co-ocurring signature families, along with the signature clusters computed for each iteration.

The output goes to a created directory. Within that directory, the subdirectory

CS

will contain the cluster signatures for each iteraion, and

related.signature.families

is set to the predicted functionally-coupled pairs of families:

[occurrence-count,family1,family2] sorted into descending order based on count.

Each CS/n file contains entries of the form

famId1 peg1 func1
famId2 peg2 func2
.
.
.
//

Parameters¶

There are no positional parameters.

Standard input is not used.

The additional command-line options are as follows.

  • gs1

Genome set 1: a file containing genome ids in the first column These genomes will be the onces containing signature families and clusters.

  • gs2

Genome set 2: a file containing genome ids in the first column

  • sz1

For each iteration pick a sample of sz1 genomes from gs1

  • sz2

For each iteration pick a sample of sz2 genomes from gs2

  • iterations

run this many iterations of random subsets of gs1 and gs2

  • output

a created directory that will contain the output

  • family

Type of protein family– local, global, or figfam.

Previous Next

© Copyright 2023 | The BV-BRC Team.

Built with Sphinx using a theme provided by Read the Docs.