TreeSort Service¶
Overview¶
The idea behind TreeSort is the observation that if there is no reassortment, then the evolutionary histories of different segments should be identical. TreeSort then uses a phylogenetic tree for one segment (e.g., the HA influenza A virus segment) as an evolutionary hypothesis for another segment (e.g., the NA segment). We will refer to the first segment as the reference and the second segment as the challenge. By trying to fit the sequence alignment of the challenge segment to the reference tree, TreeSort identifies points on that tree, where this evolutionary hypothesis breaks. The “breaking” manifests in the mismatch between the divergence time on the reference tree (e.g., 1 year divergence between sister clades) and an unlikely high number of substitutions in the challenge segment that are required to explain the reference tree topology under the null hypothesis of no reassortment.
TreeSort has demonstrated very high accuracy in reassortment inference in simulations (manuscript in preparation). TreeSort can process datasets with tens of thousands of virus strains in just a few minutes and can scale to very large datasets with hundreds of thousands of strains. (This overview is from https://github.com/flu-crew/TreeSort/blob/main/README.md)
NOTE¶
The current version of TreeSort is meant to be used for Influenza viruses ONLY. We hope to provide an updated version in the near future that can be used with common segmented viruses.
See Also¶
TreeSort Service Tutorial (TODO)
Using the TreeSort Service¶
The “TreeSort” submenu under the “TOOLS & SERVICES” main menu (Viral Tools category) opens the input form.
Note: You must be logged into BV-BRC to use this service.
Parameters¶
Input file¶
Select a FASTA file from your workspace. Note that the FASTA headers/deflines have very strict formatting requirements:
The segment name and sample date must be formatted like the following example (“|segment|sample date” are in yellow):
Only the following Influenza segment names are recognized: PB2, PB1, PA, HA, NP, NA, MP, and NS.
Output folder¶
The directory in your workspace where a directory will be created for the TreeSort results.
Output name¶
The name of the directory that will be created under “Output folder”. This name will also be used for the primary results filename (output name.tre).
Reference segment¶
Reassortment events are acquisitions of 1 or more novel segments relative to this (fixed) reference segment.
Segments¶
Select at least 2 segments to include in the analysis.
Advanced options¶
Match type¶
Strain: Match the names (deflines in FASTAs) across the segments based on the strain name. E.g., “A/Texas/32/2021” or “A/swine/A0229832/Iowa/2021”. Works for flu A, B, C, and D, and no pre-processing is needed to standardize the names before the analysis.
EPI_ISL_XXX: Segments are matched based on the “EPI_ISL_XXX” field (if present in deflines).
RegEx: Provide your own custom regular expression to match the segments across the alignments.
Match RegEx¶
When a match type of RegEx is selected, your custom regular expression will be entered in this field.
Inference method¶
local: (default)
mincut: The mincut method:
Always determines the most parsimonious reassortment placement, even in ambiguous circumstances.
Uses the reassortment test to cut the reference phylogeny into the optimum (smallest) number of non-reassorting parts with theoretical guarantees on optimality.
Is more robust than the current “local” method in many instances, and does not result in “uncertain” reassortment inferences with the ‘?’ annotation.
Reference tree inference method¶
The tool that will be used to infer the reference tree:
FastTree: (default)
Infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
Can handle alignments with up to a million of sequences in a reasonable amount of time and memory.
IQ-Tree (recommended for better accuracy)
A fast search algorithm (Nguyen et al., 2015) to infer phylogenetic trees by maximum likelihood.
Allowed deviation¶
Maximum deviation from the estimated substitution rate within each segment. The default is 2: The substitution rate on a particular tree branch is allowed to be twice as high or twice as low as the estimated rate. The default value was estimated from the empirical influenza A data.
P-value threshold¶
The cutoff p-value for the reassortment tests: the default is 0.001 (0.1 percent). You may want to decrease or increase this parameter depending on how stringent you want the analysis to be.
Clades filename¶
The path to an output file where clades with evidence of reassortment will be saved.
Estimate molecular clock rates for different segments¶
Estimate molecular clock rates for different segments, assuming equal rates.
Collapse near-zero length branches into multifurcations¶
Collapse near-zero length branches into multifurcations (by default, TreeSort collapses all branches shorter than 10^-7^ (1e-7) and then optimizes the multifurcations).
Output Results¶
Clicking the Jobs indicator at the bottom of the BV-BRC page opens the Jobs Status page, which displays all current and previous service jobs and their statuses.
Once the job has completed, you can view the results by double-clicking the job or clicking the “View” button on the green vertical Action Bar on the right-hand side of the page.
The results page consists of a header describing the job and a list of output files, which are generated by the TreeSort service and saved in your Workspace.
Result files¶
TreeSort_analysis_results.html: An overview of the analysis results with links to all files generated by TreeSort, descriptions of the file types, and guidance on how to interpret the result data.
<output name>.tre: An annotated tree file in Nexus format where output name is the text entered in the Output name field.
Segment-specific files¶
These files are generated for every virus segment included in the analysis, where segment name is PB2, PB1, PA, HA, NP, NA, MP, or NS.
<segment name>-input.fasta.aln
<segment name>-input.fasta.aln.dates.csv
<segment name>-input.fasta.aln.rooted.tre
<segment name>-input.fasta.tre
<segment name>-input.fasta.aln.treetime
outliers.tsv
root_to_tip_regression.pdf
rtt.csv
References¶
Alexey Markin, Catherine A. Macken, Amy L. Baker, Tavis K. Anderson, “Revealing reassortment in influenza A viruses with TreeSort” bioRxiv 2024.11.15.623781; doi: https://doi.org/10.1101/2024.11.15.623781