The goal of Collecting and Identification the Outbreak Cluster (caIRA) is to find the cluster based tree and metadata of the data. The package is based on the paper:
Ragonnet-Cronin, M., Hodcroft, E., Hué, S. et al. Automated analysis of phylogenetic clusters. BMC Bioinformatics 14, 317 (2013). https://doi.org/10.1186/1471-2105-14-317
Hall M, Woolhouse M, Rambaut A (2015) Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set. PLoS Comput Biol 11(12): e1004613. https://doi.org/10.1371/journal.pcbi.1004613
This package is the part of Dhihram Tenrisau, MSc Health Data Science summer project, ‘Phylodynamic of Norovirus in UK 2003-2023’. The project is supervised by Stéphane Hué.
By using this package, the hope will be emerging (not the the disease emerging) like the lyrics of the song ‘Ah Ça Ira’:
Ah ça ira, réjouis-toi! (It’ll be fine, rejoice!)
Ah ça ira, le bon temps viendra (It’ll be fine, good times will come)
You can install the development version of huebreaker from GitHub with:
This package needs the additional package tidyverse
,
ape
, treeio
, and dplyr
You need the 2 data in this file, the first data is tree files
(newick or nexus) and metadata file. The metadata file consists of
label
, location
, and date
columns. The label
column must be same with the label in
tree file.
#open data
tree <- read.tree("https://raw.githubusercontent.com/Dhihram/huebreaker/master/data/random_tree.nwk")
metat <- read.csv("https://raw.githubusercontent.com/Dhihram/huebreaker/master/data/random_metadata.csv")
#change date format
metat$date <- as.Date(metat$date, format = "%m/%d/%Y")
#check the data
str(tree)
#> List of 5
#> $ edge : int [1:38, 1:2] 21 22 23 24 24 23 25 26 27 27 ...
#> $ edge.length: num [1:38] 0.217 0.217 0.389 0.942 0.963 ...
#> $ Nnode : int 19
#> $ node.label : chr [1:19] "59" "87" "97" "100" ...
#> $ tip.label : chr [1:20] "t8" "t5" "t1" "t17" ...
#> - attr(*, "class")= chr "phylo"
#> - attr(*, "order")= chr "cladewise"
str(metat)
#> 'data.frame': 20 obs. of 4 variables:
#> $ X : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ label : chr "t8" "t5" "t1" "t17" ...
#> $ location: chr "Area1" "Area1" "Area1" "Area1" ...
#> $ date : Date, format: "2020-01-31" "2020-01-05" ...
The intial tree can be seen below with ggtree
package
This part of package, genclus
will utilize:
Finding and clustering the monophylectic groups in the tree
Add the parameter of the clusters:
bootstrap_treshold
, data_range
, and
samearea
Keep the maximum monophylectic groups in the cluster identify
The bootstrap_treshold
is the minimum bootstrap value to
be considered as a cluster. The data_range
is the range of
the days to be considered as a cluster. The samearea
is the
boolean value to consider the same area as a cluster.
res <- genclus(tree, metat, bootstrap_threshold = 80, date_range = 30, samearea = TRUE)
knitr::kable(res)
Group | Tips | Bootstrap | ParentNode | AreaName | MinDate | MaxDate | diff | NumTips |
---|---|---|---|---|---|---|---|---|
t1, t17 | t1, t17 | 93 | 27 | Area1 | 2020-01-20 | 2020-01-28 | 8 | 2 |
t14, t18 | t14, t18 | 84 | 39 | Area3 | 2020-01-23 | 2020-01-26 | 3 | 2 |
We can check the clusters in tree
Since BEAST tree has different structure of the tree, we developed
beastclus
. In BEAST tree, the bootstrapping value is
posterior probability with scale 0-1. The beast tree is exported using
read.beast
from treeio
In this example, I will
use the Monkeypox phylogenetic tree and metadata from GISAID:
#beast tree
beast_tree <- read.beast('https://raw.githubusercontent.com/Dhihram/caIRA/refs/heads/master/inst/extdata/pox_strict_comb.tree')
#metadata
metadata<-read.csv('https://raw.githubusercontent.com/Dhihram/caIRA/refs/heads/master/inst/extdata/metadata_samp.csv')
Generating cluster
res2 <- beastclus(beast_tree, metadata, post_threshold = 0.50, date_range = 90, samearea = TRUE)
knitr::kable(res2)
ParentNode | label | Posterior | AreaName | min_date | max_date | NumTips | dif |
---|---|---|---|---|---|---|---|
79 | hMpxV/Indonesia/BT-Biokes-MP303/2023, hMpxV/Indonesia/JK-NIHRD-MP015/2023 | 0.83 | Jabodetabek | 2023-10-20 | 2023-12-12 | 2 | 53 |
64 | hMpxV/Indonesia/JB-NIHRD-MP254/2023, hMpxV/Indonesia/JK-Biokes-MP310/2023, hMpxV/Indonesia/JK-NIHRD-MP022/2023, hMpxV/Indonesia/JK-NIHRD-MP039/2023, hMpxV/Indonesia/JK-NIHRD-MP080/2023, hMpxV/Indonesia/JK-NIHRD-MP081/2023, hMpxV/Indonesia/JK-NIHRD-MP159/2023, hMpxV/Indonesia/JK-NIHRD-MP218/2023, hMpxV/Indonesia/JK-NIHRD-MP225/2023, hMpxV/Indonesia/JK-NIHRD-MP246/2023, hMpxV/Indonesia/JK-NIHRD-MP252/2023 | 0.77 | Jabodetabek | 2023-10-23 | 2023-12-15 | 11 | 53 |
89 | hMpxV/Indonesia/JK-Biokes-MP010/2024, hMpxV/Indonesia/JK-Biokes-MP011/2024, hMpxV/Indonesia/JK-Biokes-MP020/2024 | 0.99 | Jabodetabek | 2024-01-22 | 2024-02-17 | 3 | 26 |
Also, we can check the clusters in tree