##Introduction to sensiPhy The sensiPhy package provides simple functions to perform sensitivity analyses in phylogenetic comparative methods. It uses several simulation methods to estimate the impact of different types of uncertainty on Phylogenetic comparative methods:
Functions for sensitivity analysis
sensiPhy functions use a common syntax that combines the type of uncertainty and the type of analysis:
where “xxx” can be one of the 5 sensiPhy methods (see Figure below):
sensiPhy workflow & functions
Additional functions
The influ
method performs leave-one-out deletion
analysis and detects influential species on parameter estimates.
library(sensiPhy)
# run analysis:
influ <- influ_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, track=FALSE)
Note that track = TRUE
shows the progress of the
analysis
# Logtransform data
alien.data$logMass <- log(alien.data$adultMass)
# Run sensitivity analysis:
influ <- influ_physig("logMass", data = alien.data, phy = alien.phy[[1]], track=FALSE)
# You can change the method used to estimate signal:
influ2 <- influ_physig("logMass", data = alien.data, phy = alien.phy[[1]], method = "lambda", track=FALSE)
# To check summary results:
summary(influ)
# Most influential speciesL
influ$influential.species
# Visual diagnostics
sensi_plot(influ)
sensi_plot(influ2)
Continuous characters
# Load data:
data("primates")
# Model trait evolution accounting for influential species
adultMass<-primates$data$adultMass
names(adultMass)<-rownames(primates$data)
# Model trait evolution accounting for influential species
influ_cont<-influ_continuous(data = adultMass,phy = primates$phy[[1]],
model = "OU",cutoff = 2,n.cores = 2,track = FALSE)
# Print summary statistics for the transitions rates, aic-values and (if applicable)
# optimisation parameter
summary(influ_cont)
# Visual diagnostics
sensi_plot(influ_cont)
Different evolutionary models from fitContinuous
can be
used: BM
,OU
, EB
,
trend
, lambda
, kappa
,
delta
and drift
.
Discrete characters
# Load data:
data("primates")
# Create a binary trait factor
adultMass_binary<-ifelse(primates$data$adultMass > 7350, "big", "small")
adultMass_binary<-as.factor(as.factor(adultMass_binary))
names(adultMass_binary)<-rownames(primates$data)
# Model trait evolution accounting for influential species - symmetric model
influ_binary <- influ_discrete(data = adultMass_binary, phy = primates$phy[[1]],
model = "SYM", transform = "none", cutoff = 2, n.cores = 2, track = FALSE)
# Print summary statistics for the transition rates, aic-values and
# (if applicable) optimisation parameter
summary(influ_binary)
# Visual diagnostics - in symmetrical model q12 and q21 are, as expected,
# exactly the same
sensi_plot(influ_binary)
Different character models from fitDiscrete
can be fit
by changing the model
argument. These include
ER
(equal-rates), SYM
(symmetric),
ARD
(all-rates-different) and meristic
(stepwise fashion). Similarly, all transformations to the phylogenetic
tree from fitDiscrete
can be used: none
,
EB
, lambda
, kappa
and
delta
.
The clade
method can be used to estimate and test the
influence of specific clades on parameter estimates.
Additional arguments: clade.col: The name of a column in the provided data frame with clades specification (a character vector with clade names). n.species: Minimum number of species in the clade in order to include this clade in the leave-one-out deletion analysis. Default is 5. n.sim: The number of repetitions for the randomization test
data(primates)
# run analysis:
clade <- clade_phylm(log(sexMaturity) ~ log(adultMass), phy = primates$phy[[1]],
data = primates$data, clade.col = "family", track=FALSE)
# To check summary results and most influential clades:
summary(clade)
sensi_plot(clade, "Cebidae")
# Logtransform data
alien.data$logMass <- log(alien.data$adultMass)
# Run sensitivity analysis:
clade <- clade_physig(trait.col = "logMass", data = alien.data, n.sim = 100,
phy = alien.phy[[1]], clade.col = "family", method = "K", track=FALSE)
summary(clade)
sensi_plot(clade, "Bovidae")
sensi_plot(clade, "Sciuridae")
Continuous characters
# Load data:
data("primates")
# Model trait evolution accounting for influential clades
clade_cont <- clade_continuous(data=primates$data, phy = primates$phy[[1]],
model="OU", trait.col = "adultMass", clade.col="family", n.sim=30, n.species=10,
n.cores = 2, track=TRUE)
# Check summary statistics for the transitions rates, aic-values and (if applicable) optimisation parameter
summary(clade_cont)
# Visual diagnostics
sensi_plot(clade_cont,graph="all")
# Plot only one clade and one parameter
sensi_plot(clade_cont,clade="Cebidae",graph = "optpar")
Discrete characters
# Load data:
data("primates")
# Create a binary trait factor
primates$data$adultMass_binary<-ifelse(primates$data$adultMass > 7350, "big", "small")
# Model trait evolution accounting for influential clades
clade_disc <- clade_discrete(data=primates$data,phy = primates$phy[[1]],
model="ARD",transform="kappa",
trait.col = "adultMass_binary", clade.col = "family",
n.sim = 30, n.species=8,n.cores = 2)
# Check summary statistics:
summary(clade_disc)
# Visual diagnostics
sensi_plot(clade_disc)
The samp
method performs analyses of sensitivity to
species sampling by randomly removing species from the dataset and
detecting the effects on parameter estimates.
# run analysis:
samp <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, n.sim = 50, track=FALSE)
# You can change the number of repetitions and break intervals:
samp2 <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, n.sim = 100, breaks = c(0.1, 0.2, 0.3, 0.4), track=FALSE)
# You can change the phylogenetic model:
samp3 <- samp_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy[[1]],
data = alien$data, model = "kappa", track=FALSE)
# Check results:
summary(samp)
# Visual diagnostics
sensi_plot(samp)
sensi_plot(samp2)
sensi_plot(samp3)
# You can specify which graph and parameter ("slope" or "intercept") to print:
sensi_plot(samp2, graphs = 1)
sensi_plot(samp2, param = "intercept")
sensi_plot(samp2, graphs = 2, param = "estimate")
sensi_plot(samp)
sensi_plot(samp2, graphs = 1)
# Logtransform data
alien.data$logMass <- log(alien.data$adultMass)
# Run sensitivity analysis:
samp <- samp_physig(trait.col = "logMass", data = alien.data, n.sim = 30, phy = alien.phy[[1]], track=FALSE)
# You can change the method used to estimate signal:
samp2 <- samp_physig(trait.col = "logMass", data = alien.data, n.sim = 30,
phy = alien.phy[[1]], method = "lambda", track=FALSE)
# Check results:
summary(samp)
# Visual diagnostics
sensi_plot(samp)
Continuous characters
# Load data:
data("primates")
# Model trait evolution
adultMass<-primates$data$adultMass
names(adultMass)<-rownames(primates$data)
samp_cont<-samp_continuous(data = adultMass,phy = primates$phy[[1]],
model = "OU",n.sim=25,breaks=seq(.05,.2,.05),n.cores = 2, track = FALSE)
# Print summary statistics for the transitions rates, aic-values and (if applicable)
# optimisation parameter
summary(samp_cont)
# Visual diagnostics
sensi_plot(samp_cont)
Discrete characters
# Load data:
data("primates")
# Create a binary trait factor
adultMass_binary<-ifelse(primates$data$adultMass > 7350, "big", "small")
adultMass_binary<-as.factor(as.factor(adultMass_binary))
names(adultMass_binary)<-rownames(primates$data)
# Model trait evolution
samp_binary <- samp_discrete(data = adultMass_binary,phy = primates$phy[[1]], n.sim=25,
breaks = seq(.1,.3,.1),model = "ARD", transform = "lambda", n.cores = 2,
track = TRUE)
# Print summary statistics for the transitions rates, aic-values and (if applicable)
# optimisation parameter
summary(samp_binary)
# Visual diagnostics
sensi_plot(samp_binary)
The tree
method performs sensitivity analysis to
estimate the influence of phylogenetic uncertainty on parameter
estimates.
Additional arguments:
n.tree: Number of times to repeat the analysis with n
different trees picked randomly in the multiPhylo file.
tree
method for phylogenetic linear
regression# Load data
data(alien)
# This analysis needs a multiphylo file:
class(alien$phy)
alien$phy
# run PGLS accounting for phylogenetic uncertainty:
tree <- tree_phylm(log(gestaLen) ~ log(adultMass), phy = alien$phy,
data = alien$data, n.tree = 30, track=FALSE)
# To check summary results:
summary(tree)
# Visual diagnostics
sensi_plot(tree)
tree
method for phylogenetic
signaltree
method
for trait evolution of continuous and discrete charactersContinuous characters
# Load data:
data("primates")
# Model trait evolution accounting for phylogenetic uncertainty
adultMass<-primates$data$adultMass
names(adultMass)<-rownames(primates$data)
tree_cont<-tree_continuous(data = adultMass,phy = primates$phy,
model = "OU",n.tree=30,n.cores = 2,track = FALSE)
# Print summary statistics for the transitions rates, aic-values and (if applicable)
# optimisation parameter
summary(tree_cont)
```{r eval=FALSE}
# Plot only some parameters
sensi_plot(tree_cont,graphs="sigsq")
sensi_plot(tree_cont,graphs="optpar")
Discrete characters
# Load data:
data("primates")
# Create a binary trait factor
adultMass_binary<-ifelse(primates$data$adultMass > 7350, "big", "small")
adultMass_binary<-as.factor(as.factor(adultMass_binary))
names(adultMass_binary)<-rownames(primates$data)
# Model trait evolution accounting for phylogenetic uncertainty
tree_binary<-tree_discrete(data = adultMass_binary,phy = primates$phy,
model = "ARD",transform = "none",n.tree = 30,n.cores = 2,track = FALSE)
# Print summary statistics for the transitions rates, aic-values and (if applicable)
# optimisation parameter
summary(tree_binary)
```{r eval=FALSE}
## Visual diagnostics
sensi_plot(tree_binary,graphs="q12")
sensi_plot(tree_binary,graphs="q21")
The intra
method performs Sensitivity analysis for
intraspecific variation and measurement error
intra
method for phylogenetic linear
regressionAdditional arguments:
Vy: Name of the column containing the standard
deviation or the standard error of the response variable. When
information is not available for one taxon, the value can be 0 or
NA.
Vx: Name of the column containing the standard
deviation or the standard error of the predictor variable. When
information is not available for one taxon, the value can be 0 or
NA
x.transf: Transformation for the response variable
(e.g. log or sqrt). Please use this argument instead of transforming
data in the formula directly.
y.transf: Transformation for the predictor variable
(e.g. log or sqrt). Please use this argument instead of transforming
data in the formula directly.
distrib: A character string indicating which
distribution to use to generate a random value for the response and/or
predictor variables. Default is normal distribution: “normal” (function
rnorm). Uniform distribution: “uniform” (runif) Warning: we recommend to
use normal distribution with Vx or Vy = standard deviation of the
mean.
n.intra: Number of times to repeat the analysis with n
different trees picked randomly in the multiPhylo file. If NULL, times =
2
# run PGLS accounting for intraspecific variation:
intra <- intra_phylm(gestaLen ~ adultMass, phy = alien$phy[[1]],
data = alien$data, Vy = "SD_gesta", Vx = "SD_mass",
n.intra = 100, x.transf = log, y.transf = log, track=FALSE)
# To check summary results:
summary(intra)
# Visual diagnostics
sensi_plot(intra)
intra
method for phylogenetic
signalAdditional arguments:
V: Name of the column containing the standard deviation
or the standard error of the trait variable. When information is not
available for one taxon, the value can be 0 or NA.
The ‘sensiPhy’ package can perform sensitivity analysis by interacting multiple types of uncertainty (described above).
‘sensiPhy’ implements functions to study interactions of both
phylogenetic uncertainty (tree-functions) and data uncertainty
(intra-functions) with sampling uncertainty (clade
,
influ
, and samp
), as well as interactions
between data and phylogenetic uncertainty (tree_intra
).
Both intra
and tree
methods can be used
together with all other uncertainties to evaluate the interaction
between two types of uncertainty at the same time.
tree_influ
method for
phylogenetic linear regressiontree_clade
method for
phylogenetic linear regressiontree_samp
method for
phylogenetic linear regressiontree_intra
method for
phylogenetic linear regression# Load data:
data(alien)
# run PGLS accounting for intraspecific and phylogenetic variation:
intra.tree <- tree_intra_phylm(gestaLen ~ adultMass, data = alien$data, phy = alien$phy,
Vy = "SD_gesta", n.intra = 10, n.tree = 30,
y.transf = log, x.transf = log, track=FALSE)
# To check summary results:
summary(intra.tree)
# Visual diagnostics
sensi_plot(intra.tree, uncer.type = "all") #or uncer.type = "tree", uncer.type = "intra"
The function match_dataphy
combines phylogeny and data
to ensure that tips in phylogeny match data and that observations with
missing values are removed.
This function uses all variables provided in the ‘formula’ to match data and phylogeny. To avoid cropping the full dataset, ‘match_dataphy’ searches for NA values only on variables provided by formula. Missing values on other variables, not included in ‘formula’, will not be removed from data.
library(sensiPhy)
# Load data:
data(alien)
# Match data and phy based on model formula:
comp.data <- match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]], track=FALSE)
# With a `multiphylo` tree:
comp.data2 <- match_dataphy(homeRange ~ homeRange, data = alien$data, alien$phy, track=FALSE)
# Check combined data:
knitr::kable(head(comp.data$data))
The supplementary function miss.phylo.d
tests if there
is a phylogenetic signal in missing data, using the D framework
developed in Fritz & Purvis (2010). A phylogenetic signal in missing
data can bias comparative analyses, because it means that absence of
data points is not independent of the phylogeny. This function can help
alert users to this problem, suggesting that more balanced data
collection would be required.
The function calculates D statistic for missing data. Missingness is recoded into a binary variable (1 = missing, 0 = non missing).
The funcion tree_bd
estimates diversification and
speciation rates evaluating uncertainty in tree topology. It estimates
net diversification rate using geiger::bd.ms
Magallon and
Sanderson (2000) method or speciation rate using
geiger::bd.km
Kendall-Moran method for n
trees, randomly picked from a multiPhylo
file.
To estimate diversification rate with Magallon and Sanderson method
library(sensiPhy)
# Load data:
data("primates")
# Run analysis
fit.ms <- tree_bd(phy = primates.phy, n.tree = 30, method = "ms")
# Check results
summary(fit.ms)
# Plot data
sensi_plot(fit.ms)
To estimate speciation rate Kendall-Moran method
sensiPhy leverages comparative methods implemented in specific R
packages:
1. phylolm
for pgls regressions
2. phytools
for phylogenetic signal estimates
3. geiger
for macroevolutionary models (continuous and
discrete trait evolution)
Nevertheless, users can still use sensiPhy to perform sensitivity analysis when they had performed their initial comparative analyses in another package.
For instance, if you have used caper
,
phytools
or gls
to fit your full models, you
can still use sensiPhy
to test the sensitivity of those
models to multiple types of uncertainty. In order to do this, users will
need to set arguments that mirror the specific parameters from their
full analysis. For example, to compare results one needs to use the same
evolutionary model in pgls (e.g “lambda”, “BM”, etc), or the same metric
for phylogenetic signal tests (K or lambda). See below for some
examples:
### prepare dataset for caper pgls:
library(caper)
library(phytools)
library(phylolm)
library(sensiPhy)
# selected variables from dataset:
alien.data <- alien.data[c("gestaLen", "adultMass")]
# create variable with species names:
alien.data$sp <- rownames(alien.data)
# caper
# prepare comparative dataset:
comp.dat <- comparative.data(data = alien.data, phy = alien.phy[[1]],
names.col = "sp")
# check comparative dataset:
print(comp.dat)
### Run PGLS analysis (full model)
# using caper (lambda)
fit.caper.lam <- pgls(log(gestaLen)~ log(adultMass), comp.dat, lambda="ML")
coef(fit.caper.lam)
# using caper (delta)
fit.caper.del <- pgls(log(gestaLen)~ log(adultMass), comp.dat, delta="ML")
coef(fit.caper.del)
# using phylolm (lambda)
fit.phylo.lam <- phylolm(log(gestaLen)~ log(adultMass), comp.dat$data, comp.dat$phy,
model = "lambda")
coef(fit.phylo.lam)
# using phylolm (del)
fit.phylo.del <- phylolm(log(gestaLen)~ log(adultMass), comp.dat$data, comp.dat$phy,
model = "delta")
coef(fit.phylo.del)
### run sensitivity analysis with sensiPhy (influential species):
library(sensiPhy)
# run analysis (lambda):
lam <- influ_phylm(log(gestaLen) ~ log(adultMass), phy = comp.dat$phy,
data = comp.dat$data, model = "lambda")
# check full model estimates and compare to initial estimates:
lam$full.model.estimates
# test for influential species:
summary(lam)
# run analysis (delta):
del <- influ_phylm(log(gestaLen) ~ log(adultMass), phy = comp.dat$phy,
data = comp.dat$data, model = "delta")
# check full model estimates and compare to initial estimates:
del$full.model.estimates
# test for influential species:
summary(del)
# 2. phylogenetic signal:-------------------------------------------------------
library(picante)
library(phytools)
library(sensiPhy)
### Estimate phylogenetic signal:
# using picante (K)
phylosignal(log(comp.dat$data$adultMass), comp.dat$phy, reps = 1000)
# using phytools (K):
phytools::phylosig(x = log(comp.dat$data$adultMass), tree = comp.dat$phy,
nsim = 1000,
method = "K", test = T)
# using phytools (lambda):
phytools::phylosig(x = log(comp.dat$data$adultMass), tree = comp.dat$phy,
nsim = 1000,
method = "lambda", test = T)
### run sensitivity analysis with sensiPhy (influential species):
library(sensiPhy)
# Load data:
data(alien)
# Logtransform data
comp.dat$data$logMass <- log(comp.dat$data$adultMass)
# Run sensitivity analysis (K):
influ.k <- influ_physig("logMass", data = comp.dat$data, phy = comp.dat$phy,
method = "K")
# check full model estimates and compare to initial estimates:
influ.k$full.data.estimates
# check for influential species:
summary(influ.k)
# Run sensitivity analysis (lambda):
influ.lam <- influ_physig("logMass", data = comp.dat$data, phy = comp.dat$phy,
method = "lambda")
# check full model estimates and compare to initial estimates:
influ.lam$full.data.estimates
# check for influential species:
summary(influ.lam)
Although most functions implemented in sensiPhy are reasonably fast
(runtime less them 1 min), it is important to be aware that some
functions might take a long time to run. This is more likely when
analyzing models with a high number of species (>1000) or number of
repetitions, and also when interacting two types of uncertainties
(e.g. functions: tree_clade
and tree_samp
). To
keep track of the time while running functions, we recommend users to
set the argument “track = TRUE”
which prints a progress bar
and gives a good idea of how long the analysis will take before it
finishes. As an indication, some examples of elapsed time on a standard
desktop computer [Linux, Intel i7, 2.7GHz, 12GB ram] are presented
below:
Analyses with single uncertainty:
tree_phylm
function | N species | N trees | time |
---|---|---|---|
tree_phylm | 100 | 30 | ~ 2 sec |
tree_phylm | 100 | 100 | ~ 5 sec |
Analyses with single uncertainty:
clade_phylm
function | N species | N sim | time |
---|---|---|---|
clade_phylm | 100 | 100 | ~ 7 sec |
clade_phylm | 100 | 300 | ~ 31 sec |
clade_phylm | 100 | 500 | ~ 61 sec |
Analyses with single uncertainty:
samp_phylm
function | N species | N sim | time |
---|---|---|---|
samp_phylm | 100 | 100 | ~ 11 sec |
samp_phylm | 100 | 300 | ~ 25 sec |
samp_phylm | 100 | 500 | ~ 43 sec |
samp_phylm | 500 | 100 | ~ 22 sec |
samp_phylm | 1000 | 100 | ~ 35 sec |
Analyses with single uncertainty:
influ_phylm
function | N species | time |
---|---|---|
influ_phylm | 100 | ~ 4 sec |
influ_phylm | 500 | ~ 43 sec |
influ_phylm | 1000 | ~ 148 sec |
Thanks for you interest in contributing with sensiPhy
.
First you need to register at Github.
You can help developing sensiPhy
in different ways:
If you have any problem running sensiPhy
functions,
please report the problem as a new issue. To
do this, follow the steps below:
sensiPhy
documentationIf you find a typo on sensiPhy
documentation or if you
want to make a small change on any file, you can do it online. Search
for the file you want to change at the master branch,
then:
send a pull request
If you want to make a bigger contribution and help us with
sensiPhy
code:
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)If you need to ask something via email, send to [email protected]
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.