BD shifts


Making BD shift plots

Note: if you haven’t checked out the beta diversity ordinations vignette yet, I recommend looking at that one first.

Introduction

While the beta diversity ordinations of gradient fractions provides a nice overview of isotope incorporation at the whole-community level, it doesn’t provide a good idea of the magnitude of this BD shift (ie., was there a lot of isotope incorporation or a little for each labeled-treatment?).

Let’s assume that the gradient fraction communities of a labeled-treatment and its corresponding unlabled-control would be (approximately) the same at the same buoyant densities if no incorperation occured. If so, then the pairwise beta diversity between gradient fractions of the treatment vs control would (e.g., the beta diversity between the 13C & 12C communities at a BD of 1.75 g/ml^1) would be ~0 (no differentiation) across the BD range. However, if some taxa incorporated isotope in the labeled-treatment, then they would shift to heavier buoyant densities, which would change the labeled-communities at the buoyant densities where the taxa used to be if unlabeled and the buoyant densities where the taxa have shifted to due to isotope incorporation.

In other words, if we make pairwise treatment-vs-control beta diversity calculations between gradient fraction communities, then we should see evidence of community-level BD shifts in the form of ‘spikes’ in beta diversity.

The only major issue with this approach is that the BD range of each gradient fraction varies from gradient to gradient. So, gradient fractions between gradients usually only partially overlap. To deal with this issue, we have taken the approach of weighting the beta diversity based on gradient fraction overlap. For instance, if 2 labeled-treatment fractions overlapped 1 control fraction by 40% and 60%, then the final beta diversity value would be the weighted average of treatment fraction 1 (40% weight) and treatment fraction 2 (60% weight). Note that this makes all beta diversity values (and their associated buoyant densities) relative to the control.

The following analysis measures these community-wide BD shifts with the following:

  1. Splitting the dataset into pairwise comparisons between each labeled-treatment and its corresponding unlabeled control.
  2. The percent BD overlap of treatment gradient fractions relative to the control are calculated.
  3. For overlapping gradient fractions in each treatment-control comparison, beta diversity is calculated between the gradient fraction communities.
  4. The weighted mean beta diversity (weighted by % fraction overlap) is calculated.
  5. The resulting data.frame can then easily plotted with ggplot.

Moreover, a permutation test is conducted to identify “BD shift windows”, which are regions of high beta-diversity that likley resulted from BD shifts of taxa in the treatment (and not in the unlabeled control). The method involves permuting OTU abundances (HTSSIP offers multiple permutation methods; see BD_shift()), an re-calculating weighted beta-diversity values among overlapping fractions in the treatment versus the control.

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(tidyr)
library(ggplot2)
library(HTSSIP)

Also let’s get an overview of the phyloseq object that we’re going to use.

physeq_S2D2
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 139 samples ]
## sample_data() Sample Data:       [ 139 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Subsetting the phyloseq object

As with the beta diversity ordinations, we are going to split up the dataset into individual labeled-treamtent + corresponding unlabeled-control comparisons. Treatment-control correspondence is based on the day from substrate addition. So, we have to parse the dataset by Substrate & Day.

params = get_treatment_params(physeq_S2D2, c('Substrate', 'Day'))
params = dplyr::filter(params, Substrate!='12C-Con')
ex = "(Substrate=='12C-Con' & Day=='${Day}') | (Substrate=='${Substrate}' & Day == '${Day}')"
physeq_S2D2_l = phyloseq_subset(physeq_S2D2, params, ex)
## Warning: `mutate_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `mutate()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
physeq_S2D2_l
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Calculating BD shift

Now, let’s just measure BD shift for just 1 subset (1 item in the list of phyloseq objects).

Note: we are just going to use 10 permutations to speed up the analysis.

wmean1 = BD_shift(physeq_S2D2_l[[2]], nperm=10)
## Warning: `group_by_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `group_by()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `arrange_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `arrange()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `rename_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `rename()` instead.
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `select()` instead.
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `filter_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `filter()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `distinct_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `distinct()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `summarise_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `summarise()` instead.
## ℹ The deprecated feature was likely used in the HTSSIP package.
##   Please report the issue to the authors.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
cat('Subset:', names(physeq_S2D2_l)[2], '\n')
## Subset: (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')
wmean1 %>% head(n=3)
##   perm_id           sample.x  distance Replicate.x IS__CONTROL.x BD_min.x
## 1       0 12C-Con.D14.R1_F12 0.2587625           1          TRUE 1.736237
## 2       0 12C-Con.D14.R1_F13 0.2553421           1          TRUE 1.730773
## 3       0 12C-Con.D14.R1_F23 0.1449341           1          TRUE 1.692527
##   BD_max.x BD_range.x perc_overlap n_over_fracs wmean_dist
## 1 1.739516   0.003279    100.00000            1  0.2587625
## 2 1.736237   0.005464     19.98536            2  0.2658108
## 3 1.695805   0.003278     33.34350            2  0.1020456
##   wmean_dist_CI_low_global wmean_dist_CI_high_global wmean_dist_CI_low
## 1               0.07377982                 0.3344957        0.06284920
## 2               0.07377982                 0.3344957        0.06783138
## 3               0.07377982                 0.3344957        0.15831671
##   wmean_dist_CI_high
## 1          0.1054504
## 2          0.1028955
## 3          0.2076602

Note that the sample.x column is all 12C-Con control samples, while the comparison column (sample.y) is the treatment gradient fraction samples. The “wmean_dist_CI_[low/high]” columns list the CI intervals (calculated by the permutation test). The “wmean_dist_CI_*global” columns define the CI interval for all gradient fractions.

OK. Let’s plot the results!

x_lab = bquote('Buoyant density (g '* ml^-1*')')
y_lab = 'Weighted mean of\nweighted-Unifrac distances'
ggplot(wmean1, aes(BD_min.x, wmean_dist)) +
  geom_line(alpha=0.7) +
  geom_point() +
  labs(x=x_lab, y=y_lab, title='Beta diversity of 13C-treatment relative to 12C-Con') +
  theme_bw() 

Each point represents the weighted mean of beta diversity values between all 13C-treatment fractions that overlap a particular 12C-control fraction, so there should be 1 point per 12C-control gradient fraction.

Note the 2 spikes in beta diversity. The 2nd spike is larger than the first, which is likely due to more taxa at the ‘light’ gradient fractions (1st spike), so a loss of a few taxa (due to BD shifting) impacts beta diveristy less than at ‘heavy’ gradient fractions, where there’s less taxa.

Identifying BD shift windows

Let’s identify the BD shift windows. “BD shift” fractions are those greater than the bootstrap CI. To reduce potential noice, I’m going to define BD shift windows as 3 consecutive “BD shift” fractions.

wmean1_m = wmean1 %>%
  mutate(BD_shift = wmean_dist > wmean_dist_CI_high) %>%
  arrange(BD_min.x) %>%
  mutate(window = (BD_shift == TRUE & lag(BD_shift) == TRUE & lag(BD_shift, 2) == TRUE) |
                  (BD_shift == TRUE & lag(BD_shift) == TRUE & lead(BD_shift) == TRUE) |
                  (BD_shift == TRUE & lead(BD_shift) == TRUE & lead(BD_shift, 2) == TRUE),
         BD_shift = BD_shift == TRUE & window == TRUE,
         BD_shift = ifelse(is.na(BD_shift), FALSE, BD_shift))

wmean1_m %>% head(n=3)
##   perm_id           sample.x   distance Replicate.x IS__CONTROL.x BD_min.x
## 1       0 12C-Con.D14.R1_F27 0.06874209           1          TRUE 1.677228
## 2       0 12C-Con.D14.R1_F26 0.09574961           1          TRUE 1.681599
## 3       0 12C-Con.D14.R1_F25 0.15970518           1          TRUE 1.684878
##   BD_max.x BD_range.x perc_overlap n_over_fracs wmean_dist
## 1 1.681599   0.004371     24.98284            2 0.06516101
## 2 1.684878   0.003279     33.33333            2 0.07170448
## 3 1.688156   0.003278     33.34350            2 0.09908418
##   wmean_dist_CI_low_global wmean_dist_CI_high_global wmean_dist_CI_low
## 1               0.07377982                 0.3344957         0.2894824
## 2               0.07377982                 0.3344957         0.3259172
## 3               0.07377982                 0.3344957         0.3097431
##   wmean_dist_CI_high BD_shift window
## 1          0.3329659    FALSE  FALSE
## 2          0.3672786    FALSE  FALSE
## 3          0.3526238    FALSE  FALSE
x_lab = bquote('Buoyant density (g '* ml^-1*')')
y_lab = 'Weighted mean of\nweighted-Unifrac distances'
ggplot(wmean1_m, aes(BD_min.x, wmean_dist)) +
  geom_line(alpha=0.7) +
  geom_linerange(aes(ymin=wmean_dist_CI_low,
                     ymax=wmean_dist_CI_high),
                 alpha=0.3) +
  geom_point(aes(color=BD_shift)) +
  scale_color_discrete('Gradient\nfraction\nin BD shift\nwindow?') +
  labs(x=x_lab, y=y_lab, title='Beta diversity of 13C-treatment relative to 12C-Con') +
  theme_bw() 

The line ranges represent the bootstrap CIs. This permutation test helps to non-subjectively identify BD shift windows, where beta-diversity is higher than expected under the null model.

Note: more permutations should be used for real analyses.

Calculating BD shift for all treatments

Now let’s run BD_shift() on all phyloseq objects in our list. We’ll use plyr::ldply() for this because it preserves the list names in the resulting data.frame (list names are assigned to .id by default).

wmean = plyr::ldply(physeq_S2D2_l, BD_shift, nperm=5)
wmean %>% head(n=3)
##                                                                       .id
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##   perm_id          sample.x   distance Replicate.x IS__CONTROL.x BD_min.x
## 1       0 12C-Con.D3.R3_F19 0.07506826           1          TRUE 1.705640
## 2       0 12C-Con.D3.R3_F13 0.11524238           1          TRUE 1.728588
## 3       0 12C-Con.D3.R3_F14 0.12153372           1          TRUE 1.724217
##   BD_max.x BD_range.x perc_overlap n_over_fracs wmean_dist
## 1 1.710011   0.004371     12.49142            2  0.1105012
## 2 1.732959   0.004371    100.00000            1  0.1152424
## 3 1.728588   0.004371     74.99428            2  0.1153831
##   wmean_dist_CI_low_global wmean_dist_CI_high_global wmean_dist_CI_low
## 1               0.06935558                 0.2789793        0.20042383
## 2               0.06935558                 0.2789793        0.05877514
## 3               0.06935558                 0.2789793        0.06521849
##   wmean_dist_CI_high
## 1          0.2742029
## 2          0.1222336
## 3          0.1305546

Alright, let’s plot the data!

# formatting the treatment names to look a bit better as facet labels
wmean = wmean %>%
  mutate(Substrate = gsub('.+(13C-[A-z]+).+', '\\1', .id),
         Day = gsub('.+Day ==[ \']*([0-9]+).+', 'Day \\1', .id),
         Day = Day %>% reorder(gsub('Day ', '', Day) %>% as.numeric))

# calculating BD shift windows
wmean = wmean %>%
  mutate(BD_shift = wmean_dist > wmean_dist_CI_high) %>%
  arrange(Substrate, BD_min.x) %>%
  group_by(Substrate) %>%
  mutate(window = (BD_shift == TRUE & lag(BD_shift) == TRUE & lag(BD_shift, 2) == TRUE) |
                  (BD_shift == TRUE & lag(BD_shift) == TRUE & lead(BD_shift) == TRUE) |
                  (BD_shift == TRUE & lead(BD_shift) == TRUE & lead(BD_shift, 2) == TRUE),
         BD_shift = BD_shift == TRUE & window == TRUE,
         BD_shift = ifelse(is.na(BD_shift), FALSE, BD_shift)) %>%
  ungroup()

# plotting, with facetting by 13C-treatment
ggplot(wmean, aes(BD_min.x, wmean_dist)) +
  geom_line(alpha=0.7) +
  geom_linerange(aes(ymin=wmean_dist_CI_low,
                     ymax=wmean_dist_CI_high),
                 alpha=0.3) +
  geom_point(aes(color=BD_shift)) +
  labs(x=x_lab, y=y_lab, 
       title='Beta diversity of 13C-treatments relative to 12C-Con') +
  facet_grid(Day ~ Substrate) +
  theme_bw() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

As you can see, the ‘heavy’ beta diversity spike is stronger for 13C-Glucose at Day 3 versus 13C-Cellulose, but this pattern reverses at Day 14 of the substrate incubation. These results are to be expected, given that glucose is more labile than cellulose.

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] phyloseq_1.49.0 HTSSIP_1.4.1    ggplot2_3.5.1   tidyr_1.3.1    
## [5] dplyr_1.1.4     rmarkdown_2.28 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5            xfun_0.48               bslib_0.8.0            
##  [4] rhdf5_2.49.0            Biobase_2.65.1          lattice_0.22-6         
##  [7] rhdf5filters_1.17.0     vctrs_0.6.5             tools_4.4.1            
## [10] generics_0.1.3          biomformat_1.33.0       stats4_4.4.1           
## [13] parallel_4.4.1          tibble_3.2.1            fansi_1.0.6            
## [16] highr_0.11              cluster_2.1.6           pkgconfig_2.0.3        
## [19] Matrix_1.7-0            data.table_1.16.2       S4Vectors_0.43.2       
## [22] lifecycle_1.0.4         GenomeInfoDbData_1.2.13 farver_2.1.2           
## [25] compiler_4.4.1          stringr_1.5.1           Biostrings_2.73.2      
## [28] munsell_0.5.1           codetools_0.2-20        permute_0.9-7          
## [31] GenomeInfoDb_1.41.2     htmltools_0.5.8.1       sys_3.4.3              
## [34] buildtools_1.0.0        sass_0.4.9              lazyeval_0.2.2         
## [37] yaml_2.3.10             pillar_1.9.0            crayon_1.5.3           
## [40] jquerylib_0.1.4         MASS_7.3-61             cachem_1.1.0           
## [43] vegan_2.6-8             iterators_1.0.14        foreach_1.5.2          
## [46] nlme_3.1-166            tidyselect_1.2.1        digest_0.6.37          
## [49] stringi_1.8.4           reshape2_1.4.4          purrr_1.0.2            
## [52] labeling_0.4.3          splines_4.4.1           maketools_1.3.1        
## [55] ade4_1.7-22             fastmap_1.2.0           grid_4.4.1             
## [58] colorspace_2.1-1        cli_3.6.3               magrittr_2.0.3         
## [61] survival_3.7-0          utf8_1.2.4              ape_5.8                
## [64] withr_3.0.1             scales_1.3.0            UCSC.utils_1.1.0       
## [67] XVector_0.45.0          httr_1.4.7              multtest_2.61.0        
## [70] igraph_2.0.3            evaluate_1.0.1          knitr_1.48             
## [73] IRanges_2.39.2          mgcv_1.9-1              rlang_1.1.4            
## [76] Rcpp_1.0.13             glue_1.8.0              BiocGenerics_0.51.3    
## [79] jsonlite_1.8.9          Rhdf5lib_1.27.0         R6_2.5.1               
## [82] plyr_1.8.9              zlibbioc_1.51.1