Beta diversity ordinations


Beta diversity ordinations

Dataset

First, let’s load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

Also let’s get an overview of the phyloseq object that we’re going to use.

physeq_S2D2
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 139 samples ]
## sample_data() Sample Data:       [ 139 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Parsing the dataset

See HTSSIP introduction vignette for a description on why dataset parsing is needed.

Let’s the parameters for parsing the dataset into individual treatment-control comparisons.

params = get_treatment_params(physeq_S2D2, c('Substrate', 'Day'))
params
##                    Substrate Day
## 12C-Con.D14.R1_F17   12C-Con  14
## 13C-Cel.D3.R1_F21    13C-Cel   3
## 13C-Cel.D14.R1_F12   13C-Cel  14
## 13C-Glu.D14.R1_F19   13C-Glu  14
## 12C-Con.D3.R3_F11    12C-Con   3
## 13C-Glu.D3.R1_F27    13C-Glu   3

We just need the parameters for each treatment, so let’s filter out the controls. In this case, the controls are all ‘12C-Con’.

params = dplyr::filter(params, Substrate!='12C-Con')
params
##                    Substrate Day
## 13C-Cel.D3.R1_F21    13C-Cel   3
## 13C-Cel.D14.R1_F12   13C-Cel  14
## 13C-Glu.D14.R1_F19   13C-Glu  14
## 13C-Glu.D3.R1_F27    13C-Glu   3

Now, we will use an expression that will subset the phyloseq object into the comparisons that we want to make.

ex is the expression that will be used for pruning the phyloseq object

ex = "(Substrate=='12C-Con' & Day=='${Day}') | (Substrate=='${Substrate}' & Day == '${Day}')"
physeq_S2D2_l = phyloseq_subset(physeq_S2D2, params, ex)
physeq_S2D2_l
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Calculating ordinations

Now, let’s actually make ordinations of each treatment compared to the control. This will return a data.frame object for plotting.

# running in parallel
doParallel::registerDoParallel(2)
physeq_S2D2_l_df = SIP_betaDiv_ord(physeq_S2D2_l, parallel=TRUE)
## Run 0 stress 0.05999876 
## Run 1 stress 0.05999868 
## ... New best solution
## ... Procrustes: rmse 3.375356e-05  max resid 9.618937e-05 
## ... Similar to previous best
## Run 2 stress 0.05999857 
## ... New best solution
## ... Procrustes: rmse 0.0002732117  max resid 0.000775849 
## ... Similar to previous best
## Run 3 stress 0.05999877 
## ... Procrustes: rmse 8.500827e-05  max resid 0.0002406656 
## ... Similar to previous best
## Run 4 stress 0.05999872 
## ... Procrustes: rmse 0.0002915895  max resid 0.0008291217 
## ... Similar to previous best
## Run 5 stress 0.05999849 
## ... New best solution
## ... Procrustes: rmse 0.0001366149  max resid 0.0003880624 
## ... Similar to previous best
## Run 6 stress 0.05999881 
## ... Procrustes: rmse 0.0002198795  max resid 0.0006218413 
## ... Similar to previous best
## Run 7 stress 0.05999863 
## ... Procrustes: rmse 0.0001696464  max resid 0.0004818257 
## ... Similar to previous best
## Run 8 stress 0.0599986 
## ... Procrustes: rmse 0.0001507035  max resid 0.0004282821 
## ... Similar to previous best
## Run 9 stress 0.05999851 
## ... Procrustes: rmse 8.524178e-05  max resid 0.0002424407 
## ... Similar to previous best
## Run 10 stress 0.05999882 
## ... Procrustes: rmse 0.0001811825  max resid 0.0005133753 
## ... Similar to previous best
## Run 11 stress 0.05999878 
## ... Procrustes: rmse 0.0002258468  max resid 0.0006408356 
## ... Similar to previous best
## Run 12 stress 0.05999855 
## ... Procrustes: rmse 0.0001225478  max resid 0.0003482748 
## ... Similar to previous best
## Run 13 stress 0.0599988 
## ... Procrustes: rmse 0.0002325677  max resid 0.0006597193 
## ... Similar to previous best
## Run 14 stress 0.05999855 
## ... Procrustes: rmse 0.0001174833  max resid 0.0003373234 
## ... Similar to previous best
## Run 15 stress 0.05999879 
## ... Procrustes: rmse 0.0002314273  max resid 0.0006570317 
## ... Similar to previous best
## Run 16 stress 0.05999888 
## ... Procrustes: rmse 0.0002080716  max resid 0.0005929058 
## ... Similar to previous best
## Run 17 stress 0.05999872 
## ... Procrustes: rmse 0.0002079436  max resid 0.0005897428 
## ... Similar to previous best
## Run 18 stress 0.05999869 
## ... Procrustes: rmse 0.0001826661  max resid 0.000517696 
## ... Similar to previous best
## Run 19 stress 0.2366197 
## Run 20 stress 0.05999857 
## ... Procrustes: rmse 0.000132549  max resid 0.0003781502 
## ... Similar to previous best
## *** Best solution repeated 15 times
## Run 0 stress 0.0719636 
## Run 1 stress 0.1319373 
## Run 2 stress 0.1296168 
## Run 3 stress 0.1010428 
## Run 4 stress 0.07206107 
## ... Procrustes: rmse 0.005102571  max resid 0.02294879 
## Run 5 stress 0.07196353 
## ... New best solution
## ... Procrustes: rmse 4.210701e-05  max resid 0.0001059697 
## ... Similar to previous best
## Run 6 stress 0.1283983 
## Run 7 stress 0.1329551 
## Run 8 stress 0.1374125 
## Run 9 stress 0.1318293 
## Run 10 stress 0.1361472 
## Run 11 stress 0.1199624 
## Run 12 stress 0.1306889 
## Run 13 stress 0.07393819 
## Run 14 stress 0.1172224 
## Run 15 stress 0.07196353 
## ... New best solution
## ... Procrustes: rmse 1.499708e-05  max resid 7.828606e-05 
## ... Similar to previous best
## Run 16 stress 0.07196357 
## ... Procrustes: rmse 3.231024e-05  max resid 8.906342e-05 
## ... Similar to previous best
## Run 17 stress 0.07457532 
## Run 18 stress 0.07404651 
## Run 19 stress 0.1186496 
## Run 20 stress 0.07196364 
## ... Procrustes: rmse 0.0001605849  max resid 0.000469045 
## ... Similar to previous best
## *** Best solution repeated 3 times
## Run 0 stress 0.07255501 
## Run 1 stress 0.07255503 
## ... Procrustes: rmse 3.528305e-05  max resid 0.0001201565 
## ... Similar to previous best
## Run 2 stress 0.07255501 
## ... Procrustes: rmse 8.818126e-06  max resid 3.144746e-05 
## ... Similar to previous best
## Run 3 stress 0.07255501 
## ... Procrustes: rmse 8.016805e-06  max resid 2.850354e-05 
## ... Similar to previous best
## Run 4 stress 0.07255501 
## ... Procrustes: rmse 1.393761e-05  max resid 5.245768e-05 
## ... Similar to previous best
## Run 5 stress 0.07255501 
## ... Procrustes: rmse 6.882707e-06  max resid 2.856283e-05 
## ... Similar to previous best
## Run 6 stress 0.07255501 
## ... Procrustes: rmse 1.433811e-05  max resid 4.715108e-05 
## ... Similar to previous best
## Run 7 stress 0.07255501 
## ... Procrustes: rmse 2.148759e-06  max resid 1.131891e-05 
## ... Similar to previous best
## Run 8 stress 0.07255502 
## ... Procrustes: rmse 2.617178e-05  max resid 9.777244e-05 
## ... Similar to previous best
## Run 9 stress 0.07255501 
## ... Procrustes: rmse 3.286859e-06  max resid 1.474209e-05 
## ... Similar to previous best
## Run 10 stress 0.07255502 
## ... Procrustes: rmse 2.257142e-05  max resid 7.488549e-05 
## ... Similar to previous best
## Run 11 stress 0.07255501 
## ... Procrustes: rmse 4.89976e-06  max resid 1.788823e-05 
## ... Similar to previous best
## Run 12 stress 0.07255502 
## ... Procrustes: rmse 1.722529e-05  max resid 5.634782e-05 
## ... Similar to previous best
## Run 13 stress 0.07255501 
## ... Procrustes: rmse 3.915431e-06  max resid 1.688773e-05 
## ... Similar to previous best
## Run 14 stress 0.07255505 
## ... Procrustes: rmse 4.298506e-05  max resid 0.000145347 
## ... Similar to previous best
## Run 15 stress 0.07255501 
## ... Procrustes: rmse 1.006472e-05  max resid 3.997473e-05 
## ... Similar to previous best
## Run 16 stress 0.07255502 
## ... Procrustes: rmse 1.110403e-05  max resid 4.277427e-05 
## ... Similar to previous best
## Run 17 stress 0.07255501 
## ... Procrustes: rmse 6.615042e-06  max resid 2.476704e-05 
## ... Similar to previous best
## Run 18 stress 0.07255501 
## ... Procrustes: rmse 1.313579e-05  max resid 5.134526e-05 
## ... Similar to previous best
## Run 19 stress 0.07255502 
## ... Procrustes: rmse 2.212686e-05  max resid 7.671359e-05 
## ... Similar to previous best
## Run 20 stress 0.07255503 
## ... Procrustes: rmse 2.800639e-05  max resid 0.0001054077 
## ... Similar to previous best
## *** Best solution repeated 20 times
## Run 0 stress 0.1080812 
## Run 1 stress 0.10136 
## ... New best solution
## ... Procrustes: rmse 0.03470367  max resid 0.1337577 
## Run 2 stress 0.108081 
## Run 3 stress 0.10136 
## ... New best solution
## ... Procrustes: rmse 0.0004028729  max resid 0.001479001 
## ... Similar to previous best
## Run 4 stress 0.108081 
## Run 5 stress 0.1080812 
## Run 6 stress 0.1013599 
## ... New best solution
## ... Procrustes: rmse 2.982353e-05  max resid 0.0001104086 
## ... Similar to previous best
## Run 7 stress 0.1013598 
## ... New best solution
## ... Procrustes: rmse 0.0002389074  max resid 0.0008782957 
## ... Similar to previous best
## Run 8 stress 0.1080814 
## Run 9 stress 0.1013603 
## ... Procrustes: rmse 0.0002179867  max resid 0.0008009432 
## ... Similar to previous best
## Run 10 stress 0.1080813 
## Run 11 stress 0.1080813 
## Run 12 stress 0.1080813 
## Run 13 stress 0.1013598 
## ... Procrustes: rmse 1.866392e-05  max resid 6.641849e-05 
## ... Similar to previous best
## Run 14 stress 0.1080814 
## Run 15 stress 0.1080816 
## Run 16 stress 0.1080811 
## Run 17 stress 0.1013601 
## ... Procrustes: rmse 0.0001742879  max resid 0.0006388473 
## ... Similar to previous best
## Run 18 stress 0.1080813 
## Run 19 stress 0.1450322 
## Run 20 stress 0.1013601 
## ... Procrustes: rmse 0.0001467166  max resid 0.0005381576 
## ... Similar to previous best
## *** Best solution repeated 5 times
physeq_S2D2_l_df %>% head(n=3)
##                                                           phyloseq_subset
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##         NMDS1       NMDS2          X.Sample primer_number fwd_barcode
## 1 -0.15454387 -0.03536686 13C-Cel.D3.R1_F21             7    GGATATCT
## 2 -0.20505811 -0.10166080 13C-Cel.D3.R1_F20             6    CGTGAGTG
## 3 -0.02323964 -0.00216143 13C-Cel.D3.R1_F15             1    ATCGTACG
##   rev_barcode Substrate Day Microcosm_replicate Fraction Buoyant_density
## 1    CGAGAGTT   13C-Cel   3                   1       21        1.705640
## 2    CGAGAGTT   13C-Cel   3                   1       20        1.706186
## 3    CGAGAGTT   13C-Cel   3                   1       15        1.725310
##   Sample_type
## 1     unknown
## 2     unknown
## 3     unknown
##                                                                                   library
## 1 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 2 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
## 3 /var/seq_data/fullCyc/MiSeq_16SrRNA/515f-806r/lib1-7rs/../150721_lib4/run1/metadata.txt
##   Exp_type Sample_location Sample_date Sample_treatment Sample_subtreatment
## 1      SIP              NA          NA               NA                  NA
## 2      SIP              NA          NA               NA                  NA
## 3      SIP              NA          NA               NA                  NA
##   core_dataset
## 1         TRUE
## 2         TRUE
## 3         TRUE

Each specific phyloseq subset (treatment-control comparison) is delimited with the “phyloseq_subset” column.

physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')  
## [2] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')
## [3] (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')
## [4] (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')  
## 4 Levels: (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3') ...

For clarity, I’m going edit these long strings to make them more readable.

physeq_S2D2_l_df = physeq_S2D2_l_df %>%
  dplyr::mutate(phyloseq_subset = gsub(' \\| ', '\n', phyloseq_subset),
                phyloseq_subset = gsub('\'3\'', '\'03\'', phyloseq_subset))
physeq_S2D2_l_df %>% .$phyloseq_subset %>% unique
## [1] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Cel' & Day == '03')"
## [2] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='03')\n(Substrate=='13C-Glu' & Day == '03')"

OK, let’s plot the data!

phyloseq_ord_plot(physeq_S2D2_l_df)

As you can see, the ‘heavy’ gradient fraction ‘communities’ for the labeled-treatments tend to diverge from the unlabeled gradient fraction communities, but the amount of divergence in dependent on substrate and time point.

Session info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] phyloseq_1.51.0 HTSSIP_1.4.1    ggplot2_3.5.1   tidyr_1.3.1    
## [5] dplyr_1.1.4     rmarkdown_2.29 
## 
## loaded via a namespace (and not attached):
##  [1] ade4_1.7-22                 tidyselect_1.2.1           
##  [3] farver_2.1.2                Biostrings_2.75.1          
##  [5] fastmap_1.2.0               lazyeval_0.2.2             
##  [7] digest_0.6.37               lifecycle_1.0.4            
##  [9] cluster_2.1.6               survival_3.7-0             
## [11] magrittr_2.0.3              compiler_4.4.2             
## [13] rlang_1.1.4                 sass_0.4.9                 
## [15] tools_4.4.2                 igraph_2.1.1               
## [17] utf8_1.2.4                  yaml_2.3.10                
## [19] data.table_1.16.2           knitr_1.49                 
## [21] S4Arrays_1.7.1              labeling_0.4.3             
## [23] DelayedArray_0.33.2         plyr_1.8.9                 
## [25] abind_1.4-8                 BiocParallel_1.41.0        
## [27] withr_3.0.2                 purrr_1.0.2                
## [29] BiocGenerics_0.53.3         sys_3.4.3                  
## [31] grid_4.4.2                  stats4_4.4.2               
## [33] fansi_1.0.6                 multtest_2.63.0            
## [35] biomformat_1.35.0           colorspace_2.1-1           
## [37] Rhdf5lib_1.29.0             scales_1.3.0               
## [39] iterators_1.0.14            MASS_7.3-61                
## [41] SummarizedExperiment_1.37.0 cli_3.6.3                  
## [43] vegan_2.6-8                 crayon_1.5.3               
## [45] generics_0.1.3              httr_1.4.7                 
## [47] reshape2_1.4.4              ape_5.8                    
## [49] cachem_1.1.0                rhdf5_2.51.0               
## [51] stringr_1.5.1               zlibbioc_1.52.0            
## [53] splines_4.4.2               parallel_4.4.2             
## [55] XVector_0.47.0              matrixStats_1.4.1          
## [57] vctrs_0.6.5                 Matrix_1.7-1               
## [59] jsonlite_1.8.9              IRanges_2.41.1             
## [61] S4Vectors_0.45.2            coenocliner_0.2-3          
## [63] maketools_1.3.1             locfit_1.5-9.10            
## [65] foreach_1.5.2               jquerylib_0.1.4            
## [67] glue_1.8.0                  codetools_0.2-20           
## [69] stringi_1.8.4               gtable_0.3.6               
## [71] GenomeInfoDb_1.43.0         GenomicRanges_1.59.0       
## [73] UCSC.utils_1.3.0            munsell_0.5.1              
## [75] tibble_3.2.1                pillar_1.9.0               
## [77] htmltools_0.5.8.1           rhdf5filters_1.19.0        
## [79] GenomeInfoDbData_1.2.13     R6_2.5.1                   
## [81] doParallel_1.0.17           evaluate_1.0.1             
## [83] lattice_0.22-6              Biobase_2.67.0             
## [85] bslib_0.8.0                 Rcpp_1.0.13-1              
## [87] SparseArray_1.7.2           nlme_3.1-166               
## [89] permute_0.9-7               mgcv_1.9-1                 
## [91] DESeq2_1.47.1               xfun_0.49                  
## [93] MatrixGenerics_1.19.0       buildtools_1.0.0           
## [95] pkgconfig_2.0.3