Quantitative stable isotope probing (q-SIP) attempts to deal with the compositional problems of HTS-SIP data by transforming relative abundances with qPCR counts of total gene copies.
More details can be found at:
Hungate, Bruce A., Rebecca L. Mau, Egbert Schwartz, J. Gregory Caporaso, Paul Dijkstra, Natasja van Gestel, Benjamin J. Koch, et al. 2015. “Quantitative Microbial Ecology Through Stable Isotope Probing.” Applied and Environmental Microbiology, August, AEM.02280–15.
HTSSIP
can both simulate q-SIP datasets and run the
q-SIP analysis (with parallel computation of bootstrap confidence
intervals).
First, let’s load some packages including HTSSIP
.
OK. We’re going to be using 2 data files:
We’ll be using the dataset that we simulated in the HTSSIP_sim vignette.
The phyloseq object is similar to the dataset in the other vignettes.
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 6 taxa and 144 samples ]
## sample_data() Sample Data: [ 144 samples by 5 sample variables ]
The associated qPCR data is, in this case a list of length = 2.
## [1] "raw" "summary"
Really, only the summary
table is needed for the next
step. You could just use the summary
data.frame object and
that would be fine. However, you should have just
1 qPCR value per sample. So, if you have ‘technical’
qPCR replicates (eg., triplicate qPCR reactions per DNA sample), then
you need to summarize those replicates to get one value (eg., the mean
of all replicate values).
## Buoyant_density IS_CONTROL qPCR_tech_rep1 qPCR_tech_rep2 qPCR_tech_rep3
## 1 1.668185 TRUE 374460160 326925692 191209173
## 2 1.680254 TRUE 418974851 846274648 1057133579
## 3 1.679431 TRUE 456070748 431437845 307250348
## Sample Gradient Fraction Treatment Replicate
## 1 12C-Con_rep1_1.668185_1 12C-Con_rep1 1 12C-Con 1
## 2 12C-Con_rep1_1.680254_2 12C-Con_rep1 2 12C-Con 1
## 3 12C-Con_rep1_1.679431_3 12C-Con_rep1 3 12C-Con 1
## IS_CONTROL Sample Buoyant_density qPCR_tech_rep_mean
## 1 FALSE 13C-Glu_rep1_1.671712_2 1.671712 46598172
## 2 FALSE 13C-Glu_rep1_1.671722_1 1.671722 42299449
## 3 FALSE 13C-Glu_rep1_1.680311_3 1.680311 53536519
## qPCR_tech_rep_sd Gradient Fraction Treatment Replicate
## 1 9055833 13C-Glu_rep1 2 13C-Glu 1
## 2 16369298 13C-Glu_rep1 1 13C-Glu 1
## 3 18265667 13C-Glu_rep1 3 13C-Glu 1
First, we’ll transform the OTU counts. The following function will do the following:
sample_idx
parameter determines how the two
datasets are matched to each other.## OTU Table: [5 taxa and 5 samples]
## taxa are rows
## 12C-Con_rep1_1.668185_1 12C-Con_rep1_1.680254_2 12C-Con_rep1_1.679431_3
## OTU.1 57 925 5809
## OTU.2 0 1 73
## OTU.3 19 492 5078
## OTU.4 0 0 0
## OTU.5 0 0 3
## 12C-Con_rep1_1.682305_4 12C-Con_rep1_1.689408_5
## OTU.1 14645 15089
## OTU.2 1077 7499
## OTU.3 21603 36426
## OTU.4 0 11
## OTU.5 43 1017
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR)
phyloseq::otu_table(physeq_rep3_t) %>% .[1:5, 1:5]
## OTU Table: [5 taxa and 5 samples]
## taxa are rows
## 12C-Con_rep1_1.668185_1 12C-Con_rep1_1.680254_2 12C-Con_rep1_1.679431_3
## OTU.1 223148756 504984567 207913325
## OTU.2 0 545929 2612786
## OTU.3 74382919 268597197 181749675
## OTU.4 0 0 0
## OTU.5 0 0 107375
## 12C-Con_rep1_1.682305_4 12C-Con_rep1_1.689408_5
## OTU.1 252808193 327552736
## OTU.2 18591630 162788652
## OTU.3 372920136 790737356
## OTU.4 0 238789
## OTU.5 742284 22077085
With the transformed OTU abundance counts, let’s calculate the BD
shift (Z) and atom fraction excess (A) for each OTU. We’ll be comparing
controls (designated by the control_expr
option) and the
labeled treatment (all samples not identified by the
control_expr
expression). In this case the
control_expr
is set to Treatment=="12C-Con"
,
which will select all samples where the Treatment
column in
the phyloseq sample_data
table equals
12C-Con
.
The treatment_rep
option designates which column in the
sample_data
table defines the replicate gradients (eg.,
12C-Con_rep1
, 12C-Con_rep2
, etc).
# The 'Treatment' column designates treatment vs control
# The 'Replicate' column designates treatment replicates
data.frame(sample_data(physeq_rep3)) %>%
dplyr::select(Treatment, Replicate) %>%
distinct
## Treatment Replicate
## 12C-Con_rep1_1.668185_1 12C-Con 1
## 12C-Con_rep2_1.669910_1 12C-Con 2
## 12C-Con_rep3_1.672233_1 12C-Con 3
## 13C-Glu_rep1_1.671722_1 13C-Glu 1
## 13C-Glu_rep2_1.670112_1 13C-Glu 2
## 13C-Glu_rep3_1.671153_1 13C-Glu 3
atomX = qSIP_atom_excess(physeq_rep3_t,
control_expr='Treatment=="12C-Con"',
treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"
The output is a list of data.frames. The W
table lists
the weighted mean BD for each OTU in each gradient. This can be
considered a sort-of ‘raw data’ table. The real meat of the output is
the A
table, which lists (among other things) the BD shift
(Z) and atom fraction excess (A) as defined by Hungate et al.,
(2015).
Note: a more complicated expression can be used to designate control samples. For example: ‘Treatment == “12C-Con” & Day == 1’
OK. The last step is to calculate atom % excess confidence intervals (CI). This is necessary for getting some sort of indication on how accurate the atom fraction excess estimations are. Also, Hungate et al., (2015) uses the CIs to call incorporators, in which an OTU was considered an incorporator if the CI was greater than, and did not span, zero.
We’ll use 20 bootstrap replicates for this tutorial, but I recommend 100 for a real analysis. Bootstrap replicates can be run in parallel (see the function documentation).
## Warning: `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(ret)`.
## # A tibble: 6 × 11
## OTU Wlab Wlight Z Gi Mlight Mheavymax Mlab A A_CI_low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OTU.1 1.72 1.70 0.0222 0.636 308. 318. 312. 0.412 0.0692
## 2 OTU.2 1.71 1.69 0.0117 0.586 308. 318. 310. 0.218 0.159
## 3 OTU.3 1.73 1.70 0.0347 0.608 308. 318. 314. 0.644 0.340
## 4 OTU.4 1.73 1.70 0.0261 0.705 308. 318. 313. 0.484 0.318
## 5 OTU.5 1.73 1.71 0.0242 0.710 308. 318. 312. 0.449 0.187
## 6 OTU.6 1.73 1.70 0.0283 0.638 308. 318. 313. 0.526 0.325
## # ℹ 1 more variable: A_CI_high <dbl>
Let’s use the same threshold as Hungate et al., (2015) for defining incorporators.
CI_threshold = 0
df_atomX_boot = df_atomX_boot %>%
mutate(Incorporator = A_CI_low > CI_threshold,
OTU = reorder(OTU, -A))
How many incorporators?
n_incorp = df_atomX_boot %>%
filter(Incorporator == TRUE) %>%
nrow
cat('Number of incorporators:', n_incorp, '\n')
## Number of incorporators: 6
OK, let’s plot the results.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] phyloseq_1.49.0 HTSSIP_1.4.1 ggplot2_3.5.1 tidyr_1.3.1
## [5] dplyr_1.1.4 rmarkdown_2.28
##
## loaded via a namespace (and not attached):
## [1] ade4_1.7-22 tidyselect_1.2.1
## [3] farver_2.1.2 Biostrings_2.73.2
## [5] fastmap_1.2.0 lazyeval_0.2.2
## [7] digest_0.6.37 lifecycle_1.0.4
## [9] cluster_2.1.6 survival_3.7-0
## [11] magrittr_2.0.3 compiler_4.4.1
## [13] rlang_1.1.4 sass_0.4.9
## [15] tools_4.4.1 igraph_2.0.3
## [17] utf8_1.2.4 yaml_2.3.10
## [19] data.table_1.16.2 knitr_1.48
## [21] S4Arrays_1.5.11 labeling_0.4.3
## [23] DelayedArray_0.31.14 plyr_1.8.9
## [25] abind_1.4-8 BiocParallel_1.39.0
## [27] withr_3.0.1 purrr_1.0.2
## [29] BiocGenerics_0.51.3 sys_3.4.3
## [31] grid_4.4.1 stats4_4.4.1
## [33] fansi_1.0.6 multtest_2.61.0
## [35] biomformat_1.33.0 colorspace_2.1-1
## [37] Rhdf5lib_1.27.0 scales_1.3.0
## [39] iterators_1.0.14 MASS_7.3-61
## [41] SummarizedExperiment_1.35.4 cli_3.6.3
## [43] vegan_2.6-8 crayon_1.5.3
## [45] generics_0.1.3 httr_1.4.7
## [47] reshape2_1.4.4 ape_5.8
## [49] cachem_1.1.0 rhdf5_2.49.0
## [51] stringr_1.5.1 zlibbioc_1.51.1
## [53] splines_4.4.1 parallel_4.4.1
## [55] XVector_0.45.0 matrixStats_1.4.1
## [57] vctrs_0.6.5 Matrix_1.7-0
## [59] jsonlite_1.8.9 IRanges_2.39.2
## [61] S4Vectors_0.43.2 coenocliner_0.2-3
## [63] maketools_1.3.1 locfit_1.5-9.10
## [65] foreach_1.5.2 jquerylib_0.1.4
## [67] glue_1.8.0 codetools_0.2-20
## [69] stringi_1.8.4 gtable_0.3.5
## [71] GenomeInfoDb_1.41.2 GenomicRanges_1.57.2
## [73] UCSC.utils_1.1.0 munsell_0.5.1
## [75] tibble_3.2.1 pillar_1.9.0
## [77] htmltools_0.5.8.1 rhdf5filters_1.17.0
## [79] GenomeInfoDbData_1.2.13 R6_2.5.1
## [81] doParallel_1.0.17 evaluate_1.0.1
## [83] lattice_0.22-6 Biobase_2.65.1
## [85] highr_0.11 bslib_0.8.0
## [87] Rcpp_1.0.13 SparseArray_1.5.45
## [89] nlme_3.1-166 permute_0.9-7
## [91] mgcv_1.9-1 DESeq2_1.45.3
## [93] xfun_0.48 MatrixGenerics_1.17.0
## [95] buildtools_1.0.0 pkgconfig_2.0.3