Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations
Statistical report for pathway analysis
Authors and affiliations
Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#
* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship
1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic
This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country
and Diet
, resulting in four distinct groups.
To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.
1.1 Statistical Methods
The statistical modeling approach is described in detail in this report. Briefly, the methods used included:
Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of
diet
,country
, and their possible interaction (diet : country
) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.Linear models: Linear models were applied to estimate the effects of
diet
,country
, and their interaction (diet:country
) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).Predictive models (elastic net): We employed elastic net logistic regression (via the
glmnet
R package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet
2 Initiation
2.1 Set home directory
Open code
setwd('/home/ticf/secured_data/GitRepo/ticf/478_MOCA_italian')
2.2 Upload initiation file
Open code
source('478_initiation.R')
3 Data
3.1 Upload and epxlore
3.1.1 Get pathways
Only these pathways that have non-zero values in all observation in at least one dataset will be chosen
Open code
<- read.delim(
data_path_originalCZ "gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_originalCZ)
## [1] 580 1
<- read.delim(
data_path_originalIT "gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_originalIT)
## [1] 488 1
<- read.delim(
data_path_validation "gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_validation)
## [1] 580 1
<- intersect(
tr $X..Pathway,
data_path_originalCZ$X..Pathway
data_path_originalIT
)
<- intersect(tr, data_path_validation$X..Pathway)
paths length(paths)
## [1] 481
3.1.2 Italian data
Open code
<- read.delim(
data_path_originalIT 'gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv',
header = TRUE, sep = "\t"
%>%
) filter(X..Pathway %in% paths)
<- data_path_originalIT[,1]
features <- strsplit(features, ": ")
split_features <- sapply(split_features, `[`, 1)
feature_name <- sapply(split_features, `[`, 2)
feature_description <- gsub(" |-", "_", feature_name)
feature_name
row.names(data_path_originalIT) <- c(feature_name)
<- data.frame(t(data_path_originalIT[,-1]))
data_path_originalIT attr(data_path_originalIT, "description") <- feature_description
<- data_path_originalIT%>%
originalIT_features select(where(~ mean(. == 0) < 0.7)) %>%
select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()
3.1.3 Czech data
Open code
<- read.delim(
data_path_originalCZ 'gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv',
header = TRUE, sep = "\t"
%>%
) filter(X..Pathway %in% paths)
<- data_path_originalCZ[,1]
features <- strsplit(features, ": ")
split_features <- sapply(split_features, `[`, 1)
feature_name <- sapply(split_features, `[`, 2)
feature_description <- gsub(" |-", "_", feature_name)
feature_name
row.names(data_path_originalCZ) <- c(feature_name)
<- data.frame(t(data_path_originalCZ[,-1]))
data_path_originalCZ attr(data_path_originalCZ, "description") <- feature_description
<- data_path_originalCZ %>%
originalCZ_features select(where(~ mean(. == 0) < 0.7)) %>%
select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()
3.1.4 Validation data
Open code
<- read.delim(
data_pathways_validation 'gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv',
header = TRUE,
sep = "\t"
%>%
) filter(X..Pathway %in% paths)
<- data_pathways_validation[,1]
features <- strsplit(features, ": ")
split_features <- sapply(split_features, `[`, 1)
feature_name <- sapply(split_features, `[`, 2)
feature_description <- gsub(" |-", "_", feature_name)
feature_name
row.names(data_pathways_validation) <- c(feature_name)
<- data.frame(t(data_pathways_validation[,-1]))
data_pathways_validation attr(data_pathways_validation, "description") <- feature_description
row.names(data_pathways_validation) <- gsub("^X", "K", row.names(data_pathways_validation))
<- data_pathways_validation %>%
validation_features select(-UNMAPPED, -UNINTEGRATED) %>%
colnames()
3.1.5 Merging data
Modify data
Open code
# which taxa
set.seed(478)
<- intersect(originalIT_features, originalCZ_features)
features <- intersect(features, validation_features)
features
# CZ
<- data_path_originalCZ %>%
data_path_originalCZ_filtered select(any_of(features))
<- data_path_originalCZ_filtered/ rowSums(data_path_originalCZ_filtered)
data_path_originalCZ_filtered
<- data_path_originalCZ_filtered %>%
data_path_originalCZ_filtered mutate(
ID = row.names(.),
Country = 'CZ'
%>%
) select(ID, Country, any_of(features))
# IT
<- data_path_originalIT %>%
data_path_originalIT_filtered select(any_of(features))
<- data_path_originalIT_filtered/ rowSums(data_path_originalIT_filtered)
data_path_originalIT_filtered
<- data_path_originalIT_filtered %>%
data_path_originalIT_filtered mutate(
ID = row.names(.),
Country = 'IT'
%>%
) select(ID, Country, any_of(features))
# joining the table
<- bind_rows(data_path_originalIT_filtered,
data_path_original_filtered
data_path_originalCZ_filtered)
<- data_path_original_filtered %>%
bacteria_data select(all_of(features))
if (file.exists("gitignore/data_path_original_impCLR.RData") == FALSE) {
<- lrSVD(
bacteria_data_imp
bacteria_data,label = 0,
dl = NULL,
z.delete = FALSE,
ncp = 2
)
row.names( bacteria_data_imp) <- row.names(bacteria_data)
<- data.frame(clr(bacteria_data_imp)) %>%
bacteria_data mutate(ID = row.names(.))
<- read.xlsx("gitignore/data/lipidome_training_cohort.xlsx") %>%
training_metadata select(Sample, Diet) %>%
mutate(ID = Sample) %>%
select(-Sample)
<- data_path_original_filtered %>%
data_path_original_filtered select(ID, Country) %>%
left_join(bacteria_data, by = "ID")
<- data_path_original_filtered %>%
data_pathways_original_clr mutate(
Data = if_else(Country == "CZ", "CZ_tr", "IT_tr")
%>%
) left_join(training_metadata, by = "ID") %>%
select(ID, Diet, Country, Data, everything())
save(
data_pathways_original_clr,file = "gitignore/data_path_original_impCLR.RData"
)
}
load("gitignore/data_path_original_impCLR.RData")
if (file.exists("gitignore/data_pathways_original_impCLR.csv") == FALSE)
write.csv(data_pathways_original_clr,
"gitignore/data_pathways_original_impCLR.csv")
## Show variances of CLR proportions across samples
<- data_pathways_original_clr %>%
data_variance rowwise() %>%
mutate(variance = var(c_across(-(ID:Data)))) %>%
ungroup() %>%
select(ID, variance)
## Look at distribution
hist(data_variance$variance)
Open code
## Show extreme samples
%>% arrange(desc(variance))
data_variance ## # A tibble: 166 × 2
## ID variance
## <chr> <dbl>
## 1 T173 11.0
## 2 T181 9.32
## 3 T182 8.98
## 4 VOV004 8.57
## 5 VOV100 8.54
## 6 T176 8.49
## 7 VOV042 8.48
## 8 VOV088 8.48
## 9 VOV006 8.46
## 10 T189 8.45
## # ℹ 156 more rows
Get diet information
Diet` from another dataset for validating data
Open code
<- read.xlsx('gitignore/data/lipidome_validation_cohort.xlsx') %>%
validation_metadata select(X1, X2) %>%
mutate(ID = X1,
Diet = X2) %>%
select(-X1, -X2)
<- data_pathways_validation %>%
data_pathways_validation_filtered select(any_of(features))
set.seed(478)
<- (data_pathways_validation_filtered/
data_pathways_validation_filtered rowSums(data_pathways_validation_filtered ))
if (file.exists("gitignore/data_pathways_validation_impCLR.RData") == FALSE) {
<- lrSVD(
data_pathways_validation_filtered_imp
data_pathways_validation_filtered,label = 0, dl = NULL, z.delete = FALSE
)
row.names(data_pathways_validation_filtered_imp) <- row.names(data_pathways_validation_filtered)
<- data.frame(
data_pathways_validation_clr clr(data_pathways_validation_filtered_imp)
%>%
) mutate(
ID = row.names(.),
Country = "CZ",
Data = "valid"
%>%
) left_join(validation_metadata, by = "ID") %>%
select(ID, Diet, Country, Data, any_of(features)) %>%
filter(!is.na(Diet))
## Add Diet for K284 which has the diet missing
data_pathways_validation_clr[which(data_pathways_validation_clr$ID == "K284"), "Diet"
<- "VEGAN"
]
save(
data_pathways_validation_clr, file = "gitignore/data_pathways_validation_impCLR.RData"
)
}
load("gitignore/data_pathways_validation_impCLR.RData")
if (file.exists("gitignore/data_pathways_validation_impCLR.csv") == FALSE)
write.csv(data_pathways_validation_clr,
"gitignore/data_pathways_validation_impCLR.csv")
## Show variances of CLR proportions across samples
<- data_pathways_validation_clr %>%
data_variance rowwise() %>%
mutate(variance = var(c_across(-(ID:Data)))) %>%
ungroup() %>%
select(ID, variance)
## Look at distribution
hist(data_variance$variance)
Open code
## Show extreme samples
%>% arrange(desc(variance))
data_variance ## # A tibble: 101 × 2
## ID variance
## <chr> <dbl>
## 1 K16 7.90
## 2 K55 7.38
## 3 K74 7.36
## 4 K119 7.29
## 5 K38 7.27
## 6 K12 7.22
## 7 K61 7.13
## 8 K292 7.10
## 9 K82 7.07
## 10 K64 7.02
## # ℹ 91 more rows
3.1.6 Merge training and validation dataset
Open code
<- bind_rows(
data_merged
data_pathways_original_clr,
data_pathways_validation_clr )
3.2 Explore
3.2.0.1 Distributions
The following plot will show distribution of 36 randomly selected pathways
Open code
= c(6,6)
size <- data_pathways_original_clr[, 5:ncol(data_pathways_original_clr)]
check
<- check[, sample(1:ncol(check), size[1]*size[2])]
check
par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
for(x in 1:ncol(check)){
hist(check[,x],
16,
col='blue',
main = paste0(colnames(check)[x])
) }
Data seems to have relatively symmetric distribution
3.2.0.2 Pathways accross groups
Open code
set.seed(478)
<- c('#F9FFAF','#329243')
colo
<- data.frame(
outcomes variable = data_merged %>%
select(any_of(sample(features, 35))) %>%
colnames()
)
<- function(variable) {
boxplot_cond
<- ggboxplot(data_merged,
p x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'pathway level') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
<- map(outcomes$variable, boxplot_cond)
plots
# Create a matrix of plots
<- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
plots_arranged plots_arranged
4 Linear models across pathways
We will fit a feature-specific linear model where the clr-transformed pathway represents the outcome variable whereas country
(Italy vs Czech), diet
(vegan vs omnivore), and their interaction (country:diet
) all represent fixed-effects predictors. So, each model has following form
\[ clr(\text{pathway level}) = \alpha + \beta_{1} \times \text{country} + \beta_{2} \times \text{diet} + \beta_{3} \times \text{country:diet} + \epsilon \]
The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary
output as presenting the conditional effects of diet
averaged across both countries and the conditional effects of country
averaged across both diet groups. We will then use the emmeans
package (Lenth 2024) to obtain specific estimates for the effect of diet
in the Italian and Czech cohorts separately, still from a single model.
pathways that will show a significant diet effect (average effect of diet
across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable are these findings.
4.1 Select and wrangle data
Open code
<- data_pathways_original_clr %>%
data_analysis_pathways na.omit() %>%
::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== "VEGAN", 0.5, -0.5
Diet
)
),Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
),%>%
) ::select(
dplyr
ID,
Country,
Country_IT,
Diet,
Diet_VEGAN,::everything()
dplyr
)
summary(data_analysis_pathways[ , 1:12])
## ID Country Country_IT Diet
## Length:166 Length:166 Min. :-0.50000 Length:166
## Class :character Class :character 1st Qu.:-0.50000 Class :character
## Mode :character Mode :character Median :-0.50000 Mode :character
## Mean :-0.03012
## 3rd Qu.: 0.50000
## Max. : 0.50000
## Diet_VEGAN Data X1CMET2_PWY ALLANTOINDEG_PWY
## Min. :-0.50000 Length:166 Min. :1.273 Min. :-6.0996
## 1st Qu.:-0.50000 Class :character 1st Qu.:2.422 1st Qu.:-4.9392
## Median : 0.50000 Mode :character Median :2.728 Median :-3.4441
## Mean : 0.08434 Mean :2.663 Mean :-3.6060
## 3rd Qu.: 0.50000 3rd Qu.:3.016 3rd Qu.:-2.3188
## Max. : 0.50000 Max. :3.741 Max. :-0.6658
## ANAEROFRUCAT_PWY ANAGLYCOLYSIS_PWY ARG.POLYAMINE_SYN ARGININE_SYN4_PWY
## Min. :0.1183 Min. :1.490 Min. :-2.72033 Min. :-2.4001
## 1st Qu.:1.5725 1st Qu.:2.407 1st Qu.: 0.07355 1st Qu.: 0.7401
## Median :1.9835 Median :2.770 Median : 0.37284 Median : 1.4195
## Mean :1.8992 Mean :2.710 Mean : 0.35348 Mean : 1.3716
## 3rd Qu.:2.2510 3rd Qu.:3.023 3rd Qu.: 0.77630 3rd Qu.: 2.1978
## Max. :3.1912 Max. :4.664 Max. : 1.97414 Max. : 3.0797
1:5 , 1:8]
data_analysis_pathways[## ID Country Country_IT Diet Diet_VEGAN Data X1CMET2_PWY ALLANTOINDEG_PWY
## 1 VOV002 IT 0.5 OMNI -0.5 IT_tr 2.186503 -5.783155
## 2 VOV003 IT 0.5 OMNI -0.5 IT_tr 2.008170 -1.685330
## 3 VOV004 IT 0.5 OMNI -0.5 IT_tr 3.418194 -4.651552
## 4 VOV006 IT 0.5 OMNI -0.5 IT_tr 3.174203 -4.576071
## 5 VOV007 IT 0.5 OMNI -0.5 IT_tr 1.479598 -2.290716
4.1.1 Define number of pathways and covariates
Open code
<- 6
n_covarites <- ncol(data_analysis_pathways) - n_covarites n_features
4.1.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
est_VGdiet_inCZ <- vector('double', n_features)
est_VGdiet_inIT <- vector('double', n_features)
est_VGdiet_avg
<- vector('double', n_features)
est_ITcountry_avg <- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg
<- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg
<- vector('double', n_features)
CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features) CI_U_VGdiet_avg
4.2 Run linear models over pathways
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_pathways[, (i + n_covarites)]
data_analysis_pathways
## fit model
<- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis_pathways)
model
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_VEGAN | Country_IT
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis_pathways)[i + n_covarites]
outcome[i]
## country effect
<- summary(model)$coefficients[
est_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
P_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
CI_U_VGdiet_avg[i]
<- summary(model)$coefficients[
est_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
]
<- -contrast_emm$estimate[1]
est_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
P_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_VGdiet_inCZ[i]
<- -contrast_emm$estimate[2]
est_VGdiet_inIT[i] <- contrast_emm$p.value[2]
P_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_VGdiet_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
4
),
] }
4.3 Results table
Open code
<- data.frame(
result_pathways
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT,
diet_country_int, P_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT )
4.3.1 Adjust p values
Open code
<- result_pathways %>%
result_pathways ::mutate(
dplyrfdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
%>%
) ::select(
dplyr
outcome,
est_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
diet_country_int, P_diet_country_int, fdr_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT )
4.3.2 Result: show and save
Open code
::kable(result_pathways %>% filter(fdr_VGdiet_avg < 0.05),
kableExtracaption = "Result of linear models, modelling CLR-transformed relative level of given fcuntional pathway with `Diet`, `Country` and `Diet x Country` interaction as fixed-effect predictors. Only the pathways that differ between vegans and omnivores in training cohorts (FDR < 0.05, average diet effect over both countries) are shown. `est` prefix: denotes estimated effects (regression coefficient), i.e. how much clr-transformed relative pathway level differ in vegans compared to omnivores, and in Italian vs Czech cohort, respectively; `P`: p-value, `fdr`: p-value after adjustment for multiple comparison, `CI_L` and `CI_U`: lower and upper bounds of 95% confidence interval respectively. `avg` suffix shows effect averaged across subgroups, whereas `inCZ` and `inIT` shows effect in Czech or Italian cohort respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value) All estimates in a single row are based on a single model"
)
outcome | est_ITcountry_avg | P_ITcountry_avg | fdr_ITcountry_avg | est_VGdiet_avg | P_VGdiet_avg | fdr_VGdiet_avg | est_VGdiet_inCZ | P_VGdiet_inCZ | fdr_VGdiet_inCZ | est_VGdiet_inIT | P_VGdiet_inIT | fdr_VGdiet_inIT | diet_country_int | P_diet_country_int | fdr_diet_country_int | CI_L_VGdiet_avg | CI_U_VGdiet_avg | CI_L_VGdiet_inCZ | CI_U_VGdiet_inCZ | CI_L_VGdiet_inIT | CI_U_VGdiet_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CALVIN_PWY | -0.0296002 | 0.7062271 | 0.7756810 | 0.2613223 | 0.0010627 | 0.0250528 | 0.3326251 | 0.0031400 | 0.0456284 | 0.1900196 | 0.0882619 | 0.9858171 | -0.1426055 | 0.3644016 | 0.5988432 | 0.1065206 | 0.4161240 | 0.1135788 | 0.5516714 | -0.0287793 | 0.4088185 |
GALACTITOLCAT_PWY | -0.7049222 | 0.0030045 | 0.0114303 | -0.9807686 | 0.0000454 | 0.0022705 | -1.3052878 | 0.0001197 | 0.0038094 | -0.6562494 | 0.0489003 | 0.9858171 | 0.6490384 | 0.1673536 | 0.5246896 | -1.4428039 | -0.5187332 | -1.9590736 | -0.6515020 | -1.3092967 | -0.0032021 |
HEXITOLDEGSUPER_PWY | -0.6345150 | 0.0033358 | 0.0122899 | -0.9086781 | 0.0000337 | 0.0019655 | -1.2010968 | 0.0001017 | 0.0035588 | -0.6162595 | 0.0422636 | 0.9858171 | 0.5848373 | 0.1716696 | 0.5246896 | -1.3292685 | -0.4880877 | -1.7962374 | -0.6059562 | -1.2107279 | -0.0217911 |
HSERMETANA_PWY | 0.0746296 | 0.3489152 | 0.4504382 | 0.2473730 | 0.0021839 | 0.0388779 | 0.3473933 | 0.0023538 | 0.0446191 | 0.1473527 | 0.1912725 | 0.9858171 | -0.2000406 | 0.2098315 | 0.5283527 | 0.0904967 | 0.4042494 | 0.1254113 | 0.5693753 | -0.0743785 | 0.3690840 |
NONOXIPENT_PWY | -0.1079297 | 0.1953293 | 0.3065706 | 0.3293683 | 0.0001085 | 0.0047487 | 0.3660825 | 0.0021621 | 0.0446191 | 0.2926541 | 0.0136120 | 0.9858171 | -0.0734285 | 0.6588361 | 0.8034587 | 0.1654654 | 0.4932712 | 0.1341578 | 0.5980072 | 0.0609913 | 0.5243168 |
PROPFERM_PWY | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
PWY_1269 | 0.2494440 | 0.0568017 | 0.1234820 | 0.4002225 | 0.0024462 | 0.0389173 | 0.3686893 | 0.0467370 | 0.1237939 | 0.4317557 | 0.0200073 | 0.9858171 | 0.0630665 | 0.8086764 | 0.8985293 | 0.1434741 | 0.6569709 | 0.0053870 | 0.7319915 | 0.0688638 | 0.7946476 |
PWY_5104 | -0.0029780 | 0.9880447 | 0.9965869 | -0.8870113 | 0.0000146 | 0.0018502 | -0.8735792 | 0.0022026 | 0.0446191 | -0.9004434 | 0.0015980 | 0.2796561 | -0.0268642 | 0.9461154 | 0.9714199 | -1.2788612 | -0.4951614 | -1.4280518 | -0.3191067 | -1.4542896 | -0.3465972 |
PWY_5367 | -0.1763380 | 0.3098164 | 0.4170606 | -0.8229389 | 0.0000044 | 0.0015281 | -1.3000263 | 0.0000004 | 0.0001260 | -0.3458515 | 0.1593626 | 0.9858171 | 0.9541748 | 0.0065146 | 0.2672971 | -1.1647303 | -0.4811475 | -1.7836654 | -0.8163873 | -0.8289442 | 0.1372413 |
PWY_5747 | 0.3801128 | 0.1486819 | 0.2513565 | -0.8724046 | 0.0010737 | 0.0250528 | -0.8978253 | 0.0165274 | 0.0736276 | -0.8469838 | 0.0234498 | 0.9858171 | 0.0508415 | 0.9228098 | 0.9699202 | -1.3896726 | -0.3551365 | -1.6297661 | -0.1658845 | -1.5780978 | -0.1158697 |
PWY_5971 | 0.2851474 | 0.1681036 | 0.2736570 | -0.7492655 | 0.0003689 | 0.0117385 | -1.2273875 | 0.0000420 | 0.0016315 | -0.2711434 | 0.3530071 | 0.9858171 | 0.9562441 | 0.0215055 | 0.3516863 | -1.1559662 | -0.3425647 | -1.8028741 | -0.6519009 | -0.8459800 | 0.3036931 |
PWY_6284 | -0.0763172 | 0.6416310 | 0.7174788 | -0.7170244 | 0.0000211 | 0.0018502 | -1.1472182 | 0.0000018 | 0.0002256 | -0.2868306 | 0.2167977 | 0.9858171 | 0.8603876 | 0.0094022 | 0.2672971 | -1.0402209 | -0.3938279 | -1.6045453 | -0.6898911 | -0.7436411 | 0.1699800 |
PWY_6293 | -0.3991880 | 0.0295207 | 0.0743327 | -0.7979908 | 0.0000204 | 0.0018502 | -1.2508209 | 0.0000027 | 0.0002256 | -0.3451607 | 0.1810402 | 0.9858171 | 0.9056603 | 0.0137467 | 0.2672971 | -1.1569706 | -0.4390110 | -1.7587819 | -0.7428600 | -0.8525478 | 0.1622265 |
PWY_6629 | -0.1491528 | 0.0501929 | 0.1140748 | 0.2410823 | 0.0017135 | 0.0374820 | 0.3128749 | 0.0039398 | 0.0492480 | 0.1692897 | 0.1150530 | 0.9858171 | -0.1435852 | 0.3436827 | 0.5866151 | 0.0918025 | 0.3903621 | 0.1016422 | 0.5241076 | -0.0417044 | 0.3802838 |
PWY_7237 | -0.2382669 | 0.0372708 | 0.0893478 | 0.3820200 | 0.0009489 | 0.0250528 | 0.5125117 | 0.0016953 | 0.0423814 | 0.2515284 | 0.1187112 | 0.9858171 | -0.2609833 | 0.2517765 | 0.5640539 | 0.1579766 | 0.6060635 | 0.1954874 | 0.8295360 | -0.0651378 | 0.5681946 |
PWY_801 | -0.4000136 | 0.0326876 | 0.0817191 | -0.7960755 | 0.0000309 | 0.0019655 | -1.2545161 | 0.0000040 | 0.0002256 | -0.3376348 | 0.2000867 | 0.9858171 | 0.9168813 | 0.0145845 | 0.2686626 | -1.1627289 | -0.4294220 | -1.7733353 | -0.7356969 | -0.8558680 | 0.1805984 |
PWY_8178 | -0.0688124 | 0.4122732 | 0.5197219 | 0.3181576 | 0.0002039 | 0.0079305 | 0.3622281 | 0.0026075 | 0.0446191 | 0.2740872 | 0.0217826 | 0.9858171 | -0.0881408 | 0.5992903 | 0.7464470 | 0.1528518 | 0.4834635 | 0.1283182 | 0.5961380 | 0.0404415 | 0.5077329 |
PWY_8188 | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
PWY_8189 | -1.1733941 | 0.0000000 | 0.0000000 | -0.5452806 | 0.0023327 | 0.0388779 | -1.1841185 | 0.0000045 | 0.0002256 | 0.0935574 | 0.7077896 | 0.9858171 | 1.2776759 | 0.0003881 | 0.0445874 | -0.8933922 | -0.1971690 | -1.6767008 | -0.6915363 | -0.3984685 | 0.5855832 |
PWY0_1586 | 0.0341764 | 0.6869672 | 0.7608813 | 0.2681826 | 0.0018360 | 0.0377992 | 0.3571433 | 0.0033129 | 0.0456284 | 0.1792219 | 0.1361326 | 0.9858171 | -0.1779213 | 0.2949060 | 0.5689951 | 0.1010066 | 0.4353587 | 0.1205871 | 0.5936995 | -0.0570671 | 0.4155109 |
PWY0_42 | 0.3942904 | 0.0154267 | 0.0440852 | -0.5433769 | 0.0009276 | 0.0250528 | -0.4880001 | 0.0337446 | 0.1072755 | -0.5987536 | 0.0093549 | 0.9858171 | -0.1107535 | 0.7314183 | 0.8512073 | -0.8614198 | -0.2253340 | -0.9380348 | -0.0379654 | -1.0482800 | -0.1492273 |
TRPSYN_PWY | -0.2950693 | 0.0005635 | 0.0028177 | 0.3120611 | 0.0002732 | 0.0095630 | 0.3618949 | 0.0026771 | 0.0446191 | 0.2622274 | 0.0283480 | 0.9858171 | -0.0996675 | 0.5531917 | 0.7251576 | 0.1464543 | 0.4776680 | 0.1275591 | 0.5962306 | 0.0281564 | 0.4962985 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_filt.csv') == FALSE){
write.table(result_pathways,
'gitignore/lm_results/result_pathways_filt.csv',
row.names = FALSE)
}
5 Elastic net
To assess the predictive power of pathways features on diet strategy, we employed Elastic Net logistic regression.
As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).
The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.
All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5)
5.1 Prepare data for glmnet
Open code
<- data_pathways_original_clr %>%
data_pathways_glmnet ::mutate(
dplyrvegan = as.numeric(
::if_else(
dplyr== "VEGAN", 1, 0
Diet
)
)%>%
) ::mutate(
dplyr::across(all_of(features), ~ arm::rescale(.))
dplyr%>%
) ::select(
dplyrall_of(features)
ID, vegan, Country, )
5.2 Fit model
Open code
<- "elanet_pathways_filt"
modelac
assign(
modelac,run(
expr = clust_glmnet_sep(
data = data_pathways_glmnet,
outcome = "vegan",
clust_id = "ID",
sample_method = "oos_boot",
N = 500,
alphas = c(0, 0.2, 0.4),
family = "binomial",
seed = 478
),path = paste0("gitignore/run/", modelac)
) )
5.3 See results
5.3.1 Model summary
Open code
$model_summary
elanet_pathways_filt## alpha lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1 0.4 0.04303069 0.9412819 0.7897421 0.6777675 0.8811993 0.873494
## accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1 0.7252609 0.6105935 0.8321154
$country_AUC
elanet_pathways_filt## auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1 0.6698679 0.4822149 0.8389606 0.8739966
## auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1 0.7339286 0.9936312
5.3.2 ROC curve - internal validation
Open code
$plot elanet_pathways_filt
5.3.3 Estimated coefficients
Open code
<- data.frame(
tr label = row.names(
$betas
elanet_pathways_filt
)[which(
abs(
$betas
elanet_pathways_filt>0
)
)
],beta = elanet_pathways_filt$betas[
abs(
$betas
elanet_pathways_filt>0
)
]-1, ]
)[
$pathway <- attr(data_path_originalCZ,
tr"description")[
colnames(data_path_originalCZ) %in% tr$label]
::kable(tr %>% select(label, pathway, beta)) kableExtra
label | pathway | beta | |
---|---|---|---|
2 | CENTFERM_PWY | pyruvate fermentation to butanoate | -0.1635717 |
3 | FUCCAT_PWY | fucose degradation | 0.2839196 |
4 | GALACTITOLCAT_PWY | galactitol degradation | -0.1996850 |
5 | GLUCARDEG_PWY | D-glucarate degradation I | 0.2602967 |
6 | GOLPDLCAT_PWY | superpathway of glycerol degradation to 1,3-propanediol | 0.1236880 |
7 | HEXITOLDEGSUPER_PWY | superpathway of hexitol degradation (bacteria) | -0.2176896 |
8 | HSERMETANA_PWY | L-methionine biosynthesis III | 0.1004888 |
9 | LACTOSECAT_PWY | lactose and galactose degradation I | -0.0286681 |
10 | NONOXIPENT_PWY | pentose phosphate pathway (non-oxidative branch) I | 0.3841247 |
11 | PPGPPMET_PWY | ppGpp metabolism | 0.2606652 |
12 | PROPFERM_PWY | superpathway of L-alanine fermentation (Stickland reaction) | -0.1535219 |
13 | PWY_5104 | L-isoleucine biosynthesis IV | -0.2145823 |
14 | PWY_5136 | fatty acid β-oxidation II (plant peroxisome) | 0.1595753 |
15 | PWY_5345 | superpathway of L-methionine biosynthesis (by sulfhydrylation) | 0.0000036 |
16 | PWY_5367 | petroselinate biosynthesis | -0.3153989 |
17 | PWY_5494 | pyruvate fermentation to propanoate II (acrylate pathway) | -0.0597671 |
18 | PWY_5747 | 2-methylcitrate cycle II | -0.0426507 |
19 | PWY_622 | starch biosynthesis | -0.0278495 |
20 | PWY_6284 | superpathway of unsaturated fatty acids biosynthesis (E. coli) | -0.1552727 |
21 | PWY_6293 | superpathway of L-cysteine biosynthesis (fungi) | -0.2893336 |
22 | PWY_6470 | peptidoglycan biosynthesis V (β-lactam resistance) | -0.4052525 |
23 | PWY_6572 | chondroitin sulfate degradation I (bacterial) | 0.1353945 |
24 | PWY_6590 | superpathway of Clostridium acetobutylicum acidogenic fermentation | -0.1394431 |
25 | PWY_6807 | xyloglucan degradation II (exoglucanase) | 0.1192990 |
26 | PWY_6892 | thiazole component of thiamine diphosphate biosynthesis I | 0.0488946 |
27 | PWY_6895 | superpathway of thiamine diphosphate biosynthesis II | -0.2609380 |
28 | PWY_6936 | seleno-amino acid biosynthesis (plants) | 0.2567455 |
29 | PWY_6992 | 1,5-anhydrofructose degradation | -0.0422256 |
30 | PWY_7111 | pyruvate fermentation to isobutanol (engineered) | -0.0626192 |
31 | PWY_7185 | UTP and CTP dephosphorylation I | -0.1626717 |
32 | PWY_7210 | pyrimidine deoxyribonucleotides biosynthesis from CTP | -0.1586225 |
33 | PWY_7234 | inosine-5’-phosphate biosynthesis III | -0.0885173 |
34 | PWY_7237 | myo-, chiro- and scyllo-inositol degradation | 0.1260601 |
35 | PWY_7312 | dTDP-β-D-fucofuranose biosynthesis | -0.1364504 |
36 | PWY_7400 | L-arginine biosynthesis IV (archaebacteria) | -0.0556371 |
37 | PWY_7858 | (5Z)-dodecenoate biosynthesis II | 0.1653035 |
38 | PWY_801 | homocysteine and cysteine interconversion | -0.2465140 |
39 | PWY_8178 | pentose phosphate pathway (non-oxidative branch) II | 0.2210654 |
40 | PWY_8188 | L-alanine degradation VI (reductive Stickland reaction) | -0.1546633 |
41 | PWY_8189 | L-alanine degradation V (oxidative Stickland reaction) | -0.1541587 |
42 | PWY0_1477 | ethanolamine utilization | -0.0720184 |
43 | PWY0_42 | 2-methylcitrate cycle I | -0.0442314 |
44 | PWY66_389 | phytol degradation | 0.0854299 |
45 | PWY66_391 | fatty acid β-oxidation VI (mammalian peroxisome) | -0.0083984 |
46 | REDCITCYC | TCA cycle VI (Helicobacter) | 0.0710738 |
47 | SO4ASSIM_PWY | assimilatory sulfate reduction I | 0.0312470 |
48 | SULFATE_CYS_PWY | superpathway of sulfate assimilation and cysteine biosynthesis | 0.0208325 |
49 | TRPSYN_PWY | L-tryptophan biosynthesis | 0.1046697 |
50 | UDPNAGSYN_PWY | UDP-N-acetyl-D-glucosamine biosynthesis I | -0.0136180 |
5.3.4 Plot of coefficients
Open code
<- data.frame(
elacoef pathway = row.names(elanet_pathways_filt$betas),
beta_ela = elanet_pathways_filt$betas[, 1]
%>%
) arrange(abs(beta_ela)) %>%
filter(abs(beta_ela) > 0,
!grepl('Intercept', pathway)) %>%
mutate(pathway = factor(pathway, levels = pathway))
<- "elanet_beta_pathway"
plotac <- "gitignore/figures"
path
assign(plotac,
ggplot(elacoef,
aes(
x = pathway,
y = beta_ela
)+
) geom_point() +
geom_hline(yintercept = 0, color = "black") +
labs(
y = "Standardized beta coefficients",
x = "pathway"
+
) theme_minimal() +
coord_flip() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 7,
height = 12
)
}
get(plotac)
6 External validation
External validation was performed with an independent Czech cohort.
As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each pathway level from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.
In a 2nd step, we will look at pathways that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated with linear models (one per pathway) with training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these pathways will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort
6.1 Diet discrimination (elastic net)
6.1.1 Get table of weights, means and SDs
Open code
<- get_coef(
coefs_pathways original_data = data_analysis_pathways,
glmnet_model = elanet_pathways_filt)
6.1.3 Standardize data in validation set
Open code
<- data_pathways_validation_clr %>%
data_pathways_validation_pred ::mutate(
dplyrvegan = if_else(
== 'VEGAN', 1, 0
Diet
)%>%
) ::select(
dplyr
vegan,::all_of(common_predictors)
dplyr%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
- coefs_pathways$mean[
match(
cur_column(),
$predictor
coefs_pathways
)
]
)%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
/ coefs_pathways$SD[
match(
cur_column(),
$predictor
coefs_pathways
)
]
) )
6.1.4 Get predicted value
Open code
$fit
elanet_pathways_filt##
## Call: glmnet::glmnet(x = original_predictors, y = original_outcome, family = family, alpha = optim_par$alpha[1], lambda = optim_par$lamb_1se[1], standardize = standardize)
##
## Df %Dev Lambda
## 1 49 37.63 0.04303
<- as.matrix(data_pathways_validation_pred[,-1])
newx
<- data_pathways_validation_pred %>%
tr ::mutate(
dplyrpredicted_logit = as.numeric(
predict(
$fit,
elanet_pathways_filtnewx = newx
)
)%>%
) ::mutate(
dplyrpredicted = inv_logit(predicted_logit)
)
6.2 Result of external validation
6.2.1 ROC curve
Open code
<- pROC::roc(
roc_pathway ~ predicted_logit,
vegan data = tr,
direction = "<",
levels = c(0, 1),
ci = TRUE
)
<- "roc_pathway_pl"
plotac <- "gitignore/figures"
path
assign(plotac, ggroc(roc_pathway))
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 4.5
) }
6.2.2 Table
Open code
<- elanet_pathways_filt
mod
<- mean(mod[["valid_performances"]]$auc_resamp_test)
trainAUC <- quantile(mod[["valid_performances"]]$auc_resamp_test,
trainCI probs = c(1 / 40, 39 / 40)
)
<- data.frame(
res alpha = c(mod$model_summary$alpha, rep("", 4)),
lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
`performance type` = c(
"Training set AUC",
"Out-of-sample AUC (all)",
"Out-of-sample AUC (Czech)",
"Out-of-sample AUC (Italy)",
"External validation AUC"
),`performance [95% CI]` = c(
sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
sprintf(
"%.3f [%.3f to %.3f]",
$model_summary$auc_OutOfSample,
mod$model_summary$auc_oos_CIL,
mod$model_summary$auc_oos_CIU
mod
),sprintf(
"%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_CZ,
mod$country_AUC$auc_oos_CIL_CZ,
mod$country_AUC$auc_oos_CIU_CZ
mod
),sprintf(
"%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_IT,
mod$country_AUC$auc_oos_CIL_IT,
mod$country_AUC$auc_oos_CIU_IT
mod
),sprintf(
"%.3f [%.3f to %.3f]",
"ci"]][2],
roc_pathway[["ci"]][1],
roc_pathway[["ci"]][3]
roc_pathway[[
)
)
)
::kable(
kableExtra%>% mutate(across(where(is.numeric), ~ round(.x, 3))),
res caption = "Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed pathways based on bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method."
)
alpha | lambda | performance.type | performance..95..CI. |
---|---|---|---|
0.4 | 0.043 | Training set AUC | 0.994 [0.975 to 1.000] |
Out-of-sample AUC (all) | 0.790 [0.678 to 0.881] | ||
Out-of-sample AUC (Czech) | 0.874 [0.734 to 0.994] | ||
Out-of-sample AUC (Italy) | 0.670 [0.482 to 0.839] | ||
External validation AUC | 0.774 [0.684 to 0.864] |
6.3 Diet effect across datasets (forest plot)
Similarly as in training data cohorts, we will fit linear model per each of the selected pathway level with a single fixed effect factor of diet
.
6.3.1 Linear model in validation cohort
Open code
<- data_pathways_validation_clr %>%
data_analysis_pathways ::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== 'VEGAN', 1, 0
Diet
)%>%
) ) ::select(
dplyr
Diet_VEGAN,::everything()
dplyr
)
summary(data_analysis_pathways[, 1:12])
## Diet_VEGAN ID Diet Country
## Min. :0.0000 Length:101 Length:101 Length:101
## 1st Qu.:0.0000 Class :character Class :character Class :character
## Median :1.0000 Mode :character Mode :character Mode :character
## Mean :0.5743
## 3rd Qu.:1.0000
## Max. :1.0000
## Data X1CMET2_PWY ALLANTOINDEG_PWY ANAEROFRUCAT_PWY
## Length:101 Min. :1.820 Min. :-4.1711 Min. :1.139
## Class :character 1st Qu.:2.582 1st Qu.:-3.5853 1st Qu.:1.598
## Mode :character Median :2.730 Median :-3.3514 Median :1.842
## Mean :2.695 Mean :-2.8111 Mean :1.825
## 3rd Qu.:2.847 3rd Qu.:-2.0140 3rd Qu.:2.029
## Max. :3.105 Max. :-0.5367 Max. :2.748
## ANAGLYCOLYSIS_PWY ARG.POLYAMINE_SYN ARGININE_SYN4_PWY ARGSYN_PWY
## Min. :1.990 Min. :-1.4687 Min. :-1.4437 Min. :2.000
## 1st Qu.:2.777 1st Qu.: 0.2802 1st Qu.: 0.5786 1st Qu.:2.752
## Median :2.942 Median : 0.5856 Median : 0.9553 Median :2.907
## Mean :2.913 Mean : 0.5673 Mean : 0.9585 Mean :2.854
## 3rd Qu.:3.086 3rd Qu.: 0.8994 3rd Qu.: 1.4342 3rd Qu.:3.001
## Max. :3.541 Max. : 1.7341 Max. : 2.1724 Max. :3.400
6.3.1.1 Define number of pathways and covariates
Open code
<- 5
n_covarites <- ncol(data_analysis_pathways) - n_covarites n_features
6.3.1.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
est_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features) CI_U_VGdiet
6.3.1.3 Linear models per outcome
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_pathways[, (i + n_covarites)]
data_analysis_pathways
## fit model
<- lm(outcome ~ Diet_VEGAN, data = data_analysis_pathways)
model
## save results
<- names(data_analysis_pathways)[i + n_covarites]
outcome[i]
## extract diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]
CI_U_VGdiet[i]
<- summary(model)$coefficients[
est_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
] }
6.3.1.4 Results table
Open code
## relevant pathways
<- result_pathways %>%
diet_sensitive_pathways filter(
< 0.05
fdr_VGdiet_avg %>%
) select(
outcome
)
<- data.frame(
result_pathways_val
outcome,
est_VGdiet, P_VGdiet,
CI_L_VGdiet, CI_U_VGdiet%>%
) filter(outcome %in% diet_sensitive_pathways$outcome)
<- attr(
tr
data_path_originalCZ,"description")[colnames(data_path_originalCZ) %in% result_pathways_val$outcome]
<- cbind(pathway = tr, result_pathways_val) %>%
val_res arrange(desc(est_VGdiet))
::kable(val_res,
kableExtracaption = 'Results of linear models estimating the effect of diet on the CLR-transformed proportion of a given functional pathway as the outcome. Only pathways whose proportion significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `est` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed proportions differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model')
pathway | outcome | est_VGdiet | P_VGdiet | CI_L_VGdiet | CI_U_VGdiet |
---|---|---|---|---|---|
peptidoglycan maturation (meso-diaminopimelate containing) | PWY0_1586 | 0.1535083 | 0.0140132 | 0.0317355 | 0.2752811 |
pentose phosphate pathway (non-oxidative branch) I | NONOXIPENT_PWY | 0.1334798 | 0.0209651 | 0.0205883 | 0.2463713 |
L-tryptophan biosynthesis | TRPSYN_PWY | 0.1327050 | 0.0186200 | 0.0226470 | 0.2427630 |
myo-, chiro- and scyllo-inositol degradation | PWY_7237 | 0.1258992 | 0.1422591 | -0.0429823 | 0.2947807 |
pentose phosphate pathway (non-oxidative branch) II | PWY_8178 | 0.1132709 | 0.0398410 | 0.0053646 | 0.2211772 |
superpathway of L-tryptophan biosynthesis | PWY_6629 | 0.0985685 | 0.0431588 | 0.0030925 | 0.1940445 |
L-methionine biosynthesis III | HSERMETANA_PWY | 0.0880344 | 0.0798108 | -0.0106562 | 0.1867249 |
Calvin-Benson-Bassham cycle | CALVIN_PWY | 0.0596844 | 0.2390520 | -0.0402962 | 0.1596651 |
galactitol degradation | GALACTITOLCAT_PWY | 0.0535177 | 0.8777221 | -0.6348888 | 0.7419241 |
superpathway of hexitol degradation (bacteria) | HEXITOLDEGSUPER_PWY | -0.0106956 | 0.9728494 | -0.6326706 | 0.6112795 |
2-methylcitrate cycle II | PWY_5747 | -0.0366123 | 0.7913669 | -0.3104904 | 0.2372658 |
palmitate biosynthesis (type II fatty acid synthase) | PWY_5971 | -0.0392706 | 0.7701391 | -0.3052183 | 0.2266771 |
2-methylcitrate cycle I | PWY0_42 | -0.0517980 | 0.7101465 | -0.3275435 | 0.2239474 |
CMP-3-deoxy-D-manno-octulosonate biosynthesis | PWY_1269 | -0.0634412 | 0.6797402 | -0.3674721 | 0.2405898 |
L-isoleucine biosynthesis IV | PWY_5104 | -0.2815587 | 0.1150244 | -0.6329268 | 0.0698094 |
L-alanine degradation V (oxidative Stickland reaction) | PWY_8189 | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
L-alanine degradation VI (reductive Stickland reaction) | PWY_8188 | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
superpathway of L-alanine fermentation (Stickland reaction) | PROPFERM_PWY | -0.4025018 | 0.0014258 | -0.6458656 | -0.1591379 |
superpathway of L-cysteine biosynthesis (fungi) | PWY_6293 | -0.4259077 | 0.0020150 | -0.6923133 | -0.1595022 |
homocysteine and cysteine interconversion | PWY_801 | -0.4273921 | 0.0020285 | -0.6949062 | -0.1598780 |
superpathway of unsaturated fatty acids biosynthesis (E. coli) | PWY_6284 | -0.9274652 | 0.0000322 | -1.3497696 | -0.5051608 |
petroselinate biosynthesis | PWY_5367 | -1.3126488 | 0.0000030 | -1.8382687 | -0.7870289 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_validation_filt.csv') == FALSE){
write.table(val_res,
'gitignore/lm_results/result_pathways_validation_filt.csv',
row.names = FALSE)
}
6.3.2 Forest plot
6.3.2.1 Data preparation
Open code
<- nrow(diet_sensitive_pathways)
len
## subset result tables
<- result_pathways %>%
result_pathways_subset filter(outcome %in% diet_sensitive_pathways$outcome)
<- result_pathways_val %>%
result_pathways_val_subset filter(outcome %in% diet_sensitive_pathways$outcome)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_pathways$outcome, 3),
beta = c(
$est_VGdiet_inCZ,
result_pathways_subset$est_VGdiet_inIT,
result_pathways_subset$est_VGdiet
result_pathways_val_subset
),lower = c(
$CI_L_VGdiet_inCZ,
result_pathways_subset$CI_L_VGdiet_inIT,
result_pathways_subset$CI_L_VGdiet
result_pathways_val_subset
),upper = c(
$CI_U_VGdiet_inCZ,
result_pathways_subset$CI_U_VGdiet_inIT,
result_pathways_subset$CI_U_VGdiet
result_pathways_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
<- data_forest %>%
validation_order left_join(
%>% select(outcome, pathway),
val_res by = 'outcome'
%>%
) group_by(outcome) %>%
summarise(beta_mean = mean(beta), .groups = "drop",
pathway = first(pathway)) %>%
arrange(beta_mean) %>%
pull(pathway)
<- data_forest %>%
up_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(
%>% mutate(outcome = pathway) %>% select(-pathway),
elacoef by = 'outcome') %>%
filter(beta_CZ > 0,
> 0,
beta_IT > 0,
lower_Validation > 0.1) %>%
beta_ela select(outcome)
<- data_forest %>%
down_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = pathway) %>% select(-pathway),
by = 'outcome') %>%
filter(beta_CZ < 0,
< 0,
beta_IT < 0,
upper_Validation < -0.1) %>%
beta_ela select(outcome)
<- as.character(c(up_winners$outcome, down_winners$outcome))
winners
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% select(outcome, pathway),
val_res by = 'outcome'
%>%
) left_join(
%>% mutate(outcome = pathway) %>% select(-pathway),
elacoef by = 'outcome') %>%
mutate(pathway = factor(pathway, levels = validation_order))
6.3.2.2 Plotting
Open code
<- "forest_pathway"
plotac <- "gitignore/figures"
path
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
assign(
plotac,ggplot(data_forest,
aes(x = pathway,
y = beta,
ymin = lower,
ymax = upper,
color = dataset)) +
geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet on clr-trasformed pathway",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$pathway, "**"),
as.character(data_forest$pathway)
$pathway
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Diet
, Country
, and the interaction term Diet:Country
as predictors. In the independent Czech validation cohort, Diet
was the only fixed-effect predictor. Patways validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are boldOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 10,
height = 12
) }
6.3.3 Boxplot
Open code
<- "boxplot_pathway"
plotac <- "gitignore/figures"
path
<- c("#F9FFAF", "#329243")
colo
<- function(variable) {
boxplot_cond <- ggboxplot(data_merged,
p x = "Diet",
y = variable,
fill = "Diet",
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = "Data",
title = variable,
ylab = "CLR(pathway proportion)"
+
)
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
<- map(diet_sensitive_pathways$outcome, boxplot_cond)
plots
# Create a matrix of plots
<- ggarrange(plotlist = plots, ncol = 4, nrow = 6,
plots_arranged common.legend = TRUE)
assign(plotac, plots_arranged)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
)
}
get(plotac)
7 Linear model VG duration
Next, we fit another series of linear models, this time modelling clr-transformed functional pathways (inferred from shotgun metagenomic sequencing) using the following fixed-effect predictors: duration of vegan status (Diet_duration
, scaled in tens of years), Country
, their interaction (Diet_duration × Country
), and Age
:
\[ \text{CLR(pathway proportion)} = \alpha + \beta_{1} \times \text{Country} + \beta_{2} \times \text{Diet duration} + \beta_{3} \times (\text{Country}:\text{Diet duration}) + \beta_{4} \times \text{Age} + \epsilon \]
This analysis includes only vegan participants, while omnivores are excluded. The aim was to test whether functional pathways that differ between vegans and omnivores also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger up- or down-regulation of diet-sensitive pathways compared to those who adopted the diet more recently.
Because longer vegan duration is likely correlated with age (e.g. a 20-year-old cannot have 20 years of vegan history, unlike a 40- or 60-year-old), we also adjusted for age in the models.
7.1 Get data
7.1.1 Training
Open code
<- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 1)
meta_trainIT <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 2) %>%
meta_trainCZ ::mutate(ID = paste0('T', ID),
dplyrSex = SEX) %>%
::select(-SEX)
dplyr
1:5,]
meta_trainIT[## ID COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
1:5,]
meta_trainCZ[## ID COHORT GRP Diet_duration Age Sex
## 1 T97 CZ_train OM 0 26.76438 M
## 2 T98 CZ_train OM 0 27.61370 F
## 3 T134 CZ_train OM 0 21.64384 F
## 4 T136 CZ_train OM 0 31.25479 M
## 5 T137 CZ_train OM 0 40.42192 F
<- bind_rows(meta_trainIT, meta_trainCZ) %>%
data_meta_original rename(`Sample` = `ID`)
1:5,]
data_meta_original[## Sample COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
<- data_pathways_original_clr %>%
data_pathways_original2 rename(`Sample` = `ID`) %>%
left_join(data_meta_original, by = 'Sample') %>%
select(Sample:Diet, COHORT:Sex, everything())
%>% dim()
data_pathways_original2 ## [1] 166 359
data_pathways_original2[1:4,
ncol(data_pathways_original2)-10):ncol(data_pathways_original2)
(
]## SO4ASSIM_PWY SULFATE_CYS_PWY TCA_GLYOX_BYPASS TCA THISYN_PWY
## 1 -0.1880565 0.4369528 -0.2921062 0.1011286 1.643560
## 2 -0.4462730 0.1693518 -0.5477345 0.1521391 1.478927
## 3 -2.4927554 -1.6876045 -2.1987741 -2.0383007 2.849858
## 4 -2.2425268 -1.4401533 -2.7500641 -2.5363305 2.799085
## THISYNARA_PWY THRESYN_PWY TRNA_CHARGING_PWY TRPSYN_PWY UDPNAGSYN_PWY
## 1 1.438471 1.880293 2.420052 1.424845 1.358719
## 2 0.884228 1.814359 2.601992 1.624480 1.603390
## 3 2.317259 3.015584 3.856117 1.818168 2.435054
## 4 2.327101 3.044502 3.956278 2.064342 3.331618
## VALSYN_PWY
## 1 2.479489
## 2 2.416387
## 3 3.539453
## 4 3.409071
1:4, 1:10]
data_pathways_original2[## Sample Diet COHORT GRP Diet_duration Age Sex Country Data X1CMET2_PWY
## 1 VOV002 OMNI IT_train OM 0 61.4 F IT IT_tr 2.186503
## 2 VOV003 OMNI IT_train OM 0 43.7 F IT IT_tr 2.008170
## 3 VOV004 OMNI IT_train OM 0 61.1 F IT IT_tr 3.418194
## 4 VOV006 OMNI IT_train OM 0 31.7 F IT IT_tr 3.174203
7.1.2 Validation
Open code
<- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 3) %>%
data_meta_valid rename(`Sample` = `ID`)
<- data_pathways_validation_clr %>%
data_pathways_valid2 rename(`Sample` = `ID`) %>%
left_join(data_meta_valid, by = 'Sample') %>%
select(Sample:Diet, COHORT:SEX, everything())
%>% dim()
data_pathways_valid2 ## [1] 101 359
data_pathways_valid2[1:4,
ncol(data_pathways_valid2)-10):ncol(data_pathways_valid2)
(
]## SO4ASSIM_PWY SULFATE_CYS_PWY TCA_GLYOX_BYPASS TCA THISYN_PWY
## 1 -0.8517185 -0.08304722 -3.365452 -3.033998 2.165097
## 2 -0.1495979 0.56899174 -3.314443 -2.782678 2.088872
## 3 -2.2760134 -1.47638331 -3.369634 -3.061441 2.328548
## 4 -3.5178606 -2.70918461 -3.528815 -3.179935 2.517770
## THISYNARA_PWY THRESYN_PWY TRNA_CHARGING_PWY TRPSYN_PWY UDPNAGSYN_PWY
## 1 2.410630 2.475191 3.147883 2.856516 2.047496
## 2 1.721339 2.516181 2.951190 2.587668 2.403907
## 3 2.304818 2.513833 3.060194 2.777382 2.104066
## 4 2.168090 2.770383 3.276645 3.078048 2.792120
## VALSYN_PWY
## 1 3.237373
## 2 3.172197
## 3 3.174123
## 4 3.403835
1:4, 1:10]
data_pathways_valid2[## Sample Diet COHORT GRP Diet_duration Age SEX Country Data X1CMET2_PWY
## 1 K4 VEGAN CZ_val VN <NA> 30.98904 M CZ valid 2.830535
## 2 K5 VEGAN CZ_val VN <NA> 30.66575 F CZ valid 2.582339
## 3 K7 VEGAN CZ_val VN <NA> 32.97808 F CZ valid 2.825255
## 4 K12 VEGAN CZ_val VN <NA> 27.89315 M CZ valid 3.022772
%>% select(Sample, Diet, GRP)
data_pathways_valid2 ## Sample Diet GRP
## 1 K4 VEGAN VN
## 2 K5 VEGAN VN
## 3 K7 VEGAN VN
## 4 K12 VEGAN VN
## 5 K13 VEGAN VN
## 6 K15 VEGAN VN
## 7 K16 VEGAN VN
## 8 K18 VEGAN VN
## 9 K19 VEGAN VN
## 10 K25 VEGAN VN
## 11 K26 VEGAN VN
## 12 K31 VEGAN VN
## 13 K32 VEGAN VN
## 14 K34 VEGAN VN
## 15 K35 VEGAN VN
## 16 K38 VEGAN VN
## 17 K39 VEGAN VN
## 18 K42 OMNI OM
## 19 K43 OMNI OM
## 20 K45 VEGAN VN
## 21 K46 VEGAN VN
## 22 K48 VEGAN VN
## 23 K49 VEGAN VN
## 24 K55 VEGAN VN
## 25 K56 VEGAN VN
## 26 K61 VEGAN VN
## 27 K62 VEGAN VN
## 28 K64 VEGAN VN
## 29 K65 VEGAN VN
## 30 K73 VEGAN VN
## 31 K74 VEGAN VN
## 32 K82 VEGAN VN
## 33 K85 VEGAN VN
## 34 K103 VEGAN VN
## 35 K106 VEGAN VN
## 36 K107 VEGAN VN
## 37 K114 VEGAN VN
## 38 K117 VEGAN VN
## 39 K119 VEGAN VN
## 40 K120 VEGAN VN
## 41 K123 OMNI OM
## 42 K124 OMNI OM
## 43 K126 OMNI OM
## 44 K127 OMNI OM
## 45 K130 OMNI OM
## 46 K131 OMNI OM
## 47 K143 OMNI OM
## 48 K150 OMNI OM
## 49 K151 OMNI OM
## 50 K154 OMNI OM
## 51 K155 OMNI OM
## 52 K175 OMNI OM
## 53 K176 OMNI OM
## 54 K178 OMNI OM
## 55 K179 OMNI OM
## 56 K182 OMNI OM
## 57 K183 OMNI OM
## 58 K198 OMNI OM
## 59 K199 OMNI OM
## 60 K201 OMNI OM
## 61 K202 OMNI OM
## 62 K205 VEGAN VN
## 63 K206 VEGAN VN
## 64 K209 OMNI OM
## 65 K210 OMNI OM
## 66 K216 VEGAN VN
## 67 K217 VEGAN VN
## 68 K219 VEGAN VN
## 69 K220 VEGAN VN
## 70 K222 OMNI OM
## 71 K223 OMNI OM
## 72 K226 VEGAN VN
## 73 K227 VEGAN VN
## 74 K230 OMNI OM
## 75 K231 OMNI OM
## 76 K234 OMNI OM
## 77 K235 OMNI OM
## 78 K238 OMNI OM
## 79 K239 OMNI OM
## 80 K250 VEGAN VN
## 81 K251 VEGAN VN
## 82 K253 OMNI OM
## 83 K254 OMNI OM
## 84 K258 VEGAN VN
## 85 K260 VEGAN VN
## 86 K261 VEGAN VN
## 87 K263 VEGAN VN
## 88 K264 VEGAN VN
## 89 K266 OMNI OM
## 90 K267 OMNI OM
## 91 K270 OMNI OM
## 92 K281 VEGAN VN
## 93 K285 VEGAN VN
## 94 K289 VEGAN VN
## 95 K291 VEGAN VN
## 96 K292 VEGAN VN
## 97 K298 OMNI OM
## 98 K299 OMNI OM
## 99 K318 OMNI OM
## 100 K329 OMNI OM
## 101 K330 OMNI OM
%>% select(Sample, Diet, GRP, Diet_duration) %>%
data_pathways_valid2 filter(
== 'VEGAN' & GRP == 'OM') | (Diet == 'OMNI' & GRP == 'VN')
(Diet
)## [1] Sample Diet GRP Diet_duration
## <0 rows> (or 0-length row.names)
7.2 Training cohort
7.2.1 Select data
Open code
<- data_pathways_original2 %>%
data_analysis mutate(Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
)%>%
) filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
select(1:8, Country_IT, all_of(diet_sensitive_pathways$outcome))
summary(data_analysis[ , 1:12])
## Sample Diet COHORT GRP
## Length:96 Length:96 Length:96 Length:96
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Diet_duration Age Sex Country
## Min. : 0.600 Min. :18.25 Length:96 Length:96
## 1st Qu.: 3.150 1st Qu.:27.57 Class :character Class :character
## Median : 5.000 Median :32.15 Mode :character Mode :character
## Mean : 5.492 Mean :34.97
## 3rd Qu.: 6.000 3rd Qu.:40.77
## Max. :20.000 Max. :60.70
## Country_IT CALVIN_PWY GALACTITOLCAT_PWY HEXITOLDEGSUPER_PWY
## Min. :-0.5000 Min. :1.496 Min. :-5.9819 Min. :-4.7341
## 1st Qu.:-0.5000 1st Qu.:2.395 1st Qu.:-4.9624 1st Qu.:-3.7146
## Median :-0.5000 Median :2.764 Median :-4.1624 Median :-2.9213
## Mean :-0.1042 Mean :2.698 Mean :-3.7681 Mean :-2.6037
## 3rd Qu.: 0.5000 3rd Qu.:2.928 3rd Qu.:-2.8106 3rd Qu.:-1.6531
## Max. : 0.5000 Max. :4.289 Max. : 0.3174 Max. : 0.8302
1:10, 1:12]
data_analysis[## Sample Diet COHORT GRP Diet_duration Age Sex Country Country_IT
## 1 VOV010 VEGAN IT_train VG 2.6 25.7 M IT 0.5
## 2 VOV012 VEGAN IT_train VG 3.0 55.7 F IT 0.5
## 3 VOV014 VEGAN IT_train VG 2.0 35.7 F IT 0.5
## 4 VOV018 VEGAN IT_train VG 3.0 33.5 F IT 0.5
## 5 VOV019 VEGAN IT_train VG 2.0 20.3 F IT 0.5
## 6 VOV020 VEGAN IT_train VG 5.0 24.7 M IT 0.5
## 7 VOV021 VEGAN IT_train VG 0.6 18.5 F IT 0.5
## 8 VOV029 VEGAN IT_train VG 7.0 52.3 F IT 0.5
## 9 VOV043 VEGAN IT_train VG 5.0 29.8 F IT 0.5
## 10 VOV044 VEGAN IT_train VG 5.0 32.2 F IT 0.5
## CALVIN_PWY GALACTITOLCAT_PWY HEXITOLDEGSUPER_PWY
## 1 2.011687 -3.3527234 -2.1492342
## 2 2.051551 -5.2216023 -3.9759989
## 3 2.105309 -5.8716201 -4.6238101
## 4 2.564119 -5.3142270 -4.0664170
## 5 2.460516 -2.8677805 -1.6570946
## 6 1.495842 -2.6777707 -1.4943911
## 7 2.047515 -0.4930437 0.1209159
## 8 2.789296 -4.8926198 -3.6448097
## 9 3.165458 -3.0580872 -1.8510735
## 10 2.246076 -4.0299535 -2.8231830
7.2.2 Define number of pathways and covariates
Open code
<- 9
n_covarites <- ncol(data_analysis) - n_covarites n_features
7.2.3 Create empty objects
Open code
<- vector('double', n_features)
outcome
<- vector('double', n_features)
est_Diet_duration_inCZ <- vector('double', n_features)
est_Diet_duration_inIT <- vector('double', n_features)
est_Diet_duration_avg
<- vector('double', n_features)
est_ITcountry_avg
<- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg
<- vector('double', n_features)
P_ITcountry_avg
<- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age
<- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg
<- vector('double', n_features)
CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features) CI_U_Diet_duration_avg
7.2.4 Estimate over outcomes
Open code
if(file.exists('gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis[, (i + n_covarites)]
data_analysis
## fit model
<- lm(outcome ~ Country_IT * Diet_duration + Age,
model data = data_analysis %>%
mutate(Diet_duration = (Diet_duration/10),
Age = (Age-33)/10))
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_duration | Country_IT,
at = list(Diet_duration = c(0, 1))
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis)[i + n_covarites]
outcome[i]
## country and age effect
<- summary(model)$coefficients[
est_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
est_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_duration'),][1]
CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
CI_U_Diet_duration_avg[i]
<- summary(model)$coefficients[
est_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- -contrast_emm[1,3]
est_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
P_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_Diet_duration_inCZ[i]
<- -contrast_emm[2,3]
est_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
P_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_Diet_duration_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
4
),
]
}
<- data.frame(
result_pathways
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
saveRDS(result_pathways,
'gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds')
}
<- readRDS('gitignore/lm_results/result_pathways_SGB30_VGdur_training.Rds') result_pathways
7.2.5 Show and save results
Open code
::kable(result_pathways %>%
kableExtra::select(
dplyr
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT%>%
) arrange(P_Diet_duration_avg),
caption = "Results of linear models, modelling clr-transformed pathways (functional pathways inferred from shotgun metagenomic sequencing, expressed as relative proportions) with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration × Country`), and `Age` as predictors, using training data only (Czech and Italian vegan cohorts). Only pathways with clr-transformed levels differing by diet (FDR < 0.05, average effect across both countries) are shown. `est` prefix: denotes estimated effects (regression coefficients), i.e. expected change in clr-transforemd pathway proportion per 10 years of vegan diet or age, or for country (Italy vs Czech Republic). `P`: p-value. `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix: effect averaged across cohorts, whereas `inCZ` and `inIT` indicate effects in the Czech or Italian cohort, respectively. All estimates in a single row are based on a single model with interaction",
escape = FALSE
)
outcome | est_ITcountry_avg | P_ITcountry_avg | est_Age | P_Age | est_Diet_duration_avg | P_Diet_duration_avg | est_Diet_duration_inCZ | P_Diet_duration_inCZ | est_Diet_duration_inIT | P_Diet_duration_inIT | diet_country_int | P_diet_country_int | CI_L_Diet_duration_avg | CI_U_Diet_duration_avg | CI_L_Diet_duration_inCZ | CI_U_Diet_duration_inCZ | CI_L_Diet_duration_inIT | CI_U_Diet_duration_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PWY0_42 | -0.2773973 | 0 | -0.1004082 | 0.3386596 | 0.8437349 | 0.0175709 | 0.1036314 | 0.7679825 | 1.5838384 | 0.0057482 | 1.4802070 | 0.0192055 | 0.1507842 | 1.5366857 | -0.5920523 | 0.7993152 | 0.4716463 | 2.6960305 |
PWY_5104 | -1.1008129 | 0 | -0.1255575 | 0.3748064 | 0.8819153 | 0.0640597 | -0.3115837 | 0.5111188 | 2.0754142 | 0.0072186 | 2.3869979 | 0.0053976 | -0.0526011 | 1.8164317 | -1.2497857 | 0.6266184 | 0.5755072 | 3.5753213 |
PWY0_1586 | -0.1532302 | 0 | -0.1244985 | 0.0668966 | 0.2778437 | 0.2187251 | 0.0725126 | 0.7482276 | 0.4831749 | 0.1829736 | 0.4106623 | 0.3064622 | -0.1677868 | 0.7234743 | -0.3748755 | 0.5199007 | -0.2320661 | 1.1984158 |
PWY_5747 | -0.5426096 | 0 | 0.1577512 | 0.3724702 | 0.6957188 | 0.2399696 | -0.1858384 | 0.7537062 | 1.5772759 | 0.0982099 | 1.7631142 | 0.0955870 | -0.4726613 | 1.8640988 | -1.3588264 | 0.9871497 | -0.2979845 | 3.4525362 |
PWY_5367 | 0.0661884 | 0 | -0.0863856 | 0.4895611 | 0.4896242 | 0.2423707 | 0.1337625 | 0.7495402 | 0.8454859 | 0.2087336 | 0.7117233 | 0.3390805 | -0.3368826 | 1.3161310 | -0.6960039 | 0.9635290 | -0.4810648 | 2.1720365 |
TRPSYN_PWY | -0.4380714 | 0 | -0.0620052 | 0.3493535 | 0.2170546 | 0.3270580 | 0.0714053 | 0.7475225 | 0.3627038 | 0.3076609 | 0.2912985 | 0.4594053 | -0.2205018 | 0.6546109 | -0.3678767 | 0.5106874 | -0.3395780 | 1.0649857 |
PWY_7237 | -0.4726868 | 0 | -0.1277910 | 0.1446405 | 0.2753235 | 0.3453469 | 0.0708106 | 0.8085468 | 0.4798364 | 0.3057300 | 0.4090259 | 0.4305587 | -0.3012190 | 0.8518660 | -0.5080058 | 0.6496269 | -0.4455194 | 1.4051922 |
PWY_6284 | 0.1978288 | 0 | -0.0630640 | 0.6001771 | 0.3609705 | 0.3700236 | 0.1088588 | 0.7872983 | 0.6130822 | 0.3429492 | 0.5042234 | 0.4813548 | -0.4349332 | 1.1568742 | -0.6901838 | 0.9079015 | -0.6643502 | 1.8905146 |
PWY_6629 | -0.2677474 | 0 | -0.0508378 | 0.3895271 | 0.1611869 | 0.4141907 | 0.0707759 | 0.7205976 | 0.2515979 | 0.4270884 | 0.1808220 | 0.6063980 | -0.2291330 | 0.5515069 | -0.3210834 | 0.4626353 | -0.3748691 | 0.8780649 |
PWY_1269 | 0.0542379 | 0 | -0.0743909 | 0.4739454 | 0.2771687 | 0.4248289 | -0.0012748 | 0.9970775 | 0.5556122 | 0.3193538 | 0.5568870 | 0.3678700 | -0.4095946 | 0.9639319 | -0.6907467 | 0.6881970 | -0.5466489 | 1.6578733 |
GALACTITOLCAT_PWY | -0.7587257 | 0 | -0.1317090 | 0.4444355 | 0.4407269 | 0.4438474 | -0.0974639 | 0.8658540 | 0.9789177 | 0.2900121 | 1.0763816 | 0.2940863 | -0.6976062 | 1.5790601 | -1.2402865 | 1.0453588 | -0.8481171 | 2.8059525 |
HEXITOLDEGSUPER_PWY | -0.6644502 | 0 | -0.0964017 | 0.5397155 | 0.3674010 | 0.4844670 | -0.0815175 | 0.8770494 | 0.8163194 | 0.3337190 | 0.8978369 | 0.3376733 | -0.6721866 | 1.4069885 | -1.1252051 | 0.9621701 | -0.8522278 | 2.4848666 |
NONOXIPENT_PWY | -0.1145726 | 0 | -0.0847598 | 0.2142740 | 0.1431280 | 0.5290129 | 0.0976397 | 0.6686444 | 0.1886162 | 0.6051177 | 0.0909765 | 0.8219570 | -0.3067688 | 0.5930247 | -0.3540314 | 0.5493108 | -0.5334720 | 0.9107045 |
PROPFERM_PWY | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
PWY_8188 | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
PWY_8189 | -0.6354697 | 0 | 0.0042961 | 0.9746762 | 0.2765237 | 0.5413465 | 0.1743986 | 0.7010282 | 0.3786488 | 0.6022046 | 0.2042502 | 0.7997380 | -0.6193986 | 1.1724460 | -0.7250571 | 1.0738544 | -1.0593144 | 1.8166120 |
PWY_6293 | -0.6016035 | 0 | 0.1982998 | 0.1229276 | -0.1430936 | 0.7374860 | -0.6189605 | 0.1508899 | 0.3327732 | 0.6273239 | 0.9517337 | 0.2121950 | -0.9885097 | 0.7023224 | -1.4677108 | 0.2297898 | -1.0241270 | 1.6896734 |
PWY_8178 | -0.0496325 | 0 | -0.0790724 | 0.2486419 | 0.0534566 | 0.8148275 | 0.0533209 | 0.8160012 | 0.0535923 | 0.8836815 | 0.0002714 | 0.9994669 | -0.3986201 | 0.5055333 | -0.4005387 | 0.5071806 | -0.6719948 | 0.7791794 |
PWY_801 | -0.6715556 | 0 | 0.2036234 | 0.1236193 | -0.0705987 | 0.8722625 | -0.6203486 | 0.1615842 | 0.4791513 | 0.4970850 | 1.0994999 | 0.1616967 | -0.9403352 | 0.7991378 | -1.4935153 | 0.2528181 | -0.9167835 | 1.8750860 |
CALVIN_PWY | -0.0743894 | 0 | -0.0533955 | 0.3990599 | 0.0301266 | 0.8865682 | 0.0180314 | 0.9322252 | 0.0422218 | 0.9008715 | 0.0241904 | 0.9486865 | -0.3882129 | 0.4484661 | -0.4019580 | 0.4380208 | -0.6292167 | 0.7136603 |
HSERMETANA_PWY | 0.0093006 | 0 | 0.0204975 | 0.7260141 | -0.0029857 | 0.9878091 | 0.0471015 | 0.8102860 | -0.0530730 | 0.8656333 | -0.1001744 | 0.7733788 | -0.3900740 | 0.3841025 | -0.3415134 | 0.4357163 | -0.6743529 | 0.5682070 |
PWY_5971 | 0.7845608 | 0 | -0.0940461 | 0.5469506 | 0.0002204 | 0.9996626 | -0.0740198 | 0.8875266 | 0.0744606 | 0.9290834 | 0.1484804 | 0.8728599 | -1.0323683 | 1.0328091 | -1.1106809 | 0.9626414 | -1.5828533 | 1.7317746 |
Open code
if(file.exists('gitignore/lm_results/result_pathways_VGdur_training.csv') == FALSE){
write.table(result_pathways,
'gitignore/lm_results/result_pathways_VGdur_training.csv', row.names = FALSE)
}
7.3 Validation cohort
Open code
<- data_pathways_valid2 %>%
data_analysis_pathways filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>%
dplyr::select(
dplyr
Diet_duration, Age,all_of(diet_sensitive_pathways$outcome)
)
summary(data_analysis_pathways[ , 1:12])
## Diet_duration Age CALVIN_PWY GALACTITOLCAT_PWY
## Min. : 0.170 Min. :21.54 Min. :2.242 Min. :-5.686
## 1st Qu.: 4.062 1st Qu.:30.05 1st Qu.:2.720 1st Qu.:-5.191
## Median : 6.000 Median :32.81 Median :2.883 Median :-4.437
## Mean : 6.790 Mean :32.37 Mean :2.865 Mean :-3.758
## 3rd Qu.:10.000 3rd Qu.:34.24 3rd Qu.:3.052 3rd Qu.:-2.130
## Max. :15.000 Max. :44.08 Max. :3.372 Max. : 1.556
## HEXITOLDEGSUPER_PWY HSERMETANA_PWY NONOXIPENT_PWY PROPFERM_PWY
## Min. :-4.440 Min. :1.852 Min. :1.932 Min. :-3.650
## 1st Qu.:-3.945 1st Qu.:2.325 1st Qu.:2.398 1st Qu.:-3.072
## Median :-3.199 Median :2.451 Median :2.629 Median :-2.934
## Mean :-2.646 Mean :2.443 Mean :2.610 Mean :-2.836
## 3rd Qu.:-1.018 3rd Qu.:2.602 3rd Qu.:2.832 3rd Qu.:-2.763
## Max. : 0.303 Max. :3.075 Max. :3.384 Max. :-1.320
## PWY_1269 PWY_5104 PWY_5367 PWY_5747
## Min. :-1.5780 Min. :-4.596 Min. :-5.0226 Min. :-6.481
## 1st Qu.: 0.3895 1st Qu.:-4.332 1st Qu.:-4.3303 1st Qu.:-6.231
## Median : 0.7839 Median :-4.173 Median :-2.6540 Median :-6.127
## Mean : 0.8536 Mean :-3.878 Mean :-2.6949 Mean :-5.946
## 3rd Qu.: 1.2113 3rd Qu.:-3.994 3rd Qu.:-1.8717 3rd Qu.:-6.027
## Max. : 2.3199 Max. :-1.523 Max. : 0.7546 Max. :-2.981
7.3.0.1 Define number of pathways and covariates
Open code
<- 2
n_covarites <- ncol(data_analysis_pathways) - n_covarites n_features
7.3.0.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
est_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features) CI_U_VGduration
7.3.0.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_pathways[, (i + n_covarites)]
data_analysis_pathways
## fit model
<- lm(outcome ~ Diet_duration + Age,
model data = data_analysis_pathways %>%
mutate(Diet_duration = (Diet_duration)/10,
Age = (Age-33)/10))
## save results
<- names(data_analysis_pathways)[i + n_covarites]
outcome[i]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_duration"), ][1]
CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]
CI_U_VGduration[i]
<- summary(model)$coefficients[
est_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- summary(model)$coefficients[
est_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
] }
7.3.0.4 Results table
Open code
<- data.frame(
result_pathways_val
outcome,
est_VGduration, P_VGduration,
CI_L_VGduration, CI_U_VGduration,
est_Age, P_Age
)
::kable(result_pathways_val %>%
kableExtraarrange(P_VGduration),
caption = "Results of linear models estimating the effect of vegan diet status duration and age on clr-transformed pathway proportions (functional pathways inferred from shotgun metagenomic sequencing). Only pathways with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. `est`: regression coefficient, i.e. expected change in clr-transformed pathway proportion per +10 years of vegan diet and +10 years of age respectively. `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model"
)
outcome | est_VGduration | P_VGduration | CI_L_VGduration | CI_U_VGduration | est_Age | P_Age |
---|---|---|---|---|---|---|
PWY0_1586 | -0.2034972 | 0.1217437 | -0.4631027 | 0.0561082 | -0.1596314 | 0.1513086 |
PWY_5971 | 0.4404654 | 0.1852968 | -0.2180837 | 1.0990145 | 0.3086420 | 0.2720620 |
PWY_8188 | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
PROPFERM_PWY | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
PWY_8189 | -0.2530201 | 0.2466083 | -0.6864001 | 0.1803598 | -0.0053484 | 0.9767893 |
PWY_6284 | 0.4282847 | 0.4363589 | -0.6676821 | 1.5242515 | 0.3688943 | 0.4288990 |
PWY_5367 | 0.5092329 | 0.4401775 | -0.8048535 | 1.8233193 | 0.4477825 | 0.4232473 |
PWY_5104 | 0.2435981 | 0.4856992 | -0.4527699 | 0.9399660 | 0.1381028 | 0.6404712 |
PWY_5747 | -0.2012648 | 0.5319405 | -0.8433037 | 0.4407740 | 0.7033059 | 0.0123152 |
NONOXIPENT_PWY | -0.0511207 | 0.6947732 | -0.3111956 | 0.2089541 | -0.3182347 | 0.0055092 |
TRPSYN_PWY | -0.0291099 | 0.8042250 | -0.2636445 | 0.2054247 | -0.2272285 | 0.0258614 |
PWY_1269 | 0.0913651 | 0.8077977 | -0.6587243 | 0.8414546 | -0.5837282 | 0.0710437 |
PWY_8178 | -0.0294339 | 0.8122229 | -0.2768912 | 0.2180233 | -0.3027070 | 0.0055220 |
HSERMETANA_PWY | -0.0245491 | 0.8165366 | -0.2358853 | 0.1867871 | -0.2385405 | 0.0100427 |
PWY_801 | 0.0487496 | 0.8200686 | -0.3793120 | 0.4768112 | 0.2074197 | 0.2563364 |
PWY_6293 | 0.0484819 | 0.8204130 | -0.3780595 | 0.4750232 | 0.2063192 | 0.2571641 |
PWY_7237 | 0.0314140 | 0.8652072 | -0.3382422 | 0.4010702 | -0.4885971 | 0.0028790 |
PWY0_42 | 0.0433593 | 0.8785533 | -0.5234589 | 0.6101774 | 0.3927582 | 0.1068248 |
CALVIN_PWY | -0.0101717 | 0.9288856 | -0.2378529 | 0.2175096 | -0.2283578 | 0.0212871 |
PWY_6629 | 0.0075671 | 0.9437346 | -0.2066243 | 0.2217585 | -0.1913790 | 0.0391760 |
HEXITOLDEGSUPER_PWY | -0.0226809 | 0.9746535 | -1.4487920 | 1.4034301 | 0.1999101 | 0.7411747 |
GALACTITOLCAT_PWY | 0.0239274 | 0.9768329 | -1.6221390 | 1.6699939 | 0.2729324 | 0.6960884 |
Open code
if (file.exists("gitignore/result_pathways_VGdur_valid.csv") == FALSE) {
write.table(result_pathways_val,
"gitignore/result_pathways_VGdur_valid.csv",
row.names = FALSE
) }
7.4 Forest plot
7.4.1 Prepare data
Open code
## subset result tables
<- result_pathways %>%
result_pathways_subset filter(outcome %in% diet_sensitive_pathways$outcome)
<- result_pathways_val %>%
result_pathways_val_subset filter(outcome %in% diet_sensitive_pathways$outcome)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_pathways$outcome, 3),
beta = c(
$est_Diet_duration_inCZ,
result_pathways_subset$est_Diet_duration_inIT,
result_pathways_subset$est_VGduration
result_pathways_val_subset
),lower = c(
$CI_L_Diet_duration_inCZ,
result_pathways_subset$CI_L_Diet_duration_inIT,
result_pathways_subset$CI_L_VGduration
result_pathways_val_subset
),upper = c(
$CI_U_Diet_duration_inCZ,
result_pathways_subset$CI_U_Diet_duration_inIT,
result_pathways_subset$CI_U_VGduration
result_pathways_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% select(outcome, pathway),
val_res by = 'outcome') %>%
left_join(
%>% mutate(outcome = pathway) %>% select(-pathway),
elacoef by = 'outcome') %>%
mutate(outcome = factor(pathway, levels = validation_order))
7.4.1.1 Plotting
Open code
<- "forest_plot_pathways_VGduration"
plotac <- "gitignore/figures"
path
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
assign(plotac, ggplot(
aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
data_forest, +
) geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet duration (+10y) on CLR(relative level)",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
$outcome
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Diet_duration
, Country
, their interaction (Diet_duration:Country
), and Age
as fixed-effect predictors. Age
was included as it correlates with Diet_duration
and could act as a confounder. In the Czech validation cohort, Diet_duration
and Age
were fixed-effect predictors. Diet-sensitive pathways are shown in bold, in the same order as in the forest plot of vegan–omnivore differencesOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
) }
8 Summary info about Patways
8.1 Proportions of unmapped and unintegrated
8.1.1 Czech trainng cohort
Open code
<- read.delim(
StratfiedPaths_originalCZ "gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway))
dim(StratfiedPaths_originalCZ)
## [1] 580 89
<- colSums(StratfiedPaths_originalCZ[, -1])
all_paths
<- StratfiedPaths_originalCZ[1, -1] / all_paths
unmapped <- StratfiedPaths_originalCZ[2, -1] / all_paths
unintegrated <- 1 - unintegrated - unmapped
classified
::kable(
kableExtra::bind_rows(
dplyrquantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
%>%
) ::mutate(
dplyrcategory = c("unmapped", "unintegrated", "classified")
%>%
) ::select(category, everything()),
dplyrcaption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Czech training cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unmapped | 0.1463292 | 0.1816258 | 0.2156915 | 0.2538069 | 0.3634129 |
unintegrated | 0.5993433 | 0.7027560 | 0.7373852 | 0.7718285 | 0.8265473 |
classified | 0.0271235 | 0.0416706 | 0.0438855 | 0.0460305 | 0.0594788 |
8.1.2 Italian training cohort
Open code
<- read.delim(
StratifiedPaths_originalIT "gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway))
dim(StratifiedPaths_originalIT)
## [1] 488 79
<- colSums(StratifiedPaths_originalIT[, -1])
all_paths
<- StratifiedPaths_originalIT[1, -1] / all_paths
unmapped <- StratifiedPaths_originalIT[2, -1] / all_paths
unintegrated <- 1 - unintegrated - unmapped
classified
::kable(
kableExtra::bind_rows(
dplyrquantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
%>%
) ::mutate(
dplyrcategory = c("unmapped", "unintegrated", "classified")
%>%
) ::select(category, everything()),
dplyrcaption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Italian training cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unmapped | 0.1291023 | 0.1754808 | 0.2275788 | 0.2617836 | 0.4390892 |
unintegrated | 0.5217834 | 0.6929692 | 0.7215643 | 0.7704262 | 0.8291034 |
classified | 0.0288690 | 0.0432038 | 0.0469259 | 0.0526642 | 0.0798983 |
8.1.3 Czech validation cohort
Open code
<- read.delim(
StratifiedPaths_validation "gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway))
dim(StratifiedPaths_validation)
## [1] 580 104
<- colSums(StratifiedPaths_validation[, -1])
all_paths
<- StratifiedPaths_validation[1, -1] / all_paths
unmapped <- StratifiedPaths_validation[2, -1] / all_paths
unintegrated <- 1 - unintegrated - unmapped
classified
::kable(
kableExtra::bind_rows(
dplyrquantile(as.numeric(unmapped), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(unintegrated), probs = c(0, 0.25, 0.5, 0.75, 1)),
quantile(as.numeric(classified), probs = c(0, 0.25, 0.5, 0.75, 1))
%>%
) ::mutate(
dplyrcategory = c("unmapped", "unintegrated", "classified")
%>%
) ::select(category, everything()),
dplyrcaption = "Summary statistics (minimum, 25th, 50th, 75th, and maximum) of the proportions of unmapped, unintegrated, and classified pathway reads across samples from the Czech validation cohort. 'Unmapped': reads not aligned to UniRef90. 'Unintegrated': reads aligned to UniRef90 gene families but not assigned to any MetaCyc pathway. 'Classified': reads assigned to known MetaCyc pathways (unstratified community-level abundances)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unmapped | 0.1363081 | 0.1825077 | 0.2098586 | 0.2292549 | 0.3303064 |
unintegrated | 0.6290565 | 0.7209362 | 0.7419081 | 0.7664460 | 0.8230822 |
classified | 0.0369937 | 0.0444025 | 0.0488882 | 0.0517536 | 0.0616683 |
Open code
<- read.delim(
data_path_validation "gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t"
%>%
) filter(!grepl("\\|", X..Pathway)) %>%
select(X..Pathway)
dim(data_path_validation)
## [1] 580 1
8.2 Taxon-assigned vs. taxon-unassigned pathways
This refers to the proportion of reads that were functionally assigned to a MetaCyc pathway but whose taxonomic origin could not be determined, from the all classified reads.
8.2.1 Czech trainng cohort
Open code
<- read.delim(
Paths_originalCZ "gitignore/data/pathw/Pathway_abundance_MetaCyc_CZ_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
%>%
) filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
<- names(Paths_originalCZ)[1]
name_col
<- Paths_originalCZ %>% filter(!grepl("\\|", `# Pathway`))
unstrat dim(unstrat)
## [1] 578 89
1:10, 1:3]
unstrat[## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 4-HYDROXYMANDELATE-DEGRADATION-PWY: 4-hydroxymandelate degradation
## 5 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 6 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 7 ANAEROFRUCAT-PWY: homolactic fermentation
## 8 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 9 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 10 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## T36 T47
## 1 0.000 0.00000
## 2 7206.790 4995.96065
## 3 0.000 0.00000
## 4 0.000 0.00000
## 5 0.000 0.00000
## 6 0.000 22.69929
## 7 3694.568 2757.45106
## 8 5606.681 6779.42061
## 9 1129.840 467.06740
## 10 0.000 0.00000
<- Paths_originalCZ %>% filter(grepl("\\|", `# Pathway`))
strat <- strat
stratCZ
dim(strat)
## [1] 20794 89
1:10, 1:3]
strat[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_clarus
## T36 T47
## 1 1365.836 0.000000
## 2 0.000 0.000000
## 3 0.000 0.000000
## 4 0.000 0.000000
## 5 0.000 0.000000
## 6 0.000 0.000000
## 7 0.000 13.076579
## 8 0.000 5.707278
## 9 0.000 0.000000
## 10 0.000 0.000000
<- strat %>%
strat_unclassified filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 425 89
1:10, 1:3]
strat_unclassified[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 AEROBACTINSYN-PWY: aerobactin biosynthesis|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## T36 T47
## 1 850.22127 735.377758
## 2 0.00000 0.000000
## 3 0.00000 0.000000
## 4 1236.08342 69.609446
## 5 1582.32882 1230.262897
## 6 53.57712 60.195263
## 7 0.00000 0.000000
## 8 17.32423 7.024764
## 9 1881.25353 1426.920220
## 10 1578.50191 1479.972027
<- quantile(
quants colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
<- as.data.frame(t(quants))
quants_df names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
$category <- "unclassified_within_taxon-assigned"
quants_df
::kable(
kableExtra%>% select(category, everything()),
quants_df caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Czech training cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified_within_taxon-assigned | 0.024712 | 0.0665779 | 0.0909481 | 0.1279512 | 0.3327366 |
8.2.2 Italian trainng cohort
Open code
<- read.delim(
Paths_originalIT "gitignore/data/pathw/Pathway_abundance_MetaCyc_IT_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
%>%
) filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
<- names(Paths_originalCZ)[1]
name_col
<- Paths_originalIT %>% filter(!grepl("\\|", `# Pathway`))
unstrat dim(unstrat)
## [1] 486 79
1:10, 1:3]
unstrat[## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 5 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 6 ANAEROFRUCAT-PWY: homolactic fermentation
## 7 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 8 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 9 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## 10 ARGININE-SYN4-PWY: L-ornithine biosynthesis II
## VOV002 VOV003
## 1 0.00000 0.00000
## 2 5831.38667 15090.47182
## 3 10.16048 0.00000
## 4 0.00000 27.10151
## 5 0.00000 375.52244
## 6 2392.63624 6594.16155
## 7 5156.72228 14176.11481
## 8 1649.41125 3106.04621
## 9 73.37596 0.00000
## 10 3006.12980 5661.73935
<- Paths_originalIT %>% filter(grepl("\\|", `# Pathway`))
strat <- strat
stratIT
dim(strat)
## [1] 18706 79
1:10, 1:3]
strat[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_An66
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## VOV002 VOV003
## 1 0.00000 268.83988
## 2 49.94327 322.46788
## 3 16.31622 0.00000
## 4 389.90038 623.18145
## 5 0.00000 0.00000
## 6 843.30879 0.00000
## 7 0.00000 0.00000
## 8 0.00000 0.00000
## 9 36.86016 91.25667
## 10 0.00000 0.00000
<- strat %>%
strat_unclassified filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 440 79
1:10, 1:3]
strat_unclassified[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## VOV002 VOV003
## 1 592.735952 1831.3830
## 2 8.815428 0.0000
## 3 0.000000 0.0000
## 4 211.424264 819.0017
## 5 453.479555 1874.8780
## 6 65.875697 438.1747
## 7 20.241094 0.0000
## 8 192.409181 200.4070
## 9 603.938329 4149.4989
## 10 636.519706 4733.9368
<- quantile(
quants colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
<- as.data.frame(t(quants))
quants_df names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
$category <- "unclassified_within_taxon-assigned"
quants_df
::kable(
kableExtra%>% select(category, everything()),
quants_df caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Italian training cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified_within_taxon-assigned | 0.0357561 | 0.0851309 | 0.1141805 | 0.1542638 | 0.4046326 |
8.2.3 Validation cohort
Open code
<- read.delim(
Paths_validation "gitignore/data/pathw/Pathway_abundance_MetaCyc_Validation_humann.tsv",
header = TRUE, sep = "\t", check.names = FALSE
%>%
) filter(
!grepl("UNMAPPED", `# Pathway`),
!grepl("UNINTEGRATED", `# Pathway`)
)
<- Paths_validation %>% filter(!grepl("\\|", `# Pathway`))
unstrat dim(unstrat)
## [1] 578 104
1:10, 1:3]
unstrat[## # Pathway
## 1 12DICHLORETHDEG-PWY: 1,2-dichloroethane degradation
## 2 1CMET2-PWY: folate transformations III (E. coli)
## 3 3-HYDROXYPHENYLACETATE-DEGRADATION-PWY: 4-hydroxyphenylacetate degradation
## 4 4-HYDROXYMANDELATE-DEGRADATION-PWY: 4-hydroxymandelate degradation
## 5 AEROBACTINSYN-PWY: aerobactin biosynthesis
## 6 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast
## 7 ANAEROFRUCAT-PWY: homolactic fermentation
## 8 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)
## 9 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis
## 10 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation
## 4 5
## 1 0.0000 0.0000
## 2 6827.5671 6081.1138
## 3 0.0000 0.0000
## 4 0.0000 0.0000
## 5 0.0000 0.0000
## 6 0.0000 0.0000
## 7 2021.6986 4361.0989
## 8 8939.6788 8361.6527
## 9 796.0169 316.3769
## 10 0.0000 0.0000
<- Paths_validation %>% filter(grepl("\\|", `# Pathway`))
strat
<- strat
stratVAL
dim(strat)
## [1] 20794 104
1:10, 1:3]
strat[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|g__Akkermansia.s__Akkermansia_muciniphila
## 2 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_finegoldii
## 3 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_onderdonkii
## 4 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_putredinis
## 5 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_sp_CAG_268
## 6 1CMET2-PWY: folate transformations III (E. coli)|g__Alistipes.s__Alistipes_timonensis
## 7 1CMET2-PWY: folate transformations III (E. coli)|g__Anaerostipes.s__Anaerostipes_hadrus
## 8 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_caccae
## 9 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_cellulosilyticus
## 10 1CMET2-PWY: folate transformations III (E. coli)|g__Bacteroides.s__Bacteroides_clarus
## 4 5
## 1 30.77154 0.00000
## 2 0.00000 0.00000
## 3 0.00000 0.00000
## 4 73.75813 204.65122
## 5 0.00000 0.00000
## 6 0.00000 0.00000
## 7 54.62160 37.22825
## 8 0.00000 0.00000
## 9 0.00000 0.00000
## 10 0.00000 0.00000
<- strat %>%
strat_unclassified filter(grepl("\\|unclassified", `# Pathway`))
dim(strat_unclassified)
## [1] 425 104
1:10, 1:3]
strat_unclassified[## # Pathway
## 1 1CMET2-PWY: folate transformations III (E. coli)|unclassified
## 2 AEROBACTINSYN-PWY: aerobactin biosynthesis|unclassified
## 3 ALLANTOINDEG-PWY: superpathway of allantoin degradation in yeast|unclassified
## 4 ANAEROFRUCAT-PWY: homolactic fermentation|unclassified
## 5 ANAGLYCOLYSIS-PWY: glycolysis III (from glucose)|unclassified
## 6 ARG+POLYAMINE-SYN: superpathway of arginine and polyamine biosynthesis|unclassified
## 7 ARGDEG-PWY: superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation|unclassified
## 8 ARGININE-SYN4-PWY: L-ornithine biosynthesis II|unclassified
## 9 ARGSYN-PWY: L-arginine biosynthesis I (via L-ornithine)|unclassified
## 10 ARGSYNBSUB-PWY: L-arginine biosynthesis II (acetyl cycle)|unclassified
## 4 5
## 1 822.03841 701.9697
## 2 0.00000 0.0000
## 3 0.00000 0.0000
## 4 179.78691 259.8946
## 5 845.21715 793.1012
## 6 0.00000 0.0000
## 7 0.00000 0.0000
## 8 89.80743 0.0000
## 9 1308.48866 1227.9350
## 10 1658.82435 1324.9933
<- quantile(
quants colSums(strat_unclassified[, -1]) / colSums(unstrat[, -1]),
probs = c(0, 0.25, 0.5, 0.75, 1)
)
<- as.data.frame(t(quants))
quants_df names(quants_df) <- c("0%", "25%", "50%", "75%", "100%")
$category <- "unclassified_within_taxon-assigned"
quants_df
::kable(
kableExtra%>% select(category, everything()),
quants_df caption = "Summary statistics (minimum, 25th, 50th, 75th, maximum) of the proportion of pathway signal that was taxonomically un-assigned, relative to all classified pathways (Czech validation cohort). 'Unclassified pathways': MetaCyc pathway abundances that HUMAnN could not attribute to specific taxa (rows labeled '|unclassified' in the stratified output)"
)
category | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified_within_taxon-assigned | 0.0259304 | 0.0801659 | 0.1083844 | 0.1322093 | 0.2306524 |
8.3 Taxa contribution to diet-sensitive pathways
8.3.1 PWY-5367: petroselinate biosynthesis
8.3.1.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Blautia.s__Ruminococcus_torques | 0 | 0.1836 | 0.9822 | 1.0000 | 1.0000 |
g__Escherichia.s__Escherichia_coli | 0 | 0.0000 | 0.0000 | 0.0596 | 1.0000 |
g__Citrobacter.s__Citrobacter_freundii | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
unclassified | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
g__Enterococcus.s__Enterococcus_faecalis | 0 | 0.0000 | 0.0000 | 0.0000 | 0.3800 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1882 |
g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1190 |
g__Streptococcus.s__Streptococcus_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1078 |
g__Escherichia.s__Escherichia_fergusonii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0843 |
g__Escherichia.s__Escherichia_marmotae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0726 |
8.3.1.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.6316 | 1 | 1.0000 |
g__Blautia.s__Ruminococcus_torques | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0.0000 | 0 | 1.0000 |
unclassified | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0 | 0.0000 | 0 | 0.9279 |
g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0.0000 | 0 | 0.7381 |
g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0.0000 | 0 | 0.7378 |
g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0.0000 | 0 | 0.4592 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0.0000 | 0 | 0.4285 |
8.3.1.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Blautia.s__Ruminococcus_torques | 0 | 1 | 1 | 1 | 1.0000 |
unclassified | 0 | 0 | 0 | 0 | 1.0000 |
g__Enterococcus.s__Enterococcus_hirae | 0 | 0 | 0 | 0 | 0.8935 |
g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0 | 0 | 0.8379 |
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.6883 |
g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0 | 0 | 0.5535 |
g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0 | 0 | 0.4465 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.3625 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.3454 |
g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.3117 |
8.3.2 NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I
8.3.2.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0535 | 0.1265 | 0.2137 | 0.2827 | 0.6020 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0125 | 0.0417 | 0.1063 | 0.2775 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0131 | 0.0404 | 0.0732 | 0.1831 |
g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0000 | 0.0299 | 0.0727 | 0.4486 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0123 | 0.0266 | 0.0518 | 0.1936 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0057 | 0.0208 | 0.0565 | 0.3011 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0160 | 0.0556 | 0.5527 |
g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0015 | 0.0136 | 0.0295 | 0.0907 |
g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0000 | 0.0109 | 0.0300 | 0.2507 |
g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0000 | 0.0108 | 0.0306 | 0.1724 |
8.3.2.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0061 | 0.2064 | 0.3930 | 0.5533 | 0.8199 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0024 | 0.0196 | 0.0485 | 0.4152 |
g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0018 | 0.0169 | 0.0475 | 0.1819 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0020 | 0.0167 | 0.0388 | 0.4557 |
g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0007 | 0.0159 | 0.0401 | 0.3181 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0070 | 0.0316 | 0.2220 |
g__Flavonifractor.s__Flavonifractor_plautii | 0.0000 | 0.0000 | 0.0031 | 0.0200 | 0.0997 |
g__Agathobaculum.s__Agathobaculum_butyriciproducens | 0.0000 | 0.0000 | 0.0020 | 0.0121 | 0.0645 |
g__Escherichia.s__Escherichia_coli | 0.0000 | 0.0000 | 0.0000 | 0.0717 | 0.7774 |
g__Roseburia.s__Roseburia_hominis | 0.0000 | 0.0000 | 0.0000 | 0.0121 | 0.3341 |
8.3.2.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("NONOXIPENT-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `NONOXIPENT-PWY: pentose phosphate pathway (non-oxidative branch) I` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0353 | 0.1040 | 0.1365 | 0.2026 | 0.4451 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0403 | 0.0730 | 0.0974 | 0.2555 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0330 | 0.0591 | 0.1089 | 0.2600 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0047 | 0.0484 | 0.1113 | 0.3453 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0196 | 0.0439 | 0.1011 | 0.3910 |
g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0264 | 0.0424 | 0.0685 | 0.2160 |
g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0257 | 0.0365 | 0.0512 | 0.2285 |
g__Blautia.s__Blautia_obeum | 0.0003 | 0.0218 | 0.0343 | 0.0457 | 0.1634 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0198 | 0.0334 | 0.0540 | 0.2550 |
g__Blautia.s__Blautia_wexlerae | 0.0000 | 0.0114 | 0.0210 | 0.0359 | 0.1837 |
8.3.3 TRPSYN-PWY: L-tryptophan biosynthesis
8.3.3.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0171 | 0.0970 | 0.1647 | 0.2772 | 0.5479 |
g__Bacteroides.s__Bacteroides_vulgatus | 0.0000 | 0.0261 | 0.0950 | 0.2094 | 0.7458 |
g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0000 | 0.0312 | 0.0656 | 0.2780 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0040 | 0.0247 | 0.0743 | 0.3175 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0030 | 0.0209 | 0.0405 | 0.1779 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0072 | 0.0179 | 0.0310 | 0.1868 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0000 | 0.0130 | 0.0348 | 0.2000 |
g__Ruminococcus.s__Ruminococcus_bromii | 0.0000 | 0.0000 | 0.0115 | 0.0505 | 0.3164 |
g__Blautia.s__Blautia_obeum | 0.0000 | 0.0030 | 0.0098 | 0.0234 | 0.2721 |
g__Firmicutes_unclassified.s__Firmicutes_bacterium_CAG_41 | 0.0000 | 0.0000 | 0.0095 | 0.0173 | 0.0639 |
8.3.3.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0 | 0.1310 | 0.2364 | 0.3691 | 0.8064 |
g__Bacteroides.s__Bacteroides_vulgatus | 0 | 0.0213 | 0.0920 | 0.2311 | 0.7756 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0 | 0.0000 | 0.0142 | 0.0382 | 0.2364 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0 | 0.0000 | 0.0128 | 0.0387 | 0.4200 |
g__Lawsonibacter.s__Lawsonibacter_asaccharolyticus | 0 | 0.0000 | 0.0072 | 0.0258 | 0.4782 |
g__Firmicutes_unclassified.s__Firmicutes_bacterium_CAG_65_45_313 | 0 | 0.0000 | 0.0000 | 0.0340 | 0.4960 |
g__Faecalibacterium.s__Faecalibacterium_prausnitzii | 0 | 0.0000 | 0.0000 | 0.0207 | 0.4824 |
g__Roseburia.s__Roseburia_hominis | 0 | 0.0000 | 0.0000 | 0.0188 | 0.1981 |
g__Ruminococcus.s__Ruminococcus_bromii | 0 | 0.0000 | 0.0000 | 0.0187 | 0.0688 |
g__Eubacterium.s__Eubacterium_eligens | 0 | 0.0000 | 0.0000 | 0.0179 | 0.0849 |
8.3.3.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("TRPSYN-PWY", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `TRPSYN-PWY: L-tryptophan biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0166 | 0.0867 | 0.1270 | 0.2093 | 0.4686 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0276 | 0.0511 | 0.0903 | 0.2820 |
g__Blautia.s__Blautia_obeum | 0.0031 | 0.0310 | 0.0460 | 0.0701 | 0.2020 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0291 | 0.0455 | 0.0712 | 0.1757 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0195 | 0.0408 | 0.0849 | 0.4596 |
g__Bacteroides.s__Bacteroides_vulgatus | 0.0000 | 0.0135 | 0.0376 | 0.0684 | 0.2935 |
g__Blautia.s__Blautia_wexlerae | 0.0018 | 0.0208 | 0.0305 | 0.0449 | 0.0889 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0179 | 0.0293 | 0.0459 | 0.2038 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0289 | 0.0737 | 0.2440 |
g__Ruminococcus.s__Ruminococcus_bromii | 0.0000 | 0.0000 | 0.0204 | 0.1113 | 0.3487 |
8.3.4 PWY-8178: pentose phosphate pathway (non-oxidative branch) II
8.3.4.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0599 | 0.1435 | 0.2467 | 0.3486 | 0.6512 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0119 | 0.0459 | 0.0664 | 0.1660 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0110 | 0.0388 | 0.0930 | 0.2692 |
g__Eubacterium.s__Eubacterium_eligens | 0.0000 | 0.0044 | 0.0382 | 0.0794 | 0.4492 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0078 | 0.0237 | 0.0634 | 0.3114 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0089 | 0.0205 | 0.0428 | 0.1478 |
g__Roseburia.s__Roseburia_inulinivorans | 0.0000 | 0.0000 | 0.0138 | 0.0365 | 0.2216 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0137 | 0.0492 | 0.5145 |
g__Blautia.s__Blautia_wexlerae | 0.0000 | 0.0056 | 0.0136 | 0.0266 | 0.2900 |
g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0000 | 0.0135 | 0.0278 | 0.0714 |
8.3.4.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0088 | 0.2572 | 0.4697 | 0.6587 | 0.9114 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0010 | 0.0133 | 0.0367 | 0.3171 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0014 | 0.0132 | 0.0455 | 0.3451 |
g__Parabacteroides.s__Parabacteroides_distasonis | 0.0000 | 0.0000 | 0.0110 | 0.0337 | 0.2201 |
g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0000 | 0.0088 | 0.0293 | 0.2724 |
g__Flavonifractor.s__Flavonifractor_plautii | 0.0000 | 0.0000 | 0.0052 | 0.0265 | 0.0888 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0000 | 0.0048 | 0.0231 | 0.1822 |
g__Agathobaculum.s__Agathobaculum_butyriciproducens | 0.0000 | 0.0000 | 0.0028 | 0.0131 | 0.0538 |
g__Roseburia.s__Roseburia_hominis | 0.0000 | 0.0000 | 0.0027 | 0.0122 | 0.2948 |
g__Escherichia.s__Escherichia_coli | 0.0000 | 0.0000 | 0.0000 | 0.0374 | 0.7374 |
8.3.4.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-8178", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-8178: pentose phosphate pathway (non-oxidative branch) II` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0.0413 | 0.1185 | 0.1669 | 0.2232 | 0.4912 |
g__Blautia.s__Ruminococcus_torques | 0.0000 | 0.0426 | 0.0673 | 0.0986 | 0.2491 |
g__Fusicatenibacter.s__Fusicatenibacter_saccharivorans | 0.0000 | 0.0291 | 0.0509 | 0.0966 | 0.2282 |
g__Roseburia.s__Roseburia_faecis | 0.0000 | 0.0019 | 0.0444 | 0.0978 | 0.3274 |
g__Lachnospiraceae_unclassified.s__Eubacterium_rectale | 0.0000 | 0.0178 | 0.0397 | 0.0863 | 0.3465 |
g__Blautia.s__Blautia_wexlerae | 0.0090 | 0.0227 | 0.0375 | 0.0535 | 0.2396 |
g__Anaerostipes.s__Anaerostipes_hadrus | 0.0000 | 0.0234 | 0.0360 | 0.0635 | 0.2427 |
g__Collinsella.s__Collinsella_aerofaciens | 0.0000 | 0.0210 | 0.0339 | 0.0548 | 0.1649 |
g__Blautia.s__Blautia_obeum | 0.0004 | 0.0213 | 0.0330 | 0.0462 | 0.1463 |
g__Eubacterium.s__Eubacterium_hallii | 0.0000 | 0.0222 | 0.0326 | 0.0457 | 0.2198 |
8.3.5 PWY-801: homocysteine and cysteine interconversion
8.3.5.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0 | 1 | 1 | 1 | 1.0000 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 1.0000 |
g__Klebsiella.s__Klebsiella_variicola | 0 | 0 | 0 | 0 | 0.7376 |
8.3.5.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0 | 1 | 1 | 1 | 1.0000 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 1.0000 |
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.2929 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.0383 |
8.3.5.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-801", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-801: homocysteine and cysteine interconversion` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 0 | 1 | 1 | 1 | 1.0000 |
g__Enterococcus.s__Enterococcus_gallinarum | 0 | 0 | 0 | 0 | 0.7631 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.2465 |
g__Enterococcus.s__Enterococcus_saccharolyticus | 0 | 0 | 0 | 0 | 0.2369 |
8.3.6 PWY-6293: superpathway of L-cysteine biosynthesis (fungi)
8.3.6.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.6.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.6.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-6293", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6293: superpathway of L-cysteine biosynthesis (fungi)` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
unclassified | 1 | 1 | 1 | 1 | 1 |
8.3.7 PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)
8.3.7.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.583 | 1 | 1.0000 |
unclassified | 0 | 0 | 0.000 | 1 | 1.0000 |
g__Enterococcus.s__Enterococcus_faecalis | 0 | 0 | 0.000 | 0 | 1.0000 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0.000 | 0 | 0.6643 |
g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.000 | 0 | 0.0502 |
8.3.7.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Escherichia.s__Escherichia_coli | 0 | 0 | 1 | 1 | 1.0000 |
g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0 | 0 | 1.0000 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 1.0000 |
g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0 | 0 | 1.0000 |
unclassified | 0 | 0 | 0 | 0 | 1.0000 |
g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0 | 0 | 0.9148 |
g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0 | 0 | 0.6450 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.4175 |
g__Kosakonia.s__Kosakonia_sacchari | 0 | 0 | 0 | 0 | 0.2467 |
g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.0186 |
8.3.7.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-6284", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-6284: superpathway of unsaturated fatty acids biosynthesis (E. coli)` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.2299 | 1.0 | 1.0000 |
unclassified | 0 | 0 | 0.0000 | 0.5 | 1.0000 |
g__Streptococcus.s__Streptococcus_oralis | 0 | 0 | 0.0000 | 0.0 | 1.0000 |
g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0.0000 | 0.0 | 0.7701 |
g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0.0000 | 0.0 | 0.5860 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0.0000 | 0.0 | 0.4828 |
g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0.0000 | 0.0 | 0.4140 |
8.3.8 PWY-5367: petroselinate biosynthesis
8.3.8.1 Czech training cohort
Open code
<- stratCZ %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Blautia.s__Ruminococcus_torques | 0 | 0.1836 | 0.9822 | 1.0000 | 1.0000 |
g__Escherichia.s__Escherichia_coli | 0 | 0.0000 | 0.0000 | 0.0596 | 1.0000 |
g__Citrobacter.s__Citrobacter_freundii | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
unclassified | 0 | 0.0000 | 0.0000 | 0.0000 | 1.0000 |
g__Enterococcus.s__Enterococcus_faecalis | 0 | 0.0000 | 0.0000 | 0.0000 | 0.3800 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1882 |
g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1190 |
g__Streptococcus.s__Streptococcus_pneumoniae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.1078 |
g__Escherichia.s__Escherichia_fergusonii | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0843 |
g__Escherichia.s__Escherichia_marmotae | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0726 |
8.3.8.2 Italian training cohort
Open code
<- stratIT %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Italian training cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0.6316 | 1 | 1.0000 |
g__Blautia.s__Ruminococcus_torques | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Escherichia.s__Escherichia_marmotae | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Leclercia.s__Leclercia_adecarboxylata | 0 | 0 | 0.0000 | 0 | 1.0000 |
unclassified | 0 | 0 | 0.0000 | 0 | 1.0000 |
g__Enterobacter.s__Enterobacter_roggenkampii | 0 | 0 | 0.0000 | 0 | 0.9279 |
g__Enterobacter.s__Enterobacter_mori | 0 | 0 | 0.0000 | 0 | 0.7381 |
g__Citrobacter.s__Citrobacter_freundii | 0 | 0 | 0.0000 | 0 | 0.7378 |
g__Citrobacter.s__Citrobacter_braakii | 0 | 0 | 0.0000 | 0 | 0.4592 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0.0000 | 0 | 0.4285 |
8.3.8.3 Czech validation cohort
Open code
<- stratVAL %>%
res filter(grepl("PWY-5367", `# Pathway`)) %>%
mutate(`# Pathway` = sub(".*\\|", "", `# Pathway`)) %>%
{<- select(., -`# Pathway`)
mat <- sweep(mat, 2, colSums(mat), "/")
mat_prop <- t(apply(mat_prop, 1, function(row) {
quants quantile(row, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
}))cbind(`# Pathway` = .$`# Pathway`, quants)
%>%
} as.data.frame()
colnames(res)[-1] <- c("0%", "25%", "50%", "75%", "100%")
-1] <- lapply(res[, -1], as.numeric)
res[,
::kable(
kableExtra%>%
res arrange(desc(`50%`), desc(`75%`), desc(`100%`)) %>%
filter(`100%` > 0) %>%
head(10) %>%
mutate(across(-`# Pathway`, ~ round(., 4))),
caption = "Top 10 taxa contributing to pathway `PWY-5367: petroselinate biosynthesis` in the Czech validation cohort. Values represent the distribution (0th, 25th, 50th, 75th, 100th percentiles) of each taxon’s proportional contribution to the total pathway abundance across samples. The row 'unclassified' indicates the fraction of the pathway abundance that HUMAnN could not assign to specific taxa"
)
# Pathway | 0% | 25% | 50% | 75% | 100% |
---|---|---|---|---|---|
g__Blautia.s__Ruminococcus_torques | 0 | 1 | 1 | 1 | 1.0000 |
unclassified | 0 | 0 | 0 | 0 | 1.0000 |
g__Enterococcus.s__Enterococcus_hirae | 0 | 0 | 0 | 0 | 0.8935 |
g__Enterobacter.s__Enterobacter_bugandensis | 0 | 0 | 0 | 0 | 0.8379 |
g__Escherichia.s__Escherichia_coli | 0 | 0 | 0 | 0 | 0.6883 |
g__Enterococcus.s__Enterococcus_faecium | 0 | 0 | 0 | 0 | 0.5535 |
g__Enterococcus.s__Enterococcus_durans | 0 | 0 | 0 | 0 | 0.4465 |
g__Klebsiella.s__Klebsiella_oxytoca | 0 | 0 | 0 | 0 | 0.3625 |
g__Klebsiella.s__Klebsiella_pneumoniae | 0 | 0 | 0 | 0 | 0.3454 |
g__Escherichia.s__Escherichia_fergusonii | 0 | 0 | 0 | 0 | 0.3117 |
9 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-6.1 lattice_0.22-5 permute_0.9-7
## [4] zCompositions_1.5.0-4 truncnorm_1.0-8 NADA_1.6-1.1
## [7] survival_3.7-0 MicrobiomeStat_1.2 glmnet_4.1-8
## [10] pROC_1.18.0 arm_1.12-2 lme4_1.1-35.5
## [13] Matrix_1.7-0 MASS_7.3-65 car_3.1-2
## [16] carData_3.0-5 emmeans_1.10.4 brms_2.21.0
## [19] Rcpp_1.0.13 rms_6.8-1 Hmisc_5.1-3
## [22] glmmTMB_1.1.9 ggtext_0.1.2 ggdist_3.3.2
## [25] cowplot_1.1.1 ggpubr_0.4.0 sjPlot_2.8.16
## [28] kableExtra_1.4.0 flextable_0.9.6 gtsummary_2.0.2
## [31] compositions_2.0-8 janitor_2.2.0 stringi_1.8.7
## [34] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
## [37] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
## [40] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.1
## [43] tidyverse_1.3.1 readxl_1.3.1 openxlsx_4.2.8
## [46] RJDBC_0.2-10 rJava_1.0-11 DBI_1.2.3
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.6 matrixStats_1.3.0 httr_1.4.2
## [4] insight_0.20.2 numDeriv_2016.8-1.1 tools_4.4.3
## [7] backports_1.5.0 utf8_1.2.6 sjlabelled_1.2.0
## [10] R6_2.6.1 mgcv_1.9-1 withr_3.0.2
## [13] Brobdingnag_1.2-7 prettyunits_1.2.0 gridExtra_2.3
## [16] bayesm_3.1-6 quantreg_5.98 cli_3.6.5
## [19] textshaping_0.3.6 performance_0.12.2 officer_0.6.6
## [22] sandwich_3.0-1 labeling_0.4.2 mvtnorm_1.1-3
## [25] robustbase_0.93-9 polspline_1.1.25 ggridges_0.5.3
## [28] askpass_1.1 QuickJSR_1.3.1 commonmark_1.9.1
## [31] systemfonts_1.0.4 StanHeaders_2.32.10 foreign_0.8-90
## [34] gfonts_0.2.0 svglite_2.1.3 rstudioapi_0.16.0
## [37] httpcode_0.3.0 generics_0.1.4 shape_1.4.6
## [40] distributional_0.4.0 zip_2.2.0 inline_0.3.19
## [43] loo_2.4.1 abind_1.4-5 lifecycle_1.0.4
## [46] multcomp_1.4-18 yaml_2.3.10 snakecase_0.11.1
## [49] grid_4.4.3 promises_1.2.0.1 crayon_1.5.3
## [52] haven_2.4.3 pillar_1.11.0 knitr_1.50
## [55] statip_0.2.3 boot_1.3-31 estimability_1.5.1
## [58] codetools_0.2-19 glue_1.7.0 V8_4.4.2
## [61] fontLiberation_0.1.0 data.table_1.15.4 vctrs_0.6.5
## [64] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
## [67] datawizard_0.12.2 xfun_0.52 mime_0.12
## [70] coda_0.19-4 modeest_2.4.0 timeDate_3043.102
## [73] iterators_1.0.14 statmod_1.4.36 TH.data_1.1-0
## [76] nlme_3.1-168 fontquiver_0.2.1 rstan_2.32.6
## [79] fBasics_4041.97 tensorA_0.36.2.1 TMB_1.9.14
## [82] rpart_4.1.24 colorspace_2.0-2 nnet_7.3-20
## [85] tidyselect_1.2.1 processx_3.8.4 timeSeries_4032.109
## [88] compiler_4.4.3 curl_6.4.0 rvest_1.0.2
## [91] htmlTable_2.4.0 SparseM_1.81 xml2_1.3.3
## [94] fontBitstreamVera_0.1.1 posterior_1.6.0 checkmate_2.3.2
## [97] scales_1.3.0 DEoptimR_1.0-10 callr_3.7.6
## [100] spatial_7.3-15 digest_0.6.37 minqa_1.2.4
## [103] rmarkdown_2.27 htmltools_0.5.8.1 pkgconfig_2.0.3
## [106] base64enc_0.1-3 stabledist_0.7-2 dbplyr_2.1.1
## [109] fastmap_1.2.0 rlang_1.1.6 htmlwidgets_1.6.4
## [112] shiny_1.9.1 farver_2.1.0 zoo_1.8-9
## [115] jsonlite_2.0.0 magrittr_2.0.3 Formula_1.2-4
## [118] bayesplot_1.8.1 munsell_0.5.0 gdtools_0.3.7
## [121] stable_1.1.6 plyr_1.8.6 pkgbuild_1.3.1
## [124] parallel_4.4.3 ggrepel_0.9.5 sjmisc_2.8.10
## [127] ggeffects_1.7.0 splines_4.4.3 gridtext_0.1.5
## [130] hms_1.1.3 sjstats_0.19.0 ps_1.7.7
## [133] uuid_1.0-3 markdown_1.13 ggsignif_0.6.3
## [136] stats4_4.4.3 rmutil_1.1.10 rstantools_2.1.1
## [139] crul_1.5.0 reprex_2.0.1 evaluate_1.0.4
## [142] RcppParallel_5.1.8 modelr_0.1.8 nloptr_2.0.0
## [145] tzdb_0.5.0 foreach_1.5.2 httpuv_1.6.5
## [148] MatrixModels_0.5-3 openssl_1.4.6 clue_0.3-65
## [151] broom_1.0.6 xtable_1.8-4 rstatix_0.7.0
## [154] later_1.3.0 viridisLite_0.4.0 ragg_1.4.0
## [157] lmerTest_3.1-3 cluster_2.1.8.1 timechange_0.3.0
## [160] bridgesampling_1.1-2