Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations
Statistical report for microbiome analysis (SGB level, prevalence > 10% in training dataset)
Authors and affiliations
Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#
* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship
1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic
This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country
and Diet
, resulting in four distinct groups.
To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.
1.1 Statistical Methods
The statistical modeling approach is described in detail in this report. Briefly, the methods used included:
Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of
diet
,country
, and their possible interaction (diet : country
) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.Linear models: Linear models were applied to estimate the effects of
diet
,country
, and their interaction (diet:country
) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).Predictive models (elastic net): We employed elastic net logistic regression (via the
glmnet
R package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet
2 Initiation
2.1 Set home directory
Open code
setwd('/home/ticf/secured_data/GitRepo/ticf/478_MOCA_italian')
2.2 Upload initiation file
Open code
source('478_initiation.R')
<- FALSE plotuj
3 Data
3.1 Upload all original data
3.1.1 Training set
3.1.1.1 Connect metadata from lipidom table
Open code
<- read.xlsx('gitignore/data/lipidome_training_cohort_rev.xlsx') %>%
training_metadata select(Sample, Country, Diet, Diet_duration, Age) %>%
mutate(ID = Sample)
3.1.1.2 Connect training data
Open code
<- read.table(
data_microbiome_original_raw 'gitignore/data/0_Data_Metaphlan4_SGB_subset.txt')
colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]
<- data_microbiome_original_raw %>%
data_microbiome_original_raw t() %>%
data.frame()
colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]
<- data_microbiome_original_raw[-1, ] %>%
data_microbiome_original_raw left_join(training_metadata, by = 'ID') %>%
mutate(Data = 'valid', Sample = ID) %>%
select(Sample, Data, Diet, Country, Diet_duration, Age, everything()) %>%
select(-ID)
dim(data_microbiome_original_raw)
## [1] 166 686
3.1.2 Validation set
3.1.2.1 Get metadata from lipidom table
Open code
<- read.xlsx(
data_lipids_validation 'gitignore/data/lipidome_validation_cohort_rev.xlsx'
%>%
) mutate(ID = X1) %>%
select(ID, X2, Diet_duration, Age)
3.1.2.2 Connect validation data
Open code
<- read.table(
data_microbiome_validation_raw 'gitignore/data/0_Data_Metaphlan4_SGB_subset_validation.txt')
colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]
<- data_microbiome_validation_raw %>%
data_microbiome_validation_raw t() %>%
data.frame()
colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]
<- data_microbiome_validation_raw[-1, ] %>%
data_microbiome_validation_raw mutate(
ID= paste0("K", gsub("\\..*", "", trimws(ID)))) %>%
left_join(data_lipids_validation, by = 'ID') %>%
mutate(Data = 'valid', Sample = ID, Diet = X2) %>%
select(Sample, Data, Diet, Diet_duration, Age, everything()) %>%
select(-ID, -X2)
dim(data_microbiome_validation_raw)
## [1] 103 685
3.1.3 Get center-log transformed value
Open code
set.seed(478)
## select these with >0.1 prevalence in training and > 0 in validation datasets
<- intersect(
names_shared colnames(
-c(1:6)] %>%
data_microbiome_original_raw[, mutate(across(everything(), as.numeric)) %>%
select(where(~ mean(. != 0) > 0.1))
),colnames(
-(1:5)] %>%
data_microbiome_validation_raw[, mutate(across(everything(), as.numeric)) %>%
select(where(~ mean(. != 0) > 0))
)
)
## Training data
<- data_microbiome_original_raw[, c(
metadata "Sample", "Country", "Diet", "Diet_duration", "Age")]
<- data_microbiome_original_raw[, -c(1:6)] %>%
bacteria_d mutate(across(everything(), as.numeric)) %>%
select(where(~ mean(. != 0) > 0.1)) %>%
select(all_of(names_shared))
dim(data_microbiome_original_raw[, -c(1:6)])
## [1] 166 680
dim(bacteria_d)
## [1] 166 655
<- bacteria_d / rowSums(bacteria_d)
bacteria_data dim(bacteria_data)
## [1] 166 655
if(file.exists('gitignore/data_microbiome_SGB10_training_impCLR.csv') == FALSE){
<- lrSVD(bacteria_data,
bacteria_data label = 0,
dl = NULL,
z.warning = 0.9,
z.delete = FALSE,
ncp = 1)
<- clr(bacteria_data)
clr_bacteria_data <- cbind(metadata, clr_bacteria_data)
data_microbiome_original
dim(data_microbiome_original)
write.csv(data_microbiome_original ,
'gitignore/data_microbiome_SGB10_training_impCLR.csv')
saveRDS(data_microbiome_original ,
'gitignore/data_microbiome_SGB10_training_impCLR.Rds')
}
<- readRDS(
data_microbiome_original 'gitignore/data_microbiome_SGB10_training_impCLR.Rds'
)
1:5, 1:10]
data_microbiome_original[## Sample Country Diet Diet_duration Age Alistipes_SGB2313|SGB2313
## 1 T36 CZ VEGAN 3.5 31.64932 -1.727885
## 2 T47 CZ VEGAN 9.0 43.79178 -2.003284
## 3 T55 CZ VEGAN 10.0 29.83014 -2.198744
## 4 T59 CZ VEGAN 8.0 30.24658 -1.172513
## 5 T60 CZ VEGAN 5.0 20.16986 -1.453814
## Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## 1 -2.785048 3.096468
## 2 5.128612 4.807561
## 3 2.865400 4.658582
## 4 -2.229675 5.286802
## 5 6.341534 8.960465
## Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
## 1 7.429837 3.509530
## 2 5.399752 7.017465
## 3 4.575829 4.056029
## 4 1.062247 5.309661
## 5 6.657670 8.961035
## Show variances of CLR proportions across samples
<- data_microbiome_original %>%
data_variance rowwise() %>%
mutate(variance = var(c_across(-(Sample:Diet)))) %>%
ungroup() %>%
select(Sample, variance)
## Look at distribution
hist(data_variance$variance)
Open code
## Validation data
<- data_microbiome_validation_raw[, c(
metadata "Sample", "Data", "Diet", "Diet_duration", "Age")]
<- data_microbiome_validation_raw[, -(1:5)] %>%
bacteria_d mutate(across(everything(), as.numeric)) %>%
select(all_of(names_shared))
<- bacteria_d / rowSums(bacteria_d)
bacteria_data min(colSums(bacteria_data))
## [1] 7.748698e-06
if(file.exists('gitignore/data_microbiome_SGB10_validation_impCLR.csv') == FALSE){
<- lrSVD(bacteria_data,
bacteria_data label = 0,
dl = NULL,
z.warning = 0.9,
z.delete = FALSE,
ncp = 1)
<- clr(bacteria_data)
clr_bacteria_data <- cbind(metadata, clr_bacteria_data)
data_microbiome_validation
## Add Diet for K284 which has the diet missing
data_microbiome_validation[which(data_microbiome_validation$Sample == 'K284'), 'Diet'
<- 'VEGAN'
]
$`Wujia_chipingensis|SGB5111`
data_microbiome_validation
# data_microbiome_validation <- data_microbiome_validation %>%
# rename("Wujia_chipingenss|SGB5111" = "Wujia_chipingensis|SGB5111")
write.csv(data_microbiome_validation,
'gitignore/data_microbiome_SGB10_validation_impCLR.csv')
saveRDS(data_microbiome_validation,
'gitignore/data_microbiome_SGB10_validation_impCLR.Rds')
}
<- readRDS(
data_microbiome_validation 'gitignore/data_microbiome_SGB10_validation_impCLR.Rds'
)
1:5, 1:9] %>% data.frame()
data_microbiome_validation[## Sample Data Diet Diet_duration Age Alistipes_SGB2313.SGB2313
## 1 K4 valid VEGAN <NA> 30.98904 1.558108
## 2 K5 valid VEGAN <NA> 30.66575 1.792161
## 3 K7 valid VEGAN <NA> 32.97808 1.996474
## 4 K12 valid VEGAN <NA> 27.89315 2.068428
## 5 K13 valid VEGAN 1 21.53699 1.487370
## Bacteroides_stercoris.SGB1830 Alistipes_putredinis.SGB2318
## 1 3.234670 5.044781
## 2 2.872535 6.346307
## 3 4.617832 6.151537
## 4 4.744165 0.735616
## 5 6.294363 5.927622
## Candidatus_Cibionibacter_quicibialis.SGB15286
## 1 5.303553
## 2 6.421573
## 3 5.902311
## 4 6.398875
## 5 3.932408
dim(data_microbiome_validation)
## [1] 103 660
3.1.4 Merge training and validation dataset
Open code
<- intersect(
common_microbiome colnames(data_microbiome_original),
colnames(data_microbiome_validation))[-c(1:4)]
<- data_microbiome_original %>%
tr1 mutate(Data = if_else(Country == 'CZ', 'CZ_tr', 'IT_tr')) %>%
select(Data, Diet, all_of(common_microbiome))
<- data_microbiome_validation %>%
tr2 mutate(Data = 'valid',
Diet = Diet) %>%
select(Data, Diet, all_of(common_microbiome))
<- bind_rows(tr1, tr2) data_merged
3.2 Explore
3.2.0.1 Distributions - clr transformed
Open code
if (plotuj) {
<- data_microbiome_original %>%
check ::select(-c(Sample:Diet)) %>%
dplyrna.omit()
<- c(8, 7)
size par(mfrow = c(size[1], size[2]))
par(mar = c(2, 1.5, 2, 0.5))
set.seed(16)
<- sample(1:ncol(check), size[1] * size[2], replace = FALSE)
ran for (x in ran) {
hist(check[, x],
16,
col = "blue",
main = paste0(colnames(check)[x])
)
} }
3.2.0.2 Taxon proportions accross groups
Open code
<- c("#329243", "#F9FFAF")
colo
<- na.omit(data_merged)
data_merged
if (plotuj) {
<- common_microbiome[
outcomes sample(
1:length(common_microbiome), 35,
replace = FALSE
)
]
<- function(variable) {
boxplot_cond <- ggboxplot(data_merged,
p x = "Diet",
y = variable,
fill = "Diet",
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = "Data",
title = variable,
ylab = "CLR(taxa proportion)"
+
)
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
<- map(outcomes, boxplot_cond)
plots
# Create a matrix of plots
<- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
plots_arranged
plots_arranged }
4 Linear models across taxa
We will fit a feature-specific linear model where the CLR-transformed count of bacteria reads represents the outcome variable whereas country
(Italy vs Czech), diet
(vegan vs omnivore), and their interaction (country:diet
) all represent fixed-effects predictors. So, each model has the following form
\[ CLR({N_i}) = \alpha + \beta_{1} \times country + \beta_{2} \times diet + \beta_{3} \times country:diet + \epsilon \] where \(N_i\) is read count of \(i\)-th bacteria taxa
The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary
output as presenting the conditional effects of diet
averaged across both countries and the conditional effects of country
averaged across both diet groups. We will then use the emmeans
package (Lenth 2024) to obtain specific estimates for the effect of diet
in the Italian and Czech cohorts separately, still from a single model.
Taxa that will show a significant diet effect (average effect of diet
across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable these findings are (see external validation section).
Note that p-value for avg effects are the same as produced with car::Anova(model, type = 'III')
.
4.1 Select data
Open code
<- data_microbiome_original %>%
data_analysis na.omit() %>%
::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== "VEGAN", 0.5, -0.5
Diet
)
),Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
)%>%
) ::select(
dplyr
Sample,
Country,
Country_IT,
Diet,
Diet_VEGAN,::everything()
dplyr
)
summary(data_analysis[ , 1:12])
## Sample Country Country_IT Diet
## Length:165 Length:165 Min. :-0.50000 Length:165
## Class :character Class :character 1st Qu.:-0.50000 Class :character
## Mode :character Mode :character Median :-0.50000 Mode :character
## Mean :-0.02727
## 3rd Qu.: 0.50000
## Max. : 0.50000
## Diet_VEGAN Diet_duration Age Alistipes_SGB2313|SGB2313
## Min. :-0.50000 Min. : 0.000 Min. :18.25 Min. :-4.0449
## 1st Qu.:-0.50000 1st Qu.: 0.000 1st Qu.:28.08 1st Qu.:-1.7471
## Median : 0.50000 Median : 3.000 Median :33.06 Median :-1.3158
## Mean : 0.08182 Mean : 3.195 Mean :35.87 Mean :-0.6187
## 3rd Qu.: 0.50000 3rd Qu.: 5.000 3rd Qu.:42.48 3rd Qu.:-0.7346
## Max. : 0.50000 Max. :20.000 Max. :64.10 Max. :10.2106
## Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## Min. :-3.4103 Min. :-4.062
## 1st Qu.:-2.2880 1st Qu.: 4.937
## Median :-0.3814 Median : 7.344
## Mean : 1.7524 Mean : 5.925
## 3rd Qu.: 5.7035 3rd Qu.: 8.330
## Max. :11.5038 Max. :12.221
## Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
## Min. :-0.6217 Min. :-2.378
## 1st Qu.: 5.4061 1st Qu.: 5.931
## Median : 6.5803 Median : 7.437
## Mean : 6.2568 Mean : 7.093
## 3rd Qu.: 7.4855 3rd Qu.: 8.472
## Max. :10.4162 Max. :11.715
4.1.1 Define number of microbiome and covariates
Open code
<- 7
n_covarites <- ncol(data_analysis) - n_covarites n_features
4.1.2 Create empty objects
Open code
<- vector('double', n_features)
outcome
<- vector('double', n_features)
est_VGdiet_inCZ <- vector('double', n_features)
est_VGdiet_inIT <- vector('double', n_features)
est_VGdiet_avg
<- vector('double', n_features)
est_ITcountry_avg <- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg
<- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg
<- vector('double', n_features)
CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features) CI_U_VGdiet_avg
4.1.3 Estimate over outcomes
Open code
if(file.exists('gitignore/result_microbiome_SGB10.Rds') == FALSE){
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis[, (i + n_covarites)]
data_analysis
## fit model
<- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis)
model
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_VEGAN | Country_IT
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis)[i + n_covarites]
outcome[i]
## country effect
<- summary(model)$coefficients[
est_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
P_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
CI_U_VGdiet_avg[i]
<- summary(model)$coefficients[
est_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
]
<- -contrast_emm[1,3]
est_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
P_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_VGdiet_inCZ[i]
<- -contrast_emm[2,3]
est_VGdiet_inIT[i] <- contrast_emm$p.value[2]
P_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_VGdiet_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
4
),
]
}
<- data.frame(
result_microbiome
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT,
diet_country_int, P_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)
saveRDS(result_microbiome,
'gitignore/result_microbiome_SGB10.Rds')
}
<- readRDS('gitignore/result_microbiome_SGB10.Rds') result_microbiome
4.1.4 Adjust p values
Open code
<- result_microbiome %>%
result_microbiome ::mutate(
dplyrfdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
%>%
) ::select(
dplyr
outcome,
est_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
est_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
est_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
est_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
diet_country_int, P_diet_country_int, fdr_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT )
4.1.5 Show and save results
Open code
::kable(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05),
kableExtracaption = "Results of linear models, modeling the CLR-transformed relative abundance of a given taxon as the outcome, with `Diet`, `Country`, and `Diet x Country` interaction as fixed-effect predictors. Only taxa whose CLR-transformed relative abundance differs between diets (FDR < 0.05, average diet effect across both countries) are shown. `est` prefix: denotes estimated effects (regression coefficients), i.e., how much CLR-transformed relative abundance differs in vegans compared to omnivores and in the Italian cohort vs. the Czech cohort, respectively; `P`: p-value; `fdr`: p-value after adjustment for multiple comparisons; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix denotes effects averaged across subgroups, whereas `inCZ` and `inIT` denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effects of a vegan diet between the Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model")
outcome | est_ITcountry_avg | P_ITcountry_avg | fdr_ITcountry_avg | est_VGdiet_avg | P_VGdiet_avg | fdr_VGdiet_avg | est_VGdiet_inCZ | P_VGdiet_inCZ | fdr_VGdiet_inCZ | est_VGdiet_inIT | P_VGdiet_inIT | fdr_VGdiet_inIT | diet_country_int | P_diet_country_int | fdr_diet_country_int | CI_L_VGdiet_avg | CI_U_VGdiet_avg | CI_L_VGdiet_inCZ | CI_U_VGdiet_inCZ | CI_L_VGdiet_inIT | CI_U_VGdiet_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GGB1420_SGB1957|SGB1957 | 1.4319289 | 0.0000546 | 0.0003030 | -1.4396187 | 0.0000500 | 0.0022745 | -0.8400132 | 0.0880130 | 0.3116857 | -2.0392242 | 0.0000471 | 0.0077123 | -1.1992110 | 0.0844620 | 0.5703361 | -2.1216742 | -0.7575632 | -1.8064948 | 0.1264684 | -3.0018832 | -1.0765653 |
Escherichia_coli|SGB10068 | 0.2851163 | 0.5382752 | 0.6345904 | -1.7882458 | 0.0001589 | 0.0041626 | -2.0426512 | 0.0021561 | 0.0371445 | -1.5338403 | 0.0199463 | 0.2214377 | 0.5088109 | 0.5828682 | 0.9328661 | -2.7011864 | -0.8753052 | -3.3363000 | -0.7490025 | -2.8223724 | -0.2453082 |
Bacteroides_clarus|SGB1832 | 1.9328621 | 0.0001797 | 0.0008289 | -1.9983796 | 0.0001100 | 0.0031989 | -1.8221069 | 0.0116575 | 0.1004691 | -2.1746522 | 0.0026162 | 0.0795584 | -0.3525453 | 0.7269701 | 0.9805656 | -2.9936247 | -1.0031345 | -3.2323822 | -0.4118317 | -3.5793496 | -0.7699548 |
Hydrogenoanaerobacterium_saccharovorans|SGB15350 | 1.5044079 | 0.0014215 | 0.0049448 | -1.4631658 | 0.0019003 | 0.0214598 | -1.4945391 | 0.0241604 | 0.1536413 | -1.4317926 | 0.0300247 | 0.2731417 | 0.0627465 | 0.9461059 | 0.9892080 | -2.3782786 | -0.5480531 | -2.7912658 | -0.1978123 | -2.7233906 | -0.1401947 |
GGB9775_SGB15395|SGB15395 | 1.6466070 | 0.0000162 | 0.0001063 | -1.1890855 | 0.0015990 | 0.0193953 | -0.8940011 | 0.0904025 | 0.3149663 | -1.4841700 | 0.0051047 | 0.1156461 | -0.5901689 | 0.4267652 | 0.9030772 | -1.9204687 | -0.4577023 | -1.9303806 | 0.1423785 | -2.5164505 | -0.4518895 |
Ruthenibacterium_lactatiformans|SGB15271 | 0.9242175 | 0.0034791 | 0.0105157 | -0.9667478 | 0.0022687 | 0.0244058 | -1.7480579 | 0.0001129 | 0.0050824 | -0.1854377 | 0.6738746 | 0.9552084 | 1.5626201 | 0.0131614 | 0.3918458 | -1.5821555 | -0.3513401 | -2.6200987 | -0.8760171 | -1.0540295 | 0.6831540 |
Lawsonibacter_asaccharolyticus|SGB15154 | 2.7259950 | 0.0000000 | 0.0000000 | -1.5993741 | 0.0000231 | 0.0016836 | -2.0267164 | 0.0001416 | 0.0056449 | -1.1720318 | 0.0249317 | 0.2401510 | 0.8546846 | 0.2457736 | 0.7988807 | -2.3238165 | -0.8749317 | -3.0532608 | -1.0001720 | -2.1945160 | -0.1495476 |
Clostridium_fessum|SGB4705 | -1.6111790 | 0.0002626 | 0.0011545 | -1.3504185 | 0.0020870 | 0.0231696 | -1.5214941 | 0.0138872 | 0.1095920 | -1.1793429 | 0.0546582 | 0.3663651 | 0.3421512 | 0.6924007 | 0.9805656 | -2.2028851 | -0.4979519 | -2.7294503 | -0.3135378 | -2.3825215 | 0.0238357 |
GGB9707_SGB15229|SGB15229 | 0.6838604 | 0.0414042 | 0.0844852 | 0.9496097 | 0.0048726 | 0.0451440 | 0.7241947 | 0.1263881 | 0.3782276 | 1.1750247 | 0.0133173 | 0.1691073 | 0.4508300 | 0.4989489 | 0.9032270 | 0.2927318 | 1.6064876 | -0.2066099 | 1.6549993 | 0.2479016 | 2.1021478 |
Collinsella_aerofaciens|SGB14535 | 0.1541214 | 0.5211416 | 0.6217628 | -0.7191625 | 0.0031260 | 0.0303141 | -0.7026985 | 0.0401537 | 0.2104052 | -0.7356265 | 0.0311310 | 0.2780840 | -0.0329280 | 0.9453232 | 0.9892080 | -1.1925098 | -0.2458152 | -1.3734378 | -0.0319592 | -1.4037130 | -0.0675401 |
GGB9509_SGB14906|SGB14906 | 1.1273207 | 0.0069225 | 0.0186593 | -1.8738306 | 0.0000106 | 0.0011605 | -1.9308514 | 0.0011639 | 0.0245923 | -1.8168099 | 0.0021176 | 0.0762611 | 0.1140415 | 0.8901160 | 0.9882489 | -2.6875900 | -1.0600713 | -3.0839590 | -0.7777438 | -2.9653567 | -0.6682631 |
GGB45432_SGB63101|SGB63101 | 0.7722693 | 0.0046470 | 0.0132918 | -0.8741661 | 0.0014079 | 0.0179357 | -1.0922149 | 0.0047246 | 0.0644715 | -0.6561174 | 0.0859067 | 0.4768551 | 0.4360974 | 0.4188344 | 0.9030772 | -1.4054316 | -0.3429007 | -1.8450249 | -0.3394048 | -1.4059499 | 0.0937151 |
GGB2653_SGB3574|SGB3574 | 0.0669449 | 0.8659025 | 0.9045712 | -1.6310665 | 0.0000603 | 0.0022745 | -2.1040281 | 0.0002450 | 0.0083391 | -1.1581050 | 0.0397597 | 0.3124007 | 0.9459230 | 0.2338711 | 0.7988807 | -2.4127083 | -0.8494248 | -3.2116246 | -0.9964315 | -2.2613208 | -0.0548892 |
Phocea_massiliensis|SGB14837 | 0.5351058 | 0.0683260 | 0.1282337 | -1.1542532 | 0.0001129 | 0.0031989 | -1.2673611 | 0.0025342 | 0.0381311 | -1.0411452 | 0.0123717 | 0.1652544 | 0.2262159 | 0.6985961 | 0.9805656 | -1.7300758 | -0.5784306 | -2.0833093 | -0.4514130 | -1.8538662 | -0.2284243 |
Guopingia_tenuis|SGB14127 | 1.9935885 | 0.0000000 | 0.0000000 | -0.7391309 | 0.0050783 | 0.0455657 | -0.7007529 | 0.0591097 | 0.2547162 | -0.7775089 | 0.0357602 | 0.3041940 | -0.0767560 | 0.8829099 | 0.9866642 | -1.2529025 | -0.2253592 | -1.4287740 | 0.0272682 | -1.5026505 | -0.0523672 |
Lachnospiraceae_bacterium|SGB4953 | -1.0338702 | 0.0219379 | 0.0502425 | 1.7643283 | 0.0001172 | 0.0031989 | 1.7752418 | 0.0056705 | 0.0675301 | 1.7534148 | 0.0060765 | 0.1159554 | -0.0218269 | 0.9805437 | 0.9969953 | 0.8819635 | 2.6466931 | 0.5249193 | 3.0255642 | 0.5080377 | 2.9987920 |
Holdemania_filiformis|SGB4046 | -0.7503859 | 0.0491812 | 0.0967377 | -1.4296652 | 0.0002237 | 0.0056347 | -2.3439700 | 0.0000223 | 0.0014449 | -0.5153605 | 0.3362691 | 0.7701268 | 1.8286095 | 0.0168572 | 0.3996216 | -2.1773279 | -0.6820026 | -3.4034177 | -1.2845222 | -1.5706179 | 0.5398969 |
Anaeromassilibacillus_senegalensis|SGB14894 | 1.0571683 | 0.0084418 | 0.0222063 | -1.4342467 | 0.0003972 | 0.0081294 | -1.7427021 | 0.0022684 | 0.0371445 | -1.1257914 | 0.0458831 | 0.3372370 | 0.6169107 | 0.4376640 | 0.9030772 | -2.2171229 | -0.6513706 | -2.8520479 | -0.6333563 | -2.2307495 | -0.0208333 |
GGB3061_SGB4063|SGB4063 | -0.1784761 | 0.5484410 | 0.6437793 | -1.1724902 | 0.0001164 | 0.0031989 | -2.3914873 | 0.0000001 | 0.0000078 | 0.0465070 | 0.9117339 | 0.9932053 | 2.4379943 | 0.0000635 | 0.0138704 | -1.7585810 | -0.5863994 | -3.2219857 | -1.5609890 | -0.7807066 | 0.8737205 |
Ruminococcus_sp_AF41_9|SGB25497 | -0.7637983 | 0.0561315 | 0.1087755 | 1.6855251 | 0.0000367 | 0.0021874 | 2.5935991 | 0.0000081 | 0.0006349 | 0.7774511 | 0.1672159 | 0.6072159 | -1.8161480 | 0.0234799 | 0.4250209 | 0.9015151 | 2.4695351 | 1.4826467 | 3.7045515 | -0.3291073 | 1.8840094 |
Eubacterium_ramulus|SGB4959 | -2.2415625 | 0.0000001 | 0.0000011 | -1.5009605 | 0.0002730 | 0.0061653 | -1.9091394 | 0.0010384 | 0.0234530 | -1.0927815 | 0.0566446 | 0.3710223 | 0.8163579 | 0.3129926 | 0.8506647 | -2.2973719 | -0.7045490 | -3.0376648 | -0.7806140 | -2.2168434 | 0.0312803 |
GGB3000_SGB3991|SGB3991 | -0.0498718 | 0.8832405 | 0.9182897 | -0.9667842 | 0.0049219 | 0.0451440 | -0.6402335 | 0.1845367 | 0.4586039 | -1.2933349 | 0.0076151 | 0.1246977 | -0.6531014 | 0.3369152 | 0.8525047 | -1.6363300 | -0.2972384 | -1.5889887 | 0.3085217 | -2.2383376 | -0.3483323 |
Youxingia_wuxianensis|SGB82503 | 0.3466622 | 0.3249701 | 0.4335141 | -1.1371719 | 0.0014584 | 0.0180232 | -1.8545785 | 0.0002674 | 0.0083391 | -0.4197654 | 0.3982314 | 0.8100670 | 1.4348131 | 0.0426614 | 0.5291961 | -1.8305594 | -0.4437844 | -2.8371176 | -0.8720393 | -1.3984184 | 0.5588876 |
Blautia_massiliensis|SGB4826 | -3.8343857 | 0.0000000 | 0.0000000 | -1.9262500 | 0.0005584 | 0.0104528 | -1.9864840 | 0.0112976 | 0.0986659 | -1.8660161 | 0.0167667 | 0.1996761 | 0.1204679 | 0.9124537 | 0.9892080 | -3.0064692 | -0.8460308 | -3.5171687 | -0.4557992 | -3.3906467 | -0.3413855 |
GGB38744_SGB14842|SGB14842 | 0.6031028 | 0.0173567 | 0.0411908 | -1.0940310 | 0.0000231 | 0.0016836 | -0.5164323 | 0.1482621 | 0.4159522 | -1.6716296 | 0.0000051 | 0.0017026 | -1.1551972 | 0.0226017 | 0.4250209 | -1.5894790 | -0.5985829 | -1.2184887 | 0.1856240 | -2.3709091 | -0.9723500 |
Dorea_formicigenerans|SGB4575 | -3.7022219 | 0.0000000 | 0.0000000 | -0.9355114 | 0.0027729 | 0.0279419 | -1.3628513 | 0.0021159 | 0.0371445 | -0.5081715 | 0.2439269 | 0.7132685 | 0.8546798 | 0.1670255 | 0.7047430 | -1.5434766 | -0.3275462 | -2.2243460 | -0.5013567 | -1.3662587 | 0.3499158 |
GGB9765_SGB15382|SGB15382 | 1.0673469 | 0.0011491 | 0.0040683 | -1.0948838 | 0.0008613 | 0.0137595 | -0.8888946 | 0.0534306 | 0.2397056 | -1.3008730 | 0.0048154 | 0.1156461 | -0.4119784 | 0.5237822 | 0.9032270 | -1.7315652 | -0.4582025 | -1.7910804 | 0.0132911 | -2.1994905 | -0.4022555 |
GGB9770_SGB15390|SGB15390 | 1.1985452 | 0.0018894 | 0.0062189 | -1.1558196 | 0.0027049 | 0.0276831 | -1.2498933 | 0.0213176 | 0.1410406 | -1.0617458 | 0.0490796 | 0.3494251 | 0.1881474 | 0.8044726 | 0.9863326 | -1.9050092 | -0.4066299 | -2.3115048 | -0.1882818 | -2.1191585 | -0.0043332 |
GGB34900_SGB14891|SGB14891 | 0.2152475 | 0.4304178 | 0.5474246 | -0.9159150 | 0.0009616 | 0.0143146 | -0.9716952 | 0.0127697 | 0.1058753 | -0.8601348 | 0.0265929 | 0.2488338 | 0.1115605 | 0.8379517 | 0.9863326 | -1.4536634 | -0.3781666 | -1.7336918 | -0.2096987 | -1.6191174 | -0.1011521 |
Mediterraneibacter_faecis|SGB4563 | -2.7423588 | 0.0000000 | 0.0000000 | -1.2812291 | 0.0016745 | 0.0197177 | -1.4053831 | 0.0143829 | 0.1121521 | -1.1570751 | 0.0424526 | 0.3196146 | 0.2483080 | 0.7571525 | 0.9839977 | -2.0727745 | -0.4896837 | -2.5270133 | -0.2837529 | -2.2742690 | -0.0398812 |
Ruminococcus_torques|SGB4608 | -2.4739912 | 0.0000004 | 0.0000037 | -3.2736948 | 0.0000000 | 0.0000000 | -4.5057027 | 0.0000000 | 0.0000001 | -2.0416869 | 0.0023286 | 0.0762611 | 2.4640158 | 0.0092320 | 0.3779338 | -4.1969767 | -2.3504129 | -5.8140053 | -3.1974001 | -3.3448149 | -0.7385589 |
Clostridium_sp_AM29_11AC|SGB4701 | -0.4618260 | 0.0573963 | 0.1105723 | -0.8479677 | 0.0005726 | 0.0104528 | -1.1204428 | 0.0012852 | 0.0261310 | -0.5754926 | 0.0929931 | 0.4953435 | 0.5449502 | 0.2604760 | 0.8120367 | -1.3244701 | -0.3714652 | -1.7956530 | -0.4452325 | -1.2480322 | 0.0970471 |
Lachnoclostridium_sp_An138|SGB4774 | 0.6048418 | 0.0372379 | 0.0782846 | -0.8925786 | 0.0022861 | 0.0244058 | -0.5215291 | 0.2030198 | 0.4858377 | -1.2636282 | 0.0022183 | 0.0762611 | -0.7420991 | 0.1993789 | 0.7592625 | -1.4612115 | -0.3239458 | -1.3272892 | 0.2842311 | -2.0662014 | -0.4610550 |
GGB9782_SGB15403|SGB15403 | -0.0300356 | 0.9226094 | 0.9457107 | -1.0147510 | 0.0012421 | 0.0166385 | -1.0541359 | 0.0170846 | 0.1257348 | -0.9753662 | 0.0265480 | 0.2488338 | 0.0787697 | 0.8986354 | 0.9882489 | -1.6243573 | -0.4051447 | -1.9179560 | -0.1903158 | -1.8357697 | -0.1149627 |
Agathobaculum_butyriciproducens|SGB14991 | -0.2565420 | 0.6019700 | 0.6881158 | 1.6523975 | 0.0009534 | 0.0143146 | 1.5020445 | 0.0323041 | 0.1793149 | 1.8027505 | 0.0101347 | 0.1443100 | 0.3007060 | 0.7597842 | 0.9847527 | 0.6829756 | 2.6218193 | 0.1283611 | 2.8757279 | 0.4345002 | 3.1710007 |
Clostridium_SGB48024|SGB48024 | 0.3647124 | 0.4433406 | 0.5595146 | 1.6969981 | 0.0004616 | 0.0091613 | 1.7940744 | 0.0084181 | 0.0810863 | 1.5999218 | 0.0180776 | 0.2077339 | -0.1941526 | 0.8381881 | 0.9863326 | 0.7597490 | 2.6342472 | 0.4659802 | 3.1221686 | 0.2770804 | 2.9227631 |
GGB9301_SGB14262|SGB14262 | -0.6057143 | 0.0087942 | 0.0230408 | -0.7214969 | 0.0018892 | 0.0214598 | -1.0116558 | 0.0021015 | 0.0371445 | -0.4313379 | 0.1827167 | 0.6245987 | 0.5803179 | 0.2057198 | 0.7662822 | -1.1724873 | -0.2705064 | -1.6507152 | -0.3725965 | -1.0678696 | 0.2051938 |
GGB9531_SGB14932|SGB14932 | 0.8221633 | 0.0125130 | 0.0315231 | -1.2192043 | 0.0002504 | 0.0058583 | -1.1578934 | 0.0130516 | 0.1067372 | -1.2805152 | 0.0059578 | 0.1159554 | -0.1226219 | 0.8508408 | 0.9863326 | -1.8620416 | -0.5763670 | -2.0688022 | -0.2469845 | -2.1878213 | -0.3732092 |
Oscillibacter_valericigenes|SGB15076 | 2.2580757 | 0.0000000 | 0.0000003 | -1.5899899 | 0.0000625 | 0.0022745 | -0.6063213 | 0.2701780 | 0.5556680 | -2.5736585 | 0.0000052 | 0.0017026 | -1.9673372 | 0.0119123 | 0.3918458 | -2.3536818 | -0.8262980 | -1.6884827 | 0.4758401 | -3.6515397 | -1.4957772 |
GGB9781_SGB66170|SGB66170 | 0.6273651 | 0.0125706 | 0.0315469 | -0.8169176 | 0.0012447 | 0.0166385 | -0.2531851 | 0.4732767 | 0.7469788 | -1.3806502 | 0.0001232 | 0.0161440 | -1.1274651 | 0.0246577 | 0.4250209 | -1.3077715 | -0.3260638 | -0.9487314 | 0.4423612 | -2.0734455 | -0.6878549 |
GGB9616_SGB15052|SGB15052 | 0.2265227 | 0.5569800 | 0.6503064 | 1.4021170 | 0.0003630 | 0.0076691 | 1.1112031 | 0.0432360 | 0.2161802 | 1.6930308 | 0.0021665 | 0.0762611 | 0.5818277 | 0.4508340 | 0.9032270 | 0.6420630 | 2.1621710 | 0.0341967 | 2.1882096 | 0.6202842 | 2.7657775 |
GGB9719_SGB15273|SGB15273 | 0.8922234 | 0.0029905 | 0.0093345 | -1.0846079 | 0.0003359 | 0.0073333 | -0.8610199 | 0.0416881 | 0.2128998 | -1.3081959 | 0.0020650 | 0.0762611 | -0.4471760 | 0.4510845 | 0.9032270 | -1.6690874 | -0.5001285 | -1.6892350 | -0.0328049 | -2.1331353 | -0.4832566 |
GGB58158_SGB79798|SGB79798 | 0.1593948 | 0.6041104 | 0.6893594 | -0.9500524 | 0.0023102 | 0.0244058 | -1.4215299 | 0.0013165 | 0.0261310 | -0.4785750 | 0.2707371 | 0.7480135 | 0.9429550 | 0.1263283 | 0.6652629 | -1.5559443 | -0.3441606 | -2.2800867 | -0.5629732 | -1.3337359 | 0.3765860 |
Enterocloster_hominis|SGB4721 | -0.3276233 | 0.3731204 | 0.4858725 | -1.7193896 | 0.0000059 | 0.0009605 | -2.2601527 | 0.0000243 | 0.0014449 | -1.1786265 | 0.0241348 | 0.2389575 | 1.0815263 | 0.1423888 | 0.6732469 | -2.4438001 | -0.9949791 | -3.2866519 | -1.2336535 | -2.2010657 | -0.1561873 |
Streptococcus_thermophilus|SGB8002 | -1.8386887 | 0.0000000 | 0.0000001 | -2.6383317 | 0.0000000 | 0.0000000 | -4.3108644 | 0.0000000 | 0.0000000 | -0.9657989 | 0.0223330 | 0.2321919 | 3.3450656 | 0.0000001 | 0.0000492 | -3.2240986 | -2.0525647 | -5.1409039 | -3.4808250 | -1.7925554 | -0.1390424 |
Blautia_obeum|SGB4810 | -1.1383366 | 0.0029791 | 0.0093345 | 1.5935042 | 0.0000404 | 0.0022062 | 2.4572472 | 0.0000087 | 0.0006349 | 0.7297612 | 0.1726515 | 0.6179602 | -1.7274860 | 0.0234188 | 0.4250209 | 0.8481024 | 2.3389060 | 1.4010031 | 3.5134913 | -0.3223053 | 1.7818277 |
GGB6601_SGB9333|SGB9333 | 0.4822516 | 0.2651432 | 0.3710871 | -1.3060574 | 0.0028644 | 0.0284265 | -1.7229312 | 0.0054173 | 0.0667807 | -0.8891836 | 0.1460214 | 0.5761689 | 0.8337476 | 0.3351856 | 0.8525047 | -2.1577352 | -0.4543795 | -2.9297698 | -0.5160925 | -2.0912490 | 0.3128818 |
GGB51884_SGB49168|SGB49168 | 0.2416861 | 0.4524916 | 0.5643997 | 1.0547797 | 0.0012444 | 0.0166385 | 1.0263175 | 0.0253612 | 0.1573078 | 1.0832419 | 0.0179329 | 0.2077339 | 0.0569244 | 0.9294397 | 0.9892080 | 0.4210183 | 1.6885411 | 0.1282693 | 1.9243657 | 0.1887457 | 1.9777381 |
Lachnospiraceae_bacterium_OF09_6|SGB4966 | -0.5582422 | 0.0094973 | 0.0246855 | 0.8470612 | 0.0001027 | 0.0031989 | 1.1090302 | 0.0003173 | 0.0090367 | 0.5850921 | 0.0529900 | 0.3663651 | -0.5239381 | 0.2197875 | 0.7798190 | 0.4271049 | 1.2670174 | 0.5139467 | 1.7041137 | -0.0076377 | 1.1778219 |
GGB3740_SGB5076|SGB5076 | -0.7648294 | 0.0481240 | 0.0955189 | 1.3485531 | 0.0005786 | 0.0104528 | 1.9636392 | 0.0004111 | 0.0112206 | 0.7334671 | 0.1779202 | 0.6231964 | -1.2301720 | 0.1112173 | 0.6121928 | 0.5901109 | 2.1069954 | 0.8889166 | 3.0383617 | -0.3370047 | 1.8039390 |
GGB4596_SGB6358|SGB6358 | -0.0686126 | 0.8863782 | 0.9200915 | 1.6018760 | 0.0010368 | 0.0150910 | 2.2961170 | 0.0009099 | 0.0212847 | 0.9076349 | 0.1816979 | 0.6245987 | -1.3884821 | 0.1495398 | 0.6849552 | 0.6551087 | 2.5486432 | 0.9545354 | 3.6376986 | -0.4286404 | 2.2439103 |
Roseburia_lenta|SGB4957 | -1.2101317 | 0.0000273 | 0.0001701 | 1.5546362 | 0.0000001 | 0.0000253 | 2.5269764 | 0.0000000 | 0.0000004 | 0.5822959 | 0.1428125 | 0.5761689 | -1.9446805 | 0.0006665 | 0.0630274 | 1.0013719 | 2.1079004 | 1.7429937 | 3.3109591 | -0.1985860 | 1.3631778 |
GGB3288_SGB4342|SGB4342 | -0.2254480 | 0.5526037 | 0.6475053 | 1.3280069 | 0.0005905 | 0.0104528 | 1.6574008 | 0.0023775 | 0.0379599 | 0.9986130 | 0.0636274 | 0.4007304 | -0.6587878 | 0.3858695 | 0.8722610 | 0.5798854 | 2.0761284 | 0.5973029 | 2.7174987 | -0.0572920 | 2.0545181 |
Clostridium_methylpentosum|SGB14962 | -0.2057003 | 0.1942039 | 0.2852097 | 0.5040337 | 0.0016858 | 0.0197177 | 0.6029955 | 0.0077422 | 0.0777972 | 0.4050719 | 0.0707806 | 0.4367662 | -0.1979236 | 0.5314189 | 0.9088234 | 0.1924399 | 0.8156275 | 0.1614630 | 1.0445280 | -0.0347143 | 0.8448581 |
Clostridium_SGB4751|SGB4751 | -0.6597956 | 0.0123001 | 0.0311063 | 0.8542261 | 0.0012805 | 0.0167745 | 1.5526183 | 0.0000432 | 0.0023606 | 0.1558339 | 0.6723502 | 0.9552084 | -1.3967843 | 0.0081258 | 0.3779338 | 0.3396190 | 1.3688332 | 0.8234133 | 2.2818233 | -0.5704869 | 0.8821548 |
GGB4604_SGB6371|SGB6371 | -0.4755155 | 0.1600789 | 0.2449806 | 1.1076774 | 0.0012408 | 0.0166385 | 1.8861285 | 0.0001164 | 0.0050824 | 0.3292262 | 0.4897372 | 0.8628944 | -1.5569024 | 0.0221334 | 0.4250209 | 0.4423075 | 1.7730472 | 0.9432908 | 2.8289663 | -0.6098825 | 1.2683348 |
GGB9545_SGB14952|SGB14952 | -0.2047745 | 0.4620639 | 0.5688945 | -0.9693579 | 0.0006237 | 0.0107498 | -0.9846295 | 0.0133625 | 0.1067372 | -0.9540863 | 0.0160424 | 0.1982593 | 0.0305432 | 0.9562227 | 0.9920937 | -1.5178948 | -0.4208210 | -1.7619134 | -0.2073455 | -1.7282959 | -0.1798766 |
GGB3363_SGB4448|SGB4448 | -0.0072223 | 0.9811593 | 0.9887067 | 0.8734580 | 0.0047915 | 0.0451440 | 1.1625032 | 0.0079741 | 0.0779558 | 0.5844128 | 0.1769958 | 0.6231964 | -0.5780904 | 0.3452676 | 0.8525047 | 0.2704404 | 1.4764756 | 0.3080194 | 2.0169870 | -0.2666914 | 1.4355170 |
GGB9634_SGB15097|SGB15097 | 0.6235922 | 0.0993983 | 0.1709022 | 1.3089666 | 0.0006478 | 0.0108798 | 1.0321693 | 0.0546291 | 0.2401482 | 1.5857639 | 0.0032679 | 0.0891857 | 0.5535946 | 0.4630128 | 0.9032270 | 0.5659230 | 2.0520101 | -0.0207332 | 2.0850717 | 0.5370259 | 2.6345019 |
Clostridiaceae_unclassified_SGB4771|SGB4771 | -0.0644846 | 0.8928803 | 0.9239125 | 1.9753772 | 0.0000578 | 0.0022745 | 1.6075161 | 0.0188364 | 0.1316514 | 2.3432383 | 0.0006624 | 0.0516373 | 0.7357222 | 0.4427805 | 0.9032270 | 1.0311990 | 2.9195555 | 0.2696032 | 2.9454291 | 1.0106171 | 3.6758596 |
GGB3627_SGB4906|SGB4906 | -0.1978070 | 0.5137310 | 0.6197025 | 0.9810423 | 0.0014239 | 0.0179357 | 0.4893157 | 0.2549213 | 0.5369277 | 1.4727690 | 0.0007095 | 0.0516373 | 0.9834533 | 0.1056955 | 0.6044502 | 0.3841932 | 1.5778915 | -0.3564274 | 1.3350588 | 0.6303710 | 2.3151670 |
GGB4682_SGB6472|SGB6472 | -0.3788127 | 0.1653363 | 0.2515762 | 1.1201390 | 0.0000602 | 0.0022745 | 1.4981716 | 0.0001465 | 0.0056449 | 0.7421063 | 0.0548149 | 0.3663651 | -0.7560653 | 0.1662035 | 0.7047430 | 0.5833697 | 1.6569083 | 0.7375625 | 2.2587807 | -0.0154944 | 1.4997071 |
GGB3491_SGB4665|SGB4665 | -1.4245623 | 0.0002684 | 0.0011633 | 1.1811718 | 0.0023606 | 0.0245432 | 1.5457577 | 0.0048927 | 0.0654019 | 0.8165859 | 0.1321344 | 0.5620002 | -0.7291719 | 0.3416679 | 0.8525047 | 0.4262265 | 1.9361171 | 0.4759904 | 2.6155251 | -0.2489503 | 1.8821221 |
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | -0.5399399 | 0.1494476 | 0.2336234 | 1.5007056 | 0.0000873 | 0.0030084 | 1.8062837 | 0.0007950 | 0.0192866 | 1.1951275 | 0.0244430 | 0.2389575 | -0.6111562 | 0.4135852 | 0.9029943 | 0.7645336 | 2.2368776 | 0.7631183 | 2.8494491 | 0.1560881 | 2.2341670 |
GGB9616_SGB15051|SGB15051 | 0.0855806 | 0.8487542 | 0.8923500 | 1.5428945 | 0.0007315 | 0.0119776 | 1.1902810 | 0.0626233 | 0.2614302 | 1.8955080 | 0.0031534 | 0.0891857 | 0.7052270 | 0.4324207 | 0.9030772 | 0.6581212 | 2.4276678 | -0.0634543 | 2.4440163 | 0.6467315 | 3.1442846 |
Clostridium_sp_AF32_12BH|SGB4710 | -0.5495881 | 0.0149283 | 0.0364852 | 0.9347848 | 0.0000468 | 0.0022745 | 1.6313134 | 0.0000007 | 0.0000691 | 0.2382561 | 0.4508937 | 0.8462332 | -1.3930574 | 0.0021545 | 0.1763958 | 0.4936932 | 1.3758763 | 1.0062809 | 2.2563459 | -0.3843043 | 0.8608165 |
Butyricicoccus_intestinisimiae|SGB14980 | -0.0672104 | 0.8306405 | 0.8769598 | 1.1788153 | 0.0002397 | 0.0058146 | 1.4023294 | 0.0019194 | 0.0359211 | 0.9553013 | 0.0324590 | 0.2834754 | -0.4470281 | 0.4772281 | 0.9032270 | 0.5592517 | 1.7983790 | 0.5243996 | 2.2802592 | 0.0808439 | 1.8297587 |
Kineothrix_SGB4886|SGB4886 | -0.2963857 | 0.4607430 | 0.5683364 | 1.8346091 | 0.0000094 | 0.0011605 | 2.1206344 | 0.0002616 | 0.0083391 | 1.5485839 | 0.0068939 | 0.1177032 | -0.5720505 | 0.4765385 | 0.9032270 | 1.0430109 | 2.6262074 | 0.9989293 | 3.2423394 | 0.4313154 | 2.6658524 |
Blautia_SGB4833|SGB4833 | -0.5845526 | 0.0142412 | 0.0350676 | 0.7945741 | 0.0009460 | 0.0143146 | 1.2653662 | 0.0002161 | 0.0078619 | 0.3237820 | 0.3322600 | 0.7701268 | -0.9415842 | 0.0476409 | 0.5291961 | 0.3287364 | 1.2604117 | 0.6052681 | 1.9254642 | -0.3337053 | 0.9812692 |
GGB3490_SGB4664|SGB4664 | -1.0161094 | 0.0017769 | 0.0059080 | 0.9107708 | 0.0049624 | 0.0451440 | 1.6723066 | 0.0003049 | 0.0090367 | 0.1492350 | 0.7412832 | 0.9655910 | -1.5230716 | 0.0183871 | 0.4152943 | 0.2794101 | 1.5421315 | 0.7776602 | 2.5669530 | -0.7418729 | 1.0403429 |
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 | -0.5586082 | 0.0165850 | 0.0397919 | 0.9909003 | 0.0000301 | 0.0019743 | 1.7004673 | 0.0000006 | 0.0000649 | 0.2813332 | 0.3889158 | 0.8021211 | -1.4191341 | 0.0024706 | 0.1798027 | 0.5352638 | 1.4465368 | 1.0548245 | 2.3461102 | -0.3617560 | 0.9244224 |
Lactococcus_lactis|SGB7985 | -1.3914763 | 0.0000000 | 0.0000000 | -0.8685948 | 0.0000166 | 0.0015579 | -1.7399436 | 0.0000000 | 0.0000005 | 0.0027540 | 0.9920545 | 0.9978148 | 1.7426976 | 0.0000157 | 0.0051437 | -1.2549385 | -0.4822512 | -2.2873977 | -1.1924896 | -0.5425347 | 0.5480427 |
GGB4569_SGB6312|SGB6312 | -0.4496405 | 0.1512086 | 0.2357020 | 0.9348036 | 0.0031471 | 0.0303141 | 1.0716343 | 0.0163903 | 0.1219959 | 0.7979729 | 0.0716452 | 0.4367662 | -0.2736614 | 0.6613542 | 0.9762868 | 0.3190793 | 1.5505280 | 0.1991449 | 1.9441238 | -0.0710657 | 1.6670115 |
Open code
if(file.exists('gitignore/lm_results/result_microbiome_SGB10.csv') == FALSE){
write.table(result_microbiome,
'gitignore/lm_results/result_microbiome_SGB10.csv', row.names = FALSE)
}
dim(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05))
## [1] 73 22
5 Elastic net
To assess the predictive power of microbiome features to discriminate between diet strategy, we employed Elastic Net logistic regression.
As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).
The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.
All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5).
5.1 Prepare data for glmnet
Open code
<- data_microbiome_original %>%
data_microbiome_glmnet na.omit() %>%
::mutate(
dplyrvegan = as.numeric(
::if_else(
dplyr== "VEGAN", 1, 0
Diet
)
),::across(
dplyr`Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`,
~ arm::rescale(.)
)%>%
) ::select(
dplyr
vegan,Country,::everything()
dplyr%>%
) ::select(
dplyr
Sample, vegan, Country,`Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`
)
5.2 Fit model
Open code
<- "elanet_microbiome_SGB10"
modelac
assign(
modelac,run(
expr = clust_glmnet_sep(
data = data_microbiome_glmnet,
outcome = "vegan",
clust_id = "Sample",
sample_method = "oos_boot",
N = 500,
alphas = c(0, 0.2, 0.4),
family = "binomial",
seed = 478
),path = paste0("gitignore/run/", modelac)
) )
5.3 Model summary
Open code
$model_summary
elanet_microbiome_SGB10## alpha lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1 0.4 0.01486299 1 0.9162559 0.846903 0.9756075 1
## accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1 0.8313312 0.7377049 0.9200269
$country_AUC
elanet_microbiome_SGB10## auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1 0.8473722 0.7057108 0.9727431 0.961881
## auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1 0.8905114 1
5.4 ROC curve - internal validation
Open code
$plot elanet_microbiome_SGB10
5.5 Estimated coefficients
Open code
data.frame(
microbiome = row.names(
$betas
elanet_microbiome_SGB10
)[which(
abs(
$betas
elanet_microbiome_SGB10>0
)
)
],beta = elanet_microbiome_SGB10$betas[
abs(
$betas
elanet_microbiome_SGB10>0
)
]%>%
) mutate(
is_in_ExtValCoh = if_else(
%in% names(data_microbiome_validation),
microbiome 1, 0
)
)## microbiome beta
## 1 (Intercept) 0.610441581
## 2 GGB9602_SGB15031|SGB15031 -0.070574298
## 3 Parabacteroides_distasonis|SGB1934 0.087653621
## 4 GGB9636_SGB15107|SGB15107 -0.168074116
## 5 Faecalibacterium_prausnitzii|SGB15339 0.025794395
## 6 Faecalibacterium_SGB15346|SGB15346 0.062953892
## 7 GGB1420_SGB1957|SGB1957 -0.272563443
## 8 Bacteroides_eggerthii|SGB1829 -0.118990440
## 9 Escherichia_coli|SGB10068 -0.071177360
## 10 GGB3175_SGB4191|SGB4191 -0.099298405
## 11 Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861 -0.115460429
## 12 Bacteroides_finegoldii|SGB1862 -0.162879455
## 13 Lachnospira_pectinoschiza|SGB5075 -0.365148956
## 14 Bacteroides_clarus|SGB1832 -0.289508943
## 15 Anaerotignum_faecicola|SGB5190 -0.286519699
## 16 Hydrogenoanaerobacterium_saccharovorans|SGB15350 -0.232108131
## 17 Bacteroides_faecis|SGB1860 0.045904035
## 18 Roseburia_hominis|SGB4936 0.015862451
## 19 GGB9712_SGB15244|SGB15244 -0.097891044
## 20 Bacteroides_ovatus|SGB1871 0.183034474
## 21 GGB9614_SGB15049|SGB15049 0.029258824
## 22 Barnesiella_intestinihominis|SGB1965 -0.007740484
## 23 Agathobaculum_butyriciproducens|SGB14993 0.032484900
## 24 GGB6649_SGB9391|SGB9391 0.344150062
## 25 Ruminococcus_bromii|SGB4285 0.046253755
## 26 Lawsonibacter_asaccharolyticus|SGB15154 -0.030678589
## 27 Phocaeicola_massiliensis|SGB1812 -0.073558903
## 28 Clostridium_fessum|SGB4705 -0.381734689
## 29 Clostridium_sp_AF36_4|SGB4644 0.194737209
## 30 GGB9707_SGB15229|SGB15229 0.037661768
## 31 Roseburia_inulinivorans|SGB4940 -0.151998114
## 32 Phocaeicola_dorei|SGB1815 -0.063680552
## 33 GGB9502_SGB14899|SGB14899 -0.089915990
## 34 Lachnospiraceae_bacterium_AM48_27BH|SGB4706 0.060701908
## 35 GGB9619_SGB15067|SGB15067 -0.106146483
## 36 Bifidobacterium_catenulatum|SGB17241 -0.420349718
## 37 GGB9509_SGB14906|SGB14906 -0.358523335
## 38 Clostridiaceae_bacterium_Marseille_Q4143|SGB4768 -0.055607095
## 39 GGB2653_SGB3574|SGB3574 -0.159107570
## 40 Bacteroides_intestinalis|SGB1846 0.039513577
## 41 Lachnospiraceae_bacterium|SGB4781 -0.098557340
## 42 Phocea_massiliensis|SGB14837 -0.170687846
## 43 Bifidobacterium_pseudocatenulatum|SGB17237 -0.046789608
## 44 Guopingia_tenuis|SGB14127 -0.095718417
## 45 Blautia_SGB4815|SGB4815 -0.108580298
## 46 Holdemania_filiformis|SGB4046 -0.277067990
## 47 Anaeromassilibacillus_senegalensis|SGB14894 -0.040823316
## 48 Senegalimassilia_anaerobia|SGB14824_group -0.051929087
## 49 GGB3061_SGB4063|SGB4063 -0.011869279
## 50 Ruminococcus_sp_AF41_9|SGB25497 0.317161646
## 51 Eubacterium_ramulus|SGB4959 -0.256755135
## 52 Enterocloster_lavalensis|SGB4725 0.035375597
## 53 Youxingia_wuxianensis|SGB82503 -0.032674700
## 54 GGB9611_SGB15045|SGB15045 -0.182649714
## 55 Bianquea_renquensis|SGB4438 0.102744640
## 56 GGB38744_SGB14842|SGB14842 -0.500222676
## 57 Dorea_formicigenerans|SGB4575 -0.027329972
## 58 GGB9765_SGB15382|SGB15382 -0.161905786
## 59 Bacteroides_nordii|SGB1858 0.018517003
## 60 Coprobacter_secundus|SGB1962 -0.057869885
## 61 GGB3034_SGB4030|SGB4030 -0.184214318
## 62 GGB34900_SGB14891|SGB14891 -0.074206357
## 63 GGB3510_SGB4687|SGB4687 0.179292645
## 64 Mediterraneibacter_faecis|SGB4563 -0.448773880
## 65 Ruminococcus_torques|SGB4608 -0.680187035
## 66 Coprococcus_comes|SGB4577 -0.120546773
## 67 Eubacterium_ventriosum|SGB5045 0.278518059
## 68 Clostridium_sp_AM29_11AC|SGB4701 -0.015338712
## 69 GGB9715_SGB15260|SGB15260 0.268211671
## 70 GGB9758_SGB15368|SGB15368 -0.193072102
## 71 GGB4689_SGB6486|SGB6486 0.096498546
## 72 GGB2996_SGB3983|SGB3983 0.124127412
## 73 GGB6613_SGB9347|SGB9347 -0.214654462
## 74 GGB3304_SGB4367|SGB4367 0.013334299
## 75 GGB9782_SGB15403|SGB15403 -0.216010736
## 76 Agathobaculum_butyriciproducens|SGB14991 0.064163130
## 77 GGB9790_SGB15413|SGB15413 0.011216154
## 78 Clostridium_sp_AF12_28|SGB4715 -0.008211546
## 79 GGB9115_SGB14048|SGB14048 -0.147380413
## 80 GGB9642_SGB15119|SGB15119 -0.023157832
## 81 GGB9301_SGB14262|SGB14262 -0.152877745
## 82 GGB9238_SGB14180|SGB14180 -0.008415888
## 83 GGB9531_SGB14932|SGB14932 -0.266110535
## 84 Oscillibacter_valericigenes|SGB15076 -0.375649223
## 85 GGB9781_SGB66170|SGB66170 -0.269724544
## 86 GGB9524_SGB14924|SGB14924 0.019605114
## 87 Segatella_copri|SGB1626 0.034112520
## 88 Mediterraneibacter_butyricigenes|SGB25493 0.077104734
## 89 GGB9719_SGB15273|SGB15273 -0.359498807
## 90 Eubacteriaceae_bacterium|SGB5043 -0.095616540
## 91 Enterocloster_hominis|SGB4721 -0.318374333
## 92 Blautia_sp_OF03_15BH|SGB4779 -0.201845304
## 93 GGB9110_SGB14043|SGB14043 -0.178628970
## 94 GGB9695_SGB15210|SGB15210 -0.077590685
## 95 Streptococcus_thermophilus|SGB8002 -1.014644514
## 96 GGB781_SGB1024|SGB1024 0.065530468
## 97 GGB1627_SGB2230|SGB2230 -0.009525388
## 98 Christensenella_hongkongensis|SGB14143 0.015072123
## 99 GGB6612_SGB9346|SGB9346 -0.071779934
## 100 Blautia_obeum|SGB4810 0.235493349
## 101 GGB3480_SGB4648|SGB4648 -0.101711770
## 102 GGB6601_SGB9333|SGB9333 -0.228053801
## 103 GGB3352_SGB4436|SGB4436 0.269468285
## 104 GGB45491_SGB63163|SGB63163 -0.184463830
## 105 GGB9708_SGB15233|SGB15233 0.040526093
## 106 Lachnospiraceae_bacterium_OF09_6|SGB4966 0.100910774
## 107 Eubacterium_sp_AF16_48|SGB5042 0.006916398
## 108 Gemmiger_SGB15301|SGB15301 -0.025267298
## 109 Pseudoflavonifractor_gallinarum|SGB29328 -0.052736280
## 110 Clostridium_sp_AT4|SGB4753 -0.282544762
## 111 GGB3740_SGB5076|SGB5076 0.475850142
## 112 Hydrogeniiclostridium_mannosilyticum|SGB14890 0.021306856
## 113 GGB4596_SGB6358|SGB6358 0.322901276
## 114 Blautia_stercoris|SGB4788 0.162017063
## 115 Roseburia_lenta|SGB4957 0.539526518
## 116 GGB3288_SGB4342|SGB4342 0.295158480
## 117 GGB9574_SGB14987|SGB14987 0.312370613
## 118 Clostridium_methylpentosum|SGB14962 0.112475927
## 119 GGB1495_SGB2071|SGB2071 0.267922114
## 120 GGB3828_SGB5197|SGB5197 0.144942485
## 121 Clostridium_SGB4751|SGB4751 0.028098610
## 122 GGB3486_SGB4658|SGB4658 0.155462609
## 123 Monoglobus_pectinilyticus|SGB4166 0.264077066
## 124 Enterocloster_bolteae|SGB4757 0.031574722
## 125 GGB4604_SGB6371|SGB6371 0.035415533
## 126 Klebsiella_pneumoniae|SGB10115_group 0.032375569
## 127 GGB9258_SGB14205|SGB14205 0.100032971
## 128 GGB9545_SGB14952|SGB14952 -0.132446606
## 129 Eubacterium_sp_AF34_35BH|SGB5051 -0.104426809
## 130 GGB3363_SGB4448|SGB4448 0.335084056
## 131 Lachnoclostridium_pacaense|SGB4763 0.097758506
## 132 GGB9634_SGB15097|SGB15097 0.566674657
## 133 Paratractidigestivibacter_faecalis|SGB14370 0.001896832
## 134 Clostridiaceae_unclassified_SGB4771|SGB4771 0.263009292
## is_in_ExtValCoh
## 1 0
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
## 11 1
## 12 1
## 13 1
## 14 1
## 15 1
## 16 1
## 17 1
## 18 1
## 19 1
## 20 1
## 21 1
## 22 1
## 23 1
## 24 1
## 25 1
## 26 1
## 27 1
## 28 1
## 29 1
## 30 1
## 31 1
## 32 1
## 33 1
## 34 1
## 35 1
## 36 1
## 37 1
## 38 1
## 39 1
## 40 1
## 41 1
## 42 1
## 43 1
## 44 1
## 45 1
## 46 1
## 47 1
## 48 1
## 49 1
## 50 1
## 51 1
## 52 1
## 53 1
## 54 1
## 55 1
## 56 1
## 57 1
## 58 1
## 59 1
## 60 1
## 61 1
## 62 1
## 63 1
## 64 1
## 65 1
## 66 1
## 67 1
## 68 1
## 69 1
## 70 1
## 71 1
## 72 1
## 73 1
## 74 1
## 75 1
## 76 1
## 77 1
## 78 1
## 79 1
## 80 1
## 81 1
## 82 1
## 83 1
## 84 1
## 85 1
## 86 1
## 87 1
## 88 1
## 89 1
## 90 1
## 91 1
## 92 1
## 93 1
## 94 1
## 95 1
## 96 1
## 97 1
## 98 1
## 99 1
## 100 1
## 101 1
## 102 1
## 103 1
## 104 1
## 105 1
## 106 1
## 107 1
## 108 1
## 109 1
## 110 1
## 111 1
## 112 1
## 113 1
## 114 1
## 115 1
## 116 1
## 117 1
## 118 1
## 119 1
## 120 1
## 121 1
## 122 1
## 123 1
## 124 1
## 125 1
## 126 1
## 127 1
## 128 1
## 129 1
## 130 1
## 131 1
## 132 1
## 133 1
## 134 1
5.6 Plot beta coefficients
Open code
<- data.frame(
elacoef microbiome = row.names(elanet_microbiome_SGB10$betas),
beta_ela = elanet_microbiome_SGB10$betas[, 1]
%>%
) arrange(abs(beta_ela)) %>%
filter(abs(beta_ela) > 0,
!grepl('Intercept', microbiome)) %>%
mutate(microbiome = factor(microbiome, levels = microbiome))
<- "elanet_beta_microbiome_SGB10"
plotac <- "gitignore/figures"
path
assign(plotac,
ggplot(elacoef,
aes(
x = microbiome,
y = beta_ela
)+
) geom_point() +
geom_hline(yintercept = 0, color = "black") +
labs(
y = "Standardized beta coefficients",
x = "Bacteria species"
+
) theme_minimal() +
coord_flip() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 7,
height = 14
) }
6 External validation
External validation was performed with an independent Czech cohort.
As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each taxa as taken from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.
In a 2nd step, we will look at taxa that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated by linear models (one per a taxa) with data of training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these taxa will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort
6.1 Prediction of diet (elastic net)
6.1.1 Get table of weights, means and SDs
Open code
<- get_coef(
coefs_microbiome_all original_data = data_analysis,
glmnet_model = elanet_microbiome_SGB10)
coefs_microbiome_all## # A tibble: 570 × 5
## predictor beta_scaled beta_OrigScale mean SD
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.610 NA NA NA
## 2 Bacteroides_stercoris|SGB1830 0 0 1.75 4.46
## 3 Alistipes_putredinis|SGB2318 0 0 5.93 3.94
## 4 Candidatus_Cibionibacter_quicibialis… 0 0 6.26 2.24
## 5 Bacteroides_uniformis|SGB1836 0 0 7.09 2.00
## 6 Eubacterium_siraeum|SGB4198 0 0 -0.162 3.77
## 7 GGB9634_SGB15093|SGB15093 0 0 0.190 2.83
## 8 GGB9602_SGB15031|SGB15031 -0.0706 -0.480 1.77 3.40
## 9 Phocaeicola_vulgatus|SGB1814 0 0 6.61 2.60
## 10 Faecalibacterium_prausnitzii|SGB15316 0 0 6.18 2.31
## # ℹ 560 more rows
6.1.3 Standardize data in validation set
Open code
<- data_microbiome_validation %>%
data_microbiome_validation_pred_all ::mutate(
dplyrvegan = if_else(
== "VEGAN", 1, 0
Diet
)%>%
) ::select(
dplyr
vegan,::all_of(common_predictors)
dplyr%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
- coefs_microbiome_all$mean[
match(
cur_column(),
$predictor
coefs_microbiome_all
)
]
)%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
/ coefs_microbiome_all$SD[
match(
cur_column(),
$predictor
coefs_microbiome_all
)
]
) )
6.1.4 Result
6.1.4.1 ROC curve
Open code
<- as.matrix(data_microbiome_validation_pred_all[,-1])
newx
<- predict(
predicted $fit,
elanet_microbiome_SGB10newx = newx)
<- data_microbiome_validation_pred_all %>%
tr ::mutate(
dplyrpredicted_logit = as.numeric(
predict(
$fit,
elanet_microbiome_SGB10newx = newx
)
)%>%
) ::mutate(
dplyrpredicted = inv_logit(predicted_logit)
)
<- pROC::roc(
roc_microbiome_all ~ predicted_logit,
vegan data = tr,
direction = "<",
levels = c(0, 1),
ci = TRUE
)
roc_microbiome_all##
## Call:
## roc.formula(formula = vegan ~ predicted_logit, data = tr, direction = "<", levels = c(0, 1), ci = TRUE)
##
## Data: predicted_logit in 43 controls (vegan 0) < 59 cases (vegan 1).
## Area under the curve: 0.8928
## 95% CI: 0.8283-0.9573 (DeLong)
<- "roc_microbiom_SGB10"
plotac <- "gitignore/figures"
path
assign(plotac, ggroc(roc_microbiome_all))
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 4.5
) }
6.1.4.2 Table
Open code
<- elanet_microbiome_SGB10
mod
<- mean(mod[["valid_performances"]]$auc_resamp_test)
trainAUC <- quantile(mod[["valid_performances"]]$auc_resamp_test,
trainCI probs = c(1 / 40, 39 / 40)
)
<- data.frame(
res alpha = c(mod$model_summary$alpha, rep("", 4)),
lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
`performance type` = c(
"Training set AUC",
"Out-of-sample AUC (all)",
"Out-of-sample AUC (Czech)",
"Out-of-sample AUC (Italy)",
"External validation AUC"
),`performance [95% CI]` = c(
sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
sprintf(
"%.3f [%.3f to %.3f]",
$model_summary$auc_OutOfSample,
mod$model_summary$auc_oos_CIL,
mod$model_summary$auc_oos_CIU
mod
),sprintf(
"%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_CZ,
mod$country_AUC$auc_oos_CIL_CZ,
mod$country_AUC$auc_oos_CIU_CZ
mod
),sprintf(
"%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_IT,
mod$country_AUC$auc_oos_CIL_IT,
mod$country_AUC$auc_oos_CIU_IT
mod
),sprintf(
"%.3f [%.3f to %.3f]",
"ci"]][2],
roc_microbiome_all[["ci"]][1],
roc_microbiome_all[["ci"]][3]
roc_microbiome_all[[
)
)
)
::kable(
kableExtra%>% mutate(across(where(is.numeric), ~ round(.x, 3))),
res caption = "Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed counts of bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method."
)
alpha | lambda | performance.type | performance..95..CI. |
---|---|---|---|
0.4 | 0.0149 | Training set AUC | 1.000 [1.000 to 1.000] |
Out-of-sample AUC (all) | 0.916 [0.847 to 0.976] | ||
Out-of-sample AUC (Czech) | 0.962 [0.891 to 1.000] | ||
Out-of-sample AUC (Italy) | 0.847 [0.706 to 0.973] | ||
External validation AUC | 0.893 [0.828 to 0.957] |
6.2 Diet effect across datasets
Similarly as in training data cohorts, we will fit linear model per each of the selected taxa (\(CLR\) - transformed), with a single fixed effect factor of diet
.
6.2.1 Linear models in validation cohort
Open code
<- result_microbiome %>%
diet_sensitive_taxa filter(fdr_VGdiet_avg < 0.05) %>%
pull(outcome)
<- length(diet_sensitive_taxa)
len
<- data_microbiome_validation %>%
data_analysis_microbiome ::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== 'VEGAN', 1, 0
Diet
)
)%>%
) ::select(
dplyr
Diet_VEGAN,all_of(diet_sensitive_taxa)
)
6.2.1.1 Define number of microbiome and covariates
Open code
<- 1
n_covarites <- ncol(data_analysis_microbiome) - n_covarites n_features
6.2.1.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
est_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features) CI_U_VGdiet
6.2.1.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_microbiome[, (i + n_covarites)]
data_analysis_microbiome
## fit model
<- lm(outcome ~ Diet_VEGAN, data = data_analysis_microbiome)
model
## save results
<- names(data_analysis_microbiome)[i + n_covarites]
outcome[i]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]
CI_U_VGdiet[i]
<- summary(model)$coefficients[
est_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
] }
6.2.1.4 Results table
Open code
<- data.frame(
result_microbiome_val
outcome,
est_VGdiet, P_VGdiet,
CI_L_VGdiet, CI_U_VGdiet
)
::kable(result_microbiome_val,
kableExtracaption = "Results of linear models estimating the effect of diet on the CLR-transformed relative abundance of a given taxon as the outcome. Only taxa whose relative abundance significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `est` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed relative abundnace differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model"
)
outcome | est_VGdiet | P_VGdiet | CI_L_VGdiet | CI_U_VGdiet |
---|---|---|---|---|
GGB1420_SGB1957|SGB1957 | -1.6319786 | 0.0001833 | -2.4651743 | -0.7987830 |
Escherichia_coli|SGB10068 | -0.1364030 | 0.7504858 | -0.9850710 | 0.7122650 |
Bacteroides_clarus|SGB1832 | -0.5846336 | 0.0591555 | -1.1922920 | 0.0230247 |
Hydrogenoanaerobacterium_saccharovorans|SGB15350 | -0.0236533 | 0.9237105 | -0.5124669 | 0.4651603 |
GGB9775_SGB15395|SGB15395 | -0.0622451 | 0.8622175 | -0.7719738 | 0.6474835 |
Ruthenibacterium_lactatiformans|SGB15271 | -0.5610860 | 0.1251910 | -1.2809841 | 0.1588121 |
Lawsonibacter_asaccharolyticus|SGB15154 | -2.0443547 | 0.0000027 | -2.8593796 | -1.2293298 |
Clostridium_fessum|SGB4705 | -0.1266632 | 0.6413189 | -0.6644529 | 0.4111264 |
GGB9707_SGB15229|SGB15229 | -0.3170753 | 0.5258509 | -1.3052703 | 0.6711197 |
Collinsella_aerofaciens|SGB14535 | 0.8831037 | 0.1607185 | -0.3567223 | 2.1229298 |
GGB9509_SGB14906|SGB14906 | -0.6808254 | 0.0481071 | -1.3558713 | -0.0057794 |
GGB45432_SGB63101|SGB63101 | -0.4316762 | 0.2950971 | -1.2453732 | 0.3820207 |
GGB2653_SGB3574|SGB3574 | -0.7730105 | 0.0527748 | -1.5554614 | 0.0094403 |
Phocea_massiliensis|SGB14837 | -0.2945252 | 0.4199778 | -1.0160979 | 0.4270474 |
Guopingia_tenuis|SGB14127 | -0.2466821 | 0.5415046 | -1.0455355 | 0.5521714 |
Lachnospiraceae_bacterium|SGB4953 | 1.0640524 | 0.1217977 | -0.2887385 | 2.4168432 |
Holdemania_filiformis|SGB4046 | -0.9757476 | 0.0010138 | -1.5474194 | -0.4040757 |
Anaeromassilibacillus_senegalensis|SGB14894 | 0.1049346 | 0.7858395 | -0.6592016 | 0.8690709 |
GGB3061_SGB4063|SGB4063 | -0.4248798 | 0.1440428 | -0.9973650 | 0.1476053 |
Ruminococcus_sp_AF41_9|SGB25497 | 1.5114864 | 0.0062116 | 0.4388303 | 2.5841426 |
Eubacterium_ramulus|SGB4959 | -0.8189588 | 0.0014835 | -1.3160877 | -0.3218300 |
GGB3000_SGB3991|SGB3991 | -0.0994500 | 0.6181309 | -0.4940141 | 0.2951140 |
Youxingia_wuxianensis|SGB82503 | -0.8425429 | 0.0219470 | -1.5606907 | -0.1243952 |
Blautia_massiliensis|SGB4826 | -0.4043207 | 0.2994738 | -1.1734276 | 0.3647862 |
GGB38744_SGB14842|SGB14842 | -0.5994825 | 0.0627007 | -1.2313410 | 0.0323760 |
Dorea_formicigenerans|SGB4575 | 0.0818715 | 0.5861126 | -0.2154860 | 0.3792291 |
GGB9765_SGB15382|SGB15382 | -0.3957603 | 0.3593542 | -1.2484426 | 0.4569219 |
GGB9770_SGB15390|SGB15390 | -0.3093744 | 0.4051897 | -1.0436384 | 0.4248895 |
GGB34900_SGB14891|SGB14891 | -0.4120472 | 0.1278995 | -0.9445541 | 0.1204597 |
Mediterraneibacter_faecis|SGB4563 | -1.2259291 | 0.0000274 | -1.7790085 | -0.6728497 |
Ruminococcus_torques|SGB4608 | -2.0120926 | 0.0002102 | -3.0496158 | -0.9745694 |
Clostridium_sp_AM29_11AC|SGB4701 | 0.0928994 | 0.3642174 | -0.1093027 | 0.2951015 |
Lachnoclostridium_sp_An138|SGB4774 | 0.4975782 | 0.0087858 | 0.1282307 | 0.8669257 |
GGB9782_SGB15403|SGB15403 | -0.1246240 | 0.5085541 | -0.4972846 | 0.2480366 |
Agathobaculum_butyriciproducens|SGB14991 | 0.3688243 | 0.4169264 | -0.5288526 | 1.2665012 |
Clostridium_SGB48024|SGB48024 | 0.3968599 | 0.3624847 | -0.4638070 | 1.2575268 |
GGB9301_SGB14262|SGB14262 | -0.1795302 | 0.3499757 | -0.5588376 | 0.1997773 |
GGB9531_SGB14932|SGB14932 | -0.7159136 | 0.0477755 | -1.4246752 | -0.0071520 |
Oscillibacter_valericigenes|SGB15076 | -0.3354555 | 0.1998820 | -0.8512073 | 0.1802962 |
GGB9781_SGB66170|SGB66170 | -0.5684455 | 0.1097144 | -1.2672650 | 0.1303741 |
GGB9616_SGB15052|SGB15052 | 0.8326234 | 0.0290113 | 0.0869295 | 1.5783172 |
GGB9719_SGB15273|SGB15273 | 0.1656707 | 0.1447268 | -0.0579396 | 0.3892811 |
GGB58158_SGB79798|SGB79798 | -0.6709277 | 0.0785820 | -1.4199318 | 0.0780764 |
Enterocloster_hominis|SGB4721 | -2.6863647 | 0.0000000 | -3.5857630 | -1.7869663 |
Streptococcus_thermophilus|SGB8002 | -2.0254710 | 0.0000042 | -2.8510019 | -1.1999401 |
Blautia_obeum|SGB4810 | 1.9982699 | 0.0000426 | 1.0724358 | 2.9241040 |
GGB6601_SGB9333|SGB9333 | -0.3380189 | 0.2473263 | -0.9143156 | 0.2382778 |
GGB51884_SGB49168|SGB49168 | 0.0493839 | 0.8653658 | -0.5269997 | 0.6257675 |
Lachnospiraceae_bacterium_OF09_6|SGB4966 | 0.8338853 | 0.0480223 | 0.0073988 | 1.6603718 |
GGB3740_SGB5076|SGB5076 | 1.8911990 | 0.0003254 | 0.8834608 | 2.8989372 |
GGB4596_SGB6358|SGB6358 | 0.1742579 | 0.5315783 | -0.3764478 | 0.7249635 |
Roseburia_lenta|SGB4957 | 1.8756274 | 0.0004803 | 0.8448666 | 2.9063882 |
GGB3288_SGB4342|SGB4342 | 0.0860836 | 0.8435058 | -0.7768079 | 0.9489752 |
Clostridium_methylpentosum|SGB14962 | 0.0926049 | 0.4507257 | -0.1500412 | 0.3352509 |
Clostridium_SGB4751|SGB4751 | 0.6046968 | 0.0003366 | 0.2816354 | 0.9277581 |
GGB4604_SGB6371|SGB6371 | 0.3559856 | 0.0821071 | -0.0461582 | 0.7581294 |
GGB9545_SGB14952|SGB14952 | 0.0878718 | 0.6490821 | -0.2940953 | 0.4698388 |
GGB3363_SGB4448|SGB4448 | 0.2788593 | 0.0252614 | 0.0352946 | 0.5224240 |
GGB9634_SGB15097|SGB15097 | 0.1673147 | 0.1856037 | -0.0817328 | 0.4163622 |
Clostridiaceae_unclassified_SGB4771|SGB4771 | 0.3627317 | 0.3434905 | -0.3933569 | 1.1188202 |
GGB3627_SGB4906|SGB4906 | -0.0604011 | 0.8893944 | -0.9198865 | 0.7990842 |
GGB4682_SGB6472|SGB6472 | 0.2886792 | 0.2644873 | -0.2217099 | 0.7990682 |
GGB3491_SGB4665|SGB4665 | 0.4885714 | 0.1610792 | -0.1979513 | 1.1750941 |
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | -0.3828443 | 0.4320887 | -1.3457524 | 0.5800637 |
GGB9616_SGB15051|SGB15051 | 0.4658143 | 0.2161639 | -0.2766842 | 1.2083128 |
Clostridium_sp_AF32_12BH|SGB4710 | 1.1263953 | 0.0041614 | 0.3644904 | 1.8883003 |
Butyricicoccus_intestinisimiae|SGB14980 | 0.2683956 | 0.0166443 | 0.0497533 | 0.4870379 |
Kineothrix_SGB4886|SGB4886 | -0.2490503 | 0.5956760 | -1.1772444 | 0.6791439 |
Blautia_SGB4833|SGB4833 | 0.1893524 | 0.5947566 | -0.5145889 | 0.8932937 |
GGB3490_SGB4664|SGB4664 | 0.7333115 | 0.0346279 | 0.0540775 | 1.4125454 |
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 | 0.4592766 | 0.0005041 | 0.2058801 | 0.7126731 |
Lactococcus_lactis|SGB7985 | -1.7353981 | 0.0000017 | -2.4118186 | -1.0589776 |
GGB4569_SGB6312|SGB6312 | 0.0890736 | 0.5307595 | -0.1918632 | 0.3700104 |
Open code
if (file.exists("gitignore/lm_results/result_microbiom_validation_SGB10.csv") == FALSE) {
write.table(result_microbiome_val,
"gitignore/lm_results/result_microbiom_validation_SGB10.csv",
row.names = FALSE
) }
6.2.2 Forest plot
6.2.2.1 Prepare data
Open code
## subset result tables
<- result_microbiome %>%
result_microbiome_subset filter(outcome %in% diet_sensitive_taxa)
<- result_microbiome_val %>%
result_microbiome_val_subset filter(outcome %in% diet_sensitive_taxa)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_taxa, 3),
beta = c(
$est_VGdiet_inCZ,
result_microbiome_subset$est_VGdiet_inIT,
result_microbiome_subset$est_VGdiet
result_microbiome_val_subset
),lower = c(
$CI_L_VGdiet_inCZ,
result_microbiome_subset$CI_L_VGdiet_inIT,
result_microbiome_subset$CI_L_VGdiet
result_microbiome_val_subset
),upper = c(
$CI_U_VGdiet_inCZ,
result_microbiome_subset$CI_U_VGdiet_inIT,
result_microbiome_subset$CI_U_VGdiet
result_microbiome_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
## define ordering
<- data_forest %>%
validation_order group_by(outcome) %>%
summarise(beta_mean = mean(beta), .groups = "drop") %>%
arrange(beta_mean) %>%
pull(outcome)
## Define 'winners'
<- data_forest %>%
up_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome),
by = 'outcome') %>%
filter(beta_CZ > 0,
> 0,
beta_IT > 0,
lower_Validation > 0.1) %>%
beta_ela pull(outcome)
<- data_forest %>%
down_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome),
by = 'outcome') %>%
filter(beta_CZ < 0,
< 0,
beta_IT < 0,
upper_Validation < -0.1) %>%
beta_ela pull(outcome)
<- c(up_winners, down_winners)
winners
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% mutate(outcome = microbiome) %>% select(-microbiome),
elacoef by = 'outcome') %>%
mutate(outcome = factor(outcome, levels = validation_order))
6.2.2.2 Plotting
Open code
<- "forest_plot_microbiome_SGB10"
plotac <- "gitignore/figures"
path
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
assign(plotac, ggplot(
aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
data_forest, +
) geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet on CLR-transformed relative abundance",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() + # Flip coordinates to have outcomes on the y-axis
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
$outcome
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Diet
, Country
, and the interaction term Diet x Country
as fixed-effect predictors. In the independent Czech validation cohort, Diet
was the only fixed-effect predictor. Taxa validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are boldOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
) }
6.2.3 Boxplot
Open code
<- "boxplot_microbiome_SGB10"
plotac <- "gitignore/figures"
path
<- c('#F9FFAF','#329243')
colo
<- function(variable) {
boxplot_cond
<- ggboxplot(data_merged,
p x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'CLR(taxa proportion)') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
) return(p)
}
# Plot all outcomes
<- map(diet_sensitive_taxa, boxplot_cond)
plots
# Create a matrix of plots
assign(plotac,
ggarrange(plotlist = plots, ncol = 7, nrow = 11, common.legend = TRUE)
)
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 12
) }
7 Linear model VG duration
Next, we fit another series of linear models, this time modelling clr-transformed taxa proportions (derived from shotgun metagenomic sequencing) using the following fixed-effect predictors: duration of vegan status (Diet_duration
, scaled in tens of years), Country
, their interaction (Diet_duration × Country
), and Age
:
\[ \text{CLR(taxa proportion)} = \alpha + \beta_{1} \times \text{Country} + \beta_{2} \times \text{Diet duration} + \beta_{3} \times (\text{Country}:\text{Diet duration}) + \beta_{4} \times \text{Age} + \epsilon \]
This analysis includes only vegan participants, while omnivores are excluded. The aim was to test whether specific taxa that differ between vegans and omnivores also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger enrichment or depletion of diet-sensitive taxa compared to those who adopted the diet more recently.
Because longer vegan duration is correlated with age, we also adjusted for age in the models.
7.1 Training cohort
7.1.1 Select data
Open code
<- data_microbiome_original %>%
data_analysis na.omit() %>%
mutate(Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
)%>%
) filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
select(1:5, Country_IT, all_of(diet_sensitive_taxa))
summary(data_analysis[ , 1:12])
## Sample Country Diet Diet_duration
## Length:96 Length:96 Length:96 Min. : 0.600
## Class :character Class :character Class :character 1st Qu.: 3.150
## Mode :character Mode :character Mode :character Median : 5.000
## Mean : 5.492
## 3rd Qu.: 6.000
## Max. :20.000
## Age Country_IT GGB1420_SGB1957|SGB1957
## Min. :18.25 Min. :-0.5000 Min. :-3.2836
## 1st Qu.:27.57 1st Qu.:-0.5000 1st Qu.:-1.5577
## Median :32.15 Median :-0.5000 Median :-1.2513
## Mean :34.97 Mean :-0.1042 Mean :-0.8041
## 3rd Qu.:40.77 3rd Qu.: 0.5000 3rd Qu.:-0.7237
## Max. :60.70 Max. : 0.5000 Max. : 5.7246
## Escherichia_coli|SGB10068 Bacteroides_clarus|SGB1832
## Min. :-2.569 Min. :-3.267
## 1st Qu.:-1.673 1st Qu.:-2.635
## Median : 1.455 Median :-2.251
## Mean : 1.261 Mean :-1.010
## 3rd Qu.: 3.479 3rd Qu.:-1.399
## Max. : 8.535 Max. : 8.931
## Hydrogenoanaerobacterium_saccharovorans|SGB15350 GGB9775_SGB15395|SGB15395
## Min. :-2.933 Min. :-2.2689
## 1st Qu.:-1.696 1st Qu.:-1.6780
## Median : 2.875 Median :-1.2936
## Mean : 1.742 Mean :-0.6410
## 3rd Qu.: 4.252 3rd Qu.:-0.7807
## Max. : 7.594 Max. : 6.2006
## Ruthenibacterium_lactatiformans|SGB15271
## Min. :-2.921
## 1st Qu.: 2.375
## Median : 3.622
## Mean : 3.269
## 3rd Qu.: 4.595
## Max. : 8.096
7.1.2 Define number of microbiome and covariates
Open code
<- 6
n_covarites <- ncol(data_analysis) - n_covarites n_features
7.1.3 Create empty objects
Open code
<- vector('double', n_features)
outcome
<- vector('double', n_features)
est_Diet_duration_inCZ <- vector('double', n_features)
est_Diet_duration_inIT <- vector('double', n_features)
est_Diet_duration_avg
<- vector('double', n_features)
est_ITcountry_avg <- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
est_Age
<- vector('double', n_features)
P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg
<- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_Age
<- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg
<- vector('double', n_features)
CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features) CI_U_Diet_duration_avg
7.1.4 Estimate over outcomes
Open code
if(file.exists('gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis[, (i + n_covarites)]
data_analysis
## fit model
<- lm(outcome ~ Country_IT * Diet_duration + Age,
model data = data_analysis %>%
mutate(Diet_duration = (Diet_duration/10),
Age = (Age-33)/10))
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_duration | Country_IT,
at = list(Diet_duration = c(0, 1))
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis)[i + n_covarites]
outcome[i]
## country and age effect
<- summary(model)$coefficients[
est_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
P_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
4
),
]
<- summary(model)$coefficients[
est_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_duration'),][1]
CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
CI_U_Diet_duration_avg[i]
<- summary(model)$coefficients[
est_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- -contrast_emm[1,3]
est_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
P_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_Diet_duration_inCZ[i]
<- -contrast_emm[2,3]
est_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
P_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_Diet_duration_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
4
),
]
}
<- data.frame(
result_microbiome
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
saveRDS(result_microbiome,
'gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds')
}
<- readRDS('gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds') result_microbiome
7.1.5 Show and save results
Open code
::kable(result_microbiome %>%
kableExtra::select(
dplyr
outcome,
est_ITcountry_avg, P_ITcountry_avg,
est_Age, P_Age,
est_Diet_duration_avg, P_Diet_duration_avg,
est_Diet_duration_inCZ, P_Diet_duration_inCZ,
est_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT%>% arrange(P_Diet_duration_avg),
) caption = "Results of linear models, modelling CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome, with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration × Country`), and `Age` as fixed-effect predictors, using training data only (Czech and Italian vegan cohorts). Only taxa with CLR-transformed relative abundances differing by diet (FDR < 0.05, average effect across both countries) were included. `est`: denotes estimated effects (regression coefficients), i.e. expected change in clr-transformed relative abundnace per 10 years of vegan diet or age, or for country (Italy vs Czechia). `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix: effect averaged across cohorts, whereas `inCZ` and `inIT` indicate effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model with interaction",
escape = FALSE
)
outcome | est_ITcountry_avg | P_ITcountry_avg | est_Age | P_Age | est_Diet_duration_avg | P_Diet_duration_avg | est_Diet_duration_inCZ | P_Diet_duration_inCZ | est_Diet_duration_inIT | P_Diet_duration_inIT | diet_country_int | P_diet_country_int | CI_L_Diet_duration_avg | CI_U_Diet_duration_avg | CI_L_Diet_duration_inCZ | CI_U_Diet_duration_inCZ | CI_L_Diet_duration_inIT | CI_U_Diet_duration_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eubacterium_ramulus|SGB4959 | -2.2634807 | 0.0244721 | -0.0797104 | 0.7729521 | 2.2299335 | 0.0173995 | 1.5341864 | 0.1003445 | 2.9256805 | 0.0507039 | 1.3914941 | 0.3979552 | 0.4013854 | 4.0584816 | -0.3015733 | 3.3699462 | -0.0091553 | 5.8605164 |
GGB9634_SGB15097|SGB15097 | 1.7359384 | 0.1308107 | 0.5207463 | 0.1038708 | -2.3902075 | 0.0264429 | -1.0117831 | 0.3439351 | -3.7686319 | 0.0291457 | -2.7568488 | 0.1471384 | -4.4944164 | -0.2859986 | -3.1242909 | 1.1007246 | -7.1459058 | -0.3913581 |
Youxingia_wuxianensis|SGB82503 | 1.2702346 | 0.1389459 | 0.6703032 | 0.0057350 | -1.6583578 | 0.0389724 | -0.8482321 | 0.2886903 | -2.4684834 | 0.0551437 | -1.6202512 | 0.2532099 | -3.2309213 | -0.0857943 | -2.4269977 | 0.7305334 | -4.9924617 | 0.0554949 |
Bacteroides_clarus|SGB1832 | -1.1466953 | 0.3270097 | 0.0344925 | 0.9154410 | 2.2660085 | 0.0391244 | -0.6619102 | 0.5440442 | 5.1939272 | 0.0035974 | 5.8558374 | 0.0030987 | 0.1155224 | 4.4164946 | -2.8208777 | 1.4970572 | 1.7423780 | 8.6454764 |
GGB6601_SGB9333|SGB9333 | 1.5659553 | 0.0560300 | 0.4354908 | 0.0562942 | -1.4037929 | 0.0654179 | -0.3456940 | 0.6484441 | -2.4618919 | 0.0444814 | -2.1161979 | 0.1176887 | -2.8990249 | 0.0914390 | -1.8468230 | 1.1554351 | -4.8617524 | -0.0620314 |
Butyricicoccus_intestinisimiae|SGB14980 | -0.1686275 | 0.8628204 | 0.0144102 | 0.9577000 | 1.5314225 | 0.0941914 | 1.4874167 | 0.1052312 | 1.5754283 | 0.2811939 | 0.0880116 | 0.9565668 | -0.2671265 | 3.3299714 | -0.3182256 | 3.2930589 | -1.3112586 | 4.4621151 |
GGB9531_SGB14932|SGB14932 | 0.1331096 | 0.8540136 | -0.1859339 | 0.3569904 | 1.0435547 | 0.1234520 | 0.2239366 | 0.7403926 | 1.8631728 | 0.0870892 | 1.6392361 | 0.1733626 | -0.2896207 | 2.3767301 | -1.1144967 | 1.5623700 | -0.2765855 | 4.0029310 |
GGB9775_SGB15395|SGB15395 | 2.2015805 | 0.0048615 | 0.0708877 | 0.7392491 | -1.0199179 | 0.1540373 | -0.0709296 | 0.9209050 | -1.9689062 | 0.0872278 | -1.8979765 | 0.1363424 | -2.4293779 | 0.3895421 | -1.4859484 | 1.3440892 | -4.2311019 | 0.2932895 |
Lachnospiraceae_bacterium_OF09_6|SGB4966 | -1.7721252 | 0.0056195 | 0.0804810 | 0.6446641 | 0.8302965 | 0.1565775 | -0.0941024 | 0.8722446 | 1.7546954 | 0.0631838 | 1.8487978 | 0.0772546 | -0.3242759 | 1.9848688 | -1.2532283 | 1.0650235 | -0.0984034 | 3.6077942 |
GGB3627_SGB4906|SGB4906 | 0.7888994 | 0.3736836 | 0.1178995 | 0.6324511 | -1.1562688 | 0.1624408 | -0.4923438 | 0.5517794 | -1.8201937 | 0.1705712 | -1.3278499 | 0.3659226 | -2.7871264 | 0.4745889 | -2.1296334 | 1.1449459 | -4.4377347 | 0.7973472 |
Phocea_massiliensis|SGB14837 | 0.3300473 | 0.6109601 | -0.2128508 | 0.2400983 | 0.8269831 | 0.1725872 | 0.3016762 | 0.6186161 | 1.3522900 | 0.1647316 | 1.0506138 | 0.3290696 | -0.3679323 | 2.0218985 | -0.8979519 | 1.5013042 | -0.5655597 | 3.2701398 |
Roseburia_lenta|SGB4957 | -1.1485043 | 0.1939312 | -0.2188754 | 0.3727034 | 1.0908033 | 0.1849003 | 1.7917238 | 0.0314002 | 0.3898828 | 0.7667576 | -1.4018410 | 0.3372956 | -0.5310858 | 2.7126924 | 0.1634381 | 3.4200095 | -2.2132635 | 2.9930291 |
Blautia_obeum|SGB4810 | -0.8297276 | 0.4278795 | -0.7873716 | 0.0079402 | 1.2678961 | 0.1941701 | 1.7075163 | 0.0826944 | 0.8282758 | 0.5957624 | -0.8792405 | 0.6115544 | -0.6575878 | 3.1933800 | -0.2255615 | 3.6405942 | -2.2621429 | 3.9186945 |
GGB58158_SGB79798|SGB79798 | 1.0617257 | 0.1138545 | -0.2375929 | 0.2026724 | -0.7705849 | 0.2161940 | -0.4403313 | 0.4802370 | -1.1008385 | 0.2705809 | -0.6605072 | 0.5501655 | -1.9996780 | 0.4585082 | -1.6742719 | 0.7936093 | -3.0735438 | 0.8718668 |
GGB3061_SGB4063|SGB4063 | 1.1735315 | 0.0535761 | 0.0537967 | 0.7481819 | -0.6384007 | 0.2558494 | -0.4098133 | 0.4665708 | -0.8669881 | 0.3358475 | -0.4571747 | 0.6465657 | -1.7474133 | 0.4706119 | -1.5231998 | 0.7035731 | -2.6469630 | 0.9129869 |
GGB4596_SGB6358|SGB6358 | -1.9610659 | 0.1923950 | -0.2883472 | 0.4897080 | 1.5541227 | 0.2662506 | 0.0808441 | 0.9539071 | 3.0274014 | 0.1779337 | 2.9465572 | 0.2365261 | -1.2056178 | 4.3138632 | -2.6897806 | 2.8514688 | -1.4020065 | 7.4568092 |
Agathobaculum_butyriciproducens|SGB14991 | -1.1348889 | 0.4267109 | -0.5289592 | 0.1846672 | 1.4471894 | 0.2767172 | -0.0135419 | 0.9918845 | 2.9079207 | 0.1740670 | 2.9214626 | 0.2177453 | -1.1797851 | 4.0741638 | -2.6508770 | 2.6237931 | -1.3083965 | 7.1242378 |
GGB3000_SGB3991|SGB3991 | -0.5273550 | 0.4310127 | 0.0366138 | 0.8440666 | -0.6318298 | 0.3111029 | -0.6795404 | 0.2780696 | -0.5841193 | 0.5588552 | 0.0954211 | 0.9313164 | -1.8639887 | 0.6003291 | -1.9165588 | 0.5574781 | -2.5617452 | 1.3935067 |
Guopingia_tenuis|SGB14127 | 2.8001581 | 0.0000154 | 0.0137690 | 0.9358774 | -0.5656660 | 0.3239461 | 0.2917445 | 0.6116400 | -1.4230765 | 0.1235319 | -1.7148210 | 0.0946077 | -1.6986314 | 0.5672995 | -0.8456892 | 1.4291783 | -3.2414960 | 0.3953430 |
Streptococcus_thermophilus|SGB8002 | 0.1750854 | 0.8104295 | -0.1641650 | 0.4199343 | 0.6677841 | 0.3266550 | 0.8055637 | 0.2391067 | 0.5300045 | 0.6269561 | -0.2755592 | 0.8196616 | -0.6772625 | 2.0128307 | -0.5447877 | 2.1559151 | -1.6288072 | 2.6888162 |
GGB3491_SGB4665|SGB4665 | -0.8105088 | 0.4893210 | 0.4918365 | 0.1337120 | -0.9883638 | 0.3652932 | 0.3514820 | 0.7479662 | -2.3282096 | 0.1850790 | -2.6796916 | 0.1691351 | -3.1460973 | 1.1693697 | -1.8147614 | 2.5177255 | -5.7913909 | 1.1349718 |
GGB9781_SGB66170|SGB66170 | 0.6819760 | 0.1162203 | -0.1047004 | 0.3841127 | -0.3440829 | 0.3920305 | 0.2008890 | 0.6181777 | -0.8890549 | 0.1695806 | -1.0899440 | 0.1293301 | -1.1387940 | 0.4506282 | -0.5969563 | 0.9987344 | -2.1645733 | 0.3864634 |
GGB9545_SGB14952|SGB14952 | -0.4962381 | 0.2723944 | 0.1692543 | 0.1794619 | -0.3562489 | 0.3964232 | -0.4925713 | 0.2436705 | -0.2199265 | 0.7438712 | 0.2726448 | 0.7149381 | -1.1867691 | 0.4742712 | -1.3263670 | 0.3412244 | -1.5529187 | 1.1130657 |
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | -1.4886652 | 0.1852244 | 0.2179337 | 0.4844928 | 0.8749422 | 0.4012757 | 0.3228170 | 0.7573322 | 1.4270673 | 0.3937119 | 1.1042503 | 0.5513446 | -1.1859841 | 2.9358684 | -1.7462374 | 2.3918714 | -1.8807375 | 4.7348721 |
GGB9616_SGB15051|SGB15051 | -0.9627153 | 0.4614339 | 0.6873363 | 0.0610358 | -1.0152803 | 0.4040259 | -1.7605192 | 0.1510474 | -0.2700414 | 0.8898122 | 1.4904778 | 0.4910210 | -3.4208484 | 1.3902879 | -4.1755747 | 0.6545363 | -4.1309995 | 3.5909167 |
GGB4569_SGB6312|SGB6312 | 0.4874583 | 0.6156552 | -0.0842579 | 0.7551783 | 0.7495444 | 0.4072820 | 1.6127155 | 0.0777146 | -0.1136268 | 0.9374961 | -1.7263423 | 0.2841795 | -1.0388044 | 2.5378932 | -0.1826864 | 3.4081174 | -2.9839424 | 2.7566888 |
Holdemania_filiformis|SGB4046 | -0.6539918 | 0.4812044 | 0.2026012 | 0.4333201 | 0.7148807 | 0.4081703 | 0.0074997 | 0.9930908 | 1.4222616 | 0.3057263 | 1.4147618 | 0.3579607 | -0.9940102 | 2.4237715 | -1.7081308 | 1.7231303 | -1.3205233 | 4.1650465 |
GGB3288_SGB4342|SGB4342 | -1.3645286 | 0.2106593 | 0.8352144 | 0.0067638 | 0.8129747 | 0.4216145 | 0.5151959 | 0.6115844 | 1.1107534 | 0.4937041 | 0.5955575 | 0.7404558 | -1.1874330 | 2.8133823 | -1.4931012 | 2.5234931 | -2.0999186 | 4.3214254 |
Lactococcus_lactis|SGB7985 | -0.6887284 | 0.0367230 | -0.0770226 | 0.3966650 | 0.2416963 | 0.4259940 | 0.0081042 | 0.9787517 | 0.4752885 | 0.3298111 | 0.4671844 | 0.3874451 | -0.3586888 | 0.8420815 | -0.5946489 | 0.6108572 | -0.4883350 | 1.4389120 |
GGB9509_SGB14906|SGB14906 | 0.4730263 | 0.5953464 | -0.0859382 | 0.7287893 | -0.6463411 | 0.4358089 | -1.2941676 | 0.1219636 | 0.0014855 | 0.9991081 | 1.2956532 | 0.3803275 | -2.2865643 | 0.9938821 | -2.9408598 | 0.3525245 | -2.6310872 | 2.6340582 |
Oscillibacter_valericigenes|SGB15076 | 1.7124668 | 0.0435626 | -0.2328092 | 0.3202100 | -0.6090124 | 0.4360298 | -0.2865574 | 0.7147092 | -0.9314674 | 0.4578627 | -0.6449100 | 0.6427029 | -2.1552530 | 0.9372282 | -1.8388963 | 1.2657814 | -3.4131972 | 1.5502624 |
Clostridium_fessum|SGB4705 | -1.3253754 | 0.2870331 | 0.1178601 | 0.7330878 | 0.8712197 | 0.4512270 | 0.9598642 | 0.4085204 | 0.7825751 | 0.6729651 | -0.1772891 | 0.9312536 | -1.4159941 | 3.1584334 | -1.3363701 | 3.2560986 | -2.8884231 | 4.4535733 |
GGB9765_SGB15382|SGB15382 | 0.6611998 | 0.3365931 | 0.0512005 | 0.7887754 | -0.4551136 | 0.4766406 | -0.5584230 | 0.3847135 | -0.3518043 | 0.7314898 | 0.2066186 | 0.8557582 | -1.7200664 | 0.8098391 | -1.8283646 | 0.7115186 | -2.3820646 | 1.6784560 |
Hydrogenoanaerobacterium_saccharovorans|SGB15350 | 2.1010775 | 0.0999120 | -0.2633877 | 0.4561183 | -0.8148049 | 0.4901774 | -0.3713756 | 0.7538286 | -1.2582343 | 0.5067191 | -0.8868587 | 0.6727946 | -3.1508725 | 1.5212627 | -2.7166565 | 1.9739053 | -5.0076434 | 2.4911749 |
GGB51884_SGB49168|SGB49168 | -1.0422911 | 0.2405985 | 0.3586399 | 0.1477642 | 0.5163300 | 0.5309754 | -0.5280498 | 0.5233464 | 1.5607098 | 0.2393218 | 2.0887597 | 0.1562966 | -1.1144372 | 2.1470972 | -2.1652486 | 1.1091490 | -1.0566859 | 4.1781056 |
Ruthenibacterium_lactatiformans|SGB15271 | 2.2001055 | 0.0171456 | -0.3591971 | 0.1578951 | 0.4922494 | 0.5607303 | 0.6647315 | 0.4342578 | 0.3197673 | 0.8137106 | -0.3449643 | 0.8186839 | -1.1823323 | 2.1668311 | -1.0164546 | 2.3459177 | -2.3679511 | 3.0074857 |
GGB9770_SGB15390|SGB15390 | 1.0814142 | 0.2124915 | -0.2363568 | 0.3269031 | 0.4501346 | 0.5756883 | 0.0502011 | 0.9503817 | 0.8500682 | 0.5103293 | 0.7998672 | 0.5763122 | -1.1416525 | 2.0419217 | -1.5478639 | 1.6482660 | -1.7047641 | 3.4049006 |
Lachnospiraceae_bacterium|SGB4953 | -1.3531550 | 0.3151562 | -0.0480135 | 0.8978560 | -0.6962323 | 0.5778283 | -0.9387056 | 0.4551058 | -0.4537590 | 0.8210771 | 0.4849466 | 0.8274593 | -3.1721696 | 1.7797050 | -3.4244078 | 1.5469966 | -4.4276602 | 3.5201422 |
Escherichia_coli|SGB10068 | -0.7903037 | 0.5155974 | -0.1016893 | 0.7636000 | 0.6292480 | 0.5778320 | -0.7439081 | 0.5123601 | 2.0024041 | 0.2710178 | 2.7463122 | 0.1741629 | -1.6085021 | 2.8669981 | -2.9904837 | 1.5026676 | -1.5892046 | 5.5940128 |
Clostridium_SGB48024|SGB48024 | 0.4519603 | 0.7520495 | -0.2487415 | 0.5325875 | 0.6917612 | 0.6034111 | 0.6256543 | 0.6397302 | 0.7578680 | 0.7227787 | 0.1322137 | 0.9554800 | -1.9440511 | 3.3275734 | -2.0205534 | 3.2718620 | -3.4726340 | 4.9883700 |
GGB9616_SGB15052|SGB15052 | 1.5297420 | 0.1826374 | -0.4748580 | 0.1377653 | -0.5353202 | 0.6147084 | 0.1558757 | 0.8838517 | -1.2265162 | 0.4727315 | -1.3823918 | 0.4655389 | -2.6405352 | 1.5698947 | -1.9576421 | 2.2693934 | -4.6054047 | 2.1523724 |
Clostridium_sp_AM29_11AC|SGB4701 | -0.4190217 | 0.2697393 | 0.0651278 | 0.5368505 | -0.1747140 | 0.6199382 | -0.3270107 | 0.3559815 | -0.0224173 | 0.9683535 | 0.3045934 | 0.6271187 | -0.8721016 | 0.5226735 | -1.0271488 | 0.3731273 | -1.1417306 | 1.0968959 |
GGB45432_SGB63101|SGB63101 | 0.9606443 | 0.1324255 | 0.1924325 | 0.2775483 | 0.2893623 | 0.6242376 | 0.3556845 | 0.5487960 | 0.2230400 | 0.8139216 | -0.1326445 | 0.8995423 | -0.8800204 | 1.4587450 | -0.8183101 | 1.5296792 | -1.6538295 | 2.0999096 |
GGB9719_SGB15273|SGB15273 | -0.1292524 | 0.8290187 | 0.1266493 | 0.4478674 | 0.2630463 | 0.6368129 | -0.4259368 | 0.4467818 | 0.9520293 | 0.2882147 | 1.3779661 | 0.1666098 | -0.8398756 | 1.3659682 | -1.5332085 | 0.6813350 | -0.8181701 | 2.7222287 |
GGB34900_SGB14891|SGB14891 | 0.1088754 | 0.8259804 | -0.0920715 | 0.5047089 | 0.2077756 | 0.6521532 | -0.0259277 | 0.9552936 | 0.4414790 | 0.5508391 | 0.4674067 | 0.5689881 | -0.7047852 | 1.1203365 | -0.9420876 | 0.8902323 | -1.0231893 | 1.9061473 |
GGB9782_SGB15403|SGB15403 | -0.4238866 | 0.4276985 | -0.0155148 | 0.9168117 | 0.2200436 | 0.6577252 | -0.2174608 | 0.6627331 | 0.6575480 | 0.4100537 | 0.8750087 | 0.3232823 | -0.7632503 | 1.2033375 | -1.2046327 | 0.7697112 | -0.9206474 | 2.2357434 |
GGB4682_SGB6472|SGB6472 | -1.3962184 | 0.1254699 | 0.0227795 | 0.9279899 | -0.3514603 | 0.6766431 | -0.8935609 | 0.2921482 | 0.1906403 | 0.8878666 | 1.0842012 | 0.4702135 | -2.0200519 | 1.3171313 | -2.5687333 | 0.7816116 | -2.4874640 | 2.8687446 |
GGB9301_SGB14262|SGB14262 | -0.3951466 | 0.4059058 | -0.0363331 | 0.7833466 | -0.1777303 | 0.6874059 | -0.2561443 | 0.5637002 | -0.0993163 | 0.8885431 | 0.1568279 | 0.8418310 | -1.0523149 | 0.6968542 | -1.1341781 | 0.6218896 | -1.5030323 | 1.3043996 |
Kineothrix_SGB4886|SGB4886 | -1.0280994 | 0.4334623 | 0.4198416 | 0.2515053 | 0.4526965 | 0.7104982 | 0.2621930 | 0.8303957 | 0.6431999 | 0.7424452 | 0.3810068 | 0.8606287 | -1.9623068 | 2.8676998 | -2.1623349 | 2.6867209 | -3.2329017 | 4.5193015 |
Mediterraneibacter_faecis|SGB4563 | -2.9720856 | 0.0052955 | 0.4182300 | 0.1521180 | -0.3473345 | 0.7205050 | -0.3661211 | 0.7071842 | -0.3285480 | 0.8329586 | 0.0375731 | 0.9826451 | -2.2697283 | 1.5750592 | -2.2960966 | 1.5638544 | -3.4140069 | 2.7569109 |
Blautia_SGB4833|SGB4833 | -1.1961469 | 0.1175504 | 0.0179656 | 0.9322538 | 0.2519153 | 0.7214130 | 0.1050509 | 0.8822227 | 0.3987797 | 0.7250761 | 0.2937289 | 0.8152682 | -1.1471075 | 1.6509381 | -1.2994895 | 1.5095913 | -1.8466641 | 2.6442236 |
Anaeromassilibacillus_senegalensis|SGB14894 | 1.9796230 | 0.0444432 | -0.2719918 | 0.3171672 | -0.3204024 | 0.7237513 | 0.1088187 | 0.9047941 | -0.7496234 | 0.6065347 | -0.8584420 | 0.5948488 | -2.1154954 | 1.4746907 | -1.6933541 | 1.9109914 | -3.6307635 | 2.1315168 |
GGB3490_SGB4664|SGB4664 | -2.7584379 | 0.0051133 | 0.0617538 | 0.8180402 | 0.3064942 | 0.7326367 | -0.5998586 | 0.5058074 | 1.2128470 | 0.4004068 | 1.8127056 | 0.2578328 | -1.4701785 | 2.0831668 | -2.3835383 | 1.1838211 | -1.6387283 | 4.0644222 |
Dorea_formicigenerans|SGB4575 | -3.0065286 | 0.0004844 | 0.3806610 | 0.1031793 | 0.2616408 | 0.7357134 | 0.7312537 | 0.3484094 | -0.2079721 | 0.8672102 | -0.9392258 | 0.4964350 | -1.2733945 | 1.7966760 | -0.8098356 | 2.2723430 | -2.6717172 | 2.2557730 |
Clostridiaceae_unclassified_SGB4771|SGB4771 | -1.6108668 | 0.2219337 | 0.4368638 | 0.2340318 | 0.3543923 | 0.7718632 | -1.1816205 | 0.3367074 | 1.8904052 | 0.3363654 | 3.0720257 | 0.1601031 | -2.0663366 | 2.7751213 | -3.6118966 | 1.2486556 | -1.9948862 | 5.7756965 |
GGB4604_SGB6371|SGB6371 | -2.2092325 | 0.0449440 | 0.0402337 | 0.8944859 | 0.2484285 | 0.8064369 | -0.6418589 | 0.5286993 | 1.1387160 | 0.4845959 | 1.7805749 | 0.3250066 | -1.7596804 | 2.2565374 | -2.6578877 | 1.3741699 | -2.0843166 | 4.3617485 |
GGB1420_SGB1957|SGB1957 | -0.2994382 | 0.6458058 | 0.2849549 | 0.1184456 | 0.1397471 | 0.8175945 | -0.7433843 | 0.2235075 | 1.0228785 | 0.2942814 | 1.7662628 | 0.1039283 | -1.0603405 | 1.3398346 | -1.9482050 | 0.4614363 | -0.9032726 | 2.9490296 |
GGB38744_SGB14842|SGB14842 | -0.0481380 | 0.9024854 | 0.0638117 | 0.5599891 | -0.0783072 | 0.8303891 | -0.0967062 | 0.7921843 | -0.0599081 | 0.9186683 | 0.0367981 | 0.9548954 | -0.8023938 | 0.6457795 | -0.8236486 | 0.6302362 | -1.2220736 | 1.1022574 |
Clostridium_methylpentosum|SGB14962 | -0.5624150 | 0.1825842 | -0.0335368 | 0.7742541 | 0.0826684 | 0.8324354 | -0.1882149 | 0.6315304 | 0.3535518 | 0.5731934 | 0.5417666 | 0.4366670 | -0.6912261 | 0.8565630 | -0.9651616 | 0.5887319 | -0.8885559 | 1.5956594 |
GGB2653_SGB3574|SGB3574 | 1.5518418 | 0.1614485 | -0.4716452 | 0.1267593 | -0.2090488 | 0.8384994 | 0.4483001 | 0.6634331 | -0.8663977 | 0.5989242 | -1.3146978 | 0.4720136 | -2.2406286 | 1.8225310 | -1.5912921 | 2.4878923 | -4.1271013 | 2.3943058 |
Clostridium_sp_AF32_12BH|SGB4710 | -1.2624833 | 0.0779989 | -0.0853467 | 0.6661600 | -0.1088297 | 0.8691888 | -0.1678652 | 0.8002667 | -0.0497943 | 0.9625519 | 0.1180709 | 0.9200342 | -1.4177682 | 1.2001087 | -1.4819660 | 1.1462356 | -2.1506521 | 2.0510635 |
GGB3363_SGB4448|SGB4448 | -0.8722215 | 0.3655405 | -0.1996082 | 0.4566678 | -0.1472339 | 0.8693142 | -0.7976302 | 0.3756334 | 0.5031624 | 0.7261667 | 1.3007926 | 0.4149270 | -1.9197901 | 1.6253223 | -2.5771773 | 0.9819168 | -2.3418060 | 3.3481307 |
Clostridium_SGB4751|SGB4751 | -1.0418126 | 0.2114167 | 0.0237241 | 0.9182359 | 0.1219428 | 0.8745470 | 0.4186829 | 0.5894977 | -0.1747974 | 0.8878620 | -0.5934802 | 0.6660818 | -1.4079194 | 1.6518049 | -1.1172129 | 1.9545787 | -2.6302396 | 2.2806449 |
Ruminococcus_torques|SGB4608 | -1.4586922 | 0.2487537 | -0.0298792 | 0.9321257 | 0.1753698 | 0.8810979 | -0.0650268 | 0.9559408 | 0.4157663 | 0.8251480 | 0.4807931 | 0.8177957 | -2.1469926 | 2.4977321 | -2.3965484 | 2.2664948 | -3.3116457 | 4.1431784 |
Ruminococcus_sp_AF41_9|SGB25497 | -1.9581259 | 0.1159202 | 0.0704723 | 0.8378831 | -0.1681373 | 0.8838588 | -0.3703484 | 0.7486453 | 0.0340738 | 0.9852834 | 0.4044221 | 0.8435129 | -2.4480518 | 2.1117772 | -2.6592547 | 1.9185579 | -3.6252091 | 3.6933566 |
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 | -1.1697254 | 0.1183491 | 0.0227932 | 0.9123736 | -0.0800245 | 0.9079618 | 0.0352389 | 0.9595563 | -0.1952879 | 0.8604734 | -0.2305268 | 0.8515799 | -1.4511615 | 1.2911124 | -1.3413058 | 1.4117835 | -2.3959748 | 2.0053990 |
Enterocloster_hominis|SGB4721 | 0.3608653 | 0.6472807 | -0.0365065 | 0.8678887 | -0.0753199 | 0.9182015 | 0.0466289 | 0.9495035 | -0.1972688 | 0.8669150 | -0.2438977 | 0.8517928 | -1.5280934 | 1.3774535 | -1.4118742 | 1.5051320 | -2.5289830 | 2.1344455 |
Collinsella_aerofaciens|SGB14535 | 0.5811178 | 0.3323737 | -0.1499484 | 0.3687864 | -0.0489983 | 0.9298182 | 0.2714336 | 0.6271981 | -0.3694302 | 0.6792067 | -0.6408639 | 0.5179718 | -1.1510321 | 1.0530355 | -0.8349465 | 1.3778138 | -2.1382042 | 1.3993437 |
GGB3740_SGB5076|SGB5076 | -2.6975736 | 0.0270404 | -0.1589040 | 0.6355782 | -0.0706479 | 0.9497001 | -1.3934332 | 0.2171458 | 1.2521374 | 0.4866240 | 2.6455705 | 0.1865461 | -2.2890898 | 2.1477940 | -3.6206245 | 0.8337581 | -2.3084815 | 4.8127562 |
Lachnoclostridium_sp_An138|SGB4774 | 0.3378946 | 0.5408997 | -0.2007474 | 0.1935681 | -0.0254781 | 0.9604370 | -0.0567166 | 0.9124181 | 0.0057603 | 0.9944247 | 0.0624769 | 0.9455131 | -1.0429108 | 0.9919545 | -1.0781619 | 0.9647288 | -1.6272281 | 1.6387487 |
Lawsonibacter_asaccharolyticus|SGB15154 | 4.0450013 | 0.0004181 | -0.3407995 | 0.2705301 | -0.0429466 | 0.9667473 | 0.5696470 | 0.5820968 | -0.6555401 | 0.6918870 | -1.2251871 | 0.5045351 | -2.0836643 | 1.9977711 | -1.4791192 | 2.6184131 | -3.9309100 | 2.6198298 |
Blautia_massiliensis|SGB4826 | -4.4798630 | 0.0033905 | -0.5202501 | 0.2126433 | 0.0445363 | 0.9744217 | -0.9567490 | 0.4932159 | 1.0458216 | 0.6391951 | 2.0025707 | 0.4187699 | -2.7069985 | 2.7960710 | -3.7191357 | 1.8056376 | -3.3704159 | 5.4620592 |
GGB9707_SGB15229|SGB15229 | 0.6596547 | 0.4596332 | 0.0782624 | 0.7523781 | -0.0046849 | 0.9954895 | -0.1855402 | 0.8235574 | 0.1761703 | 0.8946372 | 0.3617105 | 0.8063177 | -1.6463606 | 1.6369907 | -1.8336905 | 1.4626101 | -2.4587336 | 2.8110742 |
Open code
if(file.exists(
'gitignore/lm_results/result_microbiome_SGB10_VGdur_training.csv'
== FALSE){
) write.table(result_microbiome,
'gitignore/result_microbiome_SGB10_VGdur_training.csv',
row.names = FALSE)
}
7.2 Validation cohort
Open code
<- data_microbiome_validation %>%
data_analysis_microbiome filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>%
dplyr::select(
dplyr
Diet_duration, Age,all_of(diet_sensitive_taxa)
)
summary(data_analysis_microbiome[ , 1:12])
## Diet_duration Age GGB1420_SGB1957|SGB1957
## Min. : 0.170 Min. :21.54 Min. :-2.0851
## 1st Qu.: 4.062 1st Qu.:30.05 1st Qu.:-1.7612
## Median : 6.000 Median :32.81 Median :-1.4597
## Mean : 6.790 Mean :32.37 Mean :-0.7887
## 3rd Qu.:10.000 3rd Qu.:34.24 3rd Qu.:-0.7294
## Max. :15.000 Max. :44.08 Max. : 4.5965
## Escherichia_coli|SGB10068 Bacteroides_clarus|SGB1832
## Min. :-2.9814 Min. :-1.40490
## 1st Qu.:-2.3623 1st Qu.:-0.81198
## Median :-1.8436 Median :-0.58475
## Mean :-0.8066 Mean :-0.17755
## 3rd Qu.: 0.7762 3rd Qu.: 0.03368
## Max. : 5.0127 Max. : 5.07320
## Hydrogenoanaerobacterium_saccharovorans|SGB15350 GGB9775_SGB15395|SGB15395
## Min. :-0.6425 Min. :-3.445
## 1st Qu.: 0.5164 1st Qu.:-2.783
## Median : 1.6681 Median :-2.307
## Mean : 1.4066 Mean :-1.645
## 3rd Qu.: 2.2456 3rd Qu.:-1.295
## Max. : 4.1991 Max. : 2.865
## Ruthenibacterium_lactatiformans|SGB15271
## Min. :-2.4003
## 1st Qu.:-1.2716
## Median : 0.5358
## Mean : 0.4248
## 3rd Qu.: 1.6461
## Max. : 6.8260
## Lawsonibacter_asaccharolyticus|SGB15154 Clostridium_fessum|SGB4705
## Min. :-4.1071 Min. :-0.3956
## 1st Qu.:-2.2461 1st Qu.: 2.8100
## Median : 0.3093 Median : 3.6723
## Mean :-0.1859 Mean : 3.3043
## 3rd Qu.: 1.6178 3rd Qu.: 4.0954
## Max. : 3.4482 Max. : 6.8045
## GGB9707_SGB15229|SGB15229 Collinsella_aerofaciens|SGB14535
## Min. :-3.3451 Min. :-1.7233
## 1st Qu.:-1.9104 1st Qu.: 0.3225
## Median : 2.1067 Median : 5.2318
## Mean : 0.9578 Mean : 3.6460
## 3rd Qu.: 2.9921 3rd Qu.: 5.7957
## Max. : 5.9439 Max. : 8.0904
7.2.0.1 Define number of microbiome and covariates
Open code
<- 2
n_covarites <- ncol(data_analysis_microbiome) - n_covarites n_features
7.2.0.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
est_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features) CI_U_VGduration
7.2.0.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_microbiome[, (i + n_covarites)]
data_analysis_microbiome
## fit model
<- lm(outcome ~ Diet_duration + Age,
model data = data_analysis_microbiome %>%
mutate(Diet_duration = Diet_duration/10,
Age = (Age-33)/10))
## save results
<- names(data_analysis_microbiome)[i + n_covarites]
outcome[i]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_duration"), ][1]
CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]
CI_U_VGduration[i]
<- summary(model)$coefficients[
est_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- summary(model)$coefficients[
est_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
] }
7.2.0.4 Results table
Open code
<- data.frame(
result_microbiome_val
outcome,
est_VGduration, P_VGduration,
CI_L_VGduration, CI_U_VGduration,
est_Age, P_Age
)
::kable(result_microbiome_val %>% arrange(P_VGduration),
kableExtracaption = "Results of linear models estimating the effect of vegan diet duration on CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome. Only taxa with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. `est`: regression coefficient, i.e. expected change in CLR-transformed relative abundance per +10 years of vegan diet duration and age, respectively. `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model"
)
outcome | est_VGduration | P_VGduration | CI_L_VGduration | CI_U_VGduration | est_Age | P_Age |
---|---|---|---|---|---|---|
Bacteroides_clarus|SGB1832 | -1.3116349 | 0.0256259 | -2.4569307 | -0.1663391 | 0.2864820 | 0.5560623 |
GGB1420_SGB1957|SGB1957 | -1.5377833 | 0.0404331 | -3.0058962 | -0.0696704 | 0.8709319 | 0.1659532 |
Roseburia_lenta|SGB4957 | 2.5621762 | 0.0479091 | 0.0244968 | 5.0998555 | -0.9569653 | 0.3758384 |
GGB9719_SGB15273|SGB15273 | -0.4464868 | 0.0489677 | -0.8908756 | -0.0020981 | 0.1788676 | 0.3447953 |
GGB3740_SGB5076|SGB5076 | -2.3724685 | 0.0695200 | -4.9414079 | 0.1964708 | 0.1074429 | 0.9214586 |
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | 1.8311048 | 0.0730549 | -0.1773846 | 3.8395941 | -1.4368845 | 0.0961959 |
GGB38744_SGB14842|SGB14842 | 1.0226307 | 0.0949474 | -0.1839934 | 2.2292548 | 0.0972689 | 0.8492989 |
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 | -0.5968727 | 0.1157674 | -1.3457747 | 0.1520293 | 0.3136219 | 0.3258177 |
GGB9775_SGB15395|SGB15395 | 1.1556507 | 0.1190987 | -0.3079122 | 2.6192136 | -0.6311460 | 0.3117642 |
Clostridium_methylpentosum|SGB14962 | -0.3300580 | 0.1433697 | -0.7758914 | 0.1157754 | -0.0422818 | 0.8231243 |
Ruminococcus_sp_AF41_9|SGB25497 | -1.7752602 | 0.1684136 | -4.3261294 | 0.7756089 | 1.4894188 | 0.1725989 |
Mediterraneibacter_faecis|SGB4563 | 0.8760938 | 0.1911025 | -0.4514032 | 2.2035908 | 0.3332206 | 0.5546844 |
Blautia_obeum|SGB4810 | -1.3132072 | 0.1919543 | -3.3069235 | 0.6805090 | 1.3406263 | 0.1173265 |
Phocea_massiliensis|SGB14837 | -1.0053122 | 0.1955656 | -2.5442082 | 0.5335838 | 0.2422542 | 0.7107319 |
GGB9634_SGB15097|SGB15097 | -0.3351536 | 0.2028299 | -0.8566955 | 0.1863883 | 0.0641197 | 0.7720292 |
GGB4569_SGB6312|SGB6312 | -0.3071262 | 0.2438905 | -0.8301199 | 0.2158675 | 0.3005928 | 0.1792893 |
Clostridium_fessum|SGB4705 | -0.8168644 | 0.2486182 | -2.2220521 | 0.5883233 | 0.5179845 | 0.3865860 |
GGB6601_SGB9333|SGB9333 | -0.5479593 | 0.2655503 | -1.5251707 | 0.4292520 | -0.0665058 | 0.8725455 |
Clostridium_SGB48024|SGB48024 | 1.0375110 | 0.2893262 | -0.9077645 | 2.9827866 | 0.0231073 | 0.9776587 |
Butyricicoccus_intestinisimiae|SGB14980 | -0.2834320 | 0.3030763 | -0.8303669 | 0.2635030 | -0.1620176 | 0.4859937 |
Dorea_formicigenerans|SGB4575 | -0.3862337 | 0.3224408 | -1.1622925 | 0.3898250 | -0.0016379 | 0.9960299 |
GGB9781_SGB66170|SGB66170 | 0.6408836 | 0.3395612 | -0.6936984 | 1.9754656 | 0.0388259 | 0.9453205 |
Oscillibacter_valericigenes|SGB15076 | -0.3615813 | 0.3398265 | -1.1149593 | 0.3917966 | 0.1055871 | 0.7412226 |
Hydrogenoanaerobacterium_saccharovorans|SGB15350 | 0.4645051 | 0.3403956 | -0.5044727 | 1.4334829 | -0.8286272 | 0.0480144 |
Clostridium_sp_AM29_11AC|SGB4701 | -0.1449633 | 0.3456100 | -0.4506773 | 0.1607508 | -0.0147491 | 0.9094505 |
GGB4604_SGB6371|SGB6371 | -0.4546108 | 0.3634033 | -1.4497721 | 0.5405505 | 0.2683480 | 0.5257854 |
Ruthenibacterium_lactatiformans|SGB15271 | 0.7927935 | 0.3884980 | -1.0371015 | 2.6226885 | -0.1093027 | 0.8880232 |
GGB3061_SGB4063|SGB4063 | -0.4736801 | 0.3994293 | -1.5927615 | 0.6454014 | 0.3599699 | 0.4495312 |
GGB3288_SGB4342|SGB4342 | 0.8072622 | 0.3995920 | -1.1005789 | 2.7151032 | 1.0061396 | 0.2172203 |
GGB9616_SGB15052|SGB15052 | 0.6446913 | 0.4526954 | -1.0656644 | 2.3550470 | -1.3333577 | 0.0705644 |
Holdemania_filiformis|SGB4046 | -0.4785331 | 0.4581304 | -1.7635596 | 0.8064934 | 0.2488215 | 0.6483724 |
Lawsonibacter_asaccharolyticus|SGB15154 | 0.7768444 | 0.4601655 | -1.3187770 | 2.8724657 | -1.1605956 | 0.1953730 |
Anaeromassilibacillus_senegalensis|SGB14894 | 0.6205170 | 0.4674442 | -1.0810253 | 2.3220594 | -0.2137758 | 0.7671808 |
Clostridium_SGB4751|SGB4751 | -0.3261207 | 0.4789721 | -1.2441147 | 0.5918733 | 0.1212707 | 0.7555793 |
Streptococcus_thermophilus|SGB8002 | 0.5809667 | 0.4861402 | -1.0815170 | 2.2434503 | -1.1984551 | 0.0937437 |
Clostridiaceae_unclassified_SGB4771|SGB4771 | 0.6025768 | 0.4925833 | -1.1476383 | 2.3527919 | -0.1871933 | 0.8009868 |
Guopingia_tenuis|SGB14127 | 0.6224428 | 0.4925961 | -1.1855279 | 2.4304135 | 0.3826205 | 0.6182632 |
GGB4596_SGB6358|SGB6358 | 0.3536710 | 0.4928491 | -0.6742188 | 1.3815607 | -0.3408037 | 0.4357955 |
Lactococcus_lactis|SGB7985 | -0.1897864 | 0.5116059 | -0.7662295 | 0.3866566 | -0.3153081 | 0.2008496 |
Blautia_SGB4833|SGB4833 | -0.4940465 | 0.5136020 | -2.0017777 | 1.0136846 | -0.9149797 | 0.1566206 |
Lachnospiraceae_bacterium|SGB4953 | -1.0071781 | 0.5303928 | -4.2079649 | 2.1936088 | 0.9428187 | 0.4884436 |
Collinsella_aerofaciens|SGB14535 | 0.8355618 | 0.5429085 | -1.9030123 | 3.5741360 | 1.2375478 | 0.2893924 |
GGB9545_SGB14952|SGB14952 | 0.2113411 | 0.5501333 | -0.4939692 | 0.9166514 | 0.1588378 | 0.5959824 |
GGB51884_SGB49168|SGB49168 | 0.3902238 | 0.5522435 | -0.9190182 | 1.6994658 | 0.0955302 | 0.8634413 |
GGB2653_SGB3574|SGB3574 | 0.4996257 | 0.5631667 | -1.2239706 | 2.2232220 | 0.6483113 | 0.3770504 |
Blautia_massiliensis|SGB4826 | -0.5656977 | 0.5760979 | -2.5839417 | 1.4525463 | -0.3690641 | 0.6666765 |
GGB9301_SGB14262|SGB14262 | -0.1436191 | 0.5893442 | -0.6743930 | 0.3871547 | 0.0772252 | 0.7317400 |
GGB9509_SGB14906|SGB14906 | 0.2756730 | 0.6089385 | -0.7994440 | 1.3507900 | -0.6722015 | 0.1446896 |
Escherichia_coli|SGB10068 | 0.4677874 | 0.6291247 | -1.4649917 | 2.4005666 | 0.2873403 | 0.7261320 |
GGB9770_SGB15390|SGB15390 | 0.3771049 | 0.6337826 | -1.2024667 | 1.9566765 | -0.1000911 | 0.8812607 |
GGB3490_SGB4664|SGB4664 | -0.3982692 | 0.6379584 | -2.0872922 | 1.2907539 | 0.2018839 | 0.7781896 |
GGB3627_SGB4906|SGB4906 | -0.4340199 | 0.6462370 | -2.3211128 | 1.4530729 | -0.6915129 | 0.3893794 |
GGB3363_SGB4448|SGB4448 | 0.1481787 | 0.6496693 | -0.5028794 | 0.7992368 | -0.3273313 | 0.2391208 |
Agathobaculum_butyriciproducens|SGB14991 | -0.4862146 | 0.6557303 | -2.6628755 | 1.6904462 | 1.2022907 | 0.1965371 |
Eubacterium_ramulus|SGB4959 | 0.2791725 | 0.6612571 | -0.9924393 | 1.5507843 | 0.2740675 | 0.6118263 |
GGB34900_SGB14891|SGB14891 | -0.2435558 | 0.6776238 | -1.4130127 | 0.9259010 | -0.1138686 | 0.8184873 |
Lachnoclostridium_sp_An138|SGB4774 | 0.1632043 | 0.7311783 | -0.7852630 | 1.1116716 | 0.2295562 | 0.5688969 |
Lachnospiraceae_bacterium_OF09_6|SGB4966 | -0.3084660 | 0.7447649 | -2.2004050 | 1.5834730 | 1.1604858 | 0.1522988 |
GGB9765_SGB15382|SGB15382 | 0.3028787 | 0.7507575 | -1.6010943 | 2.2068517 | 0.4884158 | 0.5460580 |
GGB9782_SGB15403|SGB15403 | -0.0967102 | 0.7546688 | -0.7146873 | 0.5212670 | -0.1567114 | 0.5506469 |
GGB4682_SGB6472|SGB6472 | -0.1737535 | 0.7857599 | -1.4503017 | 1.1027948 | -0.4738158 | 0.3833382 |
Youxingia_wuxianensis|SGB82503 | -0.1978751 | 0.7878078 | -1.6660328 | 1.2702826 | -0.2671297 | 0.6682418 |
GGB3491_SGB4665|SGB4665 | -0.2047389 | 0.7973044 | -1.7967477 | 1.3872700 | 0.4716595 | 0.4859371 |
GGB9707_SGB15229|SGB15229 | -0.2750773 | 0.8133382 | -2.6017973 | 2.0516427 | 1.6353199 | 0.1020206 |
Clostridium_sp_AF32_12BH|SGB4710 | -0.2020956 | 0.8290283 | -2.0712815 | 1.6670904 | -1.3693280 | 0.0886788 |
GGB45432_SGB63101|SGB63101 | -0.1802382 | 0.8577759 | -2.1891923 | 1.8287160 | 0.7776985 | 0.3633936 |
GGB9531_SGB14932|SGB14932 | -0.0881209 | 0.8877615 | -1.3353313 | 1.1590894 | -0.1611528 | 0.7607626 |
GGB3000_SGB3991|SGB3991 | -0.0464247 | 0.8958820 | -0.7550767 | 0.6622274 | 0.0881035 | 0.7695390 |
Ruminococcus_torques|SGB4608 | -0.1097039 | 0.9228469 | -2.3725500 | 2.1531421 | -0.4820386 | 0.6159588 |
GGB9616_SGB15051|SGB15051 | 0.0282394 | 0.9742853 | -1.7219390 | 1.7784178 | 0.5838507 | 0.4329963 |
GGB58158_SGB79798|SGB79798 | 0.0217278 | 0.9792870 | -1.6501596 | 1.6936153 | 0.5347863 | 0.4520518 |
Enterocloster_hominis|SGB4721 | -0.0102244 | 0.9884335 | -1.4191964 | 1.3987476 | -0.5078804 | 0.3971107 |
Kineothrix_SGB4886|SGB4886 | 0.0007642 | 0.9993285 | -1.8132847 | 1.8148132 | -0.9095313 | 0.2404009 |
Open code
if (file.exists(
"gitignore/lm_results/result_microbiom_SGB10_VGdur_valid.csv"
== FALSE) {
) write.table(result_microbiome_val,
"gitignore/lm_results/result_microbiom_SGB10_VGdur_valid.csv",
row.names = FALSE
) }
7.3 Forest plot
7.3.1 Prepare data
Open code
## subset result tables
<- result_microbiome %>%
result_microbiome_subset filter(outcome %in% diet_sensitive_taxa)
<- result_microbiome_val %>%
result_microbiome_val_subset filter(outcome %in% diet_sensitive_taxa)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_taxa, 3),
beta = c(
$est_Diet_duration_inCZ,
result_microbiome_subset$est_Diet_duration_inIT,
result_microbiome_subset$est_VGduration
result_microbiome_val_subset
),lower = c(
$CI_L_Diet_duration_inCZ,
result_microbiome_subset$CI_L_Diet_duration_inIT,
result_microbiome_subset$CI_L_VGduration
result_microbiome_val_subset
),upper = c(
$CI_U_Diet_duration_inCZ,
result_microbiome_subset$CI_U_Diet_duration_inIT,
result_microbiome_subset$CI_U_VGduration
result_microbiome_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% mutate(outcome = microbiome) %>% select(-microbiome),
elacoef by = 'outcome') %>%
mutate(outcome = factor(outcome, levels = validation_order))
7.3.1.1 Plotting
Open code
<- "forest_plot_microbiome_SGB10_VGduration"
plotac <- "gitignore/figures"
path
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
assign(plotac, ggplot(
aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
data_forest, +
) geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet duration (+10y) on CLR(relative abundance)",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() + # Flip coordinates to have outcomes on the y-axis
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
$outcome
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Diet_duration
, Country
, their interaction (Diet_duration x Country
), and Age
as fixed-effect predictors. Age
was included as it correlates with Diet_duration
and could act as a confounder. In the Czech validation cohort, Diet_duration
and Age
were fixed-effect predictors. Diet-sensitive taxa are shown in bold, in the same order as in the forest plot of vegan–omnivore differencesOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
) }
8 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-6.1 lattice_0.22-5 permute_0.9-7
## [4] zCompositions_1.5.0-4 truncnorm_1.0-8 NADA_1.6-1.1
## [7] survival_3.7-0 MicrobiomeStat_1.2 glmnet_4.1-8
## [10] pROC_1.18.0 arm_1.12-2 lme4_1.1-35.5
## [13] Matrix_1.7-0 MASS_7.3-65 car_3.1-2
## [16] carData_3.0-5 emmeans_1.10.4 brms_2.21.0
## [19] Rcpp_1.0.13 rms_6.8-1 Hmisc_5.1-3
## [22] glmmTMB_1.1.9 ggtext_0.1.2 ggdist_3.3.2
## [25] cowplot_1.1.1 ggpubr_0.4.0 sjPlot_2.8.16
## [28] kableExtra_1.4.0 flextable_0.9.6 gtsummary_2.0.2
## [31] compositions_2.0-8 janitor_2.2.0 stringi_1.8.7
## [34] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
## [37] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
## [40] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.1
## [43] tidyverse_1.3.1 readxl_1.3.1 openxlsx_4.2.8
## [46] RJDBC_0.2-10 rJava_1.0-11 DBI_1.2.3
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.6 matrixStats_1.3.0 httr_1.4.2
## [4] insight_0.20.2 numDeriv_2016.8-1.1 tools_4.4.3
## [7] backports_1.5.0 utf8_1.2.6 sjlabelled_1.2.0
## [10] R6_2.6.1 mgcv_1.9-1 withr_3.0.2
## [13] Brobdingnag_1.2-7 prettyunits_1.2.0 gridExtra_2.3
## [16] bayesm_3.1-6 quantreg_5.98 cli_3.6.5
## [19] textshaping_0.3.6 performance_0.12.2 officer_0.6.6
## [22] sandwich_3.0-1 labeling_0.4.2 mvtnorm_1.1-3
## [25] robustbase_0.93-9 polspline_1.1.25 ggridges_0.5.3
## [28] askpass_1.1 QuickJSR_1.3.1 commonmark_1.9.1
## [31] systemfonts_1.0.4 StanHeaders_2.32.10 foreign_0.8-90
## [34] gfonts_0.2.0 svglite_2.1.3 rstudioapi_0.16.0
## [37] httpcode_0.3.0 generics_0.1.4 shape_1.4.6
## [40] distributional_0.4.0 zip_2.2.0 inline_0.3.19
## [43] loo_2.4.1 abind_1.4-5 lifecycle_1.0.4
## [46] multcomp_1.4-18 yaml_2.3.10 snakecase_0.11.1
## [49] grid_4.4.3 promises_1.2.0.1 crayon_1.5.3
## [52] haven_2.4.3 pillar_1.11.0 knitr_1.50
## [55] statip_0.2.3 boot_1.3-31 estimability_1.5.1
## [58] codetools_0.2-19 glue_1.7.0 V8_4.4.2
## [61] fontLiberation_0.1.0 data.table_1.15.4 vctrs_0.6.5
## [64] cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
## [67] datawizard_0.12.2 xfun_0.52 mime_0.12
## [70] coda_0.19-4 modeest_2.4.0 timeDate_3043.102
## [73] iterators_1.0.14 statmod_1.4.36 TH.data_1.1-0
## [76] nlme_3.1-168 fontquiver_0.2.1 rstan_2.32.6
## [79] fBasics_4041.97 tensorA_0.36.2.1 TMB_1.9.14
## [82] rpart_4.1.24 colorspace_2.0-2 nnet_7.3-20
## [85] tidyselect_1.2.1 processx_3.8.4 timeSeries_4032.109
## [88] compiler_4.4.3 curl_6.4.0 rvest_1.0.2
## [91] htmlTable_2.4.0 SparseM_1.81 xml2_1.3.3
## [94] fontBitstreamVera_0.1.1 posterior_1.6.0 checkmate_2.3.2
## [97] scales_1.3.0 DEoptimR_1.0-10 callr_3.7.6
## [100] spatial_7.3-15 digest_0.6.37 minqa_1.2.4
## [103] rmarkdown_2.27 htmltools_0.5.8.1 pkgconfig_2.0.3
## [106] base64enc_0.1-3 stabledist_0.7-2 dbplyr_2.1.1
## [109] fastmap_1.2.0 rlang_1.1.6 htmlwidgets_1.6.4
## [112] shiny_1.9.1 farver_2.1.0 zoo_1.8-9
## [115] jsonlite_2.0.0 magrittr_2.0.3 Formula_1.2-4
## [118] bayesplot_1.8.1 munsell_0.5.0 gdtools_0.3.7
## [121] stable_1.1.6 plyr_1.8.6 pkgbuild_1.3.1
## [124] parallel_4.4.3 ggrepel_0.9.5 sjmisc_2.8.10
## [127] ggeffects_1.7.0 splines_4.4.3 gridtext_0.1.5
## [130] hms_1.1.3 sjstats_0.19.0 ps_1.7.7
## [133] uuid_1.0-3 markdown_1.13 ggsignif_0.6.3
## [136] stats4_4.4.3 rmutil_1.1.10 rstantools_2.1.1
## [139] crul_1.5.0 reprex_2.0.1 evaluate_1.0.4
## [142] RcppParallel_5.1.8 modelr_0.1.8 nloptr_2.0.0
## [145] tzdb_0.5.0 foreach_1.5.2 httpuv_1.6.5
## [148] MatrixModels_0.5-3 openssl_1.4.6 clue_0.3-65
## [151] broom_1.0.6 xtable_1.8-4 rstatix_0.7.0
## [154] later_1.3.0 viridisLite_0.4.0 ragg_1.4.0
## [157] lmerTest_3.1-3 cluster_2.1.8.1 timechange_0.3.0
## [160] bridgesampling_1.1-2