Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations
Statistical report for metabolom analysis
Authors and affiliations
Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#
* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship
1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic
This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country
and Diet
, resulting in four distinct groups.
To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.
1.1 Statistical Methods
The statistical modeling approach is described in detail in this report. Briefly, the methods used included:
Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of
diet
,country
, and their possible interaction (diet : country
) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.Linear models: Linear models were applied to estimate the effects of
diet
,country
, and their interaction (diet:country
) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).Predictive models (elastic net): We employed elastic net logistic regression (via the
glmnet
R package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet
2 Initiation
2.1 Set home directory
Open code
setwd('/home/ticf/1_ticf_sec/478_MOCA_italian/')
2.2 Upload initiation file
Open code
source('478_initiation.R')
3 Data
3.1 Upload all original data
3.1.1 Training set
Open code
<- read.xlsx('gitignore/data/serum_metabolome_training_cohort.xlsx') data_metabolites_original
3.1.2 Validation set
Open code
<- read.xlsx('gitignore/data/Serum_metabolome_Validation_cohort_new.xlsx') %>%
data_metabolites_validation select(-X38)
3.1.3 Merge training and validation dataset
Open code
<- data_metabolites_original %>%
tr1 mutate(Data = if_else(Country == 'CZ', 'CZ_tr', 'IT_tr')) %>%
select(Data, Diet, `formate`:`2-hydroxybutyrate`)
<- data_metabolites_validation %>%
tr2 mutate(Data = 'valid',
Diet = if_else(SKUPINA == 0 , 'OMNI', 'VEGAN')) %>%
select(Data, Diet, `formate`:`2-hydroxybutyrate`)
<- bind_rows(tr1, tr2) data_merged
3.2 Explore
3.2.1 Distributions - raw data
The following plot will show distribution of 36 randomly selected metabolites
Open code
<- data_metabolites_original %>%
check ::select(
dplyr`formate`: `2-hydroxybutyrate`
%>%
) na.omit()
= c(6,6)
size par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
set.seed(16)
for(x in 1:ncol(check)){
hist(check[,x],
16,
col='blue',
main = paste0(colnames(check)[x])
)
}
Data seems to be highly right-tailed
3.2.2 Distribution - Log2 transformed
Open code
par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
set.seed(16)
for(x in 1:ncol(check)){
hist(log2(check[,x]+1e-8),
16,
col='blue',
main = paste0('log2',colnames(check)[x])
) }
Seems more symmetrical and Gaussian
3.2.2.1 Comparison training vs validation cohort
Open code
<- data_metabolites_original %>%
tr1 select(formate:`2-hydroxybutyrate`) %>%
mutate(dataset = 'training')
<- data_metabolites_validation %>% select(formate:`2-hydroxybutyrate`) %>%
tr2 mutate(dataset = 'validation')
<- bind_rows(tr1, tr2)
tr
= c(6,6)
size par(mfrow = c(size[1],size[2]))
par(mar = c(2,1.5,2,0.5))
par(mgp = c(3, 0.5, 0 ))
for(x in 1:(ncol(tr)-1)){
plot(log2(tr[, x]) ~ factor(tr$dataset),
main = paste0(colnames(check)[x]),
ylim = c(-24, -10)
) }
3.2.2.2 Metabolites accross groups
Open code
<- c('#329243', '#F9FFAF')
colo
<- data.frame(
outcomes variable = data_merged %>%
select(formate:`2-hydroxybutyrate`) %>%
colnames())
<- function(variable) {
boxplot_cond
<- ggboxplot(data_merged,
p x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'Metabolite level') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
<- map(outcomes$variable, boxplot_cond)
plots
# Create a matrix of plots
<- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
plots_arranged plots_arranged
4 Linear models across metabolites
We will fit a feature-specific linear model where the log2-transformed metabolite represents the outcome variable whereas country
(Italy vs Czech), diet
(vegan vs omnivore), and their interaction (country:diet
) all represent fixed-effects predictors. So, each model has following form
\[ log_{2}(\text{metabolite level}) = \alpha + \beta_{1} \times \text{country} + \beta_{2} \times \text{diet} + \beta_{3} \times \text{country:diet} + \epsilon \]
The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary
output as presenting the conditional effects of diet
averaged across both countries and the conditional effects of country
averaged across both diet groups. We will then use the emmeans
package (Lenth 2024) to obtain specific estimates for the effect of diet
in the Italian and Czech cohorts separately, still from a single model.
Metabolites that will show a significant diet effect (average effect of diet
across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable are these findings.
4.1 Define transformation function for each dataset
Given the distribution of the estimated metabolites concentrations, we will use log2-transformed values
Open code
<- function(x){
trans_metabolite log2(x + 1e-8)
}
4.2 Select and wrangle data
Open code
<- data_metabolites_original %>%
data_analysis_metabolom na.omit() %>%
::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== "VEGAN", 0.5, -0.5
Diet
)
),Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
),::across(
dplyr`formate`: `2-hydroxybutyrate`, ~ trans_metabolite(.)
)%>%
) ::select(
dplyr
Sample,
Country,
Country_IT,
Diet,
Diet_VEGAN,
Group,::everything()
dplyr
)
summary(data_analysis_metabolom[ , 1:12])
## Sample Country Country_IT Diet
## Length:173 Length:173 Min. :-0.50000 Length:173
## Class :character Class :character 1st Qu.:-0.50000 Class :character
## Mode :character Mode :character Median :-0.50000 Mode :character
## Mean :-0.04913
## 3rd Qu.: 0.50000
## Max. : 0.50000
## Diet_VEGAN Group formate tryptophan
## Min. :-0.50000 Length:173 Min. :-20.03 Min. :-17.69
## 1st Qu.:-0.50000 Class :character 1st Qu.:-19.59 1st Qu.:-17.32
## Median : 0.50000 Mode :character Median :-19.25 Median :-17.17
## Mean : 0.07803 Mean :-18.98 Mean :-17.18
## 3rd Qu.: 0.50000 3rd Qu.:-18.30 3rd Qu.:-17.07
## Max. : 0.50000 Max. :-17.93 Max. :-16.10
## phenylalanine histidine tyrosine glucose
## Min. :-16.70 Min. :-17.31 Min. :-17.17 Min. :-13.00
## 1st Qu.:-16.34 1st Qu.:-16.93 1st Qu.:-16.55 1st Qu.:-12.61
## Median :-16.25 Median :-16.80 Median :-16.34 Median :-12.50
## Mean :-16.23 Mean :-16.80 Mean :-16.36 Mean :-12.47
## 3rd Qu.:-16.12 3rd Qu.:-16.67 3rd Qu.:-16.20 3rd Qu.:-12.34
## Max. :-15.76 Max. :-16.16 Max. :-15.25 Max. :-11.58
4.2.1 Define number of metabolites and covariates
Open code
<- 6
n_covarites <- ncol(data_analysis_metabolom) - n_covarites n_features
4.2.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
log2FD_VGdiet_inCZ <- vector('double', n_features)
log2FD_VGdiet_inIT <- vector('double', n_features)
log2FD_VGdiet_avg
<- vector('double', n_features)
log2FD_ITcountry_avg <- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg
<- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg
<- vector('double', n_features)
CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features) CI_U_VGdiet_avg
4.3 Run linear models over metabolites
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_metabolom[, (i + n_covarites)]
data_analysis_metabolom
## fit model
<- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis_metabolom)
model
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_VEGAN | Country_IT
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis_metabolom)[i + n_covarites]
outcome[i]
## country effect
<- summary(model)$coefficients[
log2FD_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
P_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
CI_U_VGdiet_avg[i]
<- summary(model)$coefficients[
log2FD_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet_avg[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
]
<- -contrast_emm$estimate[1]
log2FD_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
P_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_VGdiet_inCZ[i]
<- -contrast_emm$estimate[2]
log2FD_VGdiet_inIT[i] <- contrast_emm$p.value[2]
P_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_VGdiet_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
4
),
] }
4.4 Results table
Open code
<- data.frame(
result_metabolom
outcome,
log2FD_ITcountry_avg, P_ITcountry_avg,
log2FD_VGdiet_avg, P_VGdiet_avg,
log2FD_VGdiet_inCZ, P_VGdiet_inCZ,
log2FD_VGdiet_inIT, P_VGdiet_inIT,
diet_country_int, P_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT )
4.4.1 Adjust p values
Open code
<- result_metabolom %>%
result_metabolom ::mutate(
dplyrfdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
%>%
) ::select(
dplyr
outcome,
log2FD_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
log2FD_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
log2FD_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
log2FD_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
diet_country_int, P_diet_country_int, fdr_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT )
4.4.2 Result: show and save
Open code
::kable(result_metabolom,
kableExtracaption = "Results of linear models, modeling the log2-transformed level of a given metabolite with `Diet`, `Country`, and `Diet:Country` interaction as fixed-effect predictors. Only metabolites with significantly different levels between diet groups are shown (FDR < 0.05, average effect across both countries). `log2FD` prefix: denotes estimated effects (regression coefficients), i.e., how much log2-transformed metabolite levels differ in vegans compared to omnivores, and between Italian and Czech cohorts, respectively; `P`: p-value; `fdr`: p-value after adjustment for multiple comparisons; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval, respectively. `avg` suffix denotes effects averaged across subgroups, whereas `inCZ` and `inIT` denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of a vegan diet between Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model."
)
outcome | log2FD_ITcountry_avg | P_ITcountry_avg | fdr_ITcountry_avg | log2FD_VGdiet_avg | P_VGdiet_avg | fdr_VGdiet_avg | log2FD_VGdiet_inCZ | P_VGdiet_inCZ | fdr_VGdiet_inCZ | log2FD_VGdiet_inIT | P_VGdiet_inIT | fdr_VGdiet_inIT | diet_country_int | P_diet_country_int | fdr_diet_country_int | CI_L_VGdiet_avg | CI_U_VGdiet_avg | CI_L_VGdiet_inCZ | CI_U_VGdiet_inCZ | CI_L_VGdiet_inIT | CI_U_VGdiet_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
formate | 1.2756132 | 0.0000000 | 0.0000000 | 0.0725783 | 0.0091069 | 0.0187830 | 0.0739017 | 0.0529521 | 0.1027894 | 0.0712550 | 0.0756434 | 0.1313806 | -0.0026467 | 0.9616865 | 0.9665166 | 0.0182748 | 0.1268819 | -0.0009521 | 0.1487555 | -0.0074370 | 0.1499469 |
tryptophan | 0.1587058 | 0.0000004 | 0.0000010 | -0.0568079 | 0.0599258 | 0.0988776 | -0.0129552 | 0.7543948 | 0.8584493 | -0.1006607 | 0.0217582 | 0.0578906 | -0.0877055 | 0.1455672 | 0.4003097 | -0.1160166 | 0.0024007 | -0.0945703 | 0.0686600 | -0.1864607 | -0.0148607 |
phenylalanine | 0.1149355 | 0.0000180 | 0.0000397 | -0.1006116 | 0.0001588 | 0.0004765 | -0.0957169 | 0.0084039 | 0.0240306 | -0.1055062 | 0.0057720 | 0.0183972 | -0.0097893 | 0.8511236 | 0.9665166 | -0.1520154 | -0.0492077 | -0.1665736 | -0.0248602 | -0.1799961 | -0.0310163 |
histidine | 0.1021990 | 0.0003020 | 0.0005862 | -0.0754526 | 0.0071189 | 0.0156616 | -0.0925386 | 0.0164038 | 0.0386661 | -0.0583666 | 0.1477098 | 0.2321153 | 0.0341720 | 0.5381061 | 0.8181237 | -0.1301250 | -0.0207801 | -0.1679009 | -0.0171763 | -0.1375931 | 0.0208600 |
tyrosine | 0.2172448 | 0.0000002 | 0.0000008 | -0.1764605 | 0.0000212 | 0.0000777 | -0.0757827 | 0.1746785 | 0.2834809 | -0.2771382 | 0.0000045 | 0.0000296 | -0.2013554 | 0.0135152 | 0.0637147 | -0.2560839 | -0.0968370 | -0.1855382 | 0.0339727 | -0.3925214 | -0.1617549 |
glucose | 0.2042069 | 0.0000000 | 0.0000000 | 0.0294804 | 0.2268282 | 0.3118888 | -0.0087509 | 0.7942477 | 0.8667242 | 0.0677117 | 0.0562146 | 0.1030602 | 0.0764626 | 0.1175782 | 0.3527345 | -0.0184981 | 0.0774589 | -0.0748861 | 0.0573842 | -0.0018145 | 0.1372379 |
mannose | -0.3851135 | 0.0000000 | 0.0000000 | -0.0075879 | 0.8573508 | 0.9126637 | 0.0116376 | 0.8414869 | 0.8677834 | -0.0268134 | 0.6612302 | 0.7212081 | -0.0384510 | 0.6488879 | 0.8922208 | -0.0907960 | 0.0756202 | -0.1030591 | 0.1263343 | -0.1473912 | 0.0937644 |
threonine | 0.1072997 | 0.0036942 | 0.0060954 | -0.0322902 | 0.3768737 | 0.4606234 | 0.0918177 | 0.0693572 | 0.1204626 | -0.1563982 | 0.0035036 | 0.0128466 | -0.2482159 | 0.0008255 | 0.0206291 | -0.1042358 | 0.0396553 | -0.0073544 | 0.1909898 | -0.2606553 | -0.0521410 |
lactate | 0.1141207 | 0.1079126 | 0.1424446 | -0.0025189 | 0.9715852 | 0.9982752 | -0.0570212 | 0.5587577 | 0.7091925 | 0.0519835 | 0.6120868 | 0.7212081 | 0.1090047 | 0.4412625 | 0.7664033 | -0.1419096 | 0.1368719 | -0.2491619 | 0.1351194 | -0.1500093 | 0.2539763 |
glycerol | -0.4783502 | 0.0000000 | 0.0000000 | 0.2280676 | 0.0020679 | 0.0052493 | -0.0065122 | 0.9484003 | 0.9484003 | 0.4626473 | 0.0000208 | 0.0001142 | 0.4691595 | 0.0015470 | 0.0206291 | 0.0841685 | 0.3719666 | -0.2048672 | 0.1918428 | 0.2541216 | 0.6711731 |
glycine | 0.0526110 | 0.3695555 | 0.4355475 | 0.3841430 | 0.0000000 | 0.0000000 | 0.4860745 | 0.0000000 | 0.0000003 | 0.2822116 | 0.0010648 | 0.0043925 | -0.2038629 | 0.0831272 | 0.3166966 | 0.2687063 | 0.4995798 | 0.3269527 | 0.6451962 | 0.1149308 | 0.4494924 |
ornithine | -0.0186175 | 0.6573762 | 0.6779192 | -0.0100253 | 0.8111914 | 0.8923105 | 0.1173914 | 0.0436688 | 0.0900669 | -0.1374420 | 0.0248718 | 0.0578906 | -0.2548334 | 0.0027346 | 0.0206291 | -0.0927415 | 0.0726909 | 0.0033727 | 0.2314101 | -0.2573070 | -0.0175770 |
lysine | -0.0777062 | 0.0216447 | 0.0340130 | -0.3392074 | 0.0000000 | 0.0000000 | -0.2376291 | 0.0000007 | 0.0000123 | -0.4407856 | 0.0000000 | 0.0000000 | -0.2031565 | 0.0028285 | 0.0206291 | -0.4053835 | -0.2730312 | -0.3288485 | -0.1464098 | -0.5366823 | -0.3448890 |
asparagine | 0.2022381 | 0.0000535 | 0.0001103 | 0.2937644 | 0.0000000 | 0.0000001 | 0.2917136 | 0.0000246 | 0.0001355 | 0.2958151 | 0.0000458 | 0.0002160 | 0.0041015 | 0.9665166 | 0.9665166 | 0.1974656 | 0.3900631 | 0.1589724 | 0.4244549 | 0.1562675 | 0.4353628 |
dimethylamine | -0.1062383 | 0.4391315 | 0.4830447 | 0.1352541 | 0.3249083 | 0.4123836 | 0.0444494 | 0.8141955 | 0.8667242 | 0.2260589 | 0.2564313 | 0.3679232 | 0.1816096 | 0.5083386 | 0.8181237 | -0.1351864 | 0.4056947 | -0.3283346 | 0.4172333 | -0.1658397 | 0.6179576 |
2-oxoisocaproate | 0.0644592 | 0.2543679 | 0.3108941 | -0.2423710 | 0.0000288 | 0.0000949 | -0.2580569 | 0.0010967 | 0.0045238 | -0.2266851 | 0.0061324 | 0.0183972 | 0.0313719 | 0.7811114 | 0.9546918 | -0.3536323 | -0.1311097 | -0.4114231 | -0.1046908 | -0.3879151 | -0.0654550 |
citrate | 0.0520531 | 0.4040612 | 0.4597938 | 0.3502631 | 0.0000001 | 0.0000004 | 0.1636893 | 0.0580469 | 0.1064194 | 0.5368368 | 0.0000000 | 0.0000002 | 0.3731476 | 0.0031256 | 0.0206291 | 0.2274184 | 0.4731077 | -0.0056437 | 0.3330222 | 0.3588213 | 0.7148524 |
glutamine | -0.1170194 | 0.0000073 | 0.0000172 | 0.1192102 | 0.0000050 | 0.0000208 | 0.1628214 | 0.0000061 | 0.0000519 | 0.0755991 | 0.0406035 | 0.0802104 | -0.0872223 | 0.0863718 | 0.3166966 | 0.0692988 | 0.1691217 | 0.0940218 | 0.2316209 | 0.0032719 | 0.1479264 |
pyruvate | 0.2836625 | 0.0000053 | 0.0000135 | 0.0001306 | 0.9982752 | 0.9982752 | 0.0366745 | 0.6597014 | 0.8063017 | -0.0364133 | 0.6774985 | 0.7212081 | -0.0730878 | 0.5454158 | 0.8181237 | -0.1189406 | 0.1192017 | -0.1274570 | 0.2008059 | -0.2089607 | 0.1361341 |
acetone | -1.4752586 | 0.0000000 | 0.0000000 | -0.0857535 | 0.5067286 | 0.5766222 | -0.1237970 | 0.4868652 | 0.6694397 | -0.0477100 | 0.7986838 | 0.8236427 | 0.0760870 | 0.7682203 | 0.9546918 | -0.3401821 | 0.1686751 | -0.4745096 | 0.2269155 | -0.4164055 | 0.3209855 |
proline | -0.0601208 | 0.0731143 | 0.1005321 | 0.0514584 | 0.1245750 | 0.1868625 | 0.0983343 | 0.0338050 | 0.0743709 | 0.0045826 | 0.9245413 | 0.9245413 | -0.0937517 | 0.1615414 | 0.4100667 | -0.0143548 | 0.1172717 | 0.0076151 | 0.1890534 | -0.0907882 | 0.0999534 |
acetate | 0.3309603 | 0.0000003 | 0.0000010 | 0.1071815 | 0.0868641 | 0.1365007 | 0.2629752 | 0.0025311 | 0.0092805 | -0.0486121 | 0.5905828 | 0.7212081 | -0.3115873 | 0.0132534 | 0.0637147 | -0.0156776 | 0.2300407 | 0.0936222 | 0.4323282 | -0.2266487 | 0.1294245 |
alanine | 0.4407706 | 0.0000000 | 0.0000000 | 0.1112648 | 0.0306749 | 0.0532774 | 0.0946521 | 0.1803970 | 0.2834809 | 0.1278775 | 0.0857030 | 0.1414099 | 0.0332254 | 0.7452635 | 0.9546918 | 0.0104878 | 0.2120418 | -0.0442621 | 0.2335663 | -0.0181596 | 0.2739146 |
3-hydroxybutyrate | -0.9479525 | 0.0000000 | 0.0000000 | 0.1342617 | 0.3112437 | 0.4108417 | 0.0713201 | 0.6959938 | 0.8202785 | 0.1972033 | 0.3047362 | 0.4057239 | 0.1258833 | 0.6345903 | 0.8922208 | -0.1266980 | 0.3952214 | -0.2883952 | 0.4310353 | -0.1809565 | 0.5753632 |
ethanol | -0.2601231 | 0.0000000 | 0.0000000 | 0.0923122 | 0.0297442 | 0.0532774 | 0.0591555 | 0.3096327 | 0.4442557 | 0.1254689 | 0.0413205 | 0.0802104 | 0.0663134 | 0.4321880 | 0.7664033 | 0.0091778 | 0.1754466 | -0.0554396 | 0.1737506 | 0.0049979 | 0.2459400 |
2-propanol | -0.0020939 | 0.9419149 | 0.9419149 | 0.0390849 | 0.1749891 | 0.2510713 | 0.0447128 | 0.2599048 | 0.3898572 | 0.0334570 | 0.4221865 | 0.5358521 | -0.0112558 | 0.8447466 | 0.9665166 | -0.0175624 | 0.0957323 | -0.0333717 | 0.1227974 | -0.0486314 | 0.1155454 |
2-oxoisovalerate | -0.1728101 | 0.0013380 | 0.0023239 | -0.1361641 | 0.0110225 | 0.0213967 | -0.1937298 | 0.0087384 | 0.0240306 | -0.0785983 | 0.3073666 | 0.4057239 | 0.1151315 | 0.2787369 | 0.6123968 | -0.2407424 | -0.0315857 | -0.3378839 | -0.0495757 | -0.2301440 | 0.0729474 |
3-methyl-2-oxovalerate | 0.1784157 | 0.0008928 | 0.0016367 | 0.0388866 | 0.4620118 | 0.5445139 | 0.0458970 | 0.5287354 | 0.6979307 | 0.0318761 | 0.6771906 | 0.7212081 | -0.0140209 | 0.8944265 | 0.9665166 | -0.0652427 | 0.1430158 | -0.0976381 | 0.1894321 | -0.1190188 | 0.1827711 |
3-hydroxyisobutyrate | 0.1641638 | 0.0458431 | 0.0657749 | -0.2772298 | 0.0008489 | 0.0023345 | -0.2834668 | 0.0126634 | 0.0321455 | -0.2709927 | 0.0231637 | 0.0578906 | 0.0124741 | 0.9391682 | 0.9665166 | -0.4383273 | -0.1161322 | -0.5055288 | -0.0614047 | -0.5044411 | -0.0375444 |
valine | 0.0182953 | 0.5905666 | 0.6286677 | -0.2501700 | 0.0000000 | 0.0000000 | -0.2117525 | 0.0000113 | 0.0000745 | -0.2885876 | 0.0000000 | 0.0000003 | -0.0768351 | 0.2592746 | 0.6111473 | -0.3171717 | -0.1831684 | -0.3041098 | -0.1193953 | -0.3856805 | -0.1914947 |
leucine | 0.0539532 | 0.1315259 | 0.1669368 | -0.2200006 | 0.0000000 | 0.0000000 | -0.1882463 | 0.0001764 | 0.0008316 | -0.2517549 | 0.0000024 | 0.0000201 | -0.0635086 | 0.3737022 | 0.7254219 | -0.2902826 | -0.1497186 | -0.2851253 | -0.0913673 | -0.3536014 | -0.1499084 |
isoleucine | 0.0721086 | 0.0328621 | 0.0492931 | -0.0953559 | 0.0049898 | 0.0117617 | -0.1304241 | 0.0053289 | 0.0175853 | -0.0602878 | 0.2162240 | 0.3243359 | 0.0701363 | 0.2969197 | 0.6123968 | -0.1615209 | -0.0291909 | -0.2216280 | -0.0392201 | -0.1561683 | 0.0355927 |
2-hydroxybutyrate | -0.2965328 | 0.0000000 | 0.0000000 | -0.2371837 | 0.0000029 | 0.0000138 | -0.3151485 | 0.0000063 | 0.0000519 | -0.1592189 | 0.0263139 | 0.0578906 | 0.1559296 | 0.1136270 | 0.3527345 | -0.3339619 | -0.1404054 | -0.4485507 | -0.1817463 | -0.2994614 | -0.0189764 |
Open code
if(file.exists('gitignore/lm_results/result_metabolom.csv') == FALSE){
write.table(result_metabolom,
'gitignore/lm_results/result_metabolom.csv',
row.names = FALSE)
}
5 Elastic net
To assess the predictive power of metabolome features on diet strategy, we employed Elastic Net logistic regression.
As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).
The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.
All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5)
5.1 Prepare data for glmnet
Open code
<- data_metabolites_original %>%
data_metabolites_glmnet ::mutate(
dplyrvegan = as.numeric(
::if_else(
dplyr== "VEGAN", 1, 0
Diet
)
),::across(
dplyr`formate`:`2-hydroxybutyrate`, ~ arm::rescale(trans_metabolite(.))
)%>%
) ::select(
dplyr`formate`:`2-hydroxybutyrate`
Sample, vegan, Country,
)
dim(data_metabolites_glmnet)
## [1] 173 36
names(data_metabolites_glmnet)
## [1] "Sample" "vegan" "Country"
## [4] "formate" "tryptophan" "phenylalanine"
## [7] "histidine" "tyrosine" "glucose"
## [10] "mannose" "threonine" "lactate"
## [13] "glycerol" "glycine" "ornithine"
## [16] "lysine" "asparagine" "dimethylamine"
## [19] "2-oxoisocaproate" "citrate" "glutamine"
## [22] "pyruvate" "acetone" "proline"
## [25] "acetate" "alanine" "3-hydroxybutyrate"
## [28] "ethanol" "2-propanol" "2-oxoisovalerate "
## [31] "3-methyl-2-oxovalerate" "3-hydroxyisobutyrate" "valine"
## [34] "leucine" "isoleucine" "2-hydroxybutyrate"
summary(data_metabolites_glmnet)
## Sample vegan Country formate
## Length:173 Min. :0.000 Length:173 Min. :-0.8030
## Class :character 1st Qu.:0.000 Class :character 1st Qu.:-0.4692
## Mode :character Median :1.000 Mode :character Median :-0.2044
## Mean :0.578 Mean : 0.0000
## 3rd Qu.:1.000 3rd Qu.: 0.5165
## Max. :1.000 Max. : 0.8004
## tryptophan phenylalanine histidine tyrosine
## Min. :-1.21480 Min. :-1.2954 Min. :-1.352559 Min. :-1.36217
## 1st Qu.:-0.33244 1st Qu.:-0.3114 1st Qu.:-0.349523 1st Qu.:-0.32851
## Median : 0.03462 Median :-0.0632 Median :-0.006086 Median : 0.02603
## Mean : 0.00000 Mean : 0.0000 Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.28038 3rd Qu.: 0.2953 3rd Qu.: 0.351011 3rd Qu.: 0.26732
## Max. : 2.59340 Max. : 1.2582 Max. : 1.704357 Max. : 1.86951
## glucose mannose threonine lactate
## Min. :-1.44184 Min. :-1.57714 Min. :-1.91626 Min. :-0.98660
## 1st Qu.:-0.37787 1st Qu.:-0.30430 1st Qu.:-0.32355 1st Qu.:-0.32800
## Median :-0.08058 Median : 0.08151 Median :-0.04016 Median :-0.03976
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.34563 3rd Qu.: 0.38405 3rd Qu.: 0.27712 3rd Qu.: 0.20942
## Max. : 2.37649 Max. : 0.94062 Max. : 1.57739 Max. : 1.86847
## glycerol glycine ornithine lysine
## Min. :-1.86801 Min. :-2.19120 Min. :-1.5540 Min. :-1.82097
## 1st Qu.:-0.31759 1st Qu.:-0.28974 1st Qu.:-0.2927 1st Qu.:-0.30989
## Median : 0.02681 Median : 0.05298 Median : 0.0154 Median : 0.04735
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.30485 3rd Qu.: 0.30088 3rd Qu.: 0.3300 3rd Qu.: 0.35728
## Max. : 1.25029 Max. : 1.21701 Max. : 1.9208 Max. : 1.55589
## asparagine dimethylamine 2-oxoisocaproate citrate
## Min. :-1.19714 Min. :-1.0765 Min. :-1.21974 Min. :-1.55282
## 1st Qu.:-0.36044 1st Qu.:-0.3972 1st Qu.:-0.36202 1st Qu.:-0.26115
## Median : 0.02313 Median :-0.1378 Median : 0.01584 Median :-0.03012
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.32250 3rd Qu.: 0.4977 3rd Qu.: 0.35055 3rd Qu.: 0.28205
## Max. : 1.41408 Max. : 1.0813 Max. : 1.28311 Max. : 1.37139
## glutamine pyruvate acetone proline
## Min. :-1.83024 Min. :-1.3249 Min. :-1.55383 Min. :-1.52820
## 1st Qu.:-0.29243 1st Qu.:-0.3740 1st Qu.:-0.29407 1st Qu.:-0.32477
## Median : 0.03193 Median :-0.0523 Median : 0.07351 Median :-0.02069
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.33934 3rd Qu.: 0.3684 3rd Qu.: 0.31267 3rd Qu.: 0.32296
## Max. : 1.00878 Max. : 1.2458 Max. : 1.10845 Max. : 1.48558
## acetate alanine 3-hydroxybutyrate ethanol
## Min. :-0.980091 Min. :-1.602400 Min. :-1.3186 Min. :-1.13922
## 1st Qu.:-0.323562 1st Qu.:-0.411117 1st Qu.:-0.3224 1st Qu.:-0.37898
## Median :-0.001196 Median :-0.004515 Median :-0.1249 Median : 0.04808
## Mean : 0.000000 Mean : 0.000000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.276962 3rd Qu.: 0.337241 3rd Qu.: 0.2263 3rd Qu.: 0.33658
## Max. : 2.673149 Max. : 1.326617 Max. : 1.5052 Max. : 1.46759
## 2-propanol 2-oxoisovalerate 3-methyl-2-oxovalerate
## Min. :-1.339002 Min. :-1.352854 Min. :-1.59828
## 1st Qu.:-0.277101 1st Qu.:-0.349204 1st Qu.:-0.30949
## Median :-0.009172 Median :-0.003996 Median :-0.03688
## Mean : 0.000000 Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.304907 3rd Qu.: 0.340219 3rd Qu.: 0.32342
## Max. : 1.645117 Max. : 1.496604 Max. : 1.36931
## 3-hydroxyisobutyrate valine leucine isoleucine
## Min. :-1.42810 Min. :-1.08398 Min. :-1.393193 Min. :-1.33229
## 1st Qu.:-0.35778 1st Qu.:-0.38337 1st Qu.:-0.311148 1st Qu.:-0.33120
## Median :-0.02365 Median : 0.01007 Median : 0.005775 Median :-0.02574
## Mean : 0.00000 Mean : 0.00000 Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.28404 3rd Qu.: 0.33301 3rd Qu.: 0.343020 3rd Qu.: 0.29964
## Max. : 1.83272 Max. : 1.96794 Max. : 2.566032 Max. : 1.48801
## 2-hydroxybutyrate
## Min. :-0.96906
## 1st Qu.:-0.35015
## Median :-0.05846
## Mean : 0.00000
## 3rd Qu.: 0.32582
## Max. : 1.40274
5.2 Fit model
Open code
<- "elanet_metabolit_all"
modelac
assign(
modelac,run(
expr = clust_glmnet_sep(
data = data_metabolites_glmnet,
outcome = "vegan",
clust_id = "Sample",
sample_method = "oos_boot",
N = 500,
alphas = c(0, 0.2, 0.4),
family = "binomial",
seed = 478
),path = paste0("gitignore/run/", modelac)
) )
5.3 See results
5.3.1 Model summary
Open code
$model_summary
elanet_metabolit_all## alpha lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1 0 0.02730688 0.9757534 0.9493977 0.9002391 0.9890267 0.9248555
## accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1 0.8897059 0.8175096 0.9516129
$country_AUC
elanet_metabolit_all## auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1 0.9502619 0.8583088 1 0.9497413
## auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1 0.8826595 1
5.3.2 ROC curve - internal validation
Open code
$plot elanet_metabolit_all
5.3.3 Estimated coefficients
Open code
$betas
elanet_metabolit_all## 34 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) 0.51654945
## formate -0.23781875
## tryptophan 0.09514574
## phenylalanine -0.26041520
## histidine -0.46961530
## tyrosine -0.40207966
## glucose -0.06094302
## mannose 0.09576230
## threonine -0.29336072
## lactate -0.03716616
## glycerol 0.40862126
## glycine 0.89462176
## ornithine 0.15526294
## lysine -1.29145285
## asparagine 0.82351717
## dimethylamine 0.06858337
## 2-oxoisocaproate -0.19919398
## citrate 0.38672431
## glutamine 0.63824650
## pyruvate 0.06344946
## acetone -0.14041313
## proline 0.25009447
## acetate 0.29136570
## alanine 0.16148569
## 3-hydroxybutyrate 0.34663556
## ethanol 0.19167449
## 2-propanol -0.07389282
## 2-oxoisovalerate -0.01538734
## 3-methyl-2-oxovalerate 0.51986873
## 3-hydroxyisobutyrate -0.40782423
## valine -0.53956958
## leucine -0.36156524
## isoleucine 0.08659878
## 2-hydroxybutyrate -0.45713854
5.3.4 Plot of coefficients
Open code
<- data.frame(
elacoef metabolite = row.names(elanet_metabolit_all$betas),
beta_ela = elanet_metabolit_all$betas[, 1]
%>%
) arrange(abs(beta_ela)) %>%
filter(abs(beta_ela) > 0,
!grepl('Intercept', metabolite)) %>%
mutate(metabolite = factor(metabolite, levels = metabolite)) %>%
mutate(outcome = as.character(metabolite))
<- "elanet_beta_metabolite"
plotac <- "gitignore/figures"
path
assign(plotac, elacoef %>%
ggplot(
aes(
x = metabolite,
y = beta_ela
)+
) geom_point() +
geom_hline(yintercept = 0, color = "black") +
labs(
y = "Standardized beta coefficients",
x = "Metabolite"
+
) theme_minimal() +
coord_flip() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 5,
height = 6
)
}
get(plotac)
6 External validation
External validation was performed with an independent Czech cohort.
As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each metabolite from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.
In a 2nd step, we will look at metabolites that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated with linear models (one per metabolite) with training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these metabolites will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort
6.1 Diet discrimination (elastic net)
6.1.0.1 Get table of weights, means and SDs
Open code
<- get_coef(
coefs_metabolom original_data = data_analysis_metabolom,
glmnet_model = elanet_metabolit_all)
coefs_metabolom## # A tibble: 34 × 5
## predictor beta_scaled beta_OrigScale mean SD
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.517 NA NA NA
## 2 formate -0.238 -0.312 -19.0 0.655
## 3 tryptophan 0.0951 0.0399 -17.2 0.209
## 4 phenylalanine -0.260 -0.0959 -16.2 0.184
## 5 histidine -0.470 -0.177 -16.8 0.189
## 6 tyrosine -0.402 -0.239 -16.4 0.297
## 7 glucose -0.0609 -0.0227 -12.5 0.186
## 8 mannose 0.0958 0.0632 -18.8 0.330
## 9 threonine -0.293 -0.143 -16.0 0.244
## 10 lactate -0.0372 -0.0336 -14.1 0.453
## # ℹ 24 more rows
6.1.0.3 Standardize data in validation set
Open code
<- data_metabolites_validation %>%
data_metabolites_validation_pred ::mutate(
dplyrvegan = if_else(
== 1, 1, 0
SKUPINA
)%>%
) ::select(
dplyr
vegan,::all_of(common_predictors)
dplyr%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = trans_metabolite
)%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
- coefs_metabolom$mean[
match(
cur_column(),
$predictor
coefs_metabolom
)
]
)%>%
) ::mutate(
dplyracross(
.cols = -vegan,
.fns = ~ .
/ coefs_metabolom$SD[
match(
cur_column(),
$predictor
coefs_metabolom
)
]
)
)
%>% names()
data_metabolites_validation_pred ## [1] "vegan" "formate" "tryptophan"
## [4] "phenylalanine" "histidine" "tyrosine"
## [7] "glucose" "mannose" "threonine"
## [10] "lactate" "glycerol" "glycine"
## [13] "ornithine" "lysine" "asparagine"
## [16] "dimethylamine" "2-oxoisocaproate" "citrate"
## [19] "glutamine" "pyruvate" "acetone"
## [22] "proline" "acetate" "alanine"
## [25] "3-hydroxybutyrate" "ethanol" "2-propanol"
## [28] "2-oxoisovalerate " "3-methyl-2-oxovalerate" "3-hydroxyisobutyrate"
## [31] "valine" "leucine" "isoleucine"
## [34] "2-hydroxybutyrate"
6.1.0.4 Get predicted value
Open code
$fit
elanet_metabolit_all##
## Call: glmnet::glmnet(x = original_predictors, y = original_outcome, family = family, alpha = optim_par$alpha[1], lambda = optim_par$lamb_1se[1], standardize = standardize)
##
## Df %Dev Lambda
## 1 33 60.94 0.02731
<- as.matrix(data_metabolites_validation_pred[,-1])
newx
<- data_metabolites_validation_pred %>%
tr ::mutate(
dplyrpredicted_logit = as.numeric(
predict(
$fit,
elanet_metabolit_allnewx = newx
)
)%>%
) ::mutate(
dplyrpredicted = inv_logit(predicted_logit)
)
6.1.1 Result of external validation
6.1.1.1 ROC curve
Open code
<- pROC::roc(
roc_metabolite ~ predicted_logit,
vegan data = tr,
direction = "<",
levels = c(0, 1),
ci = TRUE
)
roc_metabolite##
## Call:
## roc.formula(formula = vegan ~ predicted_logit, data = tr, direction = "<", levels = c(0, 1), ci = TRUE)
##
## Data: predicted_logit in 49 controls (vegan 0) < 91 cases (vegan 1).
## Area under the curve: 0.9123
## 95% CI: 0.8588-0.9658 (DeLong)
<- "roc_metabolite_pl"
plotac <- "gitignore/figures"
path
assign(plotac, ggroc(roc_metabolite))
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 4.5
) }
6.1.1.2 Table
Open code
<- elanet_metabolit_all
mod
<- mean(mod[["valid_performances"]]$auc_resamp_test)
trainAUC <- quantile(mod[["valid_performances"]]$auc_resamp_test,
trainCI probs = c(1 / 40, 39 / 40))
<- data.frame(
res alpha = c(mod$model_summary$alpha, rep("", 4)),
lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
`performance type` = c(
"Training set AUC",
"Out-of-sample AUC (all)",
"Out-of-sample AUC (Czech)",
"Out-of-sample AUC (Italy)",
"External validation AUC"
),`performance [95% CI]` = c(
sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
sprintf("%.3f [%.3f to %.3f]",
$model_summary$auc_OutOfSample,
mod$model_summary$auc_oos_CIL,
mod$model_summary$auc_oos_CIU),
modsprintf("%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_CZ,
mod$country_AUC$auc_oos_CIL_CZ,
mod$country_AUC$auc_oos_CIU_CZ),
modsprintf("%.3f [%.3f to %.3f]",
$country_AUC$auc_OutOfSample_IT,
mod$country_AUC$auc_oos_CIL_IT,
mod$country_AUC$auc_oos_CIU_IT),
modsprintf("%.3f [%.3f to %.3f]",
"ci"]][2],
roc_metabolite[["ci"]][1],
roc_metabolite[["ci"]][3])
roc_metabolite[[
)
)
::kable(
kableExtra%>% mutate(across(where(is.numeric), ~ round(.x, 3))),
res caption = 'Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using log2‑transformed metabolite levels. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method.'
)
alpha | lambda | performance.type | performance..95..CI. |
---|---|---|---|
0 | 0.0273 | Training set AUC | 0.988 [0.970 to 1.000] |
Out-of-sample AUC (all) | 0.949 [0.900 to 0.989] | ||
Out-of-sample AUC (Czech) | 0.950 [0.883 to 1.000] | ||
Out-of-sample AUC (Italy) | 0.950 [0.858 to 1.000] | ||
External validation AUC | 0.912 [0.859 to 0.966] |
6.2 Diet effect across datasets (forest plot)
Similarly as in training data cohorts, we will fit linear model per each of the selected metabolite level (\(log_{2}\) - transformed), with a single fixed effect factor of diet
.
6.2.1 Linear model in validation cohort
Open code
<- data_metabolites_validation %>%
data_analysis_metabolom ::mutate(
dplyrDiet_VEGAN = as.numeric(
::if_else(
dplyr== 1, 1, 0
SKUPINA
)
),::across(
dplyr`formate`:`2-hydroxybutyrate`, ~ trans_metabolite(.)
)%>%
) ::select(
dplyr
Diet_VEGAN,::everything()
dplyr
)
summary(data_analysis_metabolom[, 1:12])
## Diet_VEGAN NazevSouboru CisloPacienta SkupinaNazev
## Min. :0.00 Length:140 Length:140 Length:140
## 1st Qu.:0.00 Class :character Class :character Class :character
## Median :1.00 Mode :character Mode :character Mode :character
## Mean :0.65
## 3rd Qu.:1.00
## Max. :1.00
## SKUPINA formate tryptophan phenylalanine
## Min. :0.00 Min. :-21.07 Min. :-19.59 Min. :-17.22
## 1st Qu.:0.00 1st Qu.:-18.04 1st Qu.:-16.79 1st Qu.:-15.66
## Median :1.00 Median :-17.90 Median :-16.67 Median :-15.57
## Mean :0.65 Mean :-17.93 Mean :-16.70 Mean :-15.56
## 3rd Qu.:1.00 3rd Qu.:-17.79 3rd Qu.:-16.57 3rd Qu.:-15.45
## Max. :1.00 Max. :-17.40 Max. :-16.14 Max. :-14.97
## histidine tyrosine glucose mannose
## Min. :-18.05 Min. :-19.05 Min. :-15.29 Min. :-20.47
## 1st Qu.:-16.63 1st Qu.:-15.92 1st Qu.:-12.24 1st Qu.:-17.69
## Median :-16.48 Median :-15.77 Median :-12.14 Median :-17.56
## Mean :-16.47 Mean :-15.79 Mean :-12.16 Mean :-17.54
## 3rd Qu.:-16.32 3rd Qu.:-15.60 3rd Qu.:-12.04 3rd Qu.:-17.35
## Max. :-15.75 Max. :-15.12 Max. :-11.77 Max. :-16.69
6.2.1.1 Define number of metabolites and covariates
Open code
<- 5
n_covarites <- ncol(data_analysis_metabolom) - n_covarites n_features
6.2.1.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
log2FD_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features) CI_U_VGdiet
6.2.1.3 Linear models per outcome
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_metabolom[, (i + n_covarites)]
data_analysis_metabolom
## fit model
<- lm(outcome ~ Diet_VEGAN, data = data_analysis_metabolom)
model
## save results
<- names(data_analysis_metabolom)[i + n_covarites]
outcome[i]
## extract diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]
CI_U_VGdiet[i]
<- summary(model)$coefficients[
log2FD_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
1
),
]
<- summary(model)$coefficients[
P_VGdiet[i] which(
names(model$coefficients) == "Diet_VEGAN"
4
),
] }
6.2.1.4 Results table
Open code
## relevant metabolites
<- result_metabolom %>%
diet_sensitive_metabolites filter(
< 0.05
fdr_VGdiet_avg %>%
) select(
outcome
)
<- nrow(diet_sensitive_metabolites)
len
<- data.frame(
result_metabolom_val
outcome,
log2FD_VGdiet, P_VGdiet,
CI_L_VGdiet, CI_U_VGdiet%>%
) filter(outcome %in% diet_sensitive_metabolites$outcome)
::kable(result_metabolom_val,
kableExtracaption = 'Results of linear models estimating the effect of diet on metabolite levels. Only metabolites that significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `log2FD` denotes the estimated effects (regression coefficient), indicating how much the log2-transformed metabolite levels differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model')
outcome | log2FD_VGdiet | P_VGdiet | CI_L_VGdiet | CI_U_VGdiet |
---|---|---|---|---|
formate | 0.1397995 | 0.0134426 | 0.0294080 | 0.2501911 |
phenylalanine | -0.0239133 | 0.5569092 | -0.1042089 | 0.0563822 |
histidine | 0.1928632 | 0.0000376 | 0.1033398 | 0.2823867 |
tyrosine | -0.0529005 | 0.4108809 | -0.1797119 | 0.0739110 |
glycerol | 1.0869457 | 0.0000000 | 0.7652919 | 1.4085994 |
glycine | 0.5251078 | 0.0000000 | 0.3845024 | 0.6657132 |
lysine | -0.0603541 | 0.3180455 | -0.1794414 | 0.0587332 |
asparagine | 0.2260915 | 0.0000543 | 0.1187963 | 0.3333867 |
2-oxoisocaproate | -0.2146024 | 0.0001050 | -0.3208361 | -0.1083687 |
citrate | 0.4309305 | 0.0000001 | 0.2809046 | 0.5809563 |
glutamine | 0.3275636 | 0.0000000 | 0.2157616 | 0.4393657 |
2-oxoisovalerate | -0.1131207 | 0.1129090 | -0.2533195 | 0.0270781 |
3-hydroxyisobutyrate | -0.3644292 | 0.0013001 | -0.5838998 | -0.1449586 |
valine | -0.0850854 | 0.0666411 | -0.1760896 | 0.0059189 |
leucine | -0.0029199 | 0.9647790 | -0.1334316 | 0.1275918 |
isoleucine | 0.1097519 | 0.1853915 | -0.0532959 | 0.2727997 |
2-hydroxybutyrate | -0.0803085 | 0.3548272 | -0.2513515 | 0.0907345 |
Open code
if(file.exists('gitignore/lm_results/result_metabolom_validation.csv') == FALSE){
write.table(result_metabolom_val,
'gitignore/lm_results/result_metabolom_validation.csv',
row.names = FALSE)
}
6.2.2 Forest plot
6.2.2.1 Data preparation
Open code
## subset result tables
<- result_metabolom %>%
result_metabolom_subset filter(outcome %in% diet_sensitive_metabolites$outcome)
<- result_metabolom_val %>%
result_metabolom_val_subset filter(outcome %in% diet_sensitive_metabolites$outcome)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_metabolites$outcome, 3),
beta = c(
$log2FD_VGdiet_inCZ,
result_metabolom_subset$log2FD_VGdiet_inIT,
result_metabolom_subset$log2FD_VGdiet
result_metabolom_val_subset
),lower = c(
$CI_L_VGdiet_inCZ,
result_metabolom_subset$CI_L_VGdiet_inIT,
result_metabolom_subset$CI_L_VGdiet
result_metabolom_val_subset
),upper = c(
$CI_U_VGdiet_inCZ,
result_metabolom_subset$CI_U_VGdiet_inIT,
result_metabolom_subset$CI_U_VGdiet
result_metabolom_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
<- data_forest %>%
validation_order group_by(outcome) %>%
summarise(beta_mean = mean(beta), .groups = "drop") %>%
arrange(beta_mean) %>%
pull(outcome)
<- data_forest %>%
up_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef, by = 'outcome') %>%
filter(beta_CZ > 0,
> 0,
beta_IT > 0,
lower_Validation > 0.1) %>%
beta_ela select(outcome)
<- data_forest %>%
down_winners pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef, by = 'outcome') %>%
filter(beta_CZ < 0,
< 0,
beta_IT < 0,
upper_Validation < -0.1) %>%
beta_ela select(outcome)
<- as.character(c(up_winners$outcome, down_winners$outcome))
winners
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% select(-metabolite),
elacoef by = 'outcome') %>%
mutate(outcome = factor(outcome, levels = validation_order))
6.2.2.2 Plotting
Open code
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
<- "forest_metabolite"
plotac <- "gitignore/figures"
path
assign(
plotac, ggplot(
data_forest,aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)) +
geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan duration (+10y) on log2 lipid level",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
$outcome
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 8
)
}
get(plotac)
Diet
, Country
, and the interaction term Diet x Country
as fixed-effect predictors. In the independent Czech validation cohort, Diet
was the only fixed-effect predictor. Metabolites validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are bold6.2.3 Boxplot
Open code
<- "boxplot_metabolom"
plotac <- "gitignore/figures"
path
<- c('#F9FFAF','#329243')
colo
<- data_merged %>%
data_merged_log2 mutate(across(`formate`:`2-hydroxybutyrate`, ~ trans_metabolite(.)))
<- function(variable) {
boxplot_cond
<- ggboxplot(data_merged_log2,
p x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'Log2(metabolite level)') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
) return(p)
}
# Plot all outcomes
<- map(diet_sensitive_metabolites$outcome, boxplot_cond)
plots
# Create a matrix of plots
assign(plotac,
ggarrange(plotlist = plots, ncol = 3, nrow = 6, common.legend = TRUE)
)
get(plotac)
Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 10
) }
7 Linear model VG duration
Next, we fit another series of linear models, this time modelling log2 metabolite levels using the following fixed-effect predictors: duration of vegan status (Diet_duration
, scaled in tens of years), country
, their interaction (Diet_duration × country
), and age
:
\[ \log_{2}(\text{metabolite level}) = \alpha + \beta_{1} \times \text{country} + \beta_{2} \times \text{diet duration} + \beta_{3} \times (\text{country}:\text{diet duration}) + \epsilon \]
This analysis includes only vegan participants, while omnivores are excluded. The goal was to examine whether metabolites that differ between vegans and non-vegans also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger up- or down-regulation of diet-sensitive metabolites compared to those who adopted the diet more recently.
Because longer vegan duration is likely correlated with age (e.g. a 20-year-old cannot have 20 years of vegan history, unlike a 40-year-old), we also adjusted for age in the models.
7.1 Get data
7.1.1 Training
Open code
<- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 1)
meta_trainIT <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 2) %>%
meta_trainCZ ::mutate(ID = paste0('T', ID),
dplyrSex = SEX) %>%
::select(-SEX)
dplyr
1:5,]
meta_trainIT[## ID COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
1:5,]
meta_trainCZ[## ID COHORT GRP Diet_duration Age Sex
## 1 T97 CZ_train OM 0 26.76438 M
## 2 T98 CZ_train OM 0 27.61370 F
## 3 T134 CZ_train OM 0 21.64384 F
## 4 T136 CZ_train OM 0 31.25479 M
## 5 T137 CZ_train OM 0 40.42192 F
<- bind_rows(meta_trainIT, meta_trainCZ) %>%
data_meta_original rename(`Sample` = `ID`)
1:5,]
data_meta_original[## Sample COHORT GRP Diet_duration Age Sex
## 1 VOV002 IT_train OM 0 61.4 F
## 2 VOV003 IT_train OM 0 43.7 F
## 3 VOV004 IT_train OM 0 61.1 F
## 4 VOV006 IT_train OM 0 31.7 F
## 5 VOV007 IT_train OM 0 31.8 F
::tbl_summary(
gtsummary%>%
data_meta_original mutate(Category = paste0(COHORT, GRP)) %>%
select(Sex, Category),
by = 'Category'
)
Characteristic |
CZ_trainOM |
CZ_trainVG |
IT_trainOM |
IT_trainVG |
---|---|---|---|---|
Sex | ||||
F | 17 (52%) | 25 (40%) | 24 (60%) | 22 (58%) |
M | 16 (48%) | 37 (60%) | 16 (40%) | 16 (42%) |
1
n (%) |
Open code
<- data_metabolites_original %>%
data_metabolite_original2 left_join(data_meta_original, by = 'Sample') %>%
select(Sample:Diet, COHORT:Sex, everything())
%>% dim()
data_metabolite_original2 ## [1] 173 42
data_metabolite_original2[1:4,
ncol(data_metabolite_original2)-10):ncol(data_metabolite_original2)
(
]## alanine 3-hydroxybutyrate ethanol 2-propanol 2-oxoisovalerate
## 1 0.0001213836 2.387548e-05 8.178577e-06 2.724559e-06 5.733548e-06
## 2 0.0001159496 9.752541e-06 7.881878e-06 2.636666e-06 6.229350e-06
## 3 0.0001374516 9.122670e-06 7.059981e-06 2.858835e-06 5.990228e-06
## 4 0.0001086865 1.477925e-05 7.276468e-06 2.215811e-06 4.624115e-06
## 3-methyl-2-oxovalerate 3-hydroxyisobutyrate valine leucine
## 1 6.041008e-06 2.418543e-06 7.093858e-05 6.090779e-05
## 2 5.920311e-06 4.066098e-06 6.269643e-05 6.538483e-05
## 3 5.437201e-06 4.733488e-06 7.109963e-05 6.308931e-05
## 4 3.964752e-06 2.978589e-06 5.723841e-05 5.201635e-05
## isoleucine 2-hydroxybutyrate
## 1 1.384813e-05 7.989465e-06
## 2 1.612050e-05 1.078008e-05
## 3 1.423981e-05 1.042251e-05
## 4 1.250699e-05 1.032980e-05
1:4, 1:10]
data_metabolite_original2[## Sample Country Diet COHORT GRP Diet_duration Age Sex Group
## 1 T119 CZ VEGAN CZ_train VG 3.0 38.21644 M VEGAN_CZ
## 2 T120 CZ VEGAN CZ_train VG 5.0 31.00000 M VEGAN_CZ
## 3 T126 CZ VEGAN CZ_train VG 6.0 25.22740 M VEGAN_CZ
## 4 T127 CZ VEGAN CZ_train VG 3.2 22.48219 F VEGAN_CZ
## formate
## 1 1.363904e-06
## 2 1.579640e-06
## 3 1.065157e-06
## 4 1.337780e-06
7.1.2 Validation
Open code
<- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 3) %>%
data_meta_valid rename(`Sample` = `ID`)
::tbl_summary(
gtsummary%>%
data_meta_valid mutate(Category = GRP) %>%
select(SEX, Category),
by = 'Category'
)
Characteristic |
OM |
VN |
---|---|---|
SEX | ||
F | 25 (50%) | 47 (51%) |
M | 25 (50%) | 45 (49%) |
1
n (%) |
Open code
<- data_metabolites_validation %>%
data_metabolite_valid2 mutate(Diet = if_else(SKUPINA == 1, 'VEGAN', 'OMNI')) %>%
rename(`Sample` = `CisloPacienta`) %>%
select(Sample, Diet, everything()) %>%
select(-c(NazevSouboru:SKUPINA)) %>%
mutate(Sample = paste0('K', as.numeric(Sample))) %>%
left_join(data_meta_valid, by = 'Sample') %>%
select(Sample:Diet, COHORT:SEX, dplyr::everything())
<- data_metabolites_validation %>%
pats_val1 mutate(Diet = if_else(SKUPINA == 1, 'VEGAN', 'OMNI')) %>%
rename(`Sample` = `CisloPacienta`) %>%
select(Sample, Diet) %>%
mutate(Sample = paste0('K', as.numeric(Sample))) %>%
select(Sample) %>% pull
<- data_meta_valid$Sample
pats_val2
union(setdiff(pats_val1, pats_val2),
setdiff(pats_val2, pats_val1))
## [1] "K56" "K209"
%>% dim()
data_metabolite_valid2 ## [1] 140 40
data_metabolite_valid2[1:4,
ncol(data_metabolite_valid2)-10):ncol(data_metabolite_valid2)
(
]## alanine 3-hydroxybutyrate ethanol 2-propanol 2-oxoisovalerate
## 1 0.0001321337 6.622714e-06 1.322319e-06 1.862368e-06 3.882214e-06
## 2 0.0001666794 2.636257e-05 1.950937e-06 2.478307e-06 4.176860e-06
## 3 0.0001372282 2.689236e-06 1.276704e-06 1.305063e-06 5.873894e-06
## 4 0.0002100940 3.813916e-06 1.582076e-06 1.988745e-06 4.902592e-06
## 3-methyl-2-oxovalerate 3-hydroxyisobutyrate valine leucine
## 1 7.514717e-06 6.263071e-06 8.426543e-05 7.687952e-05
## 2 5.784112e-06 2.784464e-06 5.376007e-05 5.273321e-05
## 3 9.832289e-06 4.093783e-06 8.149183e-05 7.399149e-05
## 4 6.437421e-06 2.800246e-06 6.654833e-05 5.688137e-05
## isoleucine 2-hydroxybutyrate
## 1 1.584328e-05 1.076326e-05
## 2 1.230743e-05 9.433298e-06
## 3 1.906236e-05 7.830384e-06
## 4 1.341109e-05 7.831377e-06
1:4, 1:10]
data_metabolite_valid2[## Sample Diet COHORT GRP Diet_duration Age SEX formate tryptophan
## 1 K42 OMNI CZ_val OM <NA> 37.01096 M 3.411723e-06 9.782523e-06
## 2 K43 OMNI CZ_val OM <NA> 39.52603 F 3.632946e-06 7.101108e-06
## 3 K123 OMNI CZ_val OM <NA> 29.12603 M 4.213210e-06 1.087543e-05
## 4 K124 OMNI CZ_val OM <NA> 27.95616 F 3.643107e-06 9.685863e-06
## phenylalanine
## 1 1.977202e-05
## 2 2.134558e-05
## 3 2.377540e-05
## 4 1.942135e-05
%>% select(Sample, Diet, GRP)
data_metabolite_valid2 ## Sample Diet GRP
## 1 K42 OMNI OM
## 2 K43 OMNI OM
## 3 K123 OMNI OM
## 4 K124 OMNI OM
## 5 K126 OMNI OM
## 6 K127 OMNI OM
## 7 K130 OMNI OM
## 8 K131 OMNI OM
## 9 K143 OMNI OM
## 10 K144 OMNI OM
## 11 K150 OMNI OM
## 12 K151 OMNI OM
## 13 K154 OMNI OM
## 14 K155 OMNI OM
## 15 K175 OMNI OM
## 16 K176 OMNI OM
## 17 K178 OMNI OM
## 18 K179 OMNI OM
## 19 K182 OMNI OM
## 20 K183 OMNI OM
## 21 K194 OMNI OM
## 22 K195 OMNI OM
## 23 K198 OMNI OM
## 24 K199 OMNI OM
## 25 K201 OMNI OM
## 26 K202 OMNI OM
## 27 K210 OMNI OM
## 28 K222 OMNI OM
## 29 K223 OMNI OM
## 30 K230 OMNI OM
## 31 K231 OMNI OM
## 32 K234 OMNI OM
## 33 K235 OMNI OM
## 34 K238 OMNI OM
## 35 K239 OMNI OM
## 36 K253 OMNI OM
## 37 K254 OMNI OM
## 38 K266 OMNI OM
## 39 K267 OMNI OM
## 40 K269 OMNI OM
## 41 K270 OMNI OM
## 42 K298 OMNI OM
## 43 K299 OMNI OM
## 44 K309 OMNI OM
## 45 K310 OMNI OM
## 46 K318 OMNI OM
## 47 K319 OMNI OM
## 48 K329 OMNI OM
## 49 K330 OMNI OM
## 50 K4 VEGAN VN
## 51 K5 VEGAN VN
## 52 K7 VEGAN VN
## 53 K12 VEGAN VN
## 54 K13 VEGAN VN
## 55 K15 VEGAN VN
## 56 K16 VEGAN VN
## 57 K18 VEGAN VN
## 58 K19 VEGAN VN
## 59 K25 VEGAN VN
## 60 K26 VEGAN VN
## 61 K31 VEGAN VN
## 62 K32 VEGAN VN
## 63 K34 VEGAN VN
## 64 K35 VEGAN VN
## 65 K38 VEGAN VN
## 66 K39 VEGAN VN
## 67 K45 VEGAN VN
## 68 K46 VEGAN VN
## 69 K48 VEGAN VN
## 70 K49 VEGAN VN
## 71 K51 VEGAN VN
## 72 K52 VEGAN VN
## 73 K55 VEGAN VN
## 74 K61 VEGAN VN
## 75 K62 VEGAN VN
## 76 K64 VEGAN VN
## 77 K65 VEGAN VN
## 78 K70 VEGAN VN
## 79 K71 VEGAN VN
## 80 K73 VEGAN VN
## 81 K74 VEGAN VN
## 82 K81 VEGAN VN
## 83 K82 VEGAN VN
## 84 K85 VEGAN VN
## 85 K86 VEGAN VN
## 86 K92 VEGAN VN
## 87 K94 VEGAN VN
## 88 K95 VEGAN VN
## 89 K100 VEGAN VN
## 90 K101 VEGAN VN
## 91 K103 VEGAN VN
## 92 K104 VEGAN VN
## 93 K106 VEGAN VN
## 94 K107 VEGAN VN
## 95 K109 VEGAN VN
## 96 K110 VEGAN VN
## 97 K113 VEGAN VN
## 98 K114 VEGAN VN
## 99 K116 VEGAN VN
## 100 K117 VEGAN VN
## 101 K119 VEGAN VN
## 102 K120 VEGAN VN
## 103 K146 VEGAN VN
## 104 K147 VEGAN VN
## 105 K165 VEGAN VN
## 106 K166 VEGAN VN
## 107 K171 VEGAN VN
## 108 K172 VEGAN VN
## 109 K187 VEGAN VN
## 110 K188 VEGAN VN
## 111 K190 VEGAN VN
## 112 K191 VEGAN VN
## 113 K205 VEGAN VN
## 114 K206 VEGAN VN
## 115 K216 VEGAN VN
## 116 K217 VEGAN VN
## 117 K219 VEGAN VN
## 118 K220 VEGAN VN
## 119 K226 VEGAN VN
## 120 K227 VEGAN VN
## 121 K250 VEGAN VN
## 122 K251 VEGAN VN
## 123 K257 VEGAN VN
## 124 K258 VEGAN VN
## 125 K260 VEGAN VN
## 126 K261 VEGAN VN
## 127 K263 VEGAN VN
## 128 K264 VEGAN VN
## 129 K281 VEGAN VN
## 130 K282 VEGAN VN
## 131 K284 VEGAN VN
## 132 K285 VEGAN VN
## 133 K288 VEGAN VN
## 134 K289 VEGAN VN
## 135 K291 VEGAN VN
## 136 K292 VEGAN VN
## 137 K295 VEGAN VN
## 138 K296 VEGAN VN
## 139 K301 VEGAN VN
## 140 K302 VEGAN VN
%>% select(Sample, Diet, GRP, Diet_duration) %>%
data_metabolite_valid2 filter(
== 'VEGAN' & GRP == 'OM') | (Diet == 'OMNI' & GRP == 'VN')
(Diet
)## [1] Sample Diet GRP Diet_duration
## <0 rows> (or 0-length row.names)
7.2 Training cohort
7.2.1 Select data
Open code
<- data_metabolite_original2 %>%
data_analysis mutate(Country_IT = as.numeric(
::if_else(
dplyr== "IT", 0.5, -0.5
Country
)
)%>%
) filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
select(Sample:Group, Country_IT, all_of(diet_sensitive_metabolites$outcome))
summary(data_analysis[ , 1:11])
## Sample Country Diet COHORT
## Length:98 Length:98 Length:98 Length:98
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## GRP Diet_duration Age Sex
## Length:98 Min. : 0.600 Min. :18.25 Length:98
## Class :character 1st Qu.: 3.212 1st Qu.:27.70 Class :character
## Mode :character Median : 5.000 Median :32.56 Mode :character
## Mean : 5.472 Mean :35.00
## 3rd Qu.: 6.000 3rd Qu.:40.31
## Max. :20.000 Max. :60.70
## Group Country_IT formate
## Length:98 Min. :-0.5000 Min. :9.235e-07
## Class :character 1st Qu.:-0.5000 1st Qu.:1.265e-06
## Mode :character Median :-0.5000 Median :1.548e-06
## Mean :-0.1122 Mean :2.059e-06
## 3rd Qu.: 0.5000 3rd Qu.:3.096e-06
## Max. : 0.5000 Max. :3.915e-06
7.2.2 Define number of metabolite and covariates
Open code
<- 10
n_covarites <- ncol(data_analysis) - n_covarites n_features
7.2.3 Create empty objects
Open code
<- vector('double', n_features)
outcome
<- vector('double', n_features)
logFD_Diet_duration_inCZ <- vector('double', n_features)
logFD_Diet_duration_inIT <- vector('double', n_features)
logFD_Diet_duration_avg
<- vector('double', n_features)
logFD_ITcountry_avg <- vector('double', n_features)
diet_country_int
<- vector('double', n_features)
logFD_Age <- vector('double', n_features)
P_Age
<- vector('double', n_features)
P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg
<- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int
<- vector('double', n_features)
CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg
<- vector('double', n_features)
CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features) CI_U_Diet_duration_avg
7.2.4 Estimate over outcomes
Open code
if(file.exists('gitignore/lm_results/result_metabolite_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis[, (i + n_covarites)]
data_analysis
## fit model
<- lm(outcome ~ Country_IT * Diet_duration + Age,
model data = data_analysis %>%
mutate(Diet_duration = (Diet_duration/10),
outcome = trans_metabolite(outcome),
Age = (Age-33)/10))
## get contrast (effects of diet BY COUNTRY)
<- summary(
contrast_emm pairs(
emmeans(
model,specs = ~ Diet_duration | Country_IT,
at = list(Diet_duration = c(0, 1))
),interaction = TRUE,
adjust = "none"
),infer = c(TRUE, TRUE)
)
## save results
<- names(data_analysis)[i + n_covarites]
outcome[i]
## country and Age effect
<- summary(model)$coefficients[
logFD_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
1
),
]
<- summary(model)$coefficients[
P_ITcountry_avg[i] which(
names(model$coefficients) == "Country_IT"
4
),
]
<- summary(model)$coefficients[
logFD_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == 'Diet_duration'),][1]
CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
CI_U_Diet_duration_avg[i]
<- summary(model)$coefficients[
logFD_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_Diet_duration_avg[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- -contrast_emm[1,3]
logFD_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
P_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_L_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
CI_U_Diet_duration_inCZ[i]
<- -contrast_emm[2,3]
logFD_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
P_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
CI_L_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
CI_U_Diet_duration_inIT[i]
## interaction
<- summary(model)$coefficients[
diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_diet_country_int[i] which(
names(model$coefficients) == "Country_IT:Diet_duration"
4
),
]
}
<- data.frame(
result_metabolite
outcome,
logFD_ITcountry_avg, P_ITcountry_avg,
logFD_Age, P_Age,
logFD_Diet_duration_avg, P_Diet_duration_avg,
logFD_Diet_duration_inCZ, P_Diet_duration_inCZ,
logFD_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
saveRDS(result_metabolite,
'gitignore/lm_results/result_metabolite_VGdur_training.Rds')
}
<- readRDS('gitignore/lm_results/result_metabolite_VGdur_training.Rds') result_metabolite
7.2.5 Show and save results
Open code
::kable(result_metabolite %>%
kableExtra::select(
dplyr
outcome,
logFD_ITcountry_avg, P_ITcountry_avg,
logFD_Diet_duration_avg, P_Diet_duration_avg,
logFD_Diet_duration_inCZ, P_Diet_duration_inCZ,
logFD_Diet_duration_inIT, P_Diet_duration_inIT,
diet_country_int, P_diet_country_int,
CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT%>% arrange(P_Diet_duration_avg),
) caption = "Result of linear models, modelling the log2-transformed level of given metabolite with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration x Country`) and `Age` as predictors, using training data only (both Czech and Italian cohorts). Only metabolites whose log2-transformed levels differ between diet in training cohorts (FDR < 0.05, average diet effet across both countries) are shown. `logFD` prefix: denotes estimated effects (regression coefficient), i.e. expected change in log2 metabolite level per 10 years of vegan diet or age, or for country (Italy vs Czechia). `P`: p-value, `CI_L` and `CI_U`: lower and upper bounds of 95% confidence interval respectively. `avg` suffix shows effect averaged across subgroups, whereas `inCZ` and `inIT` show effect in Czech or Italian cohort respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model with interaction",
escape = FALSE
)
outcome | logFD_ITcountry_avg | P_ITcountry_avg | logFD_Diet_duration_avg | P_Diet_duration_avg | logFD_Diet_duration_inCZ | P_Diet_duration_inCZ | logFD_Diet_duration_inIT | P_Diet_duration_inIT | diet_country_int | P_diet_country_int | CI_L_Diet_duration_avg | CI_U_Diet_duration_avg | CI_L_Diet_duration_inCZ | CI_U_Diet_duration_inCZ | CI_L_Diet_duration_inIT | CI_U_Diet_duration_inIT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
citrate | 0.1143157 | 0.4930994 | 0.4022623 | 0.0110006 | 0.2482721 | 0.1130090 | 0.5562525 | 0.0282494 | 0.3079804 | 0.2687216 | 0.0943973 | 0.7101274 | -0.0598810 | 0.5564252 | 0.0606037 | 1.0519013 |
phenylalanine | 0.1657750 | 0.0166864 | -0.0801155 | 0.2099722 | -0.0086252 | 0.8922892 | -0.1516058 | 0.1412463 | -0.1429806 | 0.2101544 | -0.2061433 | 0.0459122 | -0.1347709 | 0.1175204 | -0.3545048 | 0.0512931 |
lysine | -0.2203053 | 0.0303230 | -0.0999406 | 0.2877428 | -0.0981387 | 0.2969192 | -0.1017424 | 0.5006567 | -0.0036037 | 0.9828180 | -0.2855565 | 0.0856753 | -0.2839283 | 0.0876508 | -0.4005756 | 0.1970907 |
histidine | 0.0900923 | 0.1934038 | -0.0439469 | 0.4951805 | -0.0612703 | 0.3426407 | -0.0266234 | 0.7972240 | 0.0346469 | 0.7630355 | -0.1713873 | 0.0834936 | -0.1888300 | 0.0662894 | -0.2317968 | 0.1785499 |
leucine | 0.0392781 | 0.6895034 | -0.0569630 | 0.5349177 | -0.0366777 | 0.6895893 | -0.0772482 | 0.6010874 | -0.0405705 | 0.8043271 | -0.2385798 | 0.1246538 | -0.2184645 | 0.1451090 | -0.3696431 | 0.2151466 |
isoleucine | 0.2262944 | 0.0138836 | -0.0517482 | 0.5403743 | 0.0399003 | 0.6370529 | -0.1433968 | 0.2929302 | -0.1832970 | 0.2258693 | -0.2189719 | 0.1154754 | -0.1274798 | 0.2072804 | -0.4126192 | 0.1258257 |
2-oxoisocaproate | 0.3461730 | 0.0298305 | -0.0890765 | 0.5444433 | 0.1254247 | 0.3943196 | -0.3035777 | 0.2010185 | -0.4290024 | 0.1041750 | -0.3798475 | 0.2016945 | -0.1656184 | 0.4164677 | -0.7717058 | 0.1645504 |
tyrosine | 0.0284829 | 0.7856861 | -0.0493386 | 0.6139168 | -0.1288198 | 0.1899391 | 0.0301426 | 0.8480927 | 0.1589624 | 0.3633615 | -0.2428935 | 0.1442162 | -0.3225558 | 0.0649162 | -0.2814720 | 0.3417571 |
2-hydroxybutyrate | -0.2117327 | 0.1272228 | 0.0390258 | 0.7618445 | 0.0333535 | 0.7958006 | 0.0446981 | 0.8292779 | 0.0113446 | 0.9606360 | -0.2159417 | 0.2939933 | -0.2218526 | 0.2885596 | -0.3657880 | 0.4551843 |
glycine | -0.1613635 | 0.2326582 | -0.0259395 | 0.8365036 | -0.1583859 | 0.2099492 | 0.1065069 | 0.5989009 | 0.2648928 | 0.2395579 | -0.2748467 | 0.2229678 | -0.4075260 | 0.0907543 | -0.2942225 | 0.5072363 |
formate | 1.2291700 | 0.0000000 | 0.0126308 | 0.8630555 | -0.0167243 | 0.8195217 | 0.0419858 | 0.7218102 | 0.0587101 | 0.6535405 | -0.1323831 | 0.1576446 | -0.1618738 | 0.1284252 | -0.1914799 | 0.2754516 |
asparagine | 0.0080122 | 0.9523680 | 0.0168787 | 0.8927392 | -0.1598072 | 0.2040944 | 0.1935645 | 0.3379878 | 0.3533717 | 0.1162509 | -0.2310177 | 0.2647751 | -0.4079356 | 0.0883212 | -0.2055375 | 0.5926665 |
glutamine | -0.2476564 | 0.0003372 | -0.0072353 | 0.9074647 | -0.0909137 | 0.1467927 | 0.0764431 | 0.4462767 | 0.1673568 | 0.1344316 | -0.1305073 | 0.1160367 | -0.2143011 | 0.0324737 | -0.1220193 | 0.2749055 |
glycerol | -0.2363158 | 0.2527762 | -0.0186424 | 0.9227130 | -0.0353652 | 0.8541236 | -0.0019196 | 0.9950489 | 0.0334456 | 0.9223394 | -0.3991932 | 0.3619083 | -0.4162720 | 0.3455416 | -0.6145891 | 0.6107498 |
valine | -0.0412983 | 0.6386216 | 0.0077384 | 0.9248301 | -0.0103518 | 0.8996564 | 0.0258287 | 0.8449300 | 0.0361805 | 0.8048743 | -0.1546900 | 0.1701668 | -0.1729322 | 0.1522285 | -0.2356737 | 0.2873310 |
2-oxoisovalerate | 0.0607476 | 0.6484160 | 0.0114058 | 0.9268662 | 0.1359799 | 0.2758019 | -0.1131683 | 0.5719322 | -0.2491482 | 0.2630375 | -0.2346857 | 0.2574973 | -0.1103418 | 0.3823016 | -0.5093644 | 0.2830278 |
3-hydroxyisobutyrate | 0.5652168 | 0.0075870 | -0.0094720 | 0.9610123 | 0.3205106 | 0.1008869 | -0.3394546 | 0.2780552 | -0.6599652 | 0.0588473 | -0.3932201 | 0.3742761 | -0.0635966 | 0.7046178 | -0.9572717 | 0.2783625 |
Open code
if(file.exists('gitignore/lm_results/result_metabolite_VGdur_training.csv') == FALSE){
write.table(result_metabolite,
'gitignore/lm_results/result_metabolite_VGdur_training.csv',
row.names = FALSE)
}
7.3 Validation cohort
Open code
<- data_metabolite_valid2 %>%
data_analysis_metabolite filter(Diet == 'VEGAN',
!is.na(Diet_duration)) %>%
::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>%
dplyr::select(
dplyr
Diet_duration, Age,all_of(diet_sensitive_metabolites$outcome)
)
summary(data_analysis_metabolite[ , 1:11])
## Diet_duration Age formate phenylalanine
## Min. : 0.17 Min. :21.54 Min. :3.078e-06 Min. :1.480e-05
## 1st Qu.: 4.00 1st Qu.:31.56 1st Qu.:3.675e-06 1st Qu.:1.914e-05
## Median : 7.00 Median :34.17 Median :4.167e-06 Median :2.030e-05
## Mean : 7.19 Mean :34.57 Mean :4.149e-06 Mean :2.061e-05
## 3rd Qu.:10.00 3rd Qu.:37.22 3rd Qu.:4.458e-06 3rd Qu.:2.198e-05
## Max. :25.00 Max. :50.88 Max. :5.776e-06 Max. :3.111e-05
## histidine tyrosine glycerol
## Min. :8.295e-06 Min. :1.093e-05 Min. :1.851e-06
## 1st Qu.:1.041e-05 1st Qu.:1.569e-05 1st Qu.:4.560e-06
## Median :1.144e-05 Median :1.746e-05 Median :6.533e-06
## Mean :1.169e-05 Mean :1.764e-05 Mean :7.868e-06
## 3rd Qu.:1.285e-05 3rd Qu.:1.972e-05 3rd Qu.:9.496e-06
## Max. :1.813e-05 Max. :2.421e-05 Max. :5.184e-05
## glycine lysine asparagine
## Min. :3.435e-05 Min. :1.185e-05 Min. :2.839e-06
## 1st Qu.:6.676e-05 1st Qu.:1.719e-05 1st Qu.:5.024e-06
## Median :8.419e-05 Median :2.081e-05 Median :5.899e-06
## Mean :8.683e-05 Mean :2.201e-05 Mean :6.005e-06
## 3rd Qu.:1.024e-04 3rd Qu.:2.469e-05 3rd Qu.:6.876e-06
## Max. :1.421e-04 Max. :3.807e-05 Max. :1.028e-05
## 2-oxoisocaproate
## Min. :5.581e-06
## 1st Qu.:7.584e-06
## Median :8.774e-06
## Mean :9.136e-06
## 3rd Qu.:1.017e-05
## Max. :1.899e-05
7.3.0.1 Define number of metabolite and covariates
Open code
<- 2
n_covarites <- ncol(data_analysis_metabolite) - n_covarites n_features
7.3.0.2 Create empty objects
Open code
<- vector('double', n_features)
outcome <- vector('double', n_features)
logFD_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
logFD_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features) CI_U_VGduration
7.3.0.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
$outcome <- data_analysis_metabolite[, (i + n_covarites)]
data_analysis_metabolite
## fit model
<- lm(outcome ~ Diet_duration + Age,
model data = data_analysis_metabolite %>%
mutate(Diet_duration = (Diet_duration)/10,
outcome = trans_metabolite(outcome),
Age = (Age-33)/10))
## save results
<- names(data_analysis_metabolite)[i + n_covarites]
outcome[i]
## diet effect
<- confint(model)
tr
<- tr[which(row.names(tr) == "Diet_duration"), ][1]
CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]
CI_U_VGduration[i]
<- summary(model)$coefficients[
logFD_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
1
),
]
<- summary(model)$coefficients[
P_VGduration[i] which(
names(model$coefficients) == "Diet_duration"
4
),
]
<- summary(model)$coefficients[
logFD_Age[i] which(
names(model$coefficients) == "Age"
1
),
]
<- summary(model)$coefficients[
P_Age[i] which(
names(model$coefficients) == "Age"
4
),
]
}
7.3.0.4 Results table
Open code
<- data.frame(
result_metabolite_val
outcome,
logFD_VGduration, P_VGduration,
CI_L_VGduration, CI_U_VGduration,
logFD_Age, P_Age
)
::kable(result_metabolite_val %>% arrange(P_VGduration),
kableExtracaption = "Results of linear models estimating the effect of vegan diet status duration and age on log2-trasformed metabolite levels. Only metabolites metabolites with significantly different levels between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `logFD` denotes estimated effects (regression coefficient), i.e. expected change in log2 metabolite level per +10 years of vegan diet and age, respectively. `P`: p-value, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model"
)
outcome | logFD_VGduration | P_VGduration | CI_L_VGduration | CI_U_VGduration | logFD_Age | P_Age |
---|---|---|---|---|---|---|
valine | -0.2129785 | 0.0101981 | -0.3740814 | -0.0518756 | 0.1167345 | 0.0548426 |
glutamine | -0.1862766 | 0.0177533 | -0.3394330 | -0.0331202 | 0.0980499 | 0.0890129 |
lysine | -0.2883669 | 0.0178233 | -0.5256127 | -0.0511211 | 0.0840764 | 0.3435407 |
leucine | -0.2088764 | 0.0242676 | -0.3899040 | -0.0278487 | 0.1192279 | 0.0803475 |
glycine | 0.2873724 | 0.0291661 | 0.0298822 | 0.5448627 | -0.1421564 | 0.1415944 |
isoleucine | -0.1898327 | 0.0621485 | -0.3895297 | 0.0098643 | 0.0567564 | 0.4470274 |
2-hydroxybutyrate | -0.2585286 | 0.1101122 | -0.5769415 | 0.0598844 | 0.1207362 | 0.3110292 |
histidine | -0.0935354 | 0.1794716 | -0.2309640 | 0.0438932 | -0.0123405 | 0.8098482 |
2-oxoisovalerate | 0.1225618 | 0.2444741 | -0.0854190 | 0.3305426 | -0.0048163 | 0.9505117 |
3-hydroxyisobutyrate | 0.1733428 | 0.3357731 | -0.1828034 | 0.5294891 | 0.0445163 | 0.7377100 |
glycerol | 0.1667869 | 0.4642500 | -0.2844422 | 0.6180161 | -0.5576731 | 0.0013345 |
formate | 0.0411991 | 0.4950196 | -0.0783767 | 0.1607750 | -0.0127748 | 0.7746800 |
phenylalanine | 0.0329646 | 0.5615259 | -0.0795269 | 0.1454560 | 0.0425598 | 0.3120995 |
citrate | -0.0750249 | 0.6180556 | -0.3732167 | 0.2231669 | -0.0571242 | 0.6079397 |
tyrosine | -0.0243180 | 0.7274266 | -0.1626363 | 0.1140004 | 0.0738384 | 0.1550599 |
asparagine | 0.0205672 | 0.8156797 | -0.1543895 | 0.1955239 | 0.0684715 | 0.2958396 |
2-oxoisocaproate | 0.0135937 | 0.8932256 | -0.1872522 | 0.2144395 | 0.0170601 | 0.8199426 |
Open code
if (file.exists("gitignore/lm_results/result_metabolite_VGdur_valid.csv") == FALSE) {
write.table(result_metabolite_val,
"gitignore/lm_results/result_metabolite_VGdur_valid.csv",
row.names = FALSE
) }
7.4 Forest plot
7.4.1 Prepare data
Open code
## subset result tables
<- result_metabolite %>%
result_metabolite_subset filter(outcome %in% diet_sensitive_metabolites$outcome)
<- result_metabolite_val %>%
result_metabolite_val_subset filter(outcome %in% diet_sensitive_metabolites$outcome)
## create a data frame
<- data.frame(
data_forest outcome = rep(diet_sensitive_metabolites$outcome, 3),
beta = c(
$logFD_Diet_duration_inCZ,
result_metabolite_subset$logFD_Diet_duration_inIT,
result_metabolite_subset$logFD_VGduration
result_metabolite_val_subset
),lower = c(
$CI_L_Diet_duration_inCZ,
result_metabolite_subset$CI_L_Diet_duration_inIT,
result_metabolite_subset$CI_L_VGduration
result_metabolite_val_subset
),upper = c(
$CI_U_Diet_duration_inCZ,
result_metabolite_subset$CI_U_Diet_duration_inIT,
result_metabolite_subset$CI_U_VGduration
result_metabolite_val_subset
),dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
<- data_forest %>%
data_forest mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
%>% mutate(outcome = metabolite) %>% select(-metabolite),
elacoef by = 'outcome') %>%
mutate(outcome = factor(outcome, levels = validation_order))
7.4.1.1 Plotting
Open code
<- "forest_plot_metabolite_VGduration"
plotac <- "gitignore/figures"
path
<- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
colors
assign(plotac, ggplot(
aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
data_forest, +
) geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan duration (+10y) on log2 metabolite level",
x = "Outcome",
color = "Dataset"
+
) theme_minimal() +
coord_flip() +
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
$outcome
), data_forest
)+
) theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)
Diet_duration
, Country
, their interaction (Diet_duration x Country
), and Age
as fixed-effect predictors. Age
was included as it correlates with Diet_duration
and could act as a confounder. In the Czech validation cohort, Diet_duration
and Age
were fixed-effect predictors. Diet-sensitive metabolites are shown in bold, in the same order as in the forest plot of vegan–omnivore differencesOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
) }
8 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-6.1 lattice_0.22-5 permute_0.9-7
## [4] zCompositions_1.5.0-4 truncnorm_1.0-8 NADA_1.6-1.1
## [7] survival_3.7-0 MicrobiomeStat_1.2 glmnet_4.1-8
## [10] pROC_1.18.0 arm_1.12-2 lme4_1.1-35.5
## [13] Matrix_1.7-0 MASS_7.3-65 car_3.1-2
## [16] carData_3.0-5 emmeans_1.10.4 brms_2.21.0
## [19] Rcpp_1.0.13 rms_6.8-1 Hmisc_5.1-3
## [22] glmmTMB_1.1.9 ggtext_0.1.2 ggdist_3.3.2
## [25] cowplot_1.1.1 ggpubr_0.4.0 sjPlot_2.8.16
## [28] kableExtra_1.4.0 flextable_0.9.6 gtsummary_2.0.2
## [31] compositions_2.0-8 janitor_2.2.0 stringi_1.8.7
## [34] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
## [37] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
## [40] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.1
## [43] tidyverse_1.3.1 readxl_1.3.1 openxlsx_4.2.8
## [46] RJDBC_0.2-10 rJava_1.0-11 DBI_1.2.3
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.6 matrixStats_1.3.0 httr_1.4.2
## [4] insight_0.20.2 numDeriv_2016.8-1.1 tools_4.4.3
## [7] backports_1.5.0 utf8_1.2.6 sjlabelled_1.2.0
## [10] R6_2.6.1 mgcv_1.9-1 withr_3.0.2
## [13] Brobdingnag_1.2-7 prettyunits_1.2.0 gridExtra_2.3
## [16] bayesm_3.1-6 quantreg_5.98 cli_3.6.5
## [19] textshaping_0.3.6 performance_0.12.2 gt_0.11.0
## [22] officer_0.6.6 sandwich_3.0-1 sass_0.4.9
## [25] labeling_0.4.2 mvtnorm_1.1-3 robustbase_0.93-9
## [28] polspline_1.1.25 ggridges_0.5.3 askpass_1.1
## [31] QuickJSR_1.3.1 commonmark_1.9.1 systemfonts_1.0.4
## [34] StanHeaders_2.32.10 foreign_0.8-90 gfonts_0.2.0
## [37] svglite_2.1.3 rstudioapi_0.16.0 httpcode_0.3.0
## [40] generics_0.1.4 shape_1.4.6 distributional_0.4.0
## [43] zip_2.2.0 inline_0.3.19 loo_2.4.1
## [46] abind_1.4-5 lifecycle_1.0.4 multcomp_1.4-18
## [49] yaml_2.3.10 snakecase_0.11.1 grid_4.4.3
## [52] promises_1.2.0.1 crayon_1.5.3 haven_2.4.3
## [55] pillar_1.11.0 knitr_1.50 statip_0.2.3
## [58] boot_1.3-31 estimability_1.5.1 codetools_0.2-19
## [61] glue_1.7.0 V8_4.4.2 fontLiberation_0.1.0
## [64] data.table_1.15.4 vctrs_0.6.5 cellranger_1.1.0
## [67] gtable_0.3.0 assertthat_0.2.1 datawizard_0.12.2
## [70] xfun_0.52 mime_0.12 coda_0.19-4
## [73] modeest_2.4.0 timeDate_3043.102 iterators_1.0.14
## [76] statmod_1.4.36 TH.data_1.1-0 nlme_3.1-168
## [79] fontquiver_0.2.1 rstan_2.32.6 fBasics_4041.97
## [82] tensorA_0.36.2.1 TMB_1.9.14 rpart_4.1.24
## [85] colorspace_2.0-2 nnet_7.3-20 tidyselect_1.2.1
## [88] processx_3.8.4 timeSeries_4032.109 compiler_4.4.3
## [91] curl_6.4.0 rvest_1.0.2 htmlTable_2.4.0
## [94] SparseM_1.81 xml2_1.3.3 fontBitstreamVera_0.1.1
## [97] posterior_1.6.0 checkmate_2.3.2 scales_1.3.0
## [100] DEoptimR_1.0-10 callr_3.7.6 spatial_7.3-15
## [103] digest_0.6.37 minqa_1.2.4 rmarkdown_2.27
## [106] htmltools_0.5.8.1 pkgconfig_2.0.3 base64enc_0.1-3
## [109] stabledist_0.7-2 dbplyr_2.1.1 fastmap_1.2.0
## [112] rlang_1.1.6 htmlwidgets_1.6.4 shiny_1.9.1
## [115] farver_2.1.0 zoo_1.8-9 jsonlite_2.0.0
## [118] magrittr_2.0.3 Formula_1.2-4 bayesplot_1.8.1
## [121] munsell_0.5.0 gdtools_0.3.7 stable_1.1.6
## [124] plyr_1.8.6 pkgbuild_1.3.1 parallel_4.4.3
## [127] ggrepel_0.9.5 sjmisc_2.8.10 ggeffects_1.7.0
## [130] splines_4.4.3 gridtext_0.1.5 hms_1.1.3
## [133] sjstats_0.19.0 ps_1.7.7 uuid_1.0-3
## [136] markdown_1.13 ggsignif_0.6.3 stats4_4.4.3
## [139] rmutil_1.1.10 rstantools_2.1.1 crul_1.5.0
## [142] reprex_2.0.1 evaluate_1.0.4 RcppParallel_5.1.8
## [145] modelr_0.1.8 nloptr_2.0.0 tzdb_0.5.0
## [148] foreach_1.5.2 httpuv_1.6.5 cards_0.2.2
## [151] MatrixModels_0.5-3 openssl_1.4.6 clue_0.3-65
## [154] broom_1.0.6 xtable_1.8-4 rstatix_0.7.0
## [157] later_1.3.0 viridisLite_0.4.0 ragg_1.4.0
## [160] lmerTest_3.1-3 cluster_2.1.8.1 timechange_0.3.0
## [163] bridgesampling_1.1-2