Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations

Statistical report for microbiome analysis (SGB level, prevalence > 10% in training dataset)


Authors and affiliations

Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#


* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship

1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic


This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review

When using this code or data, cite the original publication:

TO BE ADDED

BibTex citation for the original publication:

TO BE ADDED


Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans

Statistical reports can be found on the reports hub.

Data analysis is described in detail in the statistical methods report.


1 Introduction

This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country and Diet, resulting in four distinct groups.

To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.

1.1 Statistical Methods

The statistical modeling approach is described in detail in this report. Briefly, the methods used included:

  • Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of diet, country, and their possible interaction (diet : country) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.

  • Linear models: Linear models were applied to estimate the effects of diet, country, and their interaction (diet:country) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).

  • Predictive models (elastic net): We employed elastic net logistic regression (via the glmnet R package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
    To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
    Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet

2 Initiation

2.1 Set home directory

Open code
setwd('/home/ticf/secured_data/GitRepo/ticf/478_MOCA_italian')

2.2 Upload initiation file

Open code
source('478_initiation.R')
plotuj <- FALSE

3 Data

3.1 Upload all original data

3.1.1 Training set

3.1.1.1 Connect metadata from lipidom table

Open code
training_metadata <- read.xlsx('gitignore/data/lipidome_training_cohort_rev.xlsx') %>% 
  select(Sample, Country, Diet, Diet_duration, Age) %>% 
  mutate(ID = Sample)

3.1.1.2 Connect training data

Open code
data_microbiome_original_raw  <- read.table(
  'gitignore/data/0_Data_Metaphlan4_SGB_subset.txt') 

colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]

data_microbiome_original_raw <- data_microbiome_original_raw %>% 
  t() %>% 
  data.frame()

colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]

data_microbiome_original_raw <- data_microbiome_original_raw[-1, ] %>% 
  left_join(training_metadata, by = 'ID') %>% 
  mutate(Data = 'valid', Sample = ID) %>% 
  select(Sample, Data, Diet, Country, Diet_duration, Age, everything()) %>% 
  select(-ID)

dim(data_microbiome_original_raw)
## [1] 166 686

3.1.2 Validation set

3.1.2.1 Get metadata from lipidom table

Open code
data_lipids_validation <- read.xlsx(
  'gitignore/data/lipidome_validation_cohort_rev.xlsx'
  ) %>% 
  mutate(ID = X1) %>% 
  select(ID, X2, Diet_duration, Age)

3.1.2.2 Connect validation data

Open code
data_microbiome_validation_raw <- read.table(
  'gitignore/data/0_Data_Metaphlan4_SGB_subset_validation.txt') 

colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]

data_microbiome_validation_raw <- data_microbiome_validation_raw %>% 
  t() %>% 
  data.frame()

colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]

data_microbiome_validation_raw <- data_microbiome_validation_raw[-1, ] %>% 
  mutate(
    ID= paste0("K", gsub("\\..*", "", trimws(ID)))) %>% 
  left_join(data_lipids_validation, by = 'ID') %>% 
  mutate(Data = 'valid', Sample = ID, Diet = X2) %>% 
  select(Sample, Data, Diet, Diet_duration, Age, everything()) %>% 
  select(-ID, -X2)

dim(data_microbiome_validation_raw)
## [1] 103 685

3.1.3 Get center-log transformed value

Open code

set.seed(478)

## select these with >0.1 prevalence in training and > 0 in validation datasets
names_shared <- intersect(
  colnames(
    data_microbiome_original_raw[, -c(1:6)] %>% 
      mutate(across(everything(), as.numeric)) %>% 
      select(where(~ mean(. != 0) > 0.1)) 
    ),
  colnames(
    data_microbiome_validation_raw[, -(1:5)] %>% 
      mutate(across(everything(), as.numeric)) %>% 
      select(where(~ mean(. != 0) > 0)) 
    )
  )

## Training data
metadata <- data_microbiome_original_raw[, c(
  "Sample", "Country", "Diet", "Diet_duration", "Age")]

bacteria_d <- data_microbiome_original_raw[, -c(1:6)] %>% 
  mutate(across(everything(), as.numeric)) %>% 
  select(where(~ mean(. != 0) > 0.1)) %>% 
  select(all_of(names_shared))

dim(data_microbiome_original_raw[, -c(1:6)])
## [1] 166 680
dim(bacteria_d)
## [1] 166 655

bacteria_data <- bacteria_d / rowSums(bacteria_d)
dim(bacteria_data)
## [1] 166 655

if(file.exists('gitignore/data_microbiome_SGB10_training_impCLR.csv') == FALSE){
    
bacteria_data <- lrSVD(bacteria_data, 
                        label = 0, 
                        dl = NULL,
                        z.warning = 0.9,
                        z.delete = FALSE,
                       ncp = 1)

clr_bacteria_data <- clr(bacteria_data)
data_microbiome_original <- cbind(metadata, clr_bacteria_data)

dim(data_microbiome_original)


  write.csv(data_microbiome_original ,
            'gitignore/data_microbiome_SGB10_training_impCLR.csv')
  
  saveRDS(data_microbiome_original ,
            'gitignore/data_microbiome_SGB10_training_impCLR.Rds')
  
  
}

data_microbiome_original <- readRDS(
  'gitignore/data_microbiome_SGB10_training_impCLR.Rds'
  )

data_microbiome_original[1:5, 1:10]
##   Sample Country  Diet Diet_duration      Age Alistipes_SGB2313|SGB2313
## 1    T36      CZ VEGAN           3.5 31.64932                 -1.727885
## 2    T47      CZ VEGAN           9.0 43.79178                 -2.003284
## 3    T55      CZ VEGAN          10.0 29.83014                 -2.198744
## 4    T59      CZ VEGAN           8.0 30.24658                 -1.172513
## 5    T60      CZ VEGAN           5.0 20.16986                 -1.453814
##   Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## 1                     -2.785048                     3.096468
## 2                      5.128612                     4.807561
## 3                      2.865400                     4.658582
## 4                     -2.229675                     5.286802
## 5                      6.341534                     8.960465
##   Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
## 1                                      7.429837                      3.509530
## 2                                      5.399752                      7.017465
## 3                                      4.575829                      4.056029
## 4                                      1.062247                      5.309661
## 5                                      6.657670                      8.961035

## Show variances of CLR proportions across samples
data_variance <- data_microbiome_original %>%
  rowwise() %>%   
  mutate(variance = var(c_across(-(Sample:Diet)))) %>%  
  ungroup() %>%       
  select(Sample, variance)  

## Look at distribution
hist(data_variance$variance)

Open code

## Validation data
metadata <- data_microbiome_validation_raw[, c(
  "Sample", "Data", "Diet", "Diet_duration", "Age")]

bacteria_d <- data_microbiome_validation_raw[, -(1:5)] %>% 
  mutate(across(everything(), as.numeric)) %>% 
  select(all_of(names_shared))

bacteria_data <- bacteria_d / rowSums(bacteria_d)
min(colSums(bacteria_data))
## [1] 7.748698e-06

if(file.exists('gitignore/data_microbiome_SGB10_validation_impCLR.csv') == FALSE){
bacteria_data <- lrSVD(bacteria_data, 
                        label = 0, 
                        dl = NULL,
                        z.warning = 0.9, 
                        z.delete = FALSE,
                       ncp = 1)



clr_bacteria_data <- clr(bacteria_data)
data_microbiome_validation <- cbind(metadata, clr_bacteria_data)

## Add Diet for K284 which has the diet missing
data_microbiome_validation[
  which(data_microbiome_validation$Sample == 'K284'), 'Diet'
  ] <- 'VEGAN'

data_microbiome_validation$`Wujia_chipingensis|SGB5111`

# data_microbiome_validation <- data_microbiome_validation %>% 
#   rename("Wujia_chipingenss|SGB5111" = "Wujia_chipingensis|SGB5111")

  write.csv(data_microbiome_validation,
            'gitignore/data_microbiome_SGB10_validation_impCLR.csv')
  
  saveRDS(data_microbiome_validation,
            'gitignore/data_microbiome_SGB10_validation_impCLR.Rds')
  
}

data_microbiome_validation <- readRDS(
  'gitignore/data_microbiome_SGB10_validation_impCLR.Rds'
  )

data_microbiome_validation[1:5, 1:9] %>% data.frame()
##   Sample  Data  Diet Diet_duration      Age Alistipes_SGB2313.SGB2313
## 1     K4 valid VEGAN          <NA> 30.98904                  1.558108
## 2     K5 valid VEGAN          <NA> 30.66575                  1.792161
## 3     K7 valid VEGAN          <NA> 32.97808                  1.996474
## 4    K12 valid VEGAN          <NA> 27.89315                  2.068428
## 5    K13 valid VEGAN             1 21.53699                  1.487370
##   Bacteroides_stercoris.SGB1830 Alistipes_putredinis.SGB2318
## 1                      3.234670                     5.044781
## 2                      2.872535                     6.346307
## 3                      4.617832                     6.151537
## 4                      4.744165                     0.735616
## 5                      6.294363                     5.927622
##   Candidatus_Cibionibacter_quicibialis.SGB15286
## 1                                      5.303553
## 2                                      6.421573
## 3                                      5.902311
## 4                                      6.398875
## 5                                      3.932408

dim(data_microbiome_validation)
## [1] 103 660

3.1.4 Merge training and validation dataset

Open code

common_microbiome <- intersect(
  colnames(data_microbiome_original), 
  colnames(data_microbiome_validation))[-c(1:4)]

tr1 <- data_microbiome_original %>% 
  mutate(Data = if_else(Country == 'CZ', 'CZ_tr', 'IT_tr')) %>% 
  select(Data, Diet, all_of(common_microbiome))

tr2 <- data_microbiome_validation %>% 
  mutate(Data = 'valid',
         Diet = Diet) %>% 
  select(Data, Diet, all_of(common_microbiome))

data_merged <- bind_rows(tr1, tr2)

3.2 Explore

3.2.0.1 Distributions - clr transformed

Open code
if (plotuj) {
  check <- data_microbiome_original %>%
    dplyr::select(-c(Sample:Diet)) %>%
    na.omit()


  size <- c(8, 7)
  par(mfrow = c(size[1], size[2]))
  par(mar = c(2, 1.5, 2, 0.5))
  set.seed(16)
  ran <- sample(1:ncol(check), size[1] * size[2], replace = FALSE)
  for (x in ran) {
    hist(check[, x],
      16,
      col = "blue",
      main = paste0(colnames(check)[x])
    )
  }
}

3.2.0.2 Taxon proportions accross groups

Open code
colo <- c("#329243", "#F9FFAF")

data_merged <- na.omit(data_merged)

if (plotuj) {
  outcomes <- common_microbiome[
    sample(
      1:length(common_microbiome), 35,
      replace = FALSE
    )
  ]

  boxplot_cond <- function(variable) {
    p <- ggboxplot(data_merged,
      x = "Diet",
      y = variable,
      fill = "Diet",
      tip.length = 0.15,
      palette = colo,
      outlier.shape = 1,
      lwd = 0.25,
      outlier.size = 0.8,
      facet.by = "Data",
      title = variable,
      ylab = "CLR(taxa proportion)"
    ) +

      theme(
        plot.title = element_text(size = 10),
        axis.title = element_text(size = 8),
        axis.text.y = element_text(size = 7),
        axis.text.x = element_blank(),
        axis.title.x = element_blank()
      )

    return(p)
  }

  # Plot all outcomes
  plots <- map(outcomes, boxplot_cond)

  # Create a matrix of plots
  plots_arranged <- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
  plots_arranged
}

4 Linear models across taxa

We will fit a feature-specific linear model where the CLR-transformed count of bacteria reads represents the outcome variable whereas country (Italy vs Czech), diet (vegan vs omnivore), and their interaction (country:diet) all represent fixed-effects predictors. So, each model has the following form

\[ CLR({N_i}) = \alpha + \beta_{1} \times country + \beta_{2} \times diet + \beta_{3} \times country:diet + \epsilon \] where \(N_i\) is read count of \(i\)-th bacteria taxa

The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary output as presenting the conditional effects of diet averaged across both countries and the conditional effects of country averaged across both diet groups. We will then use the emmeans package (Lenth 2024) to obtain specific estimates for the effect of diet in the Italian and Czech cohorts separately, still from a single model.

Taxa that will show a significant diet effect (average effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable these findings are (see external validation section).

Note that p-value for avg effects are the same as produced with car::Anova(model, type = 'III').

4.1 Select data

Open code
data_analysis <- data_microbiome_original %>%
  na.omit() %>%
  dplyr::mutate(
    Diet_VEGAN = as.numeric(
      dplyr::if_else(
        Diet == "VEGAN", 0.5, -0.5
      )
    ),
    Country_IT = as.numeric(
      dplyr::if_else(
        Country == "IT", 0.5, -0.5
      )
    )
  ) %>%
  dplyr::select(
    Sample,
    Country,
    Country_IT,
    Diet,
    Diet_VEGAN,
    dplyr::everything()
  )

summary(data_analysis[ , 1:12])
##     Sample            Country            Country_IT           Diet          
##  Length:165         Length:165         Min.   :-0.50000   Length:165        
##  Class :character   Class :character   1st Qu.:-0.50000   Class :character  
##  Mode  :character   Mode  :character   Median :-0.50000   Mode  :character  
##                                        Mean   :-0.02727                     
##                                        3rd Qu.: 0.50000                     
##                                        Max.   : 0.50000                     
##    Diet_VEGAN       Diet_duration         Age        Alistipes_SGB2313|SGB2313
##  Min.   :-0.50000   Min.   : 0.000   Min.   :18.25   Min.   :-4.0449          
##  1st Qu.:-0.50000   1st Qu.: 0.000   1st Qu.:28.08   1st Qu.:-1.7471          
##  Median : 0.50000   Median : 3.000   Median :33.06   Median :-1.3158          
##  Mean   : 0.08182   Mean   : 3.195   Mean   :35.87   Mean   :-0.6187          
##  3rd Qu.: 0.50000   3rd Qu.: 5.000   3rd Qu.:42.48   3rd Qu.:-0.7346          
##  Max.   : 0.50000   Max.   :20.000   Max.   :64.10   Max.   :10.2106          
##  Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
##  Min.   :-3.4103               Min.   :-4.062              
##  1st Qu.:-2.2880               1st Qu.: 4.937              
##  Median :-0.3814               Median : 7.344              
##  Mean   : 1.7524               Mean   : 5.925              
##  3rd Qu.: 5.7035               3rd Qu.: 8.330              
##  Max.   :11.5038               Max.   :12.221              
##  Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
##  Min.   :-0.6217                               Min.   :-2.378               
##  1st Qu.: 5.4061                               1st Qu.: 5.931               
##  Median : 6.5803                               Median : 7.437               
##  Mean   : 6.2568                               Mean   : 7.093               
##  3rd Qu.: 7.4855                               3rd Qu.: 8.472               
##  Max.   :10.4162                               Max.   :11.715

4.1.1 Define number of microbiome and covariates

Open code
n_covarites <- 7
n_features <- ncol(data_analysis) - n_covarites

4.1.2 Create empty objects

Open code
outcome <- vector('double', n_features)

est_VGdiet_inCZ <- vector('double', n_features)
est_VGdiet_inIT <- vector('double', n_features)
est_VGdiet_avg <- vector('double', n_features)

est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)


P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg <- vector('double', n_features)

P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)

CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg <- vector('double', n_features)

CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features)
CI_U_VGdiet_avg <- vector('double', n_features)

4.1.3 Estimate over outcomes

Open code
if(file.exists('gitignore/result_microbiome_SGB10.Rds') == FALSE){
for (i in 1:n_features) {
  
  ## define variable
  data_analysis$outcome <- data_analysis[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis)

  ## get contrast (effects of diet BY COUNTRY)
  contrast_emm <- summary(
    pairs(
      emmeans(
        model,
        specs = ~ Diet_VEGAN | Country_IT
        ),
      interaction = TRUE,
      adjust = "none"
      ),
    infer = c(TRUE, TRUE)
    )

  ## save results
  outcome[i] <- names(data_analysis)[i + n_covarites]
  
  ## country effect
  est_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 1
  ]

  P_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 4
  ]
  
  
  ## diet effect
  tr <- confint(model)
  
  CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
  CI_U_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
  
  est_VGdiet_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 1
  ]

  P_VGdiet_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 4
  ]
  
  est_VGdiet_inCZ[i] <- -contrast_emm[1,3]
  P_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
  CI_L_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
  CI_U_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
  
  est_VGdiet_inIT[i] <- -contrast_emm[2,3]
  P_VGdiet_inIT[i] <- contrast_emm$p.value[2]
  CI_L_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
  CI_U_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
  
  ## interaction
  diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_VEGAN"
    ), 1
  ]

  P_diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_VEGAN"
    ), 4
  ]
}

  result_microbiome <- data.frame(
  outcome,
  est_ITcountry_avg, P_ITcountry_avg,
  est_VGdiet_avg, P_VGdiet_avg,
  est_VGdiet_inCZ, P_VGdiet_inCZ,
  est_VGdiet_inIT, P_VGdiet_inIT,
  diet_country_int, P_diet_country_int,
  CI_L_VGdiet_avg, CI_U_VGdiet_avg,
  CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
  CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)
  
  saveRDS(result_microbiome, 
              'gitignore/result_microbiome_SGB10.Rds')
}

result_microbiome <- readRDS('gitignore/result_microbiome_SGB10.Rds')

4.1.4 Adjust p values

Open code
result_microbiome <- result_microbiome %>% 
  dplyr::mutate(
    fdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
    fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
    
    fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
    fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
    fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
  ) %>% 
  dplyr::select(
    outcome,
    est_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
    est_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
    est_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
    est_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
    diet_country_int, P_diet_country_int, fdr_diet_country_int,
    CI_L_VGdiet_avg, CI_U_VGdiet_avg,
    CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
    CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
  )

4.1.5 Show and save results

Open code
kableExtra::kable(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05),
  caption = "Results of linear models, modeling the CLR-transformed relative abundance of a given taxon as the outcome, with `Diet`, `Country`, and `Diet x Country` interaction as fixed-effect predictors. Only taxa whose CLR-transformed relative abundance differs between diets (FDR < 0.05, average diet effect across both countries) are shown. `est` prefix: denotes estimated effects (regression coefficients), i.e., how much CLR-transformed relative abundance differs in vegans compared to omnivores and in the Italian cohort vs. the Czech cohort, respectively; `P`: p-value; `fdr`: p-value after adjustment for multiple comparisons; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix denotes effects averaged across subgroups, whereas `inCZ` and `inIT` denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effects of a vegan diet between the Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model")
Results of linear models, modeling the CLR-transformed relative abundance of a given taxon as the outcome, with Diet, Country, and Diet x Country interaction as fixed-effect predictors. Only taxa whose CLR-transformed relative abundance differs between diets (FDR < 0.05, average diet effect across both countries) are shown. est prefix: denotes estimated effects (regression coefficients), i.e., how much CLR-transformed relative abundance differs in vegans compared to omnivores and in the Italian cohort vs. the Czech cohort, respectively; P: p-value; fdr: p-value after adjustment for multiple comparisons; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. avg suffix denotes effects averaged across subgroups, whereas inCZ and inIT denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as diet_country_int (difference in the effects of a vegan diet between the Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and P_diet_country_int (its p-value). All estimates in a single row are based on a single model
outcome est_ITcountry_avg P_ITcountry_avg fdr_ITcountry_avg est_VGdiet_avg P_VGdiet_avg fdr_VGdiet_avg est_VGdiet_inCZ P_VGdiet_inCZ fdr_VGdiet_inCZ est_VGdiet_inIT P_VGdiet_inIT fdr_VGdiet_inIT diet_country_int P_diet_country_int fdr_diet_country_int CI_L_VGdiet_avg CI_U_VGdiet_avg CI_L_VGdiet_inCZ CI_U_VGdiet_inCZ CI_L_VGdiet_inIT CI_U_VGdiet_inIT
GGB1420_SGB1957|SGB1957 1.4319289 0.0000546 0.0003030 -1.4396187 0.0000500 0.0022745 -0.8400132 0.0880130 0.3116857 -2.0392242 0.0000471 0.0077123 -1.1992110 0.0844620 0.5703361 -2.1216742 -0.7575632 -1.8064948 0.1264684 -3.0018832 -1.0765653
Escherichia_coli|SGB10068 0.2851163 0.5382752 0.6345904 -1.7882458 0.0001589 0.0041626 -2.0426512 0.0021561 0.0371445 -1.5338403 0.0199463 0.2214377 0.5088109 0.5828682 0.9328661 -2.7011864 -0.8753052 -3.3363000 -0.7490025 -2.8223724 -0.2453082
Bacteroides_clarus|SGB1832 1.9328621 0.0001797 0.0008289 -1.9983796 0.0001100 0.0031989 -1.8221069 0.0116575 0.1004691 -2.1746522 0.0026162 0.0795584 -0.3525453 0.7269701 0.9805656 -2.9936247 -1.0031345 -3.2323822 -0.4118317 -3.5793496 -0.7699548
Hydrogenoanaerobacterium_saccharovorans|SGB15350 1.5044079 0.0014215 0.0049448 -1.4631658 0.0019003 0.0214598 -1.4945391 0.0241604 0.1536413 -1.4317926 0.0300247 0.2731417 0.0627465 0.9461059 0.9892080 -2.3782786 -0.5480531 -2.7912658 -0.1978123 -2.7233906 -0.1401947
GGB9775_SGB15395|SGB15395 1.6466070 0.0000162 0.0001063 -1.1890855 0.0015990 0.0193953 -0.8940011 0.0904025 0.3149663 -1.4841700 0.0051047 0.1156461 -0.5901689 0.4267652 0.9030772 -1.9204687 -0.4577023 -1.9303806 0.1423785 -2.5164505 -0.4518895
Ruthenibacterium_lactatiformans|SGB15271 0.9242175 0.0034791 0.0105157 -0.9667478 0.0022687 0.0244058 -1.7480579 0.0001129 0.0050824 -0.1854377 0.6738746 0.9552084 1.5626201 0.0131614 0.3918458 -1.5821555 -0.3513401 -2.6200987 -0.8760171 -1.0540295 0.6831540
Lawsonibacter_asaccharolyticus|SGB15154 2.7259950 0.0000000 0.0000000 -1.5993741 0.0000231 0.0016836 -2.0267164 0.0001416 0.0056449 -1.1720318 0.0249317 0.2401510 0.8546846 0.2457736 0.7988807 -2.3238165 -0.8749317 -3.0532608 -1.0001720 -2.1945160 -0.1495476
Clostridium_fessum|SGB4705 -1.6111790 0.0002626 0.0011545 -1.3504185 0.0020870 0.0231696 -1.5214941 0.0138872 0.1095920 -1.1793429 0.0546582 0.3663651 0.3421512 0.6924007 0.9805656 -2.2028851 -0.4979519 -2.7294503 -0.3135378 -2.3825215 0.0238357
GGB9707_SGB15229|SGB15229 0.6838604 0.0414042 0.0844852 0.9496097 0.0048726 0.0451440 0.7241947 0.1263881 0.3782276 1.1750247 0.0133173 0.1691073 0.4508300 0.4989489 0.9032270 0.2927318 1.6064876 -0.2066099 1.6549993 0.2479016 2.1021478
Collinsella_aerofaciens|SGB14535 0.1541214 0.5211416 0.6217628 -0.7191625 0.0031260 0.0303141 -0.7026985 0.0401537 0.2104052 -0.7356265 0.0311310 0.2780840 -0.0329280 0.9453232 0.9892080 -1.1925098 -0.2458152 -1.3734378 -0.0319592 -1.4037130 -0.0675401
GGB9509_SGB14906|SGB14906 1.1273207 0.0069225 0.0186593 -1.8738306 0.0000106 0.0011605 -1.9308514 0.0011639 0.0245923 -1.8168099 0.0021176 0.0762611 0.1140415 0.8901160 0.9882489 -2.6875900 -1.0600713 -3.0839590 -0.7777438 -2.9653567 -0.6682631
GGB45432_SGB63101|SGB63101 0.7722693 0.0046470 0.0132918 -0.8741661 0.0014079 0.0179357 -1.0922149 0.0047246 0.0644715 -0.6561174 0.0859067 0.4768551 0.4360974 0.4188344 0.9030772 -1.4054316 -0.3429007 -1.8450249 -0.3394048 -1.4059499 0.0937151
GGB2653_SGB3574|SGB3574 0.0669449 0.8659025 0.9045712 -1.6310665 0.0000603 0.0022745 -2.1040281 0.0002450 0.0083391 -1.1581050 0.0397597 0.3124007 0.9459230 0.2338711 0.7988807 -2.4127083 -0.8494248 -3.2116246 -0.9964315 -2.2613208 -0.0548892
Phocea_massiliensis|SGB14837 0.5351058 0.0683260 0.1282337 -1.1542532 0.0001129 0.0031989 -1.2673611 0.0025342 0.0381311 -1.0411452 0.0123717 0.1652544 0.2262159 0.6985961 0.9805656 -1.7300758 -0.5784306 -2.0833093 -0.4514130 -1.8538662 -0.2284243
Guopingia_tenuis|SGB14127 1.9935885 0.0000000 0.0000000 -0.7391309 0.0050783 0.0455657 -0.7007529 0.0591097 0.2547162 -0.7775089 0.0357602 0.3041940 -0.0767560 0.8829099 0.9866642 -1.2529025 -0.2253592 -1.4287740 0.0272682 -1.5026505 -0.0523672
Lachnospiraceae_bacterium|SGB4953 -1.0338702 0.0219379 0.0502425 1.7643283 0.0001172 0.0031989 1.7752418 0.0056705 0.0675301 1.7534148 0.0060765 0.1159554 -0.0218269 0.9805437 0.9969953 0.8819635 2.6466931 0.5249193 3.0255642 0.5080377 2.9987920
Holdemania_filiformis|SGB4046 -0.7503859 0.0491812 0.0967377 -1.4296652 0.0002237 0.0056347 -2.3439700 0.0000223 0.0014449 -0.5153605 0.3362691 0.7701268 1.8286095 0.0168572 0.3996216 -2.1773279 -0.6820026 -3.4034177 -1.2845222 -1.5706179 0.5398969
Anaeromassilibacillus_senegalensis|SGB14894 1.0571683 0.0084418 0.0222063 -1.4342467 0.0003972 0.0081294 -1.7427021 0.0022684 0.0371445 -1.1257914 0.0458831 0.3372370 0.6169107 0.4376640 0.9030772 -2.2171229 -0.6513706 -2.8520479 -0.6333563 -2.2307495 -0.0208333
GGB3061_SGB4063|SGB4063 -0.1784761 0.5484410 0.6437793 -1.1724902 0.0001164 0.0031989 -2.3914873 0.0000001 0.0000078 0.0465070 0.9117339 0.9932053 2.4379943 0.0000635 0.0138704 -1.7585810 -0.5863994 -3.2219857 -1.5609890 -0.7807066 0.8737205
Ruminococcus_sp_AF41_9|SGB25497 -0.7637983 0.0561315 0.1087755 1.6855251 0.0000367 0.0021874 2.5935991 0.0000081 0.0006349 0.7774511 0.1672159 0.6072159 -1.8161480 0.0234799 0.4250209 0.9015151 2.4695351 1.4826467 3.7045515 -0.3291073 1.8840094
Eubacterium_ramulus|SGB4959 -2.2415625 0.0000001 0.0000011 -1.5009605 0.0002730 0.0061653 -1.9091394 0.0010384 0.0234530 -1.0927815 0.0566446 0.3710223 0.8163579 0.3129926 0.8506647 -2.2973719 -0.7045490 -3.0376648 -0.7806140 -2.2168434 0.0312803
GGB3000_SGB3991|SGB3991 -0.0498718 0.8832405 0.9182897 -0.9667842 0.0049219 0.0451440 -0.6402335 0.1845367 0.4586039 -1.2933349 0.0076151 0.1246977 -0.6531014 0.3369152 0.8525047 -1.6363300 -0.2972384 -1.5889887 0.3085217 -2.2383376 -0.3483323
Youxingia_wuxianensis|SGB82503 0.3466622 0.3249701 0.4335141 -1.1371719 0.0014584 0.0180232 -1.8545785 0.0002674 0.0083391 -0.4197654 0.3982314 0.8100670 1.4348131 0.0426614 0.5291961 -1.8305594 -0.4437844 -2.8371176 -0.8720393 -1.3984184 0.5588876
Blautia_massiliensis|SGB4826 -3.8343857 0.0000000 0.0000000 -1.9262500 0.0005584 0.0104528 -1.9864840 0.0112976 0.0986659 -1.8660161 0.0167667 0.1996761 0.1204679 0.9124537 0.9892080 -3.0064692 -0.8460308 -3.5171687 -0.4557992 -3.3906467 -0.3413855
GGB38744_SGB14842|SGB14842 0.6031028 0.0173567 0.0411908 -1.0940310 0.0000231 0.0016836 -0.5164323 0.1482621 0.4159522 -1.6716296 0.0000051 0.0017026 -1.1551972 0.0226017 0.4250209 -1.5894790 -0.5985829 -1.2184887 0.1856240 -2.3709091 -0.9723500
Dorea_formicigenerans|SGB4575 -3.7022219 0.0000000 0.0000000 -0.9355114 0.0027729 0.0279419 -1.3628513 0.0021159 0.0371445 -0.5081715 0.2439269 0.7132685 0.8546798 0.1670255 0.7047430 -1.5434766 -0.3275462 -2.2243460 -0.5013567 -1.3662587 0.3499158
GGB9765_SGB15382|SGB15382 1.0673469 0.0011491 0.0040683 -1.0948838 0.0008613 0.0137595 -0.8888946 0.0534306 0.2397056 -1.3008730 0.0048154 0.1156461 -0.4119784 0.5237822 0.9032270 -1.7315652 -0.4582025 -1.7910804 0.0132911 -2.1994905 -0.4022555
GGB9770_SGB15390|SGB15390 1.1985452 0.0018894 0.0062189 -1.1558196 0.0027049 0.0276831 -1.2498933 0.0213176 0.1410406 -1.0617458 0.0490796 0.3494251 0.1881474 0.8044726 0.9863326 -1.9050092 -0.4066299 -2.3115048 -0.1882818 -2.1191585 -0.0043332
GGB34900_SGB14891|SGB14891 0.2152475 0.4304178 0.5474246 -0.9159150 0.0009616 0.0143146 -0.9716952 0.0127697 0.1058753 -0.8601348 0.0265929 0.2488338 0.1115605 0.8379517 0.9863326 -1.4536634 -0.3781666 -1.7336918 -0.2096987 -1.6191174 -0.1011521
Mediterraneibacter_faecis|SGB4563 -2.7423588 0.0000000 0.0000000 -1.2812291 0.0016745 0.0197177 -1.4053831 0.0143829 0.1121521 -1.1570751 0.0424526 0.3196146 0.2483080 0.7571525 0.9839977 -2.0727745 -0.4896837 -2.5270133 -0.2837529 -2.2742690 -0.0398812
Ruminococcus_torques|SGB4608 -2.4739912 0.0000004 0.0000037 -3.2736948 0.0000000 0.0000000 -4.5057027 0.0000000 0.0000001 -2.0416869 0.0023286 0.0762611 2.4640158 0.0092320 0.3779338 -4.1969767 -2.3504129 -5.8140053 -3.1974001 -3.3448149 -0.7385589
Clostridium_sp_AM29_11AC|SGB4701 -0.4618260 0.0573963 0.1105723 -0.8479677 0.0005726 0.0104528 -1.1204428 0.0012852 0.0261310 -0.5754926 0.0929931 0.4953435 0.5449502 0.2604760 0.8120367 -1.3244701 -0.3714652 -1.7956530 -0.4452325 -1.2480322 0.0970471
Lachnoclostridium_sp_An138|SGB4774 0.6048418 0.0372379 0.0782846 -0.8925786 0.0022861 0.0244058 -0.5215291 0.2030198 0.4858377 -1.2636282 0.0022183 0.0762611 -0.7420991 0.1993789 0.7592625 -1.4612115 -0.3239458 -1.3272892 0.2842311 -2.0662014 -0.4610550
GGB9782_SGB15403|SGB15403 -0.0300356 0.9226094 0.9457107 -1.0147510 0.0012421 0.0166385 -1.0541359 0.0170846 0.1257348 -0.9753662 0.0265480 0.2488338 0.0787697 0.8986354 0.9882489 -1.6243573 -0.4051447 -1.9179560 -0.1903158 -1.8357697 -0.1149627
Agathobaculum_butyriciproducens|SGB14991 -0.2565420 0.6019700 0.6881158 1.6523975 0.0009534 0.0143146 1.5020445 0.0323041 0.1793149 1.8027505 0.0101347 0.1443100 0.3007060 0.7597842 0.9847527 0.6829756 2.6218193 0.1283611 2.8757279 0.4345002 3.1710007
Clostridium_SGB48024|SGB48024 0.3647124 0.4433406 0.5595146 1.6969981 0.0004616 0.0091613 1.7940744 0.0084181 0.0810863 1.5999218 0.0180776 0.2077339 -0.1941526 0.8381881 0.9863326 0.7597490 2.6342472 0.4659802 3.1221686 0.2770804 2.9227631
GGB9301_SGB14262|SGB14262 -0.6057143 0.0087942 0.0230408 -0.7214969 0.0018892 0.0214598 -1.0116558 0.0021015 0.0371445 -0.4313379 0.1827167 0.6245987 0.5803179 0.2057198 0.7662822 -1.1724873 -0.2705064 -1.6507152 -0.3725965 -1.0678696 0.2051938
GGB9531_SGB14932|SGB14932 0.8221633 0.0125130 0.0315231 -1.2192043 0.0002504 0.0058583 -1.1578934 0.0130516 0.1067372 -1.2805152 0.0059578 0.1159554 -0.1226219 0.8508408 0.9863326 -1.8620416 -0.5763670 -2.0688022 -0.2469845 -2.1878213 -0.3732092
Oscillibacter_valericigenes|SGB15076 2.2580757 0.0000000 0.0000003 -1.5899899 0.0000625 0.0022745 -0.6063213 0.2701780 0.5556680 -2.5736585 0.0000052 0.0017026 -1.9673372 0.0119123 0.3918458 -2.3536818 -0.8262980 -1.6884827 0.4758401 -3.6515397 -1.4957772
GGB9781_SGB66170|SGB66170 0.6273651 0.0125706 0.0315469 -0.8169176 0.0012447 0.0166385 -0.2531851 0.4732767 0.7469788 -1.3806502 0.0001232 0.0161440 -1.1274651 0.0246577 0.4250209 -1.3077715 -0.3260638 -0.9487314 0.4423612 -2.0734455 -0.6878549
GGB9616_SGB15052|SGB15052 0.2265227 0.5569800 0.6503064 1.4021170 0.0003630 0.0076691 1.1112031 0.0432360 0.2161802 1.6930308 0.0021665 0.0762611 0.5818277 0.4508340 0.9032270 0.6420630 2.1621710 0.0341967 2.1882096 0.6202842 2.7657775
GGB9719_SGB15273|SGB15273 0.8922234 0.0029905 0.0093345 -1.0846079 0.0003359 0.0073333 -0.8610199 0.0416881 0.2128998 -1.3081959 0.0020650 0.0762611 -0.4471760 0.4510845 0.9032270 -1.6690874 -0.5001285 -1.6892350 -0.0328049 -2.1331353 -0.4832566
GGB58158_SGB79798|SGB79798 0.1593948 0.6041104 0.6893594 -0.9500524 0.0023102 0.0244058 -1.4215299 0.0013165 0.0261310 -0.4785750 0.2707371 0.7480135 0.9429550 0.1263283 0.6652629 -1.5559443 -0.3441606 -2.2800867 -0.5629732 -1.3337359 0.3765860
Enterocloster_hominis|SGB4721 -0.3276233 0.3731204 0.4858725 -1.7193896 0.0000059 0.0009605 -2.2601527 0.0000243 0.0014449 -1.1786265 0.0241348 0.2389575 1.0815263 0.1423888 0.6732469 -2.4438001 -0.9949791 -3.2866519 -1.2336535 -2.2010657 -0.1561873
Streptococcus_thermophilus|SGB8002 -1.8386887 0.0000000 0.0000001 -2.6383317 0.0000000 0.0000000 -4.3108644 0.0000000 0.0000000 -0.9657989 0.0223330 0.2321919 3.3450656 0.0000001 0.0000492 -3.2240986 -2.0525647 -5.1409039 -3.4808250 -1.7925554 -0.1390424
Blautia_obeum|SGB4810 -1.1383366 0.0029791 0.0093345 1.5935042 0.0000404 0.0022062 2.4572472 0.0000087 0.0006349 0.7297612 0.1726515 0.6179602 -1.7274860 0.0234188 0.4250209 0.8481024 2.3389060 1.4010031 3.5134913 -0.3223053 1.7818277
GGB6601_SGB9333|SGB9333 0.4822516 0.2651432 0.3710871 -1.3060574 0.0028644 0.0284265 -1.7229312 0.0054173 0.0667807 -0.8891836 0.1460214 0.5761689 0.8337476 0.3351856 0.8525047 -2.1577352 -0.4543795 -2.9297698 -0.5160925 -2.0912490 0.3128818
GGB51884_SGB49168|SGB49168 0.2416861 0.4524916 0.5643997 1.0547797 0.0012444 0.0166385 1.0263175 0.0253612 0.1573078 1.0832419 0.0179329 0.2077339 0.0569244 0.9294397 0.9892080 0.4210183 1.6885411 0.1282693 1.9243657 0.1887457 1.9777381
Lachnospiraceae_bacterium_OF09_6|SGB4966 -0.5582422 0.0094973 0.0246855 0.8470612 0.0001027 0.0031989 1.1090302 0.0003173 0.0090367 0.5850921 0.0529900 0.3663651 -0.5239381 0.2197875 0.7798190 0.4271049 1.2670174 0.5139467 1.7041137 -0.0076377 1.1778219
GGB3740_SGB5076|SGB5076 -0.7648294 0.0481240 0.0955189 1.3485531 0.0005786 0.0104528 1.9636392 0.0004111 0.0112206 0.7334671 0.1779202 0.6231964 -1.2301720 0.1112173 0.6121928 0.5901109 2.1069954 0.8889166 3.0383617 -0.3370047 1.8039390
GGB4596_SGB6358|SGB6358 -0.0686126 0.8863782 0.9200915 1.6018760 0.0010368 0.0150910 2.2961170 0.0009099 0.0212847 0.9076349 0.1816979 0.6245987 -1.3884821 0.1495398 0.6849552 0.6551087 2.5486432 0.9545354 3.6376986 -0.4286404 2.2439103
Roseburia_lenta|SGB4957 -1.2101317 0.0000273 0.0001701 1.5546362 0.0000001 0.0000253 2.5269764 0.0000000 0.0000004 0.5822959 0.1428125 0.5761689 -1.9446805 0.0006665 0.0630274 1.0013719 2.1079004 1.7429937 3.3109591 -0.1985860 1.3631778
GGB3288_SGB4342|SGB4342 -0.2254480 0.5526037 0.6475053 1.3280069 0.0005905 0.0104528 1.6574008 0.0023775 0.0379599 0.9986130 0.0636274 0.4007304 -0.6587878 0.3858695 0.8722610 0.5798854 2.0761284 0.5973029 2.7174987 -0.0572920 2.0545181
Clostridium_methylpentosum|SGB14962 -0.2057003 0.1942039 0.2852097 0.5040337 0.0016858 0.0197177 0.6029955 0.0077422 0.0777972 0.4050719 0.0707806 0.4367662 -0.1979236 0.5314189 0.9088234 0.1924399 0.8156275 0.1614630 1.0445280 -0.0347143 0.8448581
Clostridium_SGB4751|SGB4751 -0.6597956 0.0123001 0.0311063 0.8542261 0.0012805 0.0167745 1.5526183 0.0000432 0.0023606 0.1558339 0.6723502 0.9552084 -1.3967843 0.0081258 0.3779338 0.3396190 1.3688332 0.8234133 2.2818233 -0.5704869 0.8821548
GGB4604_SGB6371|SGB6371 -0.4755155 0.1600789 0.2449806 1.1076774 0.0012408 0.0166385 1.8861285 0.0001164 0.0050824 0.3292262 0.4897372 0.8628944 -1.5569024 0.0221334 0.4250209 0.4423075 1.7730472 0.9432908 2.8289663 -0.6098825 1.2683348
GGB9545_SGB14952|SGB14952 -0.2047745 0.4620639 0.5688945 -0.9693579 0.0006237 0.0107498 -0.9846295 0.0133625 0.1067372 -0.9540863 0.0160424 0.1982593 0.0305432 0.9562227 0.9920937 -1.5178948 -0.4208210 -1.7619134 -0.2073455 -1.7282959 -0.1798766
GGB3363_SGB4448|SGB4448 -0.0072223 0.9811593 0.9887067 0.8734580 0.0047915 0.0451440 1.1625032 0.0079741 0.0779558 0.5844128 0.1769958 0.6231964 -0.5780904 0.3452676 0.8525047 0.2704404 1.4764756 0.3080194 2.0169870 -0.2666914 1.4355170
GGB9634_SGB15097|SGB15097 0.6235922 0.0993983 0.1709022 1.3089666 0.0006478 0.0108798 1.0321693 0.0546291 0.2401482 1.5857639 0.0032679 0.0891857 0.5535946 0.4630128 0.9032270 0.5659230 2.0520101 -0.0207332 2.0850717 0.5370259 2.6345019
Clostridiaceae_unclassified_SGB4771|SGB4771 -0.0644846 0.8928803 0.9239125 1.9753772 0.0000578 0.0022745 1.6075161 0.0188364 0.1316514 2.3432383 0.0006624 0.0516373 0.7357222 0.4427805 0.9032270 1.0311990 2.9195555 0.2696032 2.9454291 1.0106171 3.6758596
GGB3627_SGB4906|SGB4906 -0.1978070 0.5137310 0.6197025 0.9810423 0.0014239 0.0179357 0.4893157 0.2549213 0.5369277 1.4727690 0.0007095 0.0516373 0.9834533 0.1056955 0.6044502 0.3841932 1.5778915 -0.3564274 1.3350588 0.6303710 2.3151670
GGB4682_SGB6472|SGB6472 -0.3788127 0.1653363 0.2515762 1.1201390 0.0000602 0.0022745 1.4981716 0.0001465 0.0056449 0.7421063 0.0548149 0.3663651 -0.7560653 0.1662035 0.7047430 0.5833697 1.6569083 0.7375625 2.2587807 -0.0154944 1.4997071
GGB3491_SGB4665|SGB4665 -1.4245623 0.0002684 0.0011633 1.1811718 0.0023606 0.0245432 1.5457577 0.0048927 0.0654019 0.8165859 0.1321344 0.5620002 -0.7291719 0.3416679 0.8525047 0.4262265 1.9361171 0.4759904 2.6155251 -0.2489503 1.8821221
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -0.5399399 0.1494476 0.2336234 1.5007056 0.0000873 0.0030084 1.8062837 0.0007950 0.0192866 1.1951275 0.0244430 0.2389575 -0.6111562 0.4135852 0.9029943 0.7645336 2.2368776 0.7631183 2.8494491 0.1560881 2.2341670
GGB9616_SGB15051|SGB15051 0.0855806 0.8487542 0.8923500 1.5428945 0.0007315 0.0119776 1.1902810 0.0626233 0.2614302 1.8955080 0.0031534 0.0891857 0.7052270 0.4324207 0.9030772 0.6581212 2.4276678 -0.0634543 2.4440163 0.6467315 3.1442846
Clostridium_sp_AF32_12BH|SGB4710 -0.5495881 0.0149283 0.0364852 0.9347848 0.0000468 0.0022745 1.6313134 0.0000007 0.0000691 0.2382561 0.4508937 0.8462332 -1.3930574 0.0021545 0.1763958 0.4936932 1.3758763 1.0062809 2.2563459 -0.3843043 0.8608165
Butyricicoccus_intestinisimiae|SGB14980 -0.0672104 0.8306405 0.8769598 1.1788153 0.0002397 0.0058146 1.4023294 0.0019194 0.0359211 0.9553013 0.0324590 0.2834754 -0.4470281 0.4772281 0.9032270 0.5592517 1.7983790 0.5243996 2.2802592 0.0808439 1.8297587
Kineothrix_SGB4886|SGB4886 -0.2963857 0.4607430 0.5683364 1.8346091 0.0000094 0.0011605 2.1206344 0.0002616 0.0083391 1.5485839 0.0068939 0.1177032 -0.5720505 0.4765385 0.9032270 1.0430109 2.6262074 0.9989293 3.2423394 0.4313154 2.6658524
Blautia_SGB4833|SGB4833 -0.5845526 0.0142412 0.0350676 0.7945741 0.0009460 0.0143146 1.2653662 0.0002161 0.0078619 0.3237820 0.3322600 0.7701268 -0.9415842 0.0476409 0.5291961 0.3287364 1.2604117 0.6052681 1.9254642 -0.3337053 0.9812692
GGB3490_SGB4664|SGB4664 -1.0161094 0.0017769 0.0059080 0.9107708 0.0049624 0.0451440 1.6723066 0.0003049 0.0090367 0.1492350 0.7412832 0.9655910 -1.5230716 0.0183871 0.4152943 0.2794101 1.5421315 0.7776602 2.5669530 -0.7418729 1.0403429
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 -0.5586082 0.0165850 0.0397919 0.9909003 0.0000301 0.0019743 1.7004673 0.0000006 0.0000649 0.2813332 0.3889158 0.8021211 -1.4191341 0.0024706 0.1798027 0.5352638 1.4465368 1.0548245 2.3461102 -0.3617560 0.9244224
Lactococcus_lactis|SGB7985 -1.3914763 0.0000000 0.0000000 -0.8685948 0.0000166 0.0015579 -1.7399436 0.0000000 0.0000005 0.0027540 0.9920545 0.9978148 1.7426976 0.0000157 0.0051437 -1.2549385 -0.4822512 -2.2873977 -1.1924896 -0.5425347 0.5480427
GGB4569_SGB6312|SGB6312 -0.4496405 0.1512086 0.2357020 0.9348036 0.0031471 0.0303141 1.0716343 0.0163903 0.1219959 0.7979729 0.0716452 0.4367662 -0.2736614 0.6613542 0.9762868 0.3190793 1.5505280 0.1991449 1.9441238 -0.0710657 1.6670115
Open code

if(file.exists('gitignore/lm_results/result_microbiome_SGB10.csv') == FALSE){
  write.table(result_microbiome, 
              'gitignore/lm_results/result_microbiome_SGB10.csv', row.names = FALSE)
}

dim(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05))
## [1] 73 22

5 Elastic net

To assess the predictive power of microbiome features to discriminate between diet strategy, we employed Elastic Net logistic regression.

As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).

The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.

All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5).

5.1 Prepare data for glmnet

Open code
data_microbiome_glmnet <- data_microbiome_original %>%
  na.omit() %>%
  dplyr::mutate(
    vegan = as.numeric(
      dplyr::if_else(
        Diet == "VEGAN", 1, 0
      )
    ),
    dplyr::across(
      `Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`, 
      ~ arm::rescale(.)
    )
  ) %>%
  dplyr::select(
    vegan,Country,
    dplyr::everything()
  ) %>%
  dplyr::select(
    Sample, vegan, Country,
    `Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`
  )

5.2 Fit model

Open code
modelac <- "elanet_microbiome_SGB10"

assign(
  modelac,
  run(
    expr = clust_glmnet_sep(
      data = data_microbiome_glmnet,
      outcome = "vegan",
      clust_id = "Sample",
      sample_method = "oos_boot",
      N = 500,
      alphas = c(0, 0.2, 0.4),
      family = "binomial",
      seed = 478
    ),
    path = paste0("gitignore/run/", modelac)
  )
)

5.3 Model summary

Open code
elanet_microbiome_SGB10$model_summary
##   alpha     lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1   0.4 0.01486299   1       0.9162559    0.846903   0.9756075        1
##   accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1            0.8313312        0.7377049        0.9200269
elanet_microbiome_SGB10$country_AUC
##   auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1          0.8473722      0.7057108      0.9727431           0.961881
##   auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1      0.8905114              1

5.4 ROC curve - internal validation

Open code
elanet_microbiome_SGB10$plot

Receiver operating characteristic (ROC) curves from a random subset of out-of-bag bootstrap iterations, illustrating the model’s ability to discriminate between vegan and omnivore status based on standardized CLR-transformed count of bacteria reads. Each curve corresponds to an elastic net model trained on a bootstrap resample and evaluated on the subjects not included in that iteration (i.e., out-of-bag samples). The curve plots the true positive rate (sensitivity) against the false positive rate (1 – specificity) across thresholds of predicted vegan probability. The area under the curve (AUC) quantifies overall discriminatory performance, with values closer to 1 indicating stronger separation between groups.

5.5 Estimated coefficients

Open code
data.frame(
  microbiome = row.names(
    elanet_microbiome_SGB10$betas
    )[
      which(
        abs(
          elanet_microbiome_SGB10$betas
          )>0
        )
      ],
  beta = elanet_microbiome_SGB10$betas[
    abs(
      elanet_microbiome_SGB10$betas
      )>0
    ]
  ) %>% 
  mutate(
    is_in_ExtValCoh = if_else(
      microbiome %in% names(data_microbiome_validation),
      1, 0
      )
    )
##                                           microbiome         beta
## 1                                        (Intercept)  0.610441581
## 2                          GGB9602_SGB15031|SGB15031 -0.070574298
## 3                 Parabacteroides_distasonis|SGB1934  0.087653621
## 4                          GGB9636_SGB15107|SGB15107 -0.168074116
## 5              Faecalibacterium_prausnitzii|SGB15339  0.025794395
## 6                 Faecalibacterium_SGB15346|SGB15346  0.062953892
## 7                            GGB1420_SGB1957|SGB1957 -0.272563443
## 8                      Bacteroides_eggerthii|SGB1829 -0.118990440
## 9                          Escherichia_coli|SGB10068 -0.071177360
## 10                           GGB3175_SGB4191|SGB4191 -0.099298405
## 11   Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861 -0.115460429
## 12                    Bacteroides_finegoldii|SGB1862 -0.162879455
## 13                 Lachnospira_pectinoschiza|SGB5075 -0.365148956
## 14                        Bacteroides_clarus|SGB1832 -0.289508943
## 15                    Anaerotignum_faecicola|SGB5190 -0.286519699
## 16  Hydrogenoanaerobacterium_saccharovorans|SGB15350 -0.232108131
## 17                        Bacteroides_faecis|SGB1860  0.045904035
## 18                         Roseburia_hominis|SGB4936  0.015862451
## 19                         GGB9712_SGB15244|SGB15244 -0.097891044
## 20                        Bacteroides_ovatus|SGB1871  0.183034474
## 21                         GGB9614_SGB15049|SGB15049  0.029258824
## 22              Barnesiella_intestinihominis|SGB1965 -0.007740484
## 23          Agathobaculum_butyriciproducens|SGB14993  0.032484900
## 24                           GGB6649_SGB9391|SGB9391  0.344150062
## 25                       Ruminococcus_bromii|SGB4285  0.046253755
## 26           Lawsonibacter_asaccharolyticus|SGB15154 -0.030678589
## 27                  Phocaeicola_massiliensis|SGB1812 -0.073558903
## 28                        Clostridium_fessum|SGB4705 -0.381734689
## 29                     Clostridium_sp_AF36_4|SGB4644  0.194737209
## 30                         GGB9707_SGB15229|SGB15229  0.037661768
## 31                   Roseburia_inulinivorans|SGB4940 -0.151998114
## 32                         Phocaeicola_dorei|SGB1815 -0.063680552
## 33                         GGB9502_SGB14899|SGB14899 -0.089915990
## 34       Lachnospiraceae_bacterium_AM48_27BH|SGB4706  0.060701908
## 35                         GGB9619_SGB15067|SGB15067 -0.106146483
## 36              Bifidobacterium_catenulatum|SGB17241 -0.420349718
## 37                         GGB9509_SGB14906|SGB14906 -0.358523335
## 38  Clostridiaceae_bacterium_Marseille_Q4143|SGB4768 -0.055607095
## 39                           GGB2653_SGB3574|SGB3574 -0.159107570
## 40                  Bacteroides_intestinalis|SGB1846  0.039513577
## 41                 Lachnospiraceae_bacterium|SGB4781 -0.098557340
## 42                      Phocea_massiliensis|SGB14837 -0.170687846
## 43        Bifidobacterium_pseudocatenulatum|SGB17237 -0.046789608
## 44                         Guopingia_tenuis|SGB14127 -0.095718417
## 45                           Blautia_SGB4815|SGB4815 -0.108580298
## 46                     Holdemania_filiformis|SGB4046 -0.277067990
## 47       Anaeromassilibacillus_senegalensis|SGB14894 -0.040823316
## 48         Senegalimassilia_anaerobia|SGB14824_group -0.051929087
## 49                           GGB3061_SGB4063|SGB4063 -0.011869279
## 50                   Ruminococcus_sp_AF41_9|SGB25497  0.317161646
## 51                       Eubacterium_ramulus|SGB4959 -0.256755135
## 52                  Enterocloster_lavalensis|SGB4725  0.035375597
## 53                    Youxingia_wuxianensis|SGB82503 -0.032674700
## 54                         GGB9611_SGB15045|SGB15045 -0.182649714
## 55                       Bianquea_renquensis|SGB4438  0.102744640
## 56                        GGB38744_SGB14842|SGB14842 -0.500222676
## 57                     Dorea_formicigenerans|SGB4575 -0.027329972
## 58                         GGB9765_SGB15382|SGB15382 -0.161905786
## 59                        Bacteroides_nordii|SGB1858  0.018517003
## 60                      Coprobacter_secundus|SGB1962 -0.057869885
## 61                           GGB3034_SGB4030|SGB4030 -0.184214318
## 62                        GGB34900_SGB14891|SGB14891 -0.074206357
## 63                           GGB3510_SGB4687|SGB4687  0.179292645
## 64                 Mediterraneibacter_faecis|SGB4563 -0.448773880
## 65                      Ruminococcus_torques|SGB4608 -0.680187035
## 66                         Coprococcus_comes|SGB4577 -0.120546773
## 67                    Eubacterium_ventriosum|SGB5045  0.278518059
## 68                  Clostridium_sp_AM29_11AC|SGB4701 -0.015338712
## 69                         GGB9715_SGB15260|SGB15260  0.268211671
## 70                         GGB9758_SGB15368|SGB15368 -0.193072102
## 71                           GGB4689_SGB6486|SGB6486  0.096498546
## 72                           GGB2996_SGB3983|SGB3983  0.124127412
## 73                           GGB6613_SGB9347|SGB9347 -0.214654462
## 74                           GGB3304_SGB4367|SGB4367  0.013334299
## 75                         GGB9782_SGB15403|SGB15403 -0.216010736
## 76          Agathobaculum_butyriciproducens|SGB14991  0.064163130
## 77                         GGB9790_SGB15413|SGB15413  0.011216154
## 78                    Clostridium_sp_AF12_28|SGB4715 -0.008211546
## 79                         GGB9115_SGB14048|SGB14048 -0.147380413
## 80                         GGB9642_SGB15119|SGB15119 -0.023157832
## 81                         GGB9301_SGB14262|SGB14262 -0.152877745
## 82                         GGB9238_SGB14180|SGB14180 -0.008415888
## 83                         GGB9531_SGB14932|SGB14932 -0.266110535
## 84              Oscillibacter_valericigenes|SGB15076 -0.375649223
## 85                         GGB9781_SGB66170|SGB66170 -0.269724544
## 86                         GGB9524_SGB14924|SGB14924  0.019605114
## 87                           Segatella_copri|SGB1626  0.034112520
## 88         Mediterraneibacter_butyricigenes|SGB25493  0.077104734
## 89                         GGB9719_SGB15273|SGB15273 -0.359498807
## 90                  Eubacteriaceae_bacterium|SGB5043 -0.095616540
## 91                     Enterocloster_hominis|SGB4721 -0.318374333
## 92                      Blautia_sp_OF03_15BH|SGB4779 -0.201845304
## 93                         GGB9110_SGB14043|SGB14043 -0.178628970
## 94                         GGB9695_SGB15210|SGB15210 -0.077590685
## 95                Streptococcus_thermophilus|SGB8002 -1.014644514
## 96                            GGB781_SGB1024|SGB1024  0.065530468
## 97                           GGB1627_SGB2230|SGB2230 -0.009525388
## 98            Christensenella_hongkongensis|SGB14143  0.015072123
## 99                           GGB6612_SGB9346|SGB9346 -0.071779934
## 100                            Blautia_obeum|SGB4810  0.235493349
## 101                          GGB3480_SGB4648|SGB4648 -0.101711770
## 102                          GGB6601_SGB9333|SGB9333 -0.228053801
## 103                          GGB3352_SGB4436|SGB4436  0.269468285
## 104                       GGB45491_SGB63163|SGB63163 -0.184463830
## 105                        GGB9708_SGB15233|SGB15233  0.040526093
## 106         Lachnospiraceae_bacterium_OF09_6|SGB4966  0.100910774
## 107                   Eubacterium_sp_AF16_48|SGB5042  0.006916398
## 108                       Gemmiger_SGB15301|SGB15301 -0.025267298
## 109         Pseudoflavonifractor_gallinarum|SGB29328 -0.052736280
## 110                       Clostridium_sp_AT4|SGB4753 -0.282544762
## 111                          GGB3740_SGB5076|SGB5076  0.475850142
## 112    Hydrogeniiclostridium_mannosilyticum|SGB14890  0.021306856
## 113                          GGB4596_SGB6358|SGB6358  0.322901276
## 114                        Blautia_stercoris|SGB4788  0.162017063
## 115                          Roseburia_lenta|SGB4957  0.539526518
## 116                          GGB3288_SGB4342|SGB4342  0.295158480
## 117                        GGB9574_SGB14987|SGB14987  0.312370613
## 118              Clostridium_methylpentosum|SGB14962  0.112475927
## 119                          GGB1495_SGB2071|SGB2071  0.267922114
## 120                          GGB3828_SGB5197|SGB5197  0.144942485
## 121                      Clostridium_SGB4751|SGB4751  0.028098610
## 122                          GGB3486_SGB4658|SGB4658  0.155462609
## 123                Monoglobus_pectinilyticus|SGB4166  0.264077066
## 124                    Enterocloster_bolteae|SGB4757  0.031574722
## 125                          GGB4604_SGB6371|SGB6371  0.035415533
## 126             Klebsiella_pneumoniae|SGB10115_group  0.032375569
## 127                        GGB9258_SGB14205|SGB14205  0.100032971
## 128                        GGB9545_SGB14952|SGB14952 -0.132446606
## 129                 Eubacterium_sp_AF34_35BH|SGB5051 -0.104426809
## 130                          GGB3363_SGB4448|SGB4448  0.335084056
## 131               Lachnoclostridium_pacaense|SGB4763  0.097758506
## 132                        GGB9634_SGB15097|SGB15097  0.566674657
## 133      Paratractidigestivibacter_faecalis|SGB14370  0.001896832
## 134      Clostridiaceae_unclassified_SGB4771|SGB4771  0.263009292
##     is_in_ExtValCoh
## 1                 0
## 2                 1
## 3                 1
## 4                 1
## 5                 1
## 6                 1
## 7                 1
## 8                 1
## 9                 1
## 10                1
## 11                1
## 12                1
## 13                1
## 14                1
## 15                1
## 16                1
## 17                1
## 18                1
## 19                1
## 20                1
## 21                1
## 22                1
## 23                1
## 24                1
## 25                1
## 26                1
## 27                1
## 28                1
## 29                1
## 30                1
## 31                1
## 32                1
## 33                1
## 34                1
## 35                1
## 36                1
## 37                1
## 38                1
## 39                1
## 40                1
## 41                1
## 42                1
## 43                1
## 44                1
## 45                1
## 46                1
## 47                1
## 48                1
## 49                1
## 50                1
## 51                1
## 52                1
## 53                1
## 54                1
## 55                1
## 56                1
## 57                1
## 58                1
## 59                1
## 60                1
## 61                1
## 62                1
## 63                1
## 64                1
## 65                1
## 66                1
## 67                1
## 68                1
## 69                1
## 70                1
## 71                1
## 72                1
## 73                1
## 74                1
## 75                1
## 76                1
## 77                1
## 78                1
## 79                1
## 80                1
## 81                1
## 82                1
## 83                1
## 84                1
## 85                1
## 86                1
## 87                1
## 88                1
## 89                1
## 90                1
## 91                1
## 92                1
## 93                1
## 94                1
## 95                1
## 96                1
## 97                1
## 98                1
## 99                1
## 100               1
## 101               1
## 102               1
## 103               1
## 104               1
## 105               1
## 106               1
## 107               1
## 108               1
## 109               1
## 110               1
## 111               1
## 112               1
## 113               1
## 114               1
## 115               1
## 116               1
## 117               1
## 118               1
## 119               1
## 120               1
## 121               1
## 122               1
## 123               1
## 124               1
## 125               1
## 126               1
## 127               1
## 128               1
## 129               1
## 130               1
## 131               1
## 132               1
## 133               1
## 134               1

5.6 Plot beta coefficients

Open code
elacoef <- data.frame(
  microbiome = row.names(elanet_microbiome_SGB10$betas),
  beta_ela = elanet_microbiome_SGB10$betas[, 1]
) %>%
  arrange(abs(beta_ela)) %>%
  filter(abs(beta_ela) > 0,
         !grepl('Intercept', microbiome)) %>%
  mutate(microbiome = factor(microbiome, levels = microbiome)) 

plotac <- "elanet_beta_microbiome_SGB10"
path <- "gitignore/figures"

assign(plotac, 
  ggplot(elacoef,
    aes(
      x = microbiome,
      y = beta_ela
    )
  ) +
  geom_point() +
  geom_hline(yintercept = 0, color = "black") +
  labs(
    y = "Standardized beta coefficients",
    x = "Bacteria species"
  ) +
  theme_minimal() +
  coord_flip() + 
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

Regression coefficients from the elastic net model predicting vegan diet strategy based on CLR-transformed and standardized proportion of bacterial taxa. Taxa are ordered by the magnitude of the standardized coefficients, indicating their relative importance in distinguishing between the diet groups. The sign of each coefficient indicates the direction of association with vegan diet status, with positive values indicating a higher likelihood of vegan status and negative values indicating omnivore status. Taxa whose effects were shrunk to zero are not shown.
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 7,
    height = 14
  )
}

6 External validation

External validation was performed with an independent Czech cohort.

As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each taxa as taken from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.

In a 2nd step, we will look at taxa that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated by linear models (one per a taxa) with data of training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these taxa will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort

6.1 Prediction of diet (elastic net)

6.1.1 Get table of weights, means and SDs

Open code

coefs_microbiome_all <- get_coef(
  original_data = data_analysis,
  glmnet_model = elanet_microbiome_SGB10)

coefs_microbiome_all
## # A tibble: 570 × 5
##    predictor                             beta_scaled beta_OrigScale   mean    SD
##    <chr>                                       <dbl>          <dbl>  <dbl> <dbl>
##  1 (Intercept)                                0.610          NA     NA     NA   
##  2 Bacteroides_stercoris|SGB1830              0               0      1.75   4.46
##  3 Alistipes_putredinis|SGB2318               0               0      5.93   3.94
##  4 Candidatus_Cibionibacter_quicibialis…      0               0      6.26   2.24
##  5 Bacteroides_uniformis|SGB1836              0               0      7.09   2.00
##  6 Eubacterium_siraeum|SGB4198                0               0     -0.162  3.77
##  7 GGB9634_SGB15093|SGB15093                  0               0      0.190  2.83
##  8 GGB9602_SGB15031|SGB15031                 -0.0706         -0.480  1.77   3.40
##  9 Phocaeicola_vulgatus|SGB1814               0               0      6.61   2.60
## 10 Faecalibacterium_prausnitzii|SGB15316      0               0      6.18   2.31
## # ℹ 560 more rows

6.1.2 Identify shared and missing predictors

Open code

## Which are missing in the validation set
missing <- setdiff(
  coefs_microbiome_all$predictor[-1], 
  colnames(
    data_microbiome_validation
    )
  )

missing
## character(0)

## Which are common with the validations et
common_predictors <- intersect(
  coefs_microbiome_all$predictor,
  colnames(data_microbiome_validation)
)

6.1.3 Standardize data in validation set

Open code
data_microbiome_validation_pred_all <- data_microbiome_validation %>%
  dplyr::mutate(
    vegan = if_else(
      Diet == "VEGAN", 1, 0
    )
  ) %>%
  dplyr::select(
    vegan,
    dplyr::all_of(common_predictors)
  ) %>% 
  dplyr::mutate(
    across(
      .cols = -vegan,
      .fns = ~ . 
      - coefs_microbiome_all$mean[
        match(
          cur_column(), 
          coefs_microbiome_all$predictor
          )
        ]
      )
    ) %>% 
  dplyr::mutate(
    across(
      .cols = -vegan,
      .fns = ~ . 
      / coefs_microbiome_all$SD[
        match(
          cur_column(), 
          coefs_microbiome_all$predictor
          )
        ]
      )
    ) 

6.1.4 Result

6.1.4.1 ROC curve

Open code

newx <- as.matrix(data_microbiome_validation_pred_all[,-1])

predicted <- predict(
  elanet_microbiome_SGB10$fit, 
  newx = newx)

tr <- data_microbiome_validation_pred_all %>% 
  dplyr::mutate(
    predicted_logit = as.numeric(
      predict(
        elanet_microbiome_SGB10$fit,
        newx = newx
        )
      )
    ) %>% 
  dplyr::mutate(
    predicted = inv_logit(predicted_logit)
  )

roc_microbiome_all <- pROC::roc(
      vegan ~ predicted_logit,
      data = tr,
      direction = "<",
      levels = c(0, 1),
      ci = TRUE
      )

roc_microbiome_all
## 
## Call:
## roc.formula(formula = vegan ~ predicted_logit, data = tr, direction = "<",     levels = c(0, 1), ci = TRUE)
## 
## Data: predicted_logit in 43 controls (vegan 0) < 59 cases (vegan 1).
## Area under the curve: 0.8928
## 95% CI: 0.8283-0.9573 (DeLong)

plotac <- "roc_microbiom_SGB10"
path <- "gitignore/figures"

assign(plotac, ggroc(roc_microbiome_all))
get(plotac)

Receiver Operating Characteristic (ROC) curve illustrating the model’s ability to distinguish between vegan and omnivore status based on CLR-transformed counts of bacteria reads in the external validation Czech cohort. The curve plots the true positive rate (sensitivity) against the true positive rate (specificity) at various thresholds of predicted vegan status, as estimated from the elastic net model developed on the training data. The area under the curve (AUC) represents the model’s overall performance, with values closer to 1 indicating stronger discrimination.
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 6,
    height = 4.5
  )
}

6.1.4.2 Table

Open code
mod <- elanet_microbiome_SGB10

trainAUC <- mean(mod[["valid_performances"]]$auc_resamp_test)
trainCI <- quantile(mod[["valid_performances"]]$auc_resamp_test,
  probs = c(1 / 40, 39 / 40)
)

res <- data.frame(
  alpha = c(mod$model_summary$alpha, rep("", 4)),
  lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
  `performance type` = c(
    "Training set AUC",
    "Out-of-sample AUC (all)",
    "Out-of-sample AUC (Czech)",
    "Out-of-sample AUC (Italy)",
    "External validation AUC"
  ),
  `performance [95% CI]` = c(
    sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$model_summary$auc_OutOfSample,
      mod$model_summary$auc_oos_CIL,
      mod$model_summary$auc_oos_CIU
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$country_AUC$auc_OutOfSample_CZ,
      mod$country_AUC$auc_oos_CIL_CZ,
      mod$country_AUC$auc_oos_CIU_CZ
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$country_AUC$auc_OutOfSample_IT,
      mod$country_AUC$auc_oos_CIL_IT,
      mod$country_AUC$auc_oos_CIU_IT
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      roc_microbiome_all[["ci"]][2],
      roc_microbiome_all[["ci"]][1],
      roc_microbiome_all[["ci"]][3]
    )
  )
)


kableExtra::kable(
  res %>% mutate(across(where(is.numeric), ~ round(.x, 3))),
  caption = "Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed counts of bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method."
)
Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed counts of bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized alpha (mixing parameter) and lambda (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method.
alpha lambda performance.type performance..95..CI.
0.4 0.0149 Training set AUC 1.000 [1.000 to 1.000]
Out-of-sample AUC (all) 0.916 [0.847 to 0.976]
Out-of-sample AUC (Czech) 0.962 [0.891 to 1.000]
Out-of-sample AUC (Italy) 0.847 [0.706 to 0.973]
External validation AUC 0.893 [0.828 to 0.957]

6.2 Diet effect across datasets

Similarly as in training data cohorts, we will fit linear model per each of the selected taxa (\(CLR\) - transformed), with a single fixed effect factor of diet.

6.2.1 Linear models in validation cohort

Open code

diet_sensitive_taxa <- result_microbiome %>% 
  filter(fdr_VGdiet_avg < 0.05) %>% 
  pull(outcome)

len <- length(diet_sensitive_taxa)

data_analysis_microbiome <- data_microbiome_validation %>%
  dplyr::mutate(
    Diet_VEGAN = as.numeric(
      dplyr::if_else(
        Diet == 'VEGAN', 1, 0
      )
    )
  ) %>%
  dplyr::select(
    Diet_VEGAN,
    all_of(diet_sensitive_taxa)
  )

6.2.1.1 Define number of microbiome and covariates

Open code
n_covarites <- 1
n_features <- ncol(data_analysis_microbiome) - n_covarites

6.2.1.2 Create empty objects

Open code
outcome <- vector('double', n_features)
est_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features)
CI_U_VGdiet <- vector('double', n_features)

6.2.1.3 Estimate over outcomes

Open code
for (i in 1:n_features) {
  ## define variable
  data_analysis_microbiome$outcome <- data_analysis_microbiome[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Diet_VEGAN, data = data_analysis_microbiome)

  ## save results
  outcome[i] <- names(data_analysis_microbiome)[i + n_covarites]

  ## diet effect
  tr <- confint(model)

  CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
  CI_U_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]

  est_VGdiet[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 1
  ]

  P_VGdiet[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 4
  ]
}

6.2.1.4 Results table

Open code
result_microbiome_val <- data.frame(
  outcome,
  est_VGdiet, P_VGdiet,
  CI_L_VGdiet, CI_U_VGdiet
)

kableExtra::kable(result_microbiome_val,
  caption = "Results of linear models estimating the effect of diet on the CLR-transformed relative abundance of a given taxon as the outcome. Only taxa whose relative abundance significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `est` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed relative abundnace differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model"
)
Results of linear models estimating the effect of diet on the CLR-transformed relative abundance of a given taxon as the outcome. Only taxa whose relative abundance significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. est represents the estimated effects (regression coefficient), indicating how much the CLR-transformed relative abundnace differ between vegans and omnivores. P: p-value, fdr: p-value adjusted for multiple comparisons, and CI_L and CI_U represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model
outcome est_VGdiet P_VGdiet CI_L_VGdiet CI_U_VGdiet
GGB1420_SGB1957|SGB1957 -1.6319786 0.0001833 -2.4651743 -0.7987830
Escherichia_coli|SGB10068 -0.1364030 0.7504858 -0.9850710 0.7122650
Bacteroides_clarus|SGB1832 -0.5846336 0.0591555 -1.1922920 0.0230247
Hydrogenoanaerobacterium_saccharovorans|SGB15350 -0.0236533 0.9237105 -0.5124669 0.4651603
GGB9775_SGB15395|SGB15395 -0.0622451 0.8622175 -0.7719738 0.6474835
Ruthenibacterium_lactatiformans|SGB15271 -0.5610860 0.1251910 -1.2809841 0.1588121
Lawsonibacter_asaccharolyticus|SGB15154 -2.0443547 0.0000027 -2.8593796 -1.2293298
Clostridium_fessum|SGB4705 -0.1266632 0.6413189 -0.6644529 0.4111264
GGB9707_SGB15229|SGB15229 -0.3170753 0.5258509 -1.3052703 0.6711197
Collinsella_aerofaciens|SGB14535 0.8831037 0.1607185 -0.3567223 2.1229298
GGB9509_SGB14906|SGB14906 -0.6808254 0.0481071 -1.3558713 -0.0057794
GGB45432_SGB63101|SGB63101 -0.4316762 0.2950971 -1.2453732 0.3820207
GGB2653_SGB3574|SGB3574 -0.7730105 0.0527748 -1.5554614 0.0094403
Phocea_massiliensis|SGB14837 -0.2945252 0.4199778 -1.0160979 0.4270474
Guopingia_tenuis|SGB14127 -0.2466821 0.5415046 -1.0455355 0.5521714
Lachnospiraceae_bacterium|SGB4953 1.0640524 0.1217977 -0.2887385 2.4168432
Holdemania_filiformis|SGB4046 -0.9757476 0.0010138 -1.5474194 -0.4040757
Anaeromassilibacillus_senegalensis|SGB14894 0.1049346 0.7858395 -0.6592016 0.8690709
GGB3061_SGB4063|SGB4063 -0.4248798 0.1440428 -0.9973650 0.1476053
Ruminococcus_sp_AF41_9|SGB25497 1.5114864 0.0062116 0.4388303 2.5841426
Eubacterium_ramulus|SGB4959 -0.8189588 0.0014835 -1.3160877 -0.3218300
GGB3000_SGB3991|SGB3991 -0.0994500 0.6181309 -0.4940141 0.2951140
Youxingia_wuxianensis|SGB82503 -0.8425429 0.0219470 -1.5606907 -0.1243952
Blautia_massiliensis|SGB4826 -0.4043207 0.2994738 -1.1734276 0.3647862
GGB38744_SGB14842|SGB14842 -0.5994825 0.0627007 -1.2313410 0.0323760
Dorea_formicigenerans|SGB4575 0.0818715 0.5861126 -0.2154860 0.3792291
GGB9765_SGB15382|SGB15382 -0.3957603 0.3593542 -1.2484426 0.4569219
GGB9770_SGB15390|SGB15390 -0.3093744 0.4051897 -1.0436384 0.4248895
GGB34900_SGB14891|SGB14891 -0.4120472 0.1278995 -0.9445541 0.1204597
Mediterraneibacter_faecis|SGB4563 -1.2259291 0.0000274 -1.7790085 -0.6728497
Ruminococcus_torques|SGB4608 -2.0120926 0.0002102 -3.0496158 -0.9745694
Clostridium_sp_AM29_11AC|SGB4701 0.0928994 0.3642174 -0.1093027 0.2951015
Lachnoclostridium_sp_An138|SGB4774 0.4975782 0.0087858 0.1282307 0.8669257
GGB9782_SGB15403|SGB15403 -0.1246240 0.5085541 -0.4972846 0.2480366
Agathobaculum_butyriciproducens|SGB14991 0.3688243 0.4169264 -0.5288526 1.2665012
Clostridium_SGB48024|SGB48024 0.3968599 0.3624847 -0.4638070 1.2575268
GGB9301_SGB14262|SGB14262 -0.1795302 0.3499757 -0.5588376 0.1997773
GGB9531_SGB14932|SGB14932 -0.7159136 0.0477755 -1.4246752 -0.0071520
Oscillibacter_valericigenes|SGB15076 -0.3354555 0.1998820 -0.8512073 0.1802962
GGB9781_SGB66170|SGB66170 -0.5684455 0.1097144 -1.2672650 0.1303741
GGB9616_SGB15052|SGB15052 0.8326234 0.0290113 0.0869295 1.5783172
GGB9719_SGB15273|SGB15273 0.1656707 0.1447268 -0.0579396 0.3892811
GGB58158_SGB79798|SGB79798 -0.6709277 0.0785820 -1.4199318 0.0780764
Enterocloster_hominis|SGB4721 -2.6863647 0.0000000 -3.5857630 -1.7869663
Streptococcus_thermophilus|SGB8002 -2.0254710 0.0000042 -2.8510019 -1.1999401
Blautia_obeum|SGB4810 1.9982699 0.0000426 1.0724358 2.9241040
GGB6601_SGB9333|SGB9333 -0.3380189 0.2473263 -0.9143156 0.2382778
GGB51884_SGB49168|SGB49168 0.0493839 0.8653658 -0.5269997 0.6257675
Lachnospiraceae_bacterium_OF09_6|SGB4966 0.8338853 0.0480223 0.0073988 1.6603718
GGB3740_SGB5076|SGB5076 1.8911990 0.0003254 0.8834608 2.8989372
GGB4596_SGB6358|SGB6358 0.1742579 0.5315783 -0.3764478 0.7249635
Roseburia_lenta|SGB4957 1.8756274 0.0004803 0.8448666 2.9063882
GGB3288_SGB4342|SGB4342 0.0860836 0.8435058 -0.7768079 0.9489752
Clostridium_methylpentosum|SGB14962 0.0926049 0.4507257 -0.1500412 0.3352509
Clostridium_SGB4751|SGB4751 0.6046968 0.0003366 0.2816354 0.9277581
GGB4604_SGB6371|SGB6371 0.3559856 0.0821071 -0.0461582 0.7581294
GGB9545_SGB14952|SGB14952 0.0878718 0.6490821 -0.2940953 0.4698388
GGB3363_SGB4448|SGB4448 0.2788593 0.0252614 0.0352946 0.5224240
GGB9634_SGB15097|SGB15097 0.1673147 0.1856037 -0.0817328 0.4163622
Clostridiaceae_unclassified_SGB4771|SGB4771 0.3627317 0.3434905 -0.3933569 1.1188202
GGB3627_SGB4906|SGB4906 -0.0604011 0.8893944 -0.9198865 0.7990842
GGB4682_SGB6472|SGB6472 0.2886792 0.2644873 -0.2217099 0.7990682
GGB3491_SGB4665|SGB4665 0.4885714 0.1610792 -0.1979513 1.1750941
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -0.3828443 0.4320887 -1.3457524 0.5800637
GGB9616_SGB15051|SGB15051 0.4658143 0.2161639 -0.2766842 1.2083128
Clostridium_sp_AF32_12BH|SGB4710 1.1263953 0.0041614 0.3644904 1.8883003
Butyricicoccus_intestinisimiae|SGB14980 0.2683956 0.0166443 0.0497533 0.4870379
Kineothrix_SGB4886|SGB4886 -0.2490503 0.5956760 -1.1772444 0.6791439
Blautia_SGB4833|SGB4833 0.1893524 0.5947566 -0.5145889 0.8932937
GGB3490_SGB4664|SGB4664 0.7333115 0.0346279 0.0540775 1.4125454
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 0.4592766 0.0005041 0.2058801 0.7126731
Lactococcus_lactis|SGB7985 -1.7353981 0.0000017 -2.4118186 -1.0589776
GGB4569_SGB6312|SGB6312 0.0890736 0.5307595 -0.1918632 0.3700104
Open code

if (file.exists("gitignore/lm_results/result_microbiom_validation_SGB10.csv") == FALSE) {
  write.table(result_microbiome_val,
    "gitignore/lm_results/result_microbiom_validation_SGB10.csv",
    row.names = FALSE
  )
}

6.2.2 Forest plot

6.2.2.1 Prepare data

Open code

## subset result tables
result_microbiome_subset <- result_microbiome %>%
  filter(outcome %in% diet_sensitive_taxa)

result_microbiome_val_subset <- result_microbiome_val %>%
  filter(outcome %in% diet_sensitive_taxa)

## create a data frame
data_forest <- data.frame(
  outcome = rep(diet_sensitive_taxa, 3),
  beta = c(
    result_microbiome_subset$est_VGdiet_inCZ,
    result_microbiome_subset$est_VGdiet_inIT,
    result_microbiome_val_subset$est_VGdiet
  ),
  lower = c(
    result_microbiome_subset$CI_L_VGdiet_inCZ,
    result_microbiome_subset$CI_L_VGdiet_inIT,
    result_microbiome_val_subset$CI_L_VGdiet
  ),
  upper = c(
    result_microbiome_subset$CI_U_VGdiet_inCZ,
    result_microbiome_subset$CI_U_VGdiet_inIT,
    result_microbiome_val_subset$CI_U_VGdiet
  ),
  dataset = c(
    rep("CZ", len),
    rep("IT", len),
    rep("Validation", len)
  )
)

## define ordering
validation_order <- data_forest %>%
  group_by(outcome) %>%
  summarise(beta_mean = mean(beta), .groups = "drop") %>%
  arrange(beta_mean) %>%
  pull(outcome)


## Define 'winners'
up_winners <- data_forest %>% 
  pivot_wider(names_from = dataset,
              values_from = c(beta, lower, upper)) %>% 
  left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
            by = 'outcome') %>% 
  filter(beta_CZ > 0,
         beta_IT > 0,
         lower_Validation > 0,
         beta_ela > 0.1) %>% 
  pull(outcome)

down_winners <- data_forest %>% 
  pivot_wider(names_from = dataset,
              values_from = c(beta, lower, upper)) %>% 
  left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
            by = 'outcome') %>% 
  filter(beta_CZ < 0,
         beta_IT < 0,
         upper_Validation < 0,
         beta_ela < -0.1) %>% 
  pull(outcome)

winners <- c(up_winners, down_winners)

 data_forest <- data_forest %>%
  mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
  left_join(
    elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
    by = 'outcome') %>% 
   mutate(outcome = factor(outcome, levels = validation_order))

6.2.2.2 Plotting

Open code
plotac <- "forest_plot_microbiome_SGB10"
path <- "gitignore/figures"

colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")

assign(plotac, ggplot(
  data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
  geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
  geom_hline(yintercept = 0, color = "black") +
  geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
  scale_color_manual(values = colors) +
  labs(
    y = "Effect of vegan diet on CLR-transformed relative abundance",
    x = "Outcome",
    color = "Dataset"
  ) +
  theme_minimal() +
  coord_flip() + # Flip coordinates to have outcomes on the y-axis
  scale_x_discrete(
    labels = setNames(
      ifelse(data_forest$in_winner, 
             paste0("**", data_forest$outcome, "**"), 
             as.character(data_forest$outcome)
      ), data_forest$outcome
    )
  ) +
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = ggtext::element_markdown(size = 10),  
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

The forest plot illustrates the effects of a vegan diet on the CLR-transformed relative abundances of selected taxa, along with their 95% confidence intervals, across two training cohorts (Czech and Italian) and one independent Czech cohort (validation). Green, blue, and grey points/lines represent differences in CLR-transformed relative abundance between vegans and omnivores within the Italian cohort, Czech cohort, and Czech validation cohort, respectively. Positive values suggest a higher relative abundance in vegans compared to omnivores. Only taxa whose relative abundances differed between vegan and omnivores (as an average effect across both training cohorts) were selected, and these effects were further validated in the independent cohort. The estimates for the training cohorts were obtained from a single linear model that included Diet, Country, and the interaction term Diet x Country as fixed-effect predictors. In the independent Czech validation cohort, Diet was the only fixed-effect predictor. Taxa validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are bold
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 13
  )
}

6.2.3 Boxplot

Open code
plotac <- "boxplot_microbiome_SGB10"
path <- "gitignore/figures"

colo <- c('#F9FFAF','#329243')

boxplot_cond <- function(variable) {
  
  p <- ggboxplot(data_merged, 
                 x = 'Diet', 
                 y = variable, 
                 fill = 'Diet', 
                 tip.length = 0.15,
                 palette = colo,
                 outlier.shape = 1,
                 lwd = 0.25,
                 outlier.size = 0.8,
                 facet.by = 'Data',
                 title = variable,
                 ylab = 'CLR(taxa proportion)') +
    
    theme(
      plot.title = element_text(size = 10), 
      axis.title = element_text(size = 8),  
      axis.text.y = element_text(size = 7),
      axis.text.x = element_blank(),
      axis.title.x = element_blank()
    ) 
  return(p)
}

# Plot all outcomes
plots <- map(diet_sensitive_taxa, boxplot_cond)

# Create a matrix of plots
assign(plotac, 
       ggarrange(plotlist = plots, ncol = 7, nrow = 11,  common.legend = TRUE)
       )

get(plotac)

CLR-transformed relative abundances of selected taxa across all 3 cohorts (Czech and Italian training cohorts and an independent Czech valdiation cohort) and across dietary groups
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 12
  )
}

7 Linear model VG duration

Next, we fit another series of linear models, this time modelling clr-transformed taxa proportions (derived from shotgun metagenomic sequencing) using the following fixed-effect predictors: duration of vegan status (Diet_duration, scaled in tens of years), Country, their interaction (Diet_duration × Country), and Age:

\[ \text{CLR(taxa proportion)} = \alpha + \beta_{1} \times \text{Country} + \beta_{2} \times \text{Diet duration} + \beta_{3} \times (\text{Country}:\text{Diet duration}) + \beta_{4} \times \text{Age} + \epsilon \]

This analysis includes only vegan participants, while omnivores are excluded. The aim was to test whether specific taxa that differ between vegans and omnivores also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger enrichment or depletion of diet-sensitive taxa compared to those who adopted the diet more recently.

Because longer vegan duration is correlated with age, we also adjusted for age in the models.

7.1 Training cohort

7.1.1 Select data

Open code
data_analysis <- data_microbiome_original %>%
  na.omit() %>%
    mutate(Country_IT = as.numeric(
      dplyr::if_else(
        Country == "IT", 0.5, -0.5
      )
    )
  ) %>%
  filter(Diet == 'VEGAN',
         !is.na(Diet_duration)) %>% 
  select(1:5, Country_IT, all_of(diet_sensitive_taxa))

summary(data_analysis[ , 1:12])
##     Sample            Country              Diet           Diet_duration   
##  Length:96          Length:96          Length:96          Min.   : 0.600  
##  Class :character   Class :character   Class :character   1st Qu.: 3.150  
##  Mode  :character   Mode  :character   Mode  :character   Median : 5.000  
##                                                           Mean   : 5.492  
##                                                           3rd Qu.: 6.000  
##                                                           Max.   :20.000  
##       Age          Country_IT      GGB1420_SGB1957|SGB1957
##  Min.   :18.25   Min.   :-0.5000   Min.   :-3.2836        
##  1st Qu.:27.57   1st Qu.:-0.5000   1st Qu.:-1.5577        
##  Median :32.15   Median :-0.5000   Median :-1.2513        
##  Mean   :34.97   Mean   :-0.1042   Mean   :-0.8041        
##  3rd Qu.:40.77   3rd Qu.: 0.5000   3rd Qu.:-0.7237        
##  Max.   :60.70   Max.   : 0.5000   Max.   : 5.7246        
##  Escherichia_coli|SGB10068 Bacteroides_clarus|SGB1832
##  Min.   :-2.569            Min.   :-3.267            
##  1st Qu.:-1.673            1st Qu.:-2.635            
##  Median : 1.455            Median :-2.251            
##  Mean   : 1.261            Mean   :-1.010            
##  3rd Qu.: 3.479            3rd Qu.:-1.399            
##  Max.   : 8.535            Max.   : 8.931            
##  Hydrogenoanaerobacterium_saccharovorans|SGB15350 GGB9775_SGB15395|SGB15395
##  Min.   :-2.933                                   Min.   :-2.2689          
##  1st Qu.:-1.696                                   1st Qu.:-1.6780          
##  Median : 2.875                                   Median :-1.2936          
##  Mean   : 1.742                                   Mean   :-0.6410          
##  3rd Qu.: 4.252                                   3rd Qu.:-0.7807          
##  Max.   : 7.594                                   Max.   : 6.2006          
##  Ruthenibacterium_lactatiformans|SGB15271
##  Min.   :-2.921                          
##  1st Qu.: 2.375                          
##  Median : 3.622                          
##  Mean   : 3.269                          
##  3rd Qu.: 4.595                          
##  Max.   : 8.096

7.1.2 Define number of microbiome and covariates

Open code
n_covarites <- 6
n_features <- ncol(data_analysis) - n_covarites

7.1.3 Create empty objects

Open code
outcome <- vector('double', n_features)

est_Diet_duration_inCZ <- vector('double', n_features)
est_Diet_duration_inIT <- vector('double', n_features)
est_Diet_duration_avg <- vector('double', n_features)

est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)

est_Age <- vector('double', n_features)

P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg <- vector('double', n_features)

P_ITcountry_avg <- vector('double', n_features)
P_Age <- vector('double', n_features)

P_diet_country_int <- vector('double', n_features)

CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg <- vector('double', n_features)

CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features)
CI_U_Diet_duration_avg <- vector('double', n_features)

7.1.4 Estimate over outcomes

Open code
if(file.exists('gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
  
  ## define variable
  data_analysis$outcome <- data_analysis[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Country_IT * Diet_duration + Age, 
              data = data_analysis %>% 
                mutate(Diet_duration = (Diet_duration/10),
                       Age = (Age-33)/10))

  ## get contrast (effects of diet BY COUNTRY)
  contrast_emm <- summary(
    pairs(
      emmeans(
        model,
        specs = ~ Diet_duration | Country_IT,
        at = list(Diet_duration = c(0, 1))
        ),
      interaction = TRUE,
      adjust = "none"
      ),
    infer = c(TRUE, TRUE)
    )

  ## save results
  outcome[i] <- names(data_analysis)[i + n_covarites]
  
  ## country and age effect
  est_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 1
  ]

  P_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 4
  ]
  
  est_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 1
  ]

  P_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 4
  ]
  
  ## diet effect
  tr <- confint(model)
  
  CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][1]
  CI_U_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
  
  est_Diet_duration_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 1
  ]

  P_Diet_duration_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 4
  ]
  
  est_Diet_duration_inCZ[i] <- -contrast_emm[1,3]
  P_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
  CI_L_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
  CI_U_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
  
  est_Diet_duration_inIT[i] <- -contrast_emm[2,3]
  P_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
  CI_L_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
  CI_U_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
  
  ## interaction
  diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_duration"
    ), 1
  ]

  P_diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_duration"
    ), 4
  ]
}

  result_microbiome <- data.frame(
  outcome,
  est_ITcountry_avg, P_ITcountry_avg,
  est_Age, P_Age,
  est_Diet_duration_avg, P_Diet_duration_avg,
  est_Diet_duration_inCZ, P_Diet_duration_inCZ,
  est_Diet_duration_inIT, P_Diet_duration_inIT,
  diet_country_int, P_diet_country_int,
  CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
  CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
  CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
  
  saveRDS(result_microbiome, 
              'gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds')
}

result_microbiome <- readRDS('gitignore/lm_results/result_microbiome_SGB10_VGdur_training.Rds')

7.1.5 Show and save results

Open code
kableExtra::kable(result_microbiome %>% 
  dplyr::select(
    outcome,
    est_ITcountry_avg, P_ITcountry_avg,
    est_Age, P_Age,
    est_Diet_duration_avg, P_Diet_duration_avg,
    est_Diet_duration_inCZ, P_Diet_duration_inCZ,
    est_Diet_duration_inIT, P_Diet_duration_inIT,
    diet_country_int, P_diet_country_int,
    CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
    CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
    CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
  ) %>% arrange(P_Diet_duration_avg),
  caption = "Results of linear models, modelling CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome, with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration × Country`), and `Age` as fixed-effect predictors, using training data only (Czech and Italian vegan cohorts). Only taxa with CLR-transformed relative abundances differing by diet (FDR < 0.05, average effect across both countries) were included. `est`: denotes estimated effects (regression coefficients), i.e. expected change in clr-transformed relative abundnace per 10 years of vegan diet or age, or for country (Italy vs Czechia). `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix: effect averaged across cohorts, whereas `inCZ` and `inIT` indicate effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model with interaction",
  escape = FALSE
)
Results of linear models, modelling CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome, with vegan status duration (Diet_duration), Country, their interaction (Diet_duration × Country), and Age as fixed-effect predictors, using training data only (Czech and Italian vegan cohorts). Only taxa with CLR-transformed relative abundances differing by diet (FDR < 0.05, average effect across both countries) were included. est: denotes estimated effects (regression coefficients), i.e. expected change in clr-transformed relative abundnace per 10 years of vegan diet or age, or for country (Italy vs Czechia). P: p-value; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. avg suffix: effect averaged across cohorts, whereas inCZ and inIT indicate effects in the Czech or Italian cohort, respectively. Interaction effects are reported as diet_country_int (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and P_diet_country_int (its p-value). All estimates in a single row are based on a single model with interaction
outcome est_ITcountry_avg P_ITcountry_avg est_Age P_Age est_Diet_duration_avg P_Diet_duration_avg est_Diet_duration_inCZ P_Diet_duration_inCZ est_Diet_duration_inIT P_Diet_duration_inIT diet_country_int P_diet_country_int CI_L_Diet_duration_avg CI_U_Diet_duration_avg CI_L_Diet_duration_inCZ CI_U_Diet_duration_inCZ CI_L_Diet_duration_inIT CI_U_Diet_duration_inIT
Eubacterium_ramulus|SGB4959 -2.2634807 0.0244721 -0.0797104 0.7729521 2.2299335 0.0173995 1.5341864 0.1003445 2.9256805 0.0507039 1.3914941 0.3979552 0.4013854 4.0584816 -0.3015733 3.3699462 -0.0091553 5.8605164
GGB9634_SGB15097|SGB15097 1.7359384 0.1308107 0.5207463 0.1038708 -2.3902075 0.0264429 -1.0117831 0.3439351 -3.7686319 0.0291457 -2.7568488 0.1471384 -4.4944164 -0.2859986 -3.1242909 1.1007246 -7.1459058 -0.3913581
Youxingia_wuxianensis|SGB82503 1.2702346 0.1389459 0.6703032 0.0057350 -1.6583578 0.0389724 -0.8482321 0.2886903 -2.4684834 0.0551437 -1.6202512 0.2532099 -3.2309213 -0.0857943 -2.4269977 0.7305334 -4.9924617 0.0554949
Bacteroides_clarus|SGB1832 -1.1466953 0.3270097 0.0344925 0.9154410 2.2660085 0.0391244 -0.6619102 0.5440442 5.1939272 0.0035974 5.8558374 0.0030987 0.1155224 4.4164946 -2.8208777 1.4970572 1.7423780 8.6454764
GGB6601_SGB9333|SGB9333 1.5659553 0.0560300 0.4354908 0.0562942 -1.4037929 0.0654179 -0.3456940 0.6484441 -2.4618919 0.0444814 -2.1161979 0.1176887 -2.8990249 0.0914390 -1.8468230 1.1554351 -4.8617524 -0.0620314
Butyricicoccus_intestinisimiae|SGB14980 -0.1686275 0.8628204 0.0144102 0.9577000 1.5314225 0.0941914 1.4874167 0.1052312 1.5754283 0.2811939 0.0880116 0.9565668 -0.2671265 3.3299714 -0.3182256 3.2930589 -1.3112586 4.4621151
GGB9531_SGB14932|SGB14932 0.1331096 0.8540136 -0.1859339 0.3569904 1.0435547 0.1234520 0.2239366 0.7403926 1.8631728 0.0870892 1.6392361 0.1733626 -0.2896207 2.3767301 -1.1144967 1.5623700 -0.2765855 4.0029310
GGB9775_SGB15395|SGB15395 2.2015805 0.0048615 0.0708877 0.7392491 -1.0199179 0.1540373 -0.0709296 0.9209050 -1.9689062 0.0872278 -1.8979765 0.1363424 -2.4293779 0.3895421 -1.4859484 1.3440892 -4.2311019 0.2932895
Lachnospiraceae_bacterium_OF09_6|SGB4966 -1.7721252 0.0056195 0.0804810 0.6446641 0.8302965 0.1565775 -0.0941024 0.8722446 1.7546954 0.0631838 1.8487978 0.0772546 -0.3242759 1.9848688 -1.2532283 1.0650235 -0.0984034 3.6077942
GGB3627_SGB4906|SGB4906 0.7888994 0.3736836 0.1178995 0.6324511 -1.1562688 0.1624408 -0.4923438 0.5517794 -1.8201937 0.1705712 -1.3278499 0.3659226 -2.7871264 0.4745889 -2.1296334 1.1449459 -4.4377347 0.7973472
Phocea_massiliensis|SGB14837 0.3300473 0.6109601 -0.2128508 0.2400983 0.8269831 0.1725872 0.3016762 0.6186161 1.3522900 0.1647316 1.0506138 0.3290696 -0.3679323 2.0218985 -0.8979519 1.5013042 -0.5655597 3.2701398
Roseburia_lenta|SGB4957 -1.1485043 0.1939312 -0.2188754 0.3727034 1.0908033 0.1849003 1.7917238 0.0314002 0.3898828 0.7667576 -1.4018410 0.3372956 -0.5310858 2.7126924 0.1634381 3.4200095 -2.2132635 2.9930291
Blautia_obeum|SGB4810 -0.8297276 0.4278795 -0.7873716 0.0079402 1.2678961 0.1941701 1.7075163 0.0826944 0.8282758 0.5957624 -0.8792405 0.6115544 -0.6575878 3.1933800 -0.2255615 3.6405942 -2.2621429 3.9186945
GGB58158_SGB79798|SGB79798 1.0617257 0.1138545 -0.2375929 0.2026724 -0.7705849 0.2161940 -0.4403313 0.4802370 -1.1008385 0.2705809 -0.6605072 0.5501655 -1.9996780 0.4585082 -1.6742719 0.7936093 -3.0735438 0.8718668
GGB3061_SGB4063|SGB4063 1.1735315 0.0535761 0.0537967 0.7481819 -0.6384007 0.2558494 -0.4098133 0.4665708 -0.8669881 0.3358475 -0.4571747 0.6465657 -1.7474133 0.4706119 -1.5231998 0.7035731 -2.6469630 0.9129869
GGB4596_SGB6358|SGB6358 -1.9610659 0.1923950 -0.2883472 0.4897080 1.5541227 0.2662506 0.0808441 0.9539071 3.0274014 0.1779337 2.9465572 0.2365261 -1.2056178 4.3138632 -2.6897806 2.8514688 -1.4020065 7.4568092
Agathobaculum_butyriciproducens|SGB14991 -1.1348889 0.4267109 -0.5289592 0.1846672 1.4471894 0.2767172 -0.0135419 0.9918845 2.9079207 0.1740670 2.9214626 0.2177453 -1.1797851 4.0741638 -2.6508770 2.6237931 -1.3083965 7.1242378
GGB3000_SGB3991|SGB3991 -0.5273550 0.4310127 0.0366138 0.8440666 -0.6318298 0.3111029 -0.6795404 0.2780696 -0.5841193 0.5588552 0.0954211 0.9313164 -1.8639887 0.6003291 -1.9165588 0.5574781 -2.5617452 1.3935067
Guopingia_tenuis|SGB14127 2.8001581 0.0000154 0.0137690 0.9358774 -0.5656660 0.3239461 0.2917445 0.6116400 -1.4230765 0.1235319 -1.7148210 0.0946077 -1.6986314 0.5672995 -0.8456892 1.4291783 -3.2414960 0.3953430
Streptococcus_thermophilus|SGB8002 0.1750854 0.8104295 -0.1641650 0.4199343 0.6677841 0.3266550 0.8055637 0.2391067 0.5300045 0.6269561 -0.2755592 0.8196616 -0.6772625 2.0128307 -0.5447877 2.1559151 -1.6288072 2.6888162
GGB3491_SGB4665|SGB4665 -0.8105088 0.4893210 0.4918365 0.1337120 -0.9883638 0.3652932 0.3514820 0.7479662 -2.3282096 0.1850790 -2.6796916 0.1691351 -3.1460973 1.1693697 -1.8147614 2.5177255 -5.7913909 1.1349718
GGB9781_SGB66170|SGB66170 0.6819760 0.1162203 -0.1047004 0.3841127 -0.3440829 0.3920305 0.2008890 0.6181777 -0.8890549 0.1695806 -1.0899440 0.1293301 -1.1387940 0.4506282 -0.5969563 0.9987344 -2.1645733 0.3864634
GGB9545_SGB14952|SGB14952 -0.4962381 0.2723944 0.1692543 0.1794619 -0.3562489 0.3964232 -0.4925713 0.2436705 -0.2199265 0.7438712 0.2726448 0.7149381 -1.1867691 0.4742712 -1.3263670 0.3412244 -1.5529187 1.1130657
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -1.4886652 0.1852244 0.2179337 0.4844928 0.8749422 0.4012757 0.3228170 0.7573322 1.4270673 0.3937119 1.1042503 0.5513446 -1.1859841 2.9358684 -1.7462374 2.3918714 -1.8807375 4.7348721
GGB9616_SGB15051|SGB15051 -0.9627153 0.4614339 0.6873363 0.0610358 -1.0152803 0.4040259 -1.7605192 0.1510474 -0.2700414 0.8898122 1.4904778 0.4910210 -3.4208484 1.3902879 -4.1755747 0.6545363 -4.1309995 3.5909167
GGB4569_SGB6312|SGB6312 0.4874583 0.6156552 -0.0842579 0.7551783 0.7495444 0.4072820 1.6127155 0.0777146 -0.1136268 0.9374961 -1.7263423 0.2841795 -1.0388044 2.5378932 -0.1826864 3.4081174 -2.9839424 2.7566888
Holdemania_filiformis|SGB4046 -0.6539918 0.4812044 0.2026012 0.4333201 0.7148807 0.4081703 0.0074997 0.9930908 1.4222616 0.3057263 1.4147618 0.3579607 -0.9940102 2.4237715 -1.7081308 1.7231303 -1.3205233 4.1650465
GGB3288_SGB4342|SGB4342 -1.3645286 0.2106593 0.8352144 0.0067638 0.8129747 0.4216145 0.5151959 0.6115844 1.1107534 0.4937041 0.5955575 0.7404558 -1.1874330 2.8133823 -1.4931012 2.5234931 -2.0999186 4.3214254
Lactococcus_lactis|SGB7985 -0.6887284 0.0367230 -0.0770226 0.3966650 0.2416963 0.4259940 0.0081042 0.9787517 0.4752885 0.3298111 0.4671844 0.3874451 -0.3586888 0.8420815 -0.5946489 0.6108572 -0.4883350 1.4389120
GGB9509_SGB14906|SGB14906 0.4730263 0.5953464 -0.0859382 0.7287893 -0.6463411 0.4358089 -1.2941676 0.1219636 0.0014855 0.9991081 1.2956532 0.3803275 -2.2865643 0.9938821 -2.9408598 0.3525245 -2.6310872 2.6340582
Oscillibacter_valericigenes|SGB15076 1.7124668 0.0435626 -0.2328092 0.3202100 -0.6090124 0.4360298 -0.2865574 0.7147092 -0.9314674 0.4578627 -0.6449100 0.6427029 -2.1552530 0.9372282 -1.8388963 1.2657814 -3.4131972 1.5502624
Clostridium_fessum|SGB4705 -1.3253754 0.2870331 0.1178601 0.7330878 0.8712197 0.4512270 0.9598642 0.4085204 0.7825751 0.6729651 -0.1772891 0.9312536 -1.4159941 3.1584334 -1.3363701 3.2560986 -2.8884231 4.4535733
GGB9765_SGB15382|SGB15382 0.6611998 0.3365931 0.0512005 0.7887754 -0.4551136 0.4766406 -0.5584230 0.3847135 -0.3518043 0.7314898 0.2066186 0.8557582 -1.7200664 0.8098391 -1.8283646 0.7115186 -2.3820646 1.6784560
Hydrogenoanaerobacterium_saccharovorans|SGB15350 2.1010775 0.0999120 -0.2633877 0.4561183 -0.8148049 0.4901774 -0.3713756 0.7538286 -1.2582343 0.5067191 -0.8868587 0.6727946 -3.1508725 1.5212627 -2.7166565 1.9739053 -5.0076434 2.4911749
GGB51884_SGB49168|SGB49168 -1.0422911 0.2405985 0.3586399 0.1477642 0.5163300 0.5309754 -0.5280498 0.5233464 1.5607098 0.2393218 2.0887597 0.1562966 -1.1144372 2.1470972 -2.1652486 1.1091490 -1.0566859 4.1781056
Ruthenibacterium_lactatiformans|SGB15271 2.2001055 0.0171456 -0.3591971 0.1578951 0.4922494 0.5607303 0.6647315 0.4342578 0.3197673 0.8137106 -0.3449643 0.8186839 -1.1823323 2.1668311 -1.0164546 2.3459177 -2.3679511 3.0074857
GGB9770_SGB15390|SGB15390 1.0814142 0.2124915 -0.2363568 0.3269031 0.4501346 0.5756883 0.0502011 0.9503817 0.8500682 0.5103293 0.7998672 0.5763122 -1.1416525 2.0419217 -1.5478639 1.6482660 -1.7047641 3.4049006
Lachnospiraceae_bacterium|SGB4953 -1.3531550 0.3151562 -0.0480135 0.8978560 -0.6962323 0.5778283 -0.9387056 0.4551058 -0.4537590 0.8210771 0.4849466 0.8274593 -3.1721696 1.7797050 -3.4244078 1.5469966 -4.4276602 3.5201422
Escherichia_coli|SGB10068 -0.7903037 0.5155974 -0.1016893 0.7636000 0.6292480 0.5778320 -0.7439081 0.5123601 2.0024041 0.2710178 2.7463122 0.1741629 -1.6085021 2.8669981 -2.9904837 1.5026676 -1.5892046 5.5940128
Clostridium_SGB48024|SGB48024 0.4519603 0.7520495 -0.2487415 0.5325875 0.6917612 0.6034111 0.6256543 0.6397302 0.7578680 0.7227787 0.1322137 0.9554800 -1.9440511 3.3275734 -2.0205534 3.2718620 -3.4726340 4.9883700
GGB9616_SGB15052|SGB15052 1.5297420 0.1826374 -0.4748580 0.1377653 -0.5353202 0.6147084 0.1558757 0.8838517 -1.2265162 0.4727315 -1.3823918 0.4655389 -2.6405352 1.5698947 -1.9576421 2.2693934 -4.6054047 2.1523724
Clostridium_sp_AM29_11AC|SGB4701 -0.4190217 0.2697393 0.0651278 0.5368505 -0.1747140 0.6199382 -0.3270107 0.3559815 -0.0224173 0.9683535 0.3045934 0.6271187 -0.8721016 0.5226735 -1.0271488 0.3731273 -1.1417306 1.0968959
GGB45432_SGB63101|SGB63101 0.9606443 0.1324255 0.1924325 0.2775483 0.2893623 0.6242376 0.3556845 0.5487960 0.2230400 0.8139216 -0.1326445 0.8995423 -0.8800204 1.4587450 -0.8183101 1.5296792 -1.6538295 2.0999096
GGB9719_SGB15273|SGB15273 -0.1292524 0.8290187 0.1266493 0.4478674 0.2630463 0.6368129 -0.4259368 0.4467818 0.9520293 0.2882147 1.3779661 0.1666098 -0.8398756 1.3659682 -1.5332085 0.6813350 -0.8181701 2.7222287
GGB34900_SGB14891|SGB14891 0.1088754 0.8259804 -0.0920715 0.5047089 0.2077756 0.6521532 -0.0259277 0.9552936 0.4414790 0.5508391 0.4674067 0.5689881 -0.7047852 1.1203365 -0.9420876 0.8902323 -1.0231893 1.9061473
GGB9782_SGB15403|SGB15403 -0.4238866 0.4276985 -0.0155148 0.9168117 0.2200436 0.6577252 -0.2174608 0.6627331 0.6575480 0.4100537 0.8750087 0.3232823 -0.7632503 1.2033375 -1.2046327 0.7697112 -0.9206474 2.2357434
GGB4682_SGB6472|SGB6472 -1.3962184 0.1254699 0.0227795 0.9279899 -0.3514603 0.6766431 -0.8935609 0.2921482 0.1906403 0.8878666 1.0842012 0.4702135 -2.0200519 1.3171313 -2.5687333 0.7816116 -2.4874640 2.8687446
GGB9301_SGB14262|SGB14262 -0.3951466 0.4059058 -0.0363331 0.7833466 -0.1777303 0.6874059 -0.2561443 0.5637002 -0.0993163 0.8885431 0.1568279 0.8418310 -1.0523149 0.6968542 -1.1341781 0.6218896 -1.5030323 1.3043996
Kineothrix_SGB4886|SGB4886 -1.0280994 0.4334623 0.4198416 0.2515053 0.4526965 0.7104982 0.2621930 0.8303957 0.6431999 0.7424452 0.3810068 0.8606287 -1.9623068 2.8676998 -2.1623349 2.6867209 -3.2329017 4.5193015
Mediterraneibacter_faecis|SGB4563 -2.9720856 0.0052955 0.4182300 0.1521180 -0.3473345 0.7205050 -0.3661211 0.7071842 -0.3285480 0.8329586 0.0375731 0.9826451 -2.2697283 1.5750592 -2.2960966 1.5638544 -3.4140069 2.7569109
Blautia_SGB4833|SGB4833 -1.1961469 0.1175504 0.0179656 0.9322538 0.2519153 0.7214130 0.1050509 0.8822227 0.3987797 0.7250761 0.2937289 0.8152682 -1.1471075 1.6509381 -1.2994895 1.5095913 -1.8466641 2.6442236
Anaeromassilibacillus_senegalensis|SGB14894 1.9796230 0.0444432 -0.2719918 0.3171672 -0.3204024 0.7237513 0.1088187 0.9047941 -0.7496234 0.6065347 -0.8584420 0.5948488 -2.1154954 1.4746907 -1.6933541 1.9109914 -3.6307635 2.1315168
GGB3490_SGB4664|SGB4664 -2.7584379 0.0051133 0.0617538 0.8180402 0.3064942 0.7326367 -0.5998586 0.5058074 1.2128470 0.4004068 1.8127056 0.2578328 -1.4701785 2.0831668 -2.3835383 1.1838211 -1.6387283 4.0644222
Dorea_formicigenerans|SGB4575 -3.0065286 0.0004844 0.3806610 0.1031793 0.2616408 0.7357134 0.7312537 0.3484094 -0.2079721 0.8672102 -0.9392258 0.4964350 -1.2733945 1.7966760 -0.8098356 2.2723430 -2.6717172 2.2557730
Clostridiaceae_unclassified_SGB4771|SGB4771 -1.6108668 0.2219337 0.4368638 0.2340318 0.3543923 0.7718632 -1.1816205 0.3367074 1.8904052 0.3363654 3.0720257 0.1601031 -2.0663366 2.7751213 -3.6118966 1.2486556 -1.9948862 5.7756965
GGB4604_SGB6371|SGB6371 -2.2092325 0.0449440 0.0402337 0.8944859 0.2484285 0.8064369 -0.6418589 0.5286993 1.1387160 0.4845959 1.7805749 0.3250066 -1.7596804 2.2565374 -2.6578877 1.3741699 -2.0843166 4.3617485
GGB1420_SGB1957|SGB1957 -0.2994382 0.6458058 0.2849549 0.1184456 0.1397471 0.8175945 -0.7433843 0.2235075 1.0228785 0.2942814 1.7662628 0.1039283 -1.0603405 1.3398346 -1.9482050 0.4614363 -0.9032726 2.9490296
GGB38744_SGB14842|SGB14842 -0.0481380 0.9024854 0.0638117 0.5599891 -0.0783072 0.8303891 -0.0967062 0.7921843 -0.0599081 0.9186683 0.0367981 0.9548954 -0.8023938 0.6457795 -0.8236486 0.6302362 -1.2220736 1.1022574
Clostridium_methylpentosum|SGB14962 -0.5624150 0.1825842 -0.0335368 0.7742541 0.0826684 0.8324354 -0.1882149 0.6315304 0.3535518 0.5731934 0.5417666 0.4366670 -0.6912261 0.8565630 -0.9651616 0.5887319 -0.8885559 1.5956594
GGB2653_SGB3574|SGB3574 1.5518418 0.1614485 -0.4716452 0.1267593 -0.2090488 0.8384994 0.4483001 0.6634331 -0.8663977 0.5989242 -1.3146978 0.4720136 -2.2406286 1.8225310 -1.5912921 2.4878923 -4.1271013 2.3943058
Clostridium_sp_AF32_12BH|SGB4710 -1.2624833 0.0779989 -0.0853467 0.6661600 -0.1088297 0.8691888 -0.1678652 0.8002667 -0.0497943 0.9625519 0.1180709 0.9200342 -1.4177682 1.2001087 -1.4819660 1.1462356 -2.1506521 2.0510635
GGB3363_SGB4448|SGB4448 -0.8722215 0.3655405 -0.1996082 0.4566678 -0.1472339 0.8693142 -0.7976302 0.3756334 0.5031624 0.7261667 1.3007926 0.4149270 -1.9197901 1.6253223 -2.5771773 0.9819168 -2.3418060 3.3481307
Clostridium_SGB4751|SGB4751 -1.0418126 0.2114167 0.0237241 0.9182359 0.1219428 0.8745470 0.4186829 0.5894977 -0.1747974 0.8878620 -0.5934802 0.6660818 -1.4079194 1.6518049 -1.1172129 1.9545787 -2.6302396 2.2806449
Ruminococcus_torques|SGB4608 -1.4586922 0.2487537 -0.0298792 0.9321257 0.1753698 0.8810979 -0.0650268 0.9559408 0.4157663 0.8251480 0.4807931 0.8177957 -2.1469926 2.4977321 -2.3965484 2.2664948 -3.3116457 4.1431784
Ruminococcus_sp_AF41_9|SGB25497 -1.9581259 0.1159202 0.0704723 0.8378831 -0.1681373 0.8838588 -0.3703484 0.7486453 0.0340738 0.9852834 0.4044221 0.8435129 -2.4480518 2.1117772 -2.6592547 1.9185579 -3.6252091 3.6933566
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 -1.1697254 0.1183491 0.0227932 0.9123736 -0.0800245 0.9079618 0.0352389 0.9595563 -0.1952879 0.8604734 -0.2305268 0.8515799 -1.4511615 1.2911124 -1.3413058 1.4117835 -2.3959748 2.0053990
Enterocloster_hominis|SGB4721 0.3608653 0.6472807 -0.0365065 0.8678887 -0.0753199 0.9182015 0.0466289 0.9495035 -0.1972688 0.8669150 -0.2438977 0.8517928 -1.5280934 1.3774535 -1.4118742 1.5051320 -2.5289830 2.1344455
Collinsella_aerofaciens|SGB14535 0.5811178 0.3323737 -0.1499484 0.3687864 -0.0489983 0.9298182 0.2714336 0.6271981 -0.3694302 0.6792067 -0.6408639 0.5179718 -1.1510321 1.0530355 -0.8349465 1.3778138 -2.1382042 1.3993437
GGB3740_SGB5076|SGB5076 -2.6975736 0.0270404 -0.1589040 0.6355782 -0.0706479 0.9497001 -1.3934332 0.2171458 1.2521374 0.4866240 2.6455705 0.1865461 -2.2890898 2.1477940 -3.6206245 0.8337581 -2.3084815 4.8127562
Lachnoclostridium_sp_An138|SGB4774 0.3378946 0.5408997 -0.2007474 0.1935681 -0.0254781 0.9604370 -0.0567166 0.9124181 0.0057603 0.9944247 0.0624769 0.9455131 -1.0429108 0.9919545 -1.0781619 0.9647288 -1.6272281 1.6387487
Lawsonibacter_asaccharolyticus|SGB15154 4.0450013 0.0004181 -0.3407995 0.2705301 -0.0429466 0.9667473 0.5696470 0.5820968 -0.6555401 0.6918870 -1.2251871 0.5045351 -2.0836643 1.9977711 -1.4791192 2.6184131 -3.9309100 2.6198298
Blautia_massiliensis|SGB4826 -4.4798630 0.0033905 -0.5202501 0.2126433 0.0445363 0.9744217 -0.9567490 0.4932159 1.0458216 0.6391951 2.0025707 0.4187699 -2.7069985 2.7960710 -3.7191357 1.8056376 -3.3704159 5.4620592
GGB9707_SGB15229|SGB15229 0.6596547 0.4596332 0.0782624 0.7523781 -0.0046849 0.9954895 -0.1855402 0.8235574 0.1761703 0.8946372 0.3617105 0.8063177 -1.6463606 1.6369907 -1.8336905 1.4626101 -2.4587336 2.8110742
Open code




if(file.exists(
  'gitignore/lm_results/result_microbiome_SGB10_VGdur_training.csv'
  ) == FALSE){
  write.table(result_microbiome, 
              'gitignore/result_microbiome_SGB10_VGdur_training.csv', 
              row.names = FALSE)
  }

7.2 Validation cohort

Open code

data_analysis_microbiome <- data_microbiome_validation %>%
  filter(Diet == 'VEGAN',
         !is.na(Diet_duration)) %>%
  dplyr::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>% 
  dplyr::select(
    Diet_duration, Age,
    all_of(diet_sensitive_taxa)
  )

summary(data_analysis_microbiome[ , 1:12])
##  Diet_duration         Age        GGB1420_SGB1957|SGB1957
##  Min.   : 0.170   Min.   :21.54   Min.   :-2.0851        
##  1st Qu.: 4.062   1st Qu.:30.05   1st Qu.:-1.7612        
##  Median : 6.000   Median :32.81   Median :-1.4597        
##  Mean   : 6.790   Mean   :32.37   Mean   :-0.7887        
##  3rd Qu.:10.000   3rd Qu.:34.24   3rd Qu.:-0.7294        
##  Max.   :15.000   Max.   :44.08   Max.   : 4.5965        
##  Escherichia_coli|SGB10068 Bacteroides_clarus|SGB1832
##  Min.   :-2.9814           Min.   :-1.40490          
##  1st Qu.:-2.3623           1st Qu.:-0.81198          
##  Median :-1.8436           Median :-0.58475          
##  Mean   :-0.8066           Mean   :-0.17755          
##  3rd Qu.: 0.7762           3rd Qu.: 0.03368          
##  Max.   : 5.0127           Max.   : 5.07320          
##  Hydrogenoanaerobacterium_saccharovorans|SGB15350 GGB9775_SGB15395|SGB15395
##  Min.   :-0.6425                                  Min.   :-3.445           
##  1st Qu.: 0.5164                                  1st Qu.:-2.783           
##  Median : 1.6681                                  Median :-2.307           
##  Mean   : 1.4066                                  Mean   :-1.645           
##  3rd Qu.: 2.2456                                  3rd Qu.:-1.295           
##  Max.   : 4.1991                                  Max.   : 2.865           
##  Ruthenibacterium_lactatiformans|SGB15271
##  Min.   :-2.4003                         
##  1st Qu.:-1.2716                         
##  Median : 0.5358                         
##  Mean   : 0.4248                         
##  3rd Qu.: 1.6461                         
##  Max.   : 6.8260                         
##  Lawsonibacter_asaccharolyticus|SGB15154 Clostridium_fessum|SGB4705
##  Min.   :-4.1071                         Min.   :-0.3956           
##  1st Qu.:-2.2461                         1st Qu.: 2.8100           
##  Median : 0.3093                         Median : 3.6723           
##  Mean   :-0.1859                         Mean   : 3.3043           
##  3rd Qu.: 1.6178                         3rd Qu.: 4.0954           
##  Max.   : 3.4482                         Max.   : 6.8045           
##  GGB9707_SGB15229|SGB15229 Collinsella_aerofaciens|SGB14535
##  Min.   :-3.3451           Min.   :-1.7233                 
##  1st Qu.:-1.9104           1st Qu.: 0.3225                 
##  Median : 2.1067           Median : 5.2318                 
##  Mean   : 0.9578           Mean   : 3.6460                 
##  3rd Qu.: 2.9921           3rd Qu.: 5.7957                 
##  Max.   : 5.9439           Max.   : 8.0904

7.2.0.1 Define number of microbiome and covariates

Open code
n_covarites <- 2
n_features <- ncol(data_analysis_microbiome) - n_covarites

7.2.0.2 Create empty objects

Open code
outcome <- vector('double', n_features)
est_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
est_Age <- vector('double', n_features)
P_Age <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features)
CI_U_VGduration <- vector('double', n_features)

7.2.0.3 Estimate over outcomes

Open code
for (i in 1:n_features) {
  ## define variable
  data_analysis_microbiome$outcome <- data_analysis_microbiome[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Diet_duration + Age, 
              data = data_analysis_microbiome %>% 
                mutate(Diet_duration = Diet_duration/10,
                       Age = (Age-33)/10))

  ## save results
  outcome[i] <- names(data_analysis_microbiome)[i + n_covarites]

  ## diet effect
  tr <- confint(model)

  CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][1]
  CI_U_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]

  est_VGduration[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 1
  ]

  P_VGduration[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 4
  ]
  
    
  est_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 1
  ]

  P_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 4
  ]
}

7.2.0.4 Results table

Open code
result_microbiome_val <- data.frame(
  outcome,
  est_VGduration, P_VGduration,
  CI_L_VGduration, CI_U_VGduration,
  est_Age, P_Age
)

kableExtra::kable(result_microbiome_val %>% arrange(P_VGduration),
  caption = "Results of linear models estimating the effect of vegan diet duration on CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome. Only taxa with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. `est`: regression coefficient, i.e. expected change in CLR-transformed relative abundance per +10 years of vegan diet duration and age, respectively. `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model"
)
Results of linear models estimating the effect of vegan diet duration on CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome. Only taxa with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. est: regression coefficient, i.e. expected change in CLR-transformed relative abundance per +10 years of vegan diet duration and age, respectively. P: p-value; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model
outcome est_VGduration P_VGduration CI_L_VGduration CI_U_VGduration est_Age P_Age
Bacteroides_clarus|SGB1832 -1.3116349 0.0256259 -2.4569307 -0.1663391 0.2864820 0.5560623
GGB1420_SGB1957|SGB1957 -1.5377833 0.0404331 -3.0058962 -0.0696704 0.8709319 0.1659532
Roseburia_lenta|SGB4957 2.5621762 0.0479091 0.0244968 5.0998555 -0.9569653 0.3758384
GGB9719_SGB15273|SGB15273 -0.4464868 0.0489677 -0.8908756 -0.0020981 0.1788676 0.3447953
GGB3740_SGB5076|SGB5076 -2.3724685 0.0695200 -4.9414079 0.1964708 0.1074429 0.9214586
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 1.8311048 0.0730549 -0.1773846 3.8395941 -1.4368845 0.0961959
GGB38744_SGB14842|SGB14842 1.0226307 0.0949474 -0.1839934 2.2292548 0.0972689 0.8492989
Clostridiaceae_bacterium_CLA_AA_H274|SGB4704 -0.5968727 0.1157674 -1.3457747 0.1520293 0.3136219 0.3258177
GGB9775_SGB15395|SGB15395 1.1556507 0.1190987 -0.3079122 2.6192136 -0.6311460 0.3117642
Clostridium_methylpentosum|SGB14962 -0.3300580 0.1433697 -0.7758914 0.1157754 -0.0422818 0.8231243
Ruminococcus_sp_AF41_9|SGB25497 -1.7752602 0.1684136 -4.3261294 0.7756089 1.4894188 0.1725989
Mediterraneibacter_faecis|SGB4563 0.8760938 0.1911025 -0.4514032 2.2035908 0.3332206 0.5546844
Blautia_obeum|SGB4810 -1.3132072 0.1919543 -3.3069235 0.6805090 1.3406263 0.1173265
Phocea_massiliensis|SGB14837 -1.0053122 0.1955656 -2.5442082 0.5335838 0.2422542 0.7107319
GGB9634_SGB15097|SGB15097 -0.3351536 0.2028299 -0.8566955 0.1863883 0.0641197 0.7720292
GGB4569_SGB6312|SGB6312 -0.3071262 0.2438905 -0.8301199 0.2158675 0.3005928 0.1792893
Clostridium_fessum|SGB4705 -0.8168644 0.2486182 -2.2220521 0.5883233 0.5179845 0.3865860
GGB6601_SGB9333|SGB9333 -0.5479593 0.2655503 -1.5251707 0.4292520 -0.0665058 0.8725455
Clostridium_SGB48024|SGB48024 1.0375110 0.2893262 -0.9077645 2.9827866 0.0231073 0.9776587
Butyricicoccus_intestinisimiae|SGB14980 -0.2834320 0.3030763 -0.8303669 0.2635030 -0.1620176 0.4859937
Dorea_formicigenerans|SGB4575 -0.3862337 0.3224408 -1.1622925 0.3898250 -0.0016379 0.9960299
GGB9781_SGB66170|SGB66170 0.6408836 0.3395612 -0.6936984 1.9754656 0.0388259 0.9453205
Oscillibacter_valericigenes|SGB15076 -0.3615813 0.3398265 -1.1149593 0.3917966 0.1055871 0.7412226
Hydrogenoanaerobacterium_saccharovorans|SGB15350 0.4645051 0.3403956 -0.5044727 1.4334829 -0.8286272 0.0480144
Clostridium_sp_AM29_11AC|SGB4701 -0.1449633 0.3456100 -0.4506773 0.1607508 -0.0147491 0.9094505
GGB4604_SGB6371|SGB6371 -0.4546108 0.3634033 -1.4497721 0.5405505 0.2683480 0.5257854
Ruthenibacterium_lactatiformans|SGB15271 0.7927935 0.3884980 -1.0371015 2.6226885 -0.1093027 0.8880232
GGB3061_SGB4063|SGB4063 -0.4736801 0.3994293 -1.5927615 0.6454014 0.3599699 0.4495312
GGB3288_SGB4342|SGB4342 0.8072622 0.3995920 -1.1005789 2.7151032 1.0061396 0.2172203
GGB9616_SGB15052|SGB15052 0.6446913 0.4526954 -1.0656644 2.3550470 -1.3333577 0.0705644
Holdemania_filiformis|SGB4046 -0.4785331 0.4581304 -1.7635596 0.8064934 0.2488215 0.6483724
Lawsonibacter_asaccharolyticus|SGB15154 0.7768444 0.4601655 -1.3187770 2.8724657 -1.1605956 0.1953730
Anaeromassilibacillus_senegalensis|SGB14894 0.6205170 0.4674442 -1.0810253 2.3220594 -0.2137758 0.7671808
Clostridium_SGB4751|SGB4751 -0.3261207 0.4789721 -1.2441147 0.5918733 0.1212707 0.7555793
Streptococcus_thermophilus|SGB8002 0.5809667 0.4861402 -1.0815170 2.2434503 -1.1984551 0.0937437
Clostridiaceae_unclassified_SGB4771|SGB4771 0.6025768 0.4925833 -1.1476383 2.3527919 -0.1871933 0.8009868
Guopingia_tenuis|SGB14127 0.6224428 0.4925961 -1.1855279 2.4304135 0.3826205 0.6182632
GGB4596_SGB6358|SGB6358 0.3536710 0.4928491 -0.6742188 1.3815607 -0.3408037 0.4357955
Lactococcus_lactis|SGB7985 -0.1897864 0.5116059 -0.7662295 0.3866566 -0.3153081 0.2008496
Blautia_SGB4833|SGB4833 -0.4940465 0.5136020 -2.0017777 1.0136846 -0.9149797 0.1566206
Lachnospiraceae_bacterium|SGB4953 -1.0071781 0.5303928 -4.2079649 2.1936088 0.9428187 0.4884436
Collinsella_aerofaciens|SGB14535 0.8355618 0.5429085 -1.9030123 3.5741360 1.2375478 0.2893924
GGB9545_SGB14952|SGB14952 0.2113411 0.5501333 -0.4939692 0.9166514 0.1588378 0.5959824
GGB51884_SGB49168|SGB49168 0.3902238 0.5522435 -0.9190182 1.6994658 0.0955302 0.8634413
GGB2653_SGB3574|SGB3574 0.4996257 0.5631667 -1.2239706 2.2232220 0.6483113 0.3770504
Blautia_massiliensis|SGB4826 -0.5656977 0.5760979 -2.5839417 1.4525463 -0.3690641 0.6666765
GGB9301_SGB14262|SGB14262 -0.1436191 0.5893442 -0.6743930 0.3871547 0.0772252 0.7317400
GGB9509_SGB14906|SGB14906 0.2756730 0.6089385 -0.7994440 1.3507900 -0.6722015 0.1446896
Escherichia_coli|SGB10068 0.4677874 0.6291247 -1.4649917 2.4005666 0.2873403 0.7261320
GGB9770_SGB15390|SGB15390 0.3771049 0.6337826 -1.2024667 1.9566765 -0.1000911 0.8812607
GGB3490_SGB4664|SGB4664 -0.3982692 0.6379584 -2.0872922 1.2907539 0.2018839 0.7781896
GGB3627_SGB4906|SGB4906 -0.4340199 0.6462370 -2.3211128 1.4530729 -0.6915129 0.3893794
GGB3363_SGB4448|SGB4448 0.1481787 0.6496693 -0.5028794 0.7992368 -0.3273313 0.2391208
Agathobaculum_butyriciproducens|SGB14991 -0.4862146 0.6557303 -2.6628755 1.6904462 1.2022907 0.1965371
Eubacterium_ramulus|SGB4959 0.2791725 0.6612571 -0.9924393 1.5507843 0.2740675 0.6118263
GGB34900_SGB14891|SGB14891 -0.2435558 0.6776238 -1.4130127 0.9259010 -0.1138686 0.8184873
Lachnoclostridium_sp_An138|SGB4774 0.1632043 0.7311783 -0.7852630 1.1116716 0.2295562 0.5688969
Lachnospiraceae_bacterium_OF09_6|SGB4966 -0.3084660 0.7447649 -2.2004050 1.5834730 1.1604858 0.1522988
GGB9765_SGB15382|SGB15382 0.3028787 0.7507575 -1.6010943 2.2068517 0.4884158 0.5460580
GGB9782_SGB15403|SGB15403 -0.0967102 0.7546688 -0.7146873 0.5212670 -0.1567114 0.5506469
GGB4682_SGB6472|SGB6472 -0.1737535 0.7857599 -1.4503017 1.1027948 -0.4738158 0.3833382
Youxingia_wuxianensis|SGB82503 -0.1978751 0.7878078 -1.6660328 1.2702826 -0.2671297 0.6682418
GGB3491_SGB4665|SGB4665 -0.2047389 0.7973044 -1.7967477 1.3872700 0.4716595 0.4859371
GGB9707_SGB15229|SGB15229 -0.2750773 0.8133382 -2.6017973 2.0516427 1.6353199 0.1020206
Clostridium_sp_AF32_12BH|SGB4710 -0.2020956 0.8290283 -2.0712815 1.6670904 -1.3693280 0.0886788
GGB45432_SGB63101|SGB63101 -0.1802382 0.8577759 -2.1891923 1.8287160 0.7776985 0.3633936
GGB9531_SGB14932|SGB14932 -0.0881209 0.8877615 -1.3353313 1.1590894 -0.1611528 0.7607626
GGB3000_SGB3991|SGB3991 -0.0464247 0.8958820 -0.7550767 0.6622274 0.0881035 0.7695390
Ruminococcus_torques|SGB4608 -0.1097039 0.9228469 -2.3725500 2.1531421 -0.4820386 0.6159588
GGB9616_SGB15051|SGB15051 0.0282394 0.9742853 -1.7219390 1.7784178 0.5838507 0.4329963
GGB58158_SGB79798|SGB79798 0.0217278 0.9792870 -1.6501596 1.6936153 0.5347863 0.4520518
Enterocloster_hominis|SGB4721 -0.0102244 0.9884335 -1.4191964 1.3987476 -0.5078804 0.3971107
Kineothrix_SGB4886|SGB4886 0.0007642 0.9993285 -1.8132847 1.8148132 -0.9095313 0.2404009
Open code

if (file.exists(
  "gitignore/lm_results/result_microbiom_SGB10_VGdur_valid.csv"
  ) == FALSE) {
  write.table(result_microbiome_val,
    "gitignore/lm_results/result_microbiom_SGB10_VGdur_valid.csv",
    row.names = FALSE
  )
}

7.3 Forest plot

7.3.1 Prepare data

Open code

## subset result tables
result_microbiome_subset <- result_microbiome %>%
  filter(outcome %in% diet_sensitive_taxa)

result_microbiome_val_subset <- result_microbiome_val %>%
  filter(outcome %in% diet_sensitive_taxa)

## create a data frame
data_forest <- data.frame(
  outcome = rep(diet_sensitive_taxa, 3),
  beta = c(
    result_microbiome_subset$est_Diet_duration_inCZ,
    result_microbiome_subset$est_Diet_duration_inIT,
    result_microbiome_val_subset$est_VGduration
  ),
  lower = c(
    result_microbiome_subset$CI_L_Diet_duration_inCZ,
    result_microbiome_subset$CI_L_Diet_duration_inIT,
    result_microbiome_val_subset$CI_L_VGduration
  ),
  upper = c(
    result_microbiome_subset$CI_U_Diet_duration_inCZ,
    result_microbiome_subset$CI_U_Diet_duration_inIT,
    result_microbiome_val_subset$CI_U_VGduration
  ),
  dataset = c(
    rep("CZ", len),
    rep("IT", len),
    rep("Validation", len)
  )
)


 data_forest <- data_forest %>%
  mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
  left_join(
    elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
    by = 'outcome') %>% 
   mutate(outcome = factor(outcome, levels = validation_order))

7.3.1.1 Plotting

Open code

plotac <- "forest_plot_microbiome_SGB10_VGduration"
path <- "gitignore/figures"

colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")

assign(plotac, ggplot(
  data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
  geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
  geom_hline(yintercept = 0, color = "black") +
  geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
  scale_color_manual(values = colors) +
  labs(
    y = "Effect of vegan diet duration (+10y) on CLR(relative abundance)",
    x = "Outcome",
    color = "Dataset"
  ) +
  theme_minimal() +
  coord_flip() + # Flip coordinates to have outcomes on the y-axis
  scale_x_discrete(
    labels = setNames(
      ifelse(data_forest$in_winner, 
             paste0("**", data_forest$outcome, "**"), 
             as.character(data_forest$outcome)
      ), data_forest$outcome
    )
  ) +
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = ggtext::element_markdown(size = 10),  
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

Forest plot shows the effect of duration of vegan diet (within the vegan sub-population) on CLR-transformed relative abundance of selected taxa (inferred from shotgun metagenomic sequencing), with 95% confidence intervals, across two training cohorts (Czech and Italian) and one independent Czech validation cohort. Green, blue, and grey points/lines represent the expected change in CLR-transformed relative abundance per +10 years of vegan diet duration within the Italian, Czech, and Czech validation cohorts, respectively. Positive values indicate higher relative abundances in long-term vegans compared to short-term vegans. Only taxa with significant differences between vegans and omnivores in training cohorts (average effect over the two countries) were included. Estimates for the training cohorts come from a single linear model with predictors Diet_duration, Country, their interaction (Diet_duration x Country), and Age as fixed-effect predictors. Age was included as it correlates with Diet_duration and could act as a confounder. In the Czech validation cohort, Diet_duration and Age were fixed-effect predictors. Diet-sensitive taxa are shown in bold, in the same order as in the forest plot of vegan–omnivore differences
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 13
  )
}

8 Reproducibility

Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Prague
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] vegan_2.6-6.1         lattice_0.22-5        permute_0.9-7        
##  [4] zCompositions_1.5.0-4 truncnorm_1.0-8       NADA_1.6-1.1         
##  [7] survival_3.7-0        MicrobiomeStat_1.2    glmnet_4.1-8         
## [10] pROC_1.18.0           arm_1.12-2            lme4_1.1-35.5        
## [13] Matrix_1.7-0          MASS_7.3-65           car_3.1-2            
## [16] carData_3.0-5         emmeans_1.10.4        brms_2.21.0          
## [19] Rcpp_1.0.13           rms_6.8-1             Hmisc_5.1-3          
## [22] glmmTMB_1.1.9         ggtext_0.1.2          ggdist_3.3.2         
## [25] cowplot_1.1.1         ggpubr_0.4.0          sjPlot_2.8.16        
## [28] kableExtra_1.4.0      flextable_0.9.6       gtsummary_2.0.2      
## [31] compositions_2.0-8    janitor_2.2.0         stringi_1.8.7        
## [34] lubridate_1.9.4       forcats_1.0.0         stringr_1.5.1        
## [37] dplyr_1.1.4           purrr_1.0.4           readr_2.1.5          
## [40] tidyr_1.3.1           tibble_3.3.0          ggplot2_3.5.1        
## [43] tidyverse_1.3.1       readxl_1.3.1          openxlsx_4.2.8       
## [46] RJDBC_0.2-10          rJava_1.0-11          DBI_1.2.3            
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.6                matrixStats_1.3.0       httr_1.4.2             
##   [4] insight_0.20.2          numDeriv_2016.8-1.1     tools_4.4.3            
##   [7] backports_1.5.0         utf8_1.2.6              sjlabelled_1.2.0       
##  [10] R6_2.6.1                mgcv_1.9-1              withr_3.0.2            
##  [13] Brobdingnag_1.2-7       prettyunits_1.2.0       gridExtra_2.3          
##  [16] bayesm_3.1-6            quantreg_5.98           cli_3.6.5              
##  [19] textshaping_0.3.6       performance_0.12.2      officer_0.6.6          
##  [22] sandwich_3.0-1          labeling_0.4.2          mvtnorm_1.1-3          
##  [25] robustbase_0.93-9       polspline_1.1.25        ggridges_0.5.3         
##  [28] askpass_1.1             QuickJSR_1.3.1          commonmark_1.9.1       
##  [31] systemfonts_1.0.4       StanHeaders_2.32.10     foreign_0.8-90         
##  [34] gfonts_0.2.0            svglite_2.1.3           rstudioapi_0.16.0      
##  [37] httpcode_0.3.0          generics_0.1.4          shape_1.4.6            
##  [40] distributional_0.4.0    zip_2.2.0               inline_0.3.19          
##  [43] loo_2.4.1               abind_1.4-5             lifecycle_1.0.4        
##  [46] multcomp_1.4-18         yaml_2.3.10             snakecase_0.11.1       
##  [49] grid_4.4.3              promises_1.2.0.1        crayon_1.5.3           
##  [52] haven_2.4.3             pillar_1.11.0           knitr_1.50             
##  [55] statip_0.2.3            boot_1.3-31             estimability_1.5.1     
##  [58] codetools_0.2-19        glue_1.7.0              V8_4.4.2               
##  [61] fontLiberation_0.1.0    data.table_1.15.4       vctrs_0.6.5            
##  [64] cellranger_1.1.0        gtable_0.3.0            assertthat_0.2.1       
##  [67] datawizard_0.12.2       xfun_0.52               mime_0.12              
##  [70] coda_0.19-4             modeest_2.4.0           timeDate_3043.102      
##  [73] iterators_1.0.14        statmod_1.4.36          TH.data_1.1-0          
##  [76] nlme_3.1-168            fontquiver_0.2.1        rstan_2.32.6           
##  [79] fBasics_4041.97         tensorA_0.36.2.1        TMB_1.9.14             
##  [82] rpart_4.1.24            colorspace_2.0-2        nnet_7.3-20            
##  [85] tidyselect_1.2.1        processx_3.8.4          timeSeries_4032.109    
##  [88] compiler_4.4.3          curl_6.4.0              rvest_1.0.2            
##  [91] htmlTable_2.4.0         SparseM_1.81            xml2_1.3.3             
##  [94] fontBitstreamVera_0.1.1 posterior_1.6.0         checkmate_2.3.2        
##  [97] scales_1.3.0            DEoptimR_1.0-10         callr_3.7.6            
## [100] spatial_7.3-15          digest_0.6.37           minqa_1.2.4            
## [103] rmarkdown_2.27          htmltools_0.5.8.1       pkgconfig_2.0.3        
## [106] base64enc_0.1-3         stabledist_0.7-2        dbplyr_2.1.1           
## [109] fastmap_1.2.0           rlang_1.1.6             htmlwidgets_1.6.4      
## [112] shiny_1.9.1             farver_2.1.0            zoo_1.8-9              
## [115] jsonlite_2.0.0          magrittr_2.0.3          Formula_1.2-4          
## [118] bayesplot_1.8.1         munsell_0.5.0           gdtools_0.3.7          
## [121] stable_1.1.6            plyr_1.8.6              pkgbuild_1.3.1         
## [124] parallel_4.4.3          ggrepel_0.9.5           sjmisc_2.8.10          
## [127] ggeffects_1.7.0         splines_4.4.3           gridtext_0.1.5         
## [130] hms_1.1.3               sjstats_0.19.0          ps_1.7.7               
## [133] uuid_1.0-3              markdown_1.13           ggsignif_0.6.3         
## [136] stats4_4.4.3            rmutil_1.1.10           rstantools_2.1.1       
## [139] crul_1.5.0              reprex_2.0.1            evaluate_1.0.4         
## [142] RcppParallel_5.1.8      modelr_0.1.8            nloptr_2.0.0           
## [145] tzdb_0.5.0              foreach_1.5.2           httpuv_1.6.5           
## [148] MatrixModels_0.5-3      openssl_1.4.6           clue_0.3-65            
## [151] broom_1.0.6             xtable_1.8-4            rstatix_0.7.0          
## [154] later_1.3.0             viridisLite_0.4.0       ragg_1.4.0             
## [157] lmerTest_3.1-3          cluster_2.1.8.1         timechange_0.3.0       
## [160] bridgesampling_1.1-2

References

Lenth, Russell V. 2024. “Emmeans: Estimated Marginal Means, Aka Least-Squares Means.” https://CRAN.R-project.org/package=emmeans.