Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations

Statistical report for microbiome analysis (SGB level, prevalence > 30% in training dataset)


Authors and affiliations

Anna Ouradova1,*, Giulio Ferrero2,3,*, Miriam Bratova4, Nikola Daskova4, Alena Bohdanecka4,5, Klara Dohnalova6, Marie Heczkova4, Karel Chalupsky6, Maria Kralova7,8, Marek Kuzma9, István Modos4, Filip Tichanek4, Lucie Najmanova9, Barbara Pardini10, Helena Pelantová9, Sonia Tarallo10, Petra Videnska11, Jan Gojda1,#, Alessio Naccarati10,#, Monika Cahova4,#


* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship

1 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic
2 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
3 Department of Computer Science, University of Turin, Turin, Italy
4 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
5 First Faculty of Medicine, Charles University, Prague, Czech Republic
6 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
7 Ambis University, Department of Economics and Management, Prague, Czech Republic
8 Department of Informatics, Brno University of Technology, Brno, Czech Republic
9 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
10 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
11 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic


This is a statistical report of the study A vegan diet signature from a multi-omics study on different European populations is related to favorable metabolic outcomes that is currenlty under review

When using this code or data, cite the original publication:

TO BE ADDED

BibTex citation for the original publication:

TO BE ADDED


Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans

Statistical reports can be found on the reports hub.

Data analysis is described in detail in the statistical methods report.


1 Introduction

This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country and Diet, resulting in four distinct groups.

To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.

1.1 Statistical Methods

The statistical modeling approach is described in detail in this report. Briefly, the methods used included:

  • Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of diet, country, and their possible interaction (diet : country) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.

  • Linear models: Linear models were applied to estimate the effects of diet, country, and their interaction (diet:country) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average conditional effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) were further examined in the independent external validation cohort to assess whether these associations were reproducible. Next, we fit linear models restricted to vegan participants to test whether omics profiles varied with the duration of vegan diet. Fixed-effect predictors were diet duration (per 10 years), country, their interaction, and age (included due to correlation with diet duration).

  • Predictive models (elastic net): We employed elastic net logistic regression (via the glmnet R package) to predict vegan status based on metabolome, lipidome, microbiome and pathways data (one model per dataset; four models in total). We considered three combinations of Lasso and Ridge penalties (alpha = 0, 0.2, 0.4). For each alpha, we selected the penalty strength (λ1se) using 10-fold cross-validation. This value corresponds to the most regularized model whose performance was within one standard error of the minimum deviance. The alpha–lambda pair with the lowest deviance was chosen to fit the final model, whose coefficients are reported.
    To estimate model performance, we repeated the full modeling procedure (including hyperparameter tuning) 500 times on bootstrap resamples of the training data. In each iteration, the model was trained on the resampled data and evaluated on the out-of-bag subjects (i.e., those not included in the training set in that iteration). The mean, and 2.5th, and 97.5th percentiles of the resulting ROC-AUC values represent the estimated out-of-sample AUC and its 95% confidence interval.
    Finally, the final model was applied to an independent validation cohort to generate predicted probabilities of vegan status. These probabilities were then used to assess external discrimination between diet groups (ROC-AUC in the independent validation cohort). The elastic net models were not intended for practical prediction, but to quantify the strength of the signal separating the dietary groups, with its uncertainty, by using all features of a given dataset jointly. It also offered a complementary perspective on which features are most clearly associated with diet

2 Initiation

2.1 Set home directory

Open code
setwd('/home/ticf/secured_data/GitRepo/ticf/478_MOCA_italian')

2.2 Upload initiation file

Open code
source('478_initiation.R')

3 Data

3.1 Upload all original data

3.1.1 Training set

3.1.1.1 Connect metadata from lipidom table

Open code
training_metadata <- read.xlsx('gitignore/data/lipidome_training_cohort.xlsx') %>% 
  select(Sample, Country, Diet) %>% 
  mutate(ID = Sample)

3.1.1.2 Connect training data

Open code
data_microbiome_original_raw  <- read.table(
  'gitignore/data/0_Data_Metaphlan4_SGB_subset.txt') 

colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]

data_microbiome_original_raw <- data_microbiome_original_raw %>% 
  t() %>% 
  data.frame()

colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]

data_microbiome_original_raw <- data_microbiome_original_raw[-1, ] %>% 
  left_join(training_metadata, by = 'ID') %>% 
  mutate(Data = 'valid', Sample = ID) %>% 
  select(Sample, Data, Diet, Country, everything()) %>% 
  select(-ID)

dim(data_microbiome_original_raw)
## [1] 166 684

3.1.2 Validation set

3.1.2.1 Get metadata from lipidom table

Open code
data_lipids_validation <- read.xlsx('gitignore/data/lipidome_validation_cohort.xlsx') %>% 
  mutate(ID = X1) %>% 
  select(ID, X2)

3.1.2.2 Connect validation data

Open code
data_microbiome_validation_raw <- read.table(
  'gitignore/data/0_Data_Metaphlan4_SGB_subset_validation.txt') 

colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]

data_microbiome_validation_raw <- data_microbiome_validation_raw %>% 
  t() %>% 
  data.frame()

colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]

data_microbiome_validation_raw <- data_microbiome_validation_raw[-1, ] %>% 
  mutate(
    ID= paste0("K", gsub("\\..*", "", trimws(ID)))) %>% 
  left_join(data_lipids_validation, by = 'ID') %>% 
  mutate(Data = 'valid', Sample = ID, Diet = X2) %>% 
  select(Sample, Data, Diet, everything()) %>% 
  select(-ID, -X2)

dim(data_microbiome_validation_raw)
## [1] 103 683

3.1.3 Get center-log transformed value

Open code
set.seed(478)

## Training data
metadata <- data_microbiome_original_raw[, c("Sample", "Country", "Diet")]

bacteria_d <- data_microbiome_original_raw[, -c(1:4)] %>% 
  mutate(across(everything(), as.numeric)) %>% 
  select(where(~ mean(. != 0) >= 0.3)) 

dim(data_microbiome_original_raw[, -c(1:4)])
## [1] 166 680
dim(bacteria_d)
## [1] 166 299

rel_taxons <- c(colnames(bacteria_d))

bacteria_data <- bacteria_d / rowSums(bacteria_d)
dim(bacteria_data)
## [1] 166 299
    
bacteria_data <- lrSVD(bacteria_data, 
                        label = 0, 
                        dl = NULL,
                        z.warning = 0.9,
                        z.delete = FALSE,
                       ncp = 1)

clr_bacteria_data <- clr(bacteria_data)
data_microbiome_original <- cbind(metadata, clr_bacteria_data)

if(file.exists('gitignore/data_microbiome_SGB30_training_impCLR.csv') == FALSE){
  write.csv(data_microbiome_original ,
            'gitignore/data_microbiome_SGB30_training_impCLR.csv')
}

## Show variances of CLR proportions across samples
data_variance <- data_microbiome_original %>%
  rowwise() %>%   
  mutate(variance = var(c_across(-(Sample:Diet)))) %>%  
  ungroup() %>%       
  select(Sample, variance)  

## Look at distribution
hist(data_variance$variance)

Open code

## Show extreme samples

## Validation data
metadata <- data_microbiome_validation_raw[, c("Sample", "Data", "Diet")]
bacteria_d <- data_microbiome_validation_raw[, -(1:3)] %>% 
  mutate(across(everything(), as.numeric)) %>% 
  select(all_of(rel_taxons))

bacteria_data <- bacteria_d / rowSums(bacteria_d)
min(colSums(bacteria_data))
## [1] 0.0004796078
    
bacteria_data <- lrSVD(bacteria_data, 
                        label = 0, 
                        dl = NULL,
                        z.warning = 0.9, 
                        z.delete = FALSE,
                       ncp = 1)



clr_bacteria_data <- clr(bacteria_data)
data_microbiome_validation <- cbind(metadata, clr_bacteria_data)

## Add Diet for K284 which has the diet missing
data_microbiome_validation[
  which(data_microbiome_validation$Sample == 'K284'), 'Diet'
  ] <- 'VEGAN'

data_microbiome_validation$`Wujia_chipingensis|SGB5111`
##   [1] -2.2659270 -1.8366784 -1.5719579 -1.5649595 -1.3519655 -1.9273808
##   [7] -0.5872619 -1.9902827 -0.9982257 -1.6966210 -2.1930395 -0.1319288
##  [13] -2.1260143 -1.9245373 -2.0088636  0.5526297  3.9926563 -0.7907929
##  [19] -2.1891711 -1.9395276 -2.1353811 -2.1654279 -2.2360908 -0.3930113
##  [25] -2.3957911 -0.2977329 -1.4533578 -1.4072936 -2.0794670  1.8049062
##  [31]  1.5258843  3.6617309  3.8680214 -2.0412347  0.4626366  0.3866172
##  [37]  5.8044209 -2.2201075 -0.4444399 -2.1377634 -2.1399921 -2.0933531
##  [43]  4.5931692 -1.6871195  2.8164681 -2.5644101 -2.0585439 -2.6629664
##  [49]  0.6723405  4.4534434  4.1610574  5.3832639 -1.6933290 -2.5382040
##  [55] -2.4107280 -2.1198776  5.3413842 -1.8967301 -1.5487724 -2.1396917
##  [61] -1.7628668  4.1335815  5.0872100  3.6031291 -1.8040730  4.9799740
##  [67] -0.9850500 -1.5123666 -0.5770878 -1.7964678 -2.1354643  2.3798401
##  [73]  5.9776544 -2.2009657 -2.4395272  4.0615507 -1.3941153 -2.3882306
##  [79] -2.2799115  4.6054080 -1.5516203 -0.4367249 -1.5615449  4.0790976
##  [85]  1.0917225  3.7139508 -1.8139512 -1.1749955 -1.6831395 -1.5614355
##  [91] -1.9565894 -0.1425388 -1.0454574 -2.1128182  4.9182020 -1.6631032
##  [97] -2.3185501 -1.6734295 -1.6233100 -1.5638780  3.3149668  4.7828809
## [103] -1.5505336

# data_microbiome_validation <- data_microbiome_validation %>% 
#   rename("Wujia_chipingenss|SGB5111" = "Wujia_chipingensis|SGB5111")


if(file.exists('gitignore/data_microbiome_SGB30_validation_impCLR.csv') == FALSE){
  write.csv(data_microbiome_validation,
            'gitignore/data_microbiome_SGB30_validation_impCLR.csv')
}

3.1.4 Merge training and validation dataset

Open code

cbind(
  colnames(data_microbiome_original), 
  colnames(data_microbiome_validation))
##        [,1]                                                
##   [1,] "Sample"                                            
##   [2,] "Country"                                           
##   [3,] "Diet"                                              
##   [4,] "Bacteroides_stercoris|SGB1830"                     
##   [5,] "Alistipes_putredinis|SGB2318"                      
##   [6,] "Candidatus_Cibionibacter_quicibialis|SGB15286"     
##   [7,] "Bacteroides_uniformis|SGB1836"                     
##   [8,] "Eubacterium_siraeum|SGB4198"                       
##   [9,] "GGB9602_SGB15031|SGB15031"                         
##  [10,] "Phocaeicola_vulgatus|SGB1814"                      
##  [11,] "Faecalibacterium_prausnitzii|SGB15316"             
##  [12,] "Lachnospiraceae_bacterium_CLA_AA_H244|SGB4993"     
##  [13,] "Sutterella_wadsworthensis|SGB9283"                 
##  [14,] "Oscillibacter_sp_ER4|SGB15254"                     
##  [15,] "Parabacteroides_distasonis|SGB1934"                
##  [16,] "Parabacteroides_merdae|SGB1949"                    
##  [17,] "Oscillibacter_valericigenes|SGB15053"              
##  [18,] "GGB47687_SGB2286|SGB2286"                          
##  [19,] "Brotolimicola_acetigignens|SGB4914"                
##  [20,] "GGB9747_SGB15356|SGB15356"                         
##  [21,] "Faecalibacterium_prausnitzii|SGB15339"             
##  [22,] "Alistipes_shahii|SGB2295"                          
##  [23,] "Alistipes_onderdonkii|SGB2303"                     
##  [24,] "Faecalibacterium_SGB15346|SGB15346"                
##  [25,] "Dialister_invisus|SGB5825_group"                   
##  [26,] "Faecalibacterium_prausnitzii|SGB15332"             
##  [27,] "Gemmiger_formicilis|SGB15300"                      
##  [28,] "GGB33469_SGB15236|SGB15236"                        
##  [29,] "Bacteroides_eggerthii|SGB1829"                     
##  [30,] "Coprococcus_eutactus|SGB5117"                      
##  [31,] "Alistipes_communis|SGB2290"                        
##  [32,] "Faecalibacterium_prausnitzii|SGB15322"             
##  [33,] "Escherichia_coli|SGB10068"                         
##  [34,] "Faecalibacterium_prausnitzii|SGB15318"             
##  [35,] "GGB3175_SGB4191|SGB4191"                           
##  [36,] "Faecalibacterium_prausnitzii|SGB15342"             
##  [37,] "Fusicatenibacter_saccharivorans|SGB4874"           
##  [38,] "Intestinimonas_massiliensis|SGB15127"              
##  [39,] "Vescimonas_coprocola|SGB15089"                     
##  [40,] "Oscillibacter_sp_MSJ_31|SGB15249"                  
##  [41,] "Bacteroides_caccae|SGB1877"                        
##  [42,] "Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861"   
##  [43,] "Bifidobacterium_longum|SGB17248"                   
##  [44,] "Alistipes_senegalensis|SGB2296"                    
##  [45,] "Roseburia_intestinalis|SGB4951"                    
##  [46,] "Dysosmobacter_welbionis|SGB15078"                  
##  [47,] "Eubacterium_rectale|SGB4933"                       
##  [48,] "Alistipes_ihumii|SGB2328"                          
##  [49,] "Roseburia_faecis|SGB4925"                          
##  [50,] "GGB9699_SGB15216|SGB15216"                         
##  [51,] "Oscillospiraceae_bacterium|SGB15225"               
##  [52,] "Faecalibacterium_prausnitzii|SGB15317"             
##  [53,] "GGB9715_SGB15267|SGB15267"                         
##  [54,] "GGB9667_SGB15164|SGB15164"                         
##  [55,] "Lachnospira_pectinoschiza|SGB5075"                 
##  [56,] "Bacteroides_clarus|SGB1832"                        
##  [57,] "Anaerotignum_faecicola|SGB5190"                    
##  [58,] "Hydrogenoanaerobacterium_saccharovorans|SGB15350"  
##  [59,] "Bacteroides_faecis|SGB1860"                        
##  [60,] "Roseburia_hominis|SGB4936"                         
##  [61,] "GGB9712_SGB15244|SGB15244"                         
##  [62,] "Ruminococcus_bicirculans|SGB4262"                  
##  [63,] "Bilophila_wadsworthia|SGB15452"                    
##  [64,] "Odoribacter_splanchnicus|SGB1790"                  
##  [65,] "GGB9365_SGB14341|SGB14341"                         
##  [66,] "GGB9715_SGB15265|SGB15265"                         
##  [67,] "GGB9730_SGB15291|SGB15291"                         
##  [68,] "Bifidobacterium_adolescentis|SGB17244"             
##  [69,] "Bacteroides_ovatus|SGB1871"                        
##  [70,] "GGB9614_SGB15049|SGB15049"                         
##  [71,] "Simiaoa_sunii|SGB4910"                             
##  [72,] "GGB9759_SGB15370|SGB15370"                         
##  [73,] "Barnesiella_intestinihominis|SGB1965"              
##  [74,] "Agathobaculum_butyriciproducens|SGB14993"          
##  [75,] "Lachnospira_eligens|SGB5082"                       
##  [76,] "Lacrimispora_amygdalina|SGB4716"                   
##  [77,] "Akkermansia_muciniphila|SGB9226"                   
##  [78,] "Clostridiales_bacterium|SGB15143"                  
##  [79,] "GGB6649_SGB9391|SGB9391"                           
##  [80,] "Ruminococcus_bromii|SGB4285"                       
##  [81,] "Gemmiger_formicilis|SGB15299"                      
##  [82,] "Alistipes_sp_AF17_16|SGB2326"                      
##  [83,] "Clostridiaceae_bacterium|SGB4269"                  
##  [84,] "GGB9760_SGB15373|SGB15373"                         
##  [85,] "Clostridiaceae_bacterium_AF18_31LB|SGB4767"        
##  [86,] "GGB9621_SGB15073|SGB15073"                         
##  [87,] "Bacteroides_cellulosilyticus|SGB1844"              
##  [88,] "Faecalibacterium_sp_CLA_AA_H233|SGB15315"          
##  [89,] "Ruthenibacterium_lactatiformans|SGB15271"          
##  [90,] "Lawsonibacter_asaccharolyticus|SGB15154"           
##  [91,] "Clostridium_sp_AF20_17LB|SGB4714"                  
##  [92,] "Phocaeicola_massiliensis|SGB1812"                  
##  [93,] "GGB9760_SGB15374|SGB15374"                         
##  [94,] "Clostridium_fessum|SGB4705"                        
##  [95,] "GGB33586_SGB53517|SGB53517"                        
##  [96,] "Clostridium_sp_AF36_4|SGB4644"                     
##  [97,] "Clostridium_SGB4909|SGB4909"                       
##  [98,] "GGB13404_SGB14252|SGB14252"                        
##  [99,] "GGB9627_SGB15081|SGB15081"                         
## [100,] "GGB9608_SGB15041|SGB15041"                         
## [101,] "GGB9707_SGB15229|SGB15229"                         
## [102,] "Parasutterella_excrementihominis|SGB9262"          
## [103,] "Roseburia_inulinivorans|SGB4940"                   
## [104,] "Phocaeicola_dorei|SGB1815"                         
## [105,] "Oscillibacter_valericigenes|SGB15124"              
## [106,] "GGB9502_SGB14899|SGB14899"                         
## [107,] "GGB9818_SGB15459|SGB15459"                         
## [108,] "GGB9767_SGB15385|SGB15385"                         
## [109,] "Clostridiaceae_bacterium|SGB4770"                  
## [110,] "Bacteroides_thetaiotaomicron|SGB1861"              
## [111,] "GGB9559_SGB14969|SGB14969"                         
## [112,] "Flavonifractor_plautii|SGB15132"                   
## [113,] "Fusicatenibacter_sp_CLA_AA_H277|SGB4780"           
## [114,] "Lachnospiraceae_bacterium_AM48_27BH|SGB4706"       
## [115,] "GGB9063_SGB13982|SGB13982"                         
## [116,] "Collinsella_aerofaciens|SGB14535"                  
## [117,] "GGB9522_SGB14921|SGB14921"                         
## [118,] "GGB9619_SGB15067|SGB15067"                         
## [119,] "Clostridium_leptum|SGB14853"                       
## [120,] "GGB52130_SGB14966|SGB14966"                        
## [121,] "Clostridium_sp_AF27_2AA|SGB4712"                   
## [122,] "Blautia_sp_MCC283|SGB4828"                         
## [123,] "Clostridiales_bacterium_KLE1615|SGB5090"           
## [124,] "Lachnospira_sp_NSJ_43|SGB5087"                     
## [125,] "Butyricimonas_virosa|SGB1784"                      
## [126,] "GGB9509_SGB14906|SGB14906"                         
## [127,] "Coprococcus_catus|SGB4670"                         
## [128,] "GGB4585_SGB6340|SGB6340"                           
## [129,] "Phascolarctobacterium_faecium|SGB5792"             
## [130,] "Clostridiaceae_bacterium_Marseille_Q4143|SGB4768"  
## [131,] "GGB45432_SGB63101|SGB63101"                        
## [132,] "Clostridium_sp_AM33_3|SGB4711"                     
## [133,] "GGB9342_SGB14306|SGB14306"                         
## [134,] "Faecalicatena_fissicatena|SGB4871"                 
## [135,] "Alistipes_finegoldii|SGB2301"                      
## [136,] "Oscillospiraceae_bacterium_Marseille_Q3528|SGB4778"
## [137,] "Faecalibacterium_sp_HTFF|SGB15340"                 
## [138,] "GGB3653_SGB4964|SGB4964"                           
## [139,] "Intestinimonas_butyriciproducens|SGB15126"         
## [140,] "GGB9719_SGB15272|SGB15272"                         
## [141,] "Blautia_glucerasea|SGB4816"                        
## [142,] "GGB2653_SGB3574|SGB3574"                           
## [143,] "Veillonella_dispar|SGB6952"                        
## [144,] "bacterium_210917_DFI_7_65|SGB14999"                
## [145,] "GGB9646_SGB15123|SGB15123"                         
## [146,] "GGB51441_SGB71759|SGB71759"                        
## [147,] "GGB33512_SGB15203|SGB15203"                        
## [148,] "Alistipes_indistinctus|SGB2325"                    
## [149,] "Candidatus_Borkfalkia_ceftriaxoniphila|SGB14027"   
## [150,] "GGB9062_SGB13981|SGB13981"                         
## [151,] "Lachnospiraceae_bacterium|SGB4781"                 
## [152,] "Phocea_massiliensis|SGB14837"                      
## [153,] "Guopingia_tenuis|SGB14127"                         
## [154,] "GGB3109_SGB4121|SGB4121"                           
## [155,] "GGB9534_SGB14937|SGB14937"                         
## [156,] "Anaerotruncus_rubiinfantis|SGB25416"               
## [157,] "Lachnospiraceae_bacterium|SGB4953"                 
## [158,] "Oscillibacter_sp_PC13|SGB7258"                     
## [159,] "Blautia_SGB4815|SGB4815"                           
## [160,] "Lawsonibacter_hominis|SGB15131"                    
## [161,] "Clostridium_SGB4750|SGB4750"                       
## [162,] "Holdemania_filiformis|SGB4046"                     
## [163,] "Anaeromassilibacillus_senegalensis|SGB14894"       
## [164,] "Lachnospira_pectinoschiza|SGB5089"                 
## [165,] "Anaerofilum_hominis|SGB79822"                      
## [166,] "Lachnotalea_sp_AF33_28|SGB5200"                    
## [167,] "Senegalimassilia_anaerobia|SGB14824_group"         
## [168,] "GGB9345_SGB14311|SGB14311"                         
## [169,] "Blautia_obeum|SGB4811"                             
## [170,] "Ligaoa_zhengdingensis|SGB14839"                    
## [171,] "Adlercreutzia_equolifaciens|SGB14797"              
## [172,] "Blautia_wexlerae|SGB4837"                          
## [173,] "GGB9787_SGB15410|SGB15410"                         
## [174,] "GGB36331_SGB15121|SGB15121"                        
## [175,] "Clostridium_sp_AM22_11AC|SGB4749"                  
## [176,] "Ruminococcus_sp_AF41_9|SGB25497"                   
## [177,] "Anaerotruncus_colihominis|SGB14963"                
## [178,] "Eubacterium_ramulus|SGB4959"                       
## [179,] "Enterocloster_lavalensis|SGB4725"                  
## [180,] "GGB2848_SGB3813|SGB3813"                           
## [181,] "Blautia_faecis|SGB4820"                            
## [182,] "Clostridiaceae_bacterium_Marseille_Q4145|SGB4769"  
## [183,] "Roseburia_sp_AF02_12|SGB4938"                      
## [184,] "Butyricimonas_faecihominis|SGB1786"                
## [185,] "Youxingia_wuxianensis|SGB82503"                    
## [186,] "Intestinimonas_gabonensis|SGB79840"                
## [187,] "GGB9064_SGB13983|SGB13983"                         
## [188,] "GGB45613_SGB63326|SGB63326"                        
## [189,] "GGB4552_SGB6276|SGB6276"                           
## [190,] "Blautia_massiliensis|SGB4826"                      
## [191,] "Lachnospiraceae_bacterium_CLA_AA_H215|SGB4777"     
## [192,] "Dorea_formicigenerans|SGB4575"                     
## [193,] "Clostridia_bacterium_UC5_1_1D1|SGB14995"           
## [194,] "GGB9765_SGB15382|SGB15382"                         
## [195,] "GGB9770_SGB15390|SGB15390"                         
## [196,] "Clostridium_sp_AM49_4BH|SGB4652"                   
## [197,] "GGB3537_SGB4727|SGB4727"                           
## [198,] "Wansuia_hejianensis|SGB25431"                      
## [199,] "Coprococcus_eutactus|SGB5121"                      
## [200,] "Coprobacter_secundus|SGB1962"                      
## [201,] "GGB3510_SGB4687|SGB4687"                           
## [202,] "Mediterraneibacter_faecis|SGB4563"                 
## [203,] "Ruminococcus_torques|SGB4608"                      
## [204,] "Coprococcus_comes|SGB4577"                         
## [205,] "Streptococcus_parasanguinis|SGB8071"               
## [206,] "Provencibacterium_massiliense|SGB14838"            
## [207,] "Eubacterium_ventriosum|SGB5045"                    
## [208,] "Enterocloster_citroniae|SGB4761"                   
## [209,] "GGB51647_SGB4348|SGB4348"                          
## [210,] "GGB9635_SGB15106|SGB15106"                         
## [211,] "GGB9758_SGB15368|SGB15368"                         
## [212,] "GGB9708_SGB15234|SGB15234"                         
## [213,] "GGB3304_SGB4367|SGB4367"                           
## [214,] "Wujia_chipingensis|SGB5111"                        
## [215,] "Clostridiaceae_bacterium_Marseille_Q4149|SGB15091" 
## [216,] "Paraprevotella_clara|SGB1798"                      
## [217,] "Agathobaculum_butyriciproducens|SGB14991"          
## [218,] "GGB9296_SGB14253|SGB14253"                         
## [219,] "Butyricimonas_paravirosa|SGB1785"                  
## [220,] "Dorea_longicatena|SGB4581"                         
## [221,] "Ruminococcus_lactaris|SGB4557"                     
## [222,] "Alistipes_dispar|SGB2311"                          
## [223,] "Dysosmobacter_SGB15077|SGB15077"                   
## [224,] "GGB13489_SGB15224|SGB15224"                        
## [225,] "Clostridium_sp_AF12_28|SGB4715"                    
## [226,] "GGB34797_SGB14322|SGB14322"                        
## [227,] "Anaerobutyricum_hallii|SGB4532"                    
## [228,] "Clostridium_SGB48024|SGB48024"                     
## [229,] "Slackia_isoflavoniconvertens|SGB14773"             
## [230,] "GGB3321_SGB4394|SGB4394"                           
## [231,] "GGB9237_SGB14179|SGB14179"                         
## [232,] "Anaerotruncus_massiliensis|SGB14965"               
## [233,] "GGB9531_SGB14932|SGB14932"                         
## [234,] "GGB3619_SGB4894|SGB4894"                           
## [235,] "Streptococcus_salivarius|SGB8007_group"            
## [236,] "Oscillibacter_valericigenes|SGB15076"              
## [237,] "Anaerostipes_hadrus|SGB4540"                       
## [238,] "Blautia_wexlerae|SGB4831"                          
## [239,] "GGB2970_SGB3952|SGB3952"                           
## [240,] "Lawsonibacter_SGB15145|SGB15145"                   
## [241,] "GGB9563_SGB14975|SGB14975"                         
## [242,] "Blautia_glucerasea|SGB4804"                        
## [243,] "Anaerostipes_hadrus|SGB4547"                       
## [244,] "Lachnospiraceae_bacterium|SGB4782"                 
## [245,] "GGB2998_SGB3988|SGB3988"                           
## [246,] "GGB9616_SGB15052|SGB15052"                         
## [247,] "GGB9524_SGB14924|SGB14924"                         
## [248,] "Segatella_copri|SGB1626"                           
## [249,] "Mediterraneibacter_butyricigenes|SGB25493"         
## [250,] "Haemophilus_parainfluenzae|SGB9712"                
## [251,] "Anaerosacchariphilus_sp_NSJ_68|SGB4772"            
## [252,] "Anaerotignum_sp_MSJ_24|SGB5180"                    
## [253,] "Dorea_sp_AF36_15AT|SGB4552"                        
## [254,] "GGB58158_SGB79798|SGB79798"                        
## [255,] "Pseudoflavonifractor_capillosus|SGB15140"          
## [256,] "Oliverpabstia_intestinalis|SGB4868"                
## [257,] "Veillonella_parvula|SGB6939"                       
## [258,] "Colidextribacter_sp_210702_DFI_3_9|SGB15146"       
## [259,] "Coprobacter_fastidiosus|SGB1963"                   
## [260,] "Veillonella_rogosae|SGB6956"                       
## [261,] "Eubacteriaceae_bacterium|SGB3958"                  
## [262,] "TM7_phylum_sp_oral_taxon_352|SGB19860_group"       
## [263,] "Enterocloster_hominis|SGB4721"                     
## [264,] "Rothia_mucilaginosa|SGB16971_group"                
## [265,] "Blautia_luti|SGB4832"                              
## [266,] "Blautia_sp_OF03_15BH|SGB4779"                      
## [267,] "Streptococcus_australis|SGB8059_group"             
## [268,] "Ruminococcus_gnavus|SGB4571"                       
## [269,] "Streptococcus_thermophilus|SGB8002"                
## [270,] "Roseburia_hominis|SGB4659"                         
## [271,] "Lentihominibacter_faecis|SGB3957"                  
## [272,] "Actinomyces_sp_ICM47|SGB17167_group"               
## [273,] "Evtepia_gabavorous|SGB15120"                       
## [274,] "Blautia_obeum|SGB4810"                             
## [275,] "GGB3480_SGB4648|SGB4648"                           
## [276,] "Ellagibacter_isourolithinifaciens|SGB14816_group"  
## [277,] "GGB2982_SGB3964|SGB3964"                           
## [278,] "Anaerobutyricum_soehngenii|SGB4537"                
## [279,] "Bacteroides_xylanisolvens|SGB1867"                 
## [280,] "Neobittarella_massiliensis|SGB7264"                
## [281,] "Faecalibacillus_intestinalis|SGB6754"              
## [282,] "GGB9708_SGB15233|SGB15233"                         
## [283,] "Enterocloster_asparagiformis|SGB4724"              
## [284,] "GGB51884_SGB49168|SGB49168"                        
## [285,] "Enterocloster_aldenensis|SGB4762"                  
## [286,] "Eggerthella_lenta|SGB14809"                        
## [287,] "Bacteroides_fragilis|SGB1855"                      
## [288,] "Veillonella_atypica|SGB6936"                       
## [289,] "Enterocloster_bolteae|SGB4758"                     
## [290,] "Romboutsia_timonensis|SGB6148"                     
## [291,] "Roseburia_lenta|SGB4957"                           
## [292,] "GGB3288_SGB4342|SGB4342"                           
## [293,] "GGB9574_SGB14987|SGB14987"                         
## [294,] "Dorea_longicatena|SGB4582"                         
## [295,] "Eubacterium_sp_AF34_35BH|SGB5051"                  
## [296,] "GGB9640_SGB15115|SGB15115"                         
## [297,] "Clostridiaceae_unclassified_SGB4771|SGB4771"       
## [298,] "GGB9766_SGB15383|SGB15383"                         
## [299,] "Lachnospiraceae_bacterium_OM04_12BH|SGB4893"       
## [300,] "Clostridium_SGB6173|SGB6173"                       
## [301,] "GGB9616_SGB15051|SGB15051"                         
## [302,] "Intestinibacter_bartlettii|SGB6140"                
##        [,2]                                                
##   [1,] "Sample"                                            
##   [2,] "Data"                                              
##   [3,] "Diet"                                              
##   [4,] "Bacteroides_stercoris|SGB1830"                     
##   [5,] "Alistipes_putredinis|SGB2318"                      
##   [6,] "Candidatus_Cibionibacter_quicibialis|SGB15286"     
##   [7,] "Bacteroides_uniformis|SGB1836"                     
##   [8,] "Eubacterium_siraeum|SGB4198"                       
##   [9,] "GGB9602_SGB15031|SGB15031"                         
##  [10,] "Phocaeicola_vulgatus|SGB1814"                      
##  [11,] "Faecalibacterium_prausnitzii|SGB15316"             
##  [12,] "Lachnospiraceae_bacterium_CLA_AA_H244|SGB4993"     
##  [13,] "Sutterella_wadsworthensis|SGB9283"                 
##  [14,] "Oscillibacter_sp_ER4|SGB15254"                     
##  [15,] "Parabacteroides_distasonis|SGB1934"                
##  [16,] "Parabacteroides_merdae|SGB1949"                    
##  [17,] "Oscillibacter_valericigenes|SGB15053"              
##  [18,] "GGB47687_SGB2286|SGB2286"                          
##  [19,] "Brotolimicola_acetigignens|SGB4914"                
##  [20,] "GGB9747_SGB15356|SGB15356"                         
##  [21,] "Faecalibacterium_prausnitzii|SGB15339"             
##  [22,] "Alistipes_shahii|SGB2295"                          
##  [23,] "Alistipes_onderdonkii|SGB2303"                     
##  [24,] "Faecalibacterium_SGB15346|SGB15346"                
##  [25,] "Dialister_invisus|SGB5825_group"                   
##  [26,] "Faecalibacterium_prausnitzii|SGB15332"             
##  [27,] "Gemmiger_formicilis|SGB15300"                      
##  [28,] "GGB33469_SGB15236|SGB15236"                        
##  [29,] "Bacteroides_eggerthii|SGB1829"                     
##  [30,] "Coprococcus_eutactus|SGB5117"                      
##  [31,] "Alistipes_communis|SGB2290"                        
##  [32,] "Faecalibacterium_prausnitzii|SGB15322"             
##  [33,] "Escherichia_coli|SGB10068"                         
##  [34,] "Faecalibacterium_prausnitzii|SGB15318"             
##  [35,] "GGB3175_SGB4191|SGB4191"                           
##  [36,] "Faecalibacterium_prausnitzii|SGB15342"             
##  [37,] "Fusicatenibacter_saccharivorans|SGB4874"           
##  [38,] "Intestinimonas_massiliensis|SGB15127"              
##  [39,] "Vescimonas_coprocola|SGB15089"                     
##  [40,] "Oscillibacter_sp_MSJ_31|SGB15249"                  
##  [41,] "Bacteroides_caccae|SGB1877"                        
##  [42,] "Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861"   
##  [43,] "Bifidobacterium_longum|SGB17248"                   
##  [44,] "Alistipes_senegalensis|SGB2296"                    
##  [45,] "Roseburia_intestinalis|SGB4951"                    
##  [46,] "Dysosmobacter_welbionis|SGB15078"                  
##  [47,] "Eubacterium_rectale|SGB4933"                       
##  [48,] "Alistipes_ihumii|SGB2328"                          
##  [49,] "Roseburia_faecis|SGB4925"                          
##  [50,] "GGB9699_SGB15216|SGB15216"                         
##  [51,] "Oscillospiraceae_bacterium|SGB15225"               
##  [52,] "Faecalibacterium_prausnitzii|SGB15317"             
##  [53,] "GGB9715_SGB15267|SGB15267"                         
##  [54,] "GGB9667_SGB15164|SGB15164"                         
##  [55,] "Lachnospira_pectinoschiza|SGB5075"                 
##  [56,] "Bacteroides_clarus|SGB1832"                        
##  [57,] "Anaerotignum_faecicola|SGB5190"                    
##  [58,] "Hydrogenoanaerobacterium_saccharovorans|SGB15350"  
##  [59,] "Bacteroides_faecis|SGB1860"                        
##  [60,] "Roseburia_hominis|SGB4936"                         
##  [61,] "GGB9712_SGB15244|SGB15244"                         
##  [62,] "Ruminococcus_bicirculans|SGB4262"                  
##  [63,] "Bilophila_wadsworthia|SGB15452"                    
##  [64,] "Odoribacter_splanchnicus|SGB1790"                  
##  [65,] "GGB9365_SGB14341|SGB14341"                         
##  [66,] "GGB9715_SGB15265|SGB15265"                         
##  [67,] "GGB9730_SGB15291|SGB15291"                         
##  [68,] "Bifidobacterium_adolescentis|SGB17244"             
##  [69,] "Bacteroides_ovatus|SGB1871"                        
##  [70,] "GGB9614_SGB15049|SGB15049"                         
##  [71,] "Simiaoa_sunii|SGB4910"                             
##  [72,] "GGB9759_SGB15370|SGB15370"                         
##  [73,] "Barnesiella_intestinihominis|SGB1965"              
##  [74,] "Agathobaculum_butyriciproducens|SGB14993"          
##  [75,] "Lachnospira_eligens|SGB5082"                       
##  [76,] "Lacrimispora_amygdalina|SGB4716"                   
##  [77,] "Akkermansia_muciniphila|SGB9226"                   
##  [78,] "Clostridiales_bacterium|SGB15143"                  
##  [79,] "GGB6649_SGB9391|SGB9391"                           
##  [80,] "Ruminococcus_bromii|SGB4285"                       
##  [81,] "Gemmiger_formicilis|SGB15299"                      
##  [82,] "Alistipes_sp_AF17_16|SGB2326"                      
##  [83,] "Clostridiaceae_bacterium|SGB4269"                  
##  [84,] "GGB9760_SGB15373|SGB15373"                         
##  [85,] "Clostridiaceae_bacterium_AF18_31LB|SGB4767"        
##  [86,] "GGB9621_SGB15073|SGB15073"                         
##  [87,] "Bacteroides_cellulosilyticus|SGB1844"              
##  [88,] "Faecalibacterium_sp_CLA_AA_H233|SGB15315"          
##  [89,] "Ruthenibacterium_lactatiformans|SGB15271"          
##  [90,] "Lawsonibacter_asaccharolyticus|SGB15154"           
##  [91,] "Clostridium_sp_AF20_17LB|SGB4714"                  
##  [92,] "Phocaeicola_massiliensis|SGB1812"                  
##  [93,] "GGB9760_SGB15374|SGB15374"                         
##  [94,] "Clostridium_fessum|SGB4705"                        
##  [95,] "GGB33586_SGB53517|SGB53517"                        
##  [96,] "Clostridium_sp_AF36_4|SGB4644"                     
##  [97,] "Clostridium_SGB4909|SGB4909"                       
##  [98,] "GGB13404_SGB14252|SGB14252"                        
##  [99,] "GGB9627_SGB15081|SGB15081"                         
## [100,] "GGB9608_SGB15041|SGB15041"                         
## [101,] "GGB9707_SGB15229|SGB15229"                         
## [102,] "Parasutterella_excrementihominis|SGB9262"          
## [103,] "Roseburia_inulinivorans|SGB4940"                   
## [104,] "Phocaeicola_dorei|SGB1815"                         
## [105,] "Oscillibacter_valericigenes|SGB15124"              
## [106,] "GGB9502_SGB14899|SGB14899"                         
## [107,] "GGB9818_SGB15459|SGB15459"                         
## [108,] "GGB9767_SGB15385|SGB15385"                         
## [109,] "Clostridiaceae_bacterium|SGB4770"                  
## [110,] "Bacteroides_thetaiotaomicron|SGB1861"              
## [111,] "GGB9559_SGB14969|SGB14969"                         
## [112,] "Flavonifractor_plautii|SGB15132"                   
## [113,] "Fusicatenibacter_sp_CLA_AA_H277|SGB4780"           
## [114,] "Lachnospiraceae_bacterium_AM48_27BH|SGB4706"       
## [115,] "GGB9063_SGB13982|SGB13982"                         
## [116,] "Collinsella_aerofaciens|SGB14535"                  
## [117,] "GGB9522_SGB14921|SGB14921"                         
## [118,] "GGB9619_SGB15067|SGB15067"                         
## [119,] "Clostridium_leptum|SGB14853"                       
## [120,] "GGB52130_SGB14966|SGB14966"                        
## [121,] "Clostridium_sp_AF27_2AA|SGB4712"                   
## [122,] "Blautia_sp_MCC283|SGB4828"                         
## [123,] "Clostridiales_bacterium_KLE1615|SGB5090"           
## [124,] "Lachnospira_sp_NSJ_43|SGB5087"                     
## [125,] "Butyricimonas_virosa|SGB1784"                      
## [126,] "GGB9509_SGB14906|SGB14906"                         
## [127,] "Coprococcus_catus|SGB4670"                         
## [128,] "GGB4585_SGB6340|SGB6340"                           
## [129,] "Phascolarctobacterium_faecium|SGB5792"             
## [130,] "Clostridiaceae_bacterium_Marseille_Q4143|SGB4768"  
## [131,] "GGB45432_SGB63101|SGB63101"                        
## [132,] "Clostridium_sp_AM33_3|SGB4711"                     
## [133,] "GGB9342_SGB14306|SGB14306"                         
## [134,] "Faecalicatena_fissicatena|SGB4871"                 
## [135,] "Alistipes_finegoldii|SGB2301"                      
## [136,] "Oscillospiraceae_bacterium_Marseille_Q3528|SGB4778"
## [137,] "Faecalibacterium_sp_HTFF|SGB15340"                 
## [138,] "GGB3653_SGB4964|SGB4964"                           
## [139,] "Intestinimonas_butyriciproducens|SGB15126"         
## [140,] "GGB9719_SGB15272|SGB15272"                         
## [141,] "Blautia_glucerasea|SGB4816"                        
## [142,] "GGB2653_SGB3574|SGB3574"                           
## [143,] "Veillonella_dispar|SGB6952"                        
## [144,] "bacterium_210917_DFI_7_65|SGB14999"                
## [145,] "GGB9646_SGB15123|SGB15123"                         
## [146,] "GGB51441_SGB71759|SGB71759"                        
## [147,] "GGB33512_SGB15203|SGB15203"                        
## [148,] "Alistipes_indistinctus|SGB2325"                    
## [149,] "Candidatus_Borkfalkia_ceftriaxoniphila|SGB14027"   
## [150,] "GGB9062_SGB13981|SGB13981"                         
## [151,] "Lachnospiraceae_bacterium|SGB4781"                 
## [152,] "Phocea_massiliensis|SGB14837"                      
## [153,] "Guopingia_tenuis|SGB14127"                         
## [154,] "GGB3109_SGB4121|SGB4121"                           
## [155,] "GGB9534_SGB14937|SGB14937"                         
## [156,] "Anaerotruncus_rubiinfantis|SGB25416"               
## [157,] "Lachnospiraceae_bacterium|SGB4953"                 
## [158,] "Oscillibacter_sp_PC13|SGB7258"                     
## [159,] "Blautia_SGB4815|SGB4815"                           
## [160,] "Lawsonibacter_hominis|SGB15131"                    
## [161,] "Clostridium_SGB4750|SGB4750"                       
## [162,] "Holdemania_filiformis|SGB4046"                     
## [163,] "Anaeromassilibacillus_senegalensis|SGB14894"       
## [164,] "Lachnospira_pectinoschiza|SGB5089"                 
## [165,] "Anaerofilum_hominis|SGB79822"                      
## [166,] "Lachnotalea_sp_AF33_28|SGB5200"                    
## [167,] "Senegalimassilia_anaerobia|SGB14824_group"         
## [168,] "GGB9345_SGB14311|SGB14311"                         
## [169,] "Blautia_obeum|SGB4811"                             
## [170,] "Ligaoa_zhengdingensis|SGB14839"                    
## [171,] "Adlercreutzia_equolifaciens|SGB14797"              
## [172,] "Blautia_wexlerae|SGB4837"                          
## [173,] "GGB9787_SGB15410|SGB15410"                         
## [174,] "GGB36331_SGB15121|SGB15121"                        
## [175,] "Clostridium_sp_AM22_11AC|SGB4749"                  
## [176,] "Ruminococcus_sp_AF41_9|SGB25497"                   
## [177,] "Anaerotruncus_colihominis|SGB14963"                
## [178,] "Eubacterium_ramulus|SGB4959"                       
## [179,] "Enterocloster_lavalensis|SGB4725"                  
## [180,] "GGB2848_SGB3813|SGB3813"                           
## [181,] "Blautia_faecis|SGB4820"                            
## [182,] "Clostridiaceae_bacterium_Marseille_Q4145|SGB4769"  
## [183,] "Roseburia_sp_AF02_12|SGB4938"                      
## [184,] "Butyricimonas_faecihominis|SGB1786"                
## [185,] "Youxingia_wuxianensis|SGB82503"                    
## [186,] "Intestinimonas_gabonensis|SGB79840"                
## [187,] "GGB9064_SGB13983|SGB13983"                         
## [188,] "GGB45613_SGB63326|SGB63326"                        
## [189,] "GGB4552_SGB6276|SGB6276"                           
## [190,] "Blautia_massiliensis|SGB4826"                      
## [191,] "Lachnospiraceae_bacterium_CLA_AA_H215|SGB4777"     
## [192,] "Dorea_formicigenerans|SGB4575"                     
## [193,] "Clostridia_bacterium_UC5_1_1D1|SGB14995"           
## [194,] "GGB9765_SGB15382|SGB15382"                         
## [195,] "GGB9770_SGB15390|SGB15390"                         
## [196,] "Clostridium_sp_AM49_4BH|SGB4652"                   
## [197,] "GGB3537_SGB4727|SGB4727"                           
## [198,] "Wansuia_hejianensis|SGB25431"                      
## [199,] "Coprococcus_eutactus|SGB5121"                      
## [200,] "Coprobacter_secundus|SGB1962"                      
## [201,] "GGB3510_SGB4687|SGB4687"                           
## [202,] "Mediterraneibacter_faecis|SGB4563"                 
## [203,] "Ruminococcus_torques|SGB4608"                      
## [204,] "Coprococcus_comes|SGB4577"                         
## [205,] "Streptococcus_parasanguinis|SGB8071"               
## [206,] "Provencibacterium_massiliense|SGB14838"            
## [207,] "Eubacterium_ventriosum|SGB5045"                    
## [208,] "Enterocloster_citroniae|SGB4761"                   
## [209,] "GGB51647_SGB4348|SGB4348"                          
## [210,] "GGB9635_SGB15106|SGB15106"                         
## [211,] "GGB9758_SGB15368|SGB15368"                         
## [212,] "GGB9708_SGB15234|SGB15234"                         
## [213,] "GGB3304_SGB4367|SGB4367"                           
## [214,] "Wujia_chipingensis|SGB5111"                        
## [215,] "Clostridiaceae_bacterium_Marseille_Q4149|SGB15091" 
## [216,] "Paraprevotella_clara|SGB1798"                      
## [217,] "Agathobaculum_butyriciproducens|SGB14991"          
## [218,] "GGB9296_SGB14253|SGB14253"                         
## [219,] "Butyricimonas_paravirosa|SGB1785"                  
## [220,] "Dorea_longicatena|SGB4581"                         
## [221,] "Ruminococcus_lactaris|SGB4557"                     
## [222,] "Alistipes_dispar|SGB2311"                          
## [223,] "Dysosmobacter_SGB15077|SGB15077"                   
## [224,] "GGB13489_SGB15224|SGB15224"                        
## [225,] "Clostridium_sp_AF12_28|SGB4715"                    
## [226,] "GGB34797_SGB14322|SGB14322"                        
## [227,] "Anaerobutyricum_hallii|SGB4532"                    
## [228,] "Clostridium_SGB48024|SGB48024"                     
## [229,] "Slackia_isoflavoniconvertens|SGB14773"             
## [230,] "GGB3321_SGB4394|SGB4394"                           
## [231,] "GGB9237_SGB14179|SGB14179"                         
## [232,] "Anaerotruncus_massiliensis|SGB14965"               
## [233,] "GGB9531_SGB14932|SGB14932"                         
## [234,] "GGB3619_SGB4894|SGB4894"                           
## [235,] "Streptococcus_salivarius|SGB8007_group"            
## [236,] "Oscillibacter_valericigenes|SGB15076"              
## [237,] "Anaerostipes_hadrus|SGB4540"                       
## [238,] "Blautia_wexlerae|SGB4831"                          
## [239,] "GGB2970_SGB3952|SGB3952"                           
## [240,] "Lawsonibacter_SGB15145|SGB15145"                   
## [241,] "GGB9563_SGB14975|SGB14975"                         
## [242,] "Blautia_glucerasea|SGB4804"                        
## [243,] "Anaerostipes_hadrus|SGB4547"                       
## [244,] "Lachnospiraceae_bacterium|SGB4782"                 
## [245,] "GGB2998_SGB3988|SGB3988"                           
## [246,] "GGB9616_SGB15052|SGB15052"                         
## [247,] "GGB9524_SGB14924|SGB14924"                         
## [248,] "Segatella_copri|SGB1626"                           
## [249,] "Mediterraneibacter_butyricigenes|SGB25493"         
## [250,] "Haemophilus_parainfluenzae|SGB9712"                
## [251,] "Anaerosacchariphilus_sp_NSJ_68|SGB4772"            
## [252,] "Anaerotignum_sp_MSJ_24|SGB5180"                    
## [253,] "Dorea_sp_AF36_15AT|SGB4552"                        
## [254,] "GGB58158_SGB79798|SGB79798"                        
## [255,] "Pseudoflavonifractor_capillosus|SGB15140"          
## [256,] "Oliverpabstia_intestinalis|SGB4868"                
## [257,] "Veillonella_parvula|SGB6939"                       
## [258,] "Colidextribacter_sp_210702_DFI_3_9|SGB15146"       
## [259,] "Coprobacter_fastidiosus|SGB1963"                   
## [260,] "Veillonella_rogosae|SGB6956"                       
## [261,] "Eubacteriaceae_bacterium|SGB3958"                  
## [262,] "TM7_phylum_sp_oral_taxon_352|SGB19860_group"       
## [263,] "Enterocloster_hominis|SGB4721"                     
## [264,] "Rothia_mucilaginosa|SGB16971_group"                
## [265,] "Blautia_luti|SGB4832"                              
## [266,] "Blautia_sp_OF03_15BH|SGB4779"                      
## [267,] "Streptococcus_australis|SGB8059_group"             
## [268,] "Ruminococcus_gnavus|SGB4571"                       
## [269,] "Streptococcus_thermophilus|SGB8002"                
## [270,] "Roseburia_hominis|SGB4659"                         
## [271,] "Lentihominibacter_faecis|SGB3957"                  
## [272,] "Actinomyces_sp_ICM47|SGB17167_group"               
## [273,] "Evtepia_gabavorous|SGB15120"                       
## [274,] "Blautia_obeum|SGB4810"                             
## [275,] "GGB3480_SGB4648|SGB4648"                           
## [276,] "Ellagibacter_isourolithinifaciens|SGB14816_group"  
## [277,] "GGB2982_SGB3964|SGB3964"                           
## [278,] "Anaerobutyricum_soehngenii|SGB4537"                
## [279,] "Bacteroides_xylanisolvens|SGB1867"                 
## [280,] "Neobittarella_massiliensis|SGB7264"                
## [281,] "Faecalibacillus_intestinalis|SGB6754"              
## [282,] "GGB9708_SGB15233|SGB15233"                         
## [283,] "Enterocloster_asparagiformis|SGB4724"              
## [284,] "GGB51884_SGB49168|SGB49168"                        
## [285,] "Enterocloster_aldenensis|SGB4762"                  
## [286,] "Eggerthella_lenta|SGB14809"                        
## [287,] "Bacteroides_fragilis|SGB1855"                      
## [288,] "Veillonella_atypica|SGB6936"                       
## [289,] "Enterocloster_bolteae|SGB4758"                     
## [290,] "Romboutsia_timonensis|SGB6148"                     
## [291,] "Roseburia_lenta|SGB4957"                           
## [292,] "GGB3288_SGB4342|SGB4342"                           
## [293,] "GGB9574_SGB14987|SGB14987"                         
## [294,] "Dorea_longicatena|SGB4582"                         
## [295,] "Eubacterium_sp_AF34_35BH|SGB5051"                  
## [296,] "GGB9640_SGB15115|SGB15115"                         
## [297,] "Clostridiaceae_unclassified_SGB4771|SGB4771"       
## [298,] "GGB9766_SGB15383|SGB15383"                         
## [299,] "Lachnospiraceae_bacterium_OM04_12BH|SGB4893"       
## [300,] "Clostridium_SGB6173|SGB6173"                       
## [301,] "GGB9616_SGB15051|SGB15051"                         
## [302,] "Intestinibacter_bartlettii|SGB6140"

common_microbiome <- intersect(
  colnames(data_microbiome_original), 
  colnames(data_microbiome_validation))[-c(1:2)]

tr1 <- data_microbiome_original %>% 
  mutate(Data = if_else(Country == 'CZ', 'CZ_tr', 'IT_tr')) %>% 
  select(Data, Diet, all_of(common_microbiome))

tr2 <- data_microbiome_validation %>% 
  mutate(Data = 'valid',
         Diet = Diet) %>% 
  select(Data, Diet, all_of(common_microbiome))

data_merged <- bind_rows(tr1, tr2)

data_microbiome_validation$`Wujia_chipingenss|SGB5111`
## NULL
data_microbiome_validation$`Wujia_chipingensis|SGB5111`
##   [1] -2.2659270 -1.8366784 -1.5719579 -1.5649595 -1.3519655 -1.9273808
##   [7] -0.5872619 -1.9902827 -0.9982257 -1.6966210 -2.1930395 -0.1319288
##  [13] -2.1260143 -1.9245373 -2.0088636  0.5526297  3.9926563 -0.7907929
##  [19] -2.1891711 -1.9395276 -2.1353811 -2.1654279 -2.2360908 -0.3930113
##  [25] -2.3957911 -0.2977329 -1.4533578 -1.4072936 -2.0794670  1.8049062
##  [31]  1.5258843  3.6617309  3.8680214 -2.0412347  0.4626366  0.3866172
##  [37]  5.8044209 -2.2201075 -0.4444399 -2.1377634 -2.1399921 -2.0933531
##  [43]  4.5931692 -1.6871195  2.8164681 -2.5644101 -2.0585439 -2.6629664
##  [49]  0.6723405  4.4534434  4.1610574  5.3832639 -1.6933290 -2.5382040
##  [55] -2.4107280 -2.1198776  5.3413842 -1.8967301 -1.5487724 -2.1396917
##  [61] -1.7628668  4.1335815  5.0872100  3.6031291 -1.8040730  4.9799740
##  [67] -0.9850500 -1.5123666 -0.5770878 -1.7964678 -2.1354643  2.3798401
##  [73]  5.9776544 -2.2009657 -2.4395272  4.0615507 -1.3941153 -2.3882306
##  [79] -2.2799115  4.6054080 -1.5516203 -0.4367249 -1.5615449  4.0790976
##  [85]  1.0917225  3.7139508 -1.8139512 -1.1749955 -1.6831395 -1.5614355
##  [91] -1.9565894 -0.1425388 -1.0454574 -2.1128182  4.9182020 -1.6631032
##  [97] -2.3185501 -1.6734295 -1.6233100 -1.5638780  3.3149668  4.7828809
## [103] -1.5505336

3.2 Explore

3.2.0.1 Distributions - clr transformed

Open code
check <- data_microbiome_original %>% 
  dplyr::select(-c(Sample:Diet)
    ) %>% 
  na.omit()


size = c(8, 7)
par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
set.seed(16)
ran <- sample(1:ncol(check), size[1]*size[2], replace = FALSE)
for(x in ran){
  hist(check[,x], 
       16, 
       col='blue', 
       main = paste0(colnames(check)[x])
  )
}

Histogram of CLR-transformed proportions for a random selection of 56 bacterial taxa (proportion: number of sequences for each bacterium relative to the total library depth)

3.2.0.2 Taxon proportions accross groups

Open code
colo <- c('#329243', '#F9FFAF')

data_merged <- na.omit(data_merged)

outcomes <- common_microbiome[
    sample(
      1:length(common_microbiome), 35, replace = FALSE
      )
  ]

boxplot_cond <- function(variable) {
  
  
  p <- ggboxplot(data_merged, 
                 x = 'Diet', 
                 y = variable, 
                 fill = 'Diet', 
                 tip.length = 0.15,
                 palette = colo,
                 outlier.shape = 1,
                 lwd = 0.25,
                 outlier.size = 0.8,
                 facet.by = 'Data',
                 title = variable,
                 ylab = 'CLR(taxa proportion)') +
    
    theme(
      plot.title = element_text(size = 10), 
      axis.title = element_text(size = 8),  
      axis.text.y = element_text(size = 7),
      axis.text.x = element_blank(),
      axis.title.x = element_blank()
    ) 
    
  return(p)
}

# Plot all outcomes
plots <- map(outcomes, boxplot_cond)

# Create a matrix of plots
plots_arranged <- ggarrange(plotlist = plots, ncol = 5, nrow = 7,  common.legend = TRUE)
plots_arranged

CLR-trasformed counts in randomly chosen 35 bacterial taxa, across all 3 cohorts (Czech and Italian training cohorts and an independent Czech valdiation cohort) and both dietary groups

4 Linear models across taxa

We will fit a feature-specific linear model where the CLR-transformed count of bacteria reads represents the outcome variable whereas country (Italy vs Czech), diet (vegan vs omnivore), and their interaction (country:diet) all represent fixed-effects predictors. So, each model has the following form

\[ CLR({N_i}) = \alpha + \beta_{1} \times country + \beta_{2} \times diet + \beta_{3} \times country:diet + \epsilon \] where \(N_i\) is read count of \(i\)-th bacteria taxa

The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary output as presenting the conditional effects of diet averaged across both countries and the conditional effects of country averaged across both diet groups. We will then use the emmeans package (Lenth 2024) to obtain specific estimates for the effect of diet in the Italian and Czech cohorts separately, still from a single model.

Taxa that will show a significant diet effect (average effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.05) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable these findings are (see external validation section).

Note that p-value for avg effects are the same as produced with car::Anova(model, type = 'III').

4.1 Select data

Open code
data_analysis <- data_microbiome_original %>%
  na.omit() %>%
  dplyr::mutate(
    Diet_VEGAN = as.numeric(
      dplyr::if_else(
        Diet == "VEGAN", 0.5, -0.5
      )
    ),
    Country_IT = as.numeric(
      dplyr::if_else(
        Country == "IT", 0.5, -0.5
      )
    )
  ) %>%
  dplyr::select(
    Sample,
    Country,
    Country_IT,
    Diet,
    Diet_VEGAN,
    dplyr::everything()
  )

summary(data_analysis[ , 1:12])
##     Sample            Country            Country_IT           Diet          
##  Length:166         Length:166         Min.   :-0.50000   Length:166        
##  Class :character   Class :character   1st Qu.:-0.50000   Class :character  
##  Mode  :character   Mode  :character   Median :-0.50000   Mode  :character  
##                                        Mean   :-0.03012                     
##                                        3rd Qu.: 0.50000                     
##                                        Max.   : 0.50000                     
##    Diet_VEGAN       Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
##  Min.   :-0.50000   Min.   :-4.8322               Min.   :-5.438              
##  1st Qu.:-0.50000   1st Qu.:-3.6490               1st Qu.: 3.908              
##  Median : 0.50000   Median :-1.3277               Median : 6.130              
##  Mean   : 0.08434   Mean   : 0.5654               Mean   : 4.703              
##  3rd Qu.: 0.50000   3rd Qu.: 4.7830               3rd Qu.: 7.159              
##  Max.   : 0.50000   Max.   : 9.4417               Max.   : 9.897              
##  Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
##  Min.   :-2.250                                Min.   :-3.273               
##  1st Qu.: 4.278                                1st Qu.: 4.743               
##  Median : 5.405                                Median : 6.158               
##  Mean   : 5.113                                Mean   : 5.964               
##  3rd Qu.: 6.447                                3rd Qu.: 7.395               
##  Max.   : 9.195                                Max.   :10.043               
##  Eubacterium_siraeum|SGB4198 GGB9602_SGB15031|SGB15031
##  Min.   :-5.694              Min.   :-4.2797          
##  1st Qu.:-4.539              1st Qu.:-2.6107          
##  Median :-2.937              Median : 0.3740          
##  Mean   :-1.269              Mean   : 0.5761          
##  3rd Qu.: 2.030              3rd Qu.: 3.4878          
##  Max.   : 6.338              Max.   : 8.0727          
##  Phocaeicola_vulgatus|SGB1814
##  Min.   :-3.802              
##  1st Qu.: 4.296              
##  Median : 5.775              
##  Mean   : 5.483              
##  3rd Qu.: 7.100              
##  Max.   :10.519

4.1.1 Define number of microbiome and covariates

Open code
n_covarites <- 5
n_features <- ncol(data_analysis) - n_covarites

4.1.2 Create empty objects

Open code
outcome <- vector('double', n_features)
est_VGdiet_inCZ <- vector('double', n_features)
est_VGdiet_inIT <- vector('double', n_features)
est_VGdiet_avg <- vector('double', n_features)

est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)


P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg <- vector('double', n_features)

P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)


CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg <- vector('double', n_features)

CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features)
CI_U_VGdiet_avg <- vector('double', n_features)

4.1.3 Estimate over outcomes

Open code
for (i in 1:n_features) {
  
  ## define variable
  data_analysis$outcome <- data_analysis[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis)

  ## get contrast (effects of diet BY COUNTRY)
  contrast_emm <- summary(
    pairs(
      emmeans(
        model,
        specs = ~ Diet_VEGAN | Country_IT
        ),
      interaction = TRUE,
      adjust = "none"
      ),
    infer = c(TRUE, TRUE)
    )

  ## save results
  outcome[i] <- names(data_analysis)[i + n_covarites]
  
  ## country effect
  est_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 1
  ]

  P_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 4
  ]
  
  
  ## diet effect
  tr <- confint(model)
  
  CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
  CI_U_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
  
  est_VGdiet_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 1
  ]

  P_VGdiet_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 4
  ]
  
  est_VGdiet_inCZ[i] <- -contrast_emm[1,3]
  P_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
  CI_L_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
  CI_U_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
  
  est_VGdiet_inIT[i] <- -contrast_emm[2,3]
  P_VGdiet_inIT[i] <- contrast_emm$p.value[2]
  CI_L_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
  CI_U_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
  
  ## interaction
  diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_VEGAN"
    ), 1
  ]

  P_diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_VEGAN"
    ), 4
  ]
}

4.1.4 Results table

Open code
result_microbiome <- data.frame(
  outcome,
  est_ITcountry_avg, P_ITcountry_avg,
  est_VGdiet_avg, P_VGdiet_avg,
  est_VGdiet_inCZ, P_VGdiet_inCZ,
  est_VGdiet_inIT, P_VGdiet_inIT,
  diet_country_int, P_diet_country_int,
  CI_L_VGdiet_avg, CI_U_VGdiet_avg,
  CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
  CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)

4.1.5 Adjust p values

Open code
result_microbiome <- result_microbiome %>% 
  dplyr::mutate(
    fdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
    fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
    
    fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
    fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
    fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
  ) %>% 
  dplyr::select(
    outcome,
    est_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
    est_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
    est_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
    est_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
    diet_country_int, P_diet_country_int, fdr_diet_country_int,
    CI_L_VGdiet_avg, CI_U_VGdiet_avg,
    CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
    CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
  )

4.1.6 Show and save results

Open code
kableExtra::kable(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05),
  caption = "Results of linear models, modeling the CLR-transformed relative abundance of a given taxon as the outcome, with `Diet`, `Country`, and `Diet x Country` interaction as fixed-effect predictors. Only taxa whose CLR-transformed relative abundance differs between diets (FDR < 0.05, average diet effect across both countries) are shown. `est` prefix: denotes estimated effects (regression coefficients), i.e., how much CLR-transformed relative abundance differs in vegans compared to omnivores and in the Italian cohort vs. the Czech cohort, respectively; `P`: p-value; `fdr`: p-value after adjustment for multiple comparisons; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix denotes effects averaged across subgroups, whereas `inCZ` and `inIT` denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effects of a vegan diet between the Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model"
)
Results of linear models, modeling the CLR-transformed relative abundance of a given taxon as the outcome, with Diet, Country, and Diet x Country interaction as fixed-effect predictors. Only taxa whose CLR-transformed relative abundance differs between diets (FDR < 0.05, average diet effect across both countries) are shown. est prefix: denotes estimated effects (regression coefficients), i.e., how much CLR-transformed relative abundance differs in vegans compared to omnivores and in the Italian cohort vs. the Czech cohort, respectively; P: p-value; fdr: p-value after adjustment for multiple comparisons; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. avg suffix denotes effects averaged across subgroups, whereas inCZ and inIT denote effects in the Czech or Italian cohort, respectively. Interaction effects are reported as diet_country_int (difference in the effects of a vegan diet between the Italian and Czech cohorts; positive values indicate a stronger effect in the Italian, negative values a stronger effect in the Czech cohort) and P_diet_country_int (its p-value). All estimates in a single row are based on a single model
outcome est_ITcountry_avg P_ITcountry_avg fdr_ITcountry_avg est_VGdiet_avg P_VGdiet_avg fdr_VGdiet_avg est_VGdiet_inCZ P_VGdiet_inCZ fdr_VGdiet_inCZ est_VGdiet_inIT P_VGdiet_inIT fdr_VGdiet_inIT diet_country_int P_diet_country_int fdr_diet_country_int CI_L_VGdiet_avg CI_U_VGdiet_avg CI_L_VGdiet_inCZ CI_U_VGdiet_inCZ CI_L_VGdiet_inIT CI_U_VGdiet_inIT
Escherichia_coli|SGB10068 0.2619264 0.5866636 0.6604881 -1.6056687 0.0010415 0.0124560 -1.7212276 0.0123629 0.1114726 -1.4901099 0.0297547 0.2433353 0.2311177 0.8103673 0.9929048 -2.5551229 -0.6562146 -3.0647170 -0.3777381 -2.8320819 -0.1481379
Bacteroides_clarus|SGB1832 1.9131572 0.0002848 0.0008431 -1.9852453 0.0001701 0.0031781 -1.6674895 0.0236093 0.1534603 -2.3030012 0.0018864 0.0961993 -0.6355117 0.5386671 0.9929048 -3.0036467 -0.9668440 -3.1085401 -0.2264388 -3.7424241 -0.8635783
Hydrogenoanaerobacterium_saccharovorans|SGB15350 1.5011518 0.0012291 0.0032813 -1.3582916 0.0033582 0.0313785 -1.3329832 0.0405517 0.2090509 -1.3835999 0.0334061 0.2561137 -0.0506167 0.9558340 0.9929048 -2.2592946 -0.4572885 -2.6079137 -0.0580527 -2.6570904 -0.1101095
GGB6649_SGB9391|SGB9391 0.2325600 0.5179518 0.6002620 1.0140390 0.0053203 0.0418620 1.7803147 0.0005900 0.0117612 0.2477634 0.6259425 0.9452363 -1.5325513 0.0342745 0.5717111 0.3052515 1.7228266 0.7773715 2.7832579 -0.7540470 1.2495737
Ruthenibacterium_lactatiformans|SGB15271 0.9036294 0.0030971 0.0073356 -0.8414746 0.0057914 0.0444010 -1.5432722 0.0003871 0.0100784 -0.1396771 0.7430182 0.9689172 1.4035951 0.0209147 0.5717111 -1.4356716 -0.2472776 -2.3840684 -0.7024759 -0.9795236 0.7001695
Lawsonibacter_asaccharolyticus|SGB15154 2.8249857 0.0000000 0.0000000 -1.5247873 0.0000971 0.0023730 -1.9303765 0.0004595 0.0105548 -1.1191981 0.0394843 0.2775605 0.8111784 0.2892321 0.9724308 -2.2780362 -0.7715384 -2.9962332 -0.8645198 -2.1838509 -0.0545454
Clostridium_fessum|SGB4705 -1.6158900 0.0000443 0.0001577 -1.1006841 0.0048020 0.0398833 -1.1368423 0.0384349 0.2063733 -1.0645259 0.0521082 0.3314966 0.0723164 0.9252760 0.9929048 -1.8607983 -0.3405699 -2.2124133 -0.0612713 -2.1388820 0.0098303
GGB9707_SGB15229|SGB15229 0.6452823 0.0445093 0.0756152 1.0866666 0.0008197 0.0106555 0.9548967 0.0357238 0.2054116 1.2184364 0.0075553 0.1328848 0.2635397 0.6797760 0.9929048 0.4574080 1.7159252 0.0644880 1.8453055 0.3290334 2.1078395
Lachnospiraceae_bacterium_AM48_27BH|SGB4706 -1.1131285 0.0045403 0.0102071 1.2676366 0.0012809 0.0147307 1.6523047 0.0029449 0.0463439 0.8829685 0.1082011 0.4901839 -0.7693361 0.3214104 0.9724308 0.5039082 2.0313650 0.5716194 2.7329899 -0.1964961 1.9624331
GGB9509_SGB14906|SGB14906 1.0817985 0.0089485 0.0182013 -1.6543521 0.0000804 0.0023730 -1.5741131 0.0072217 0.0835575 -1.7345911 0.0031102 0.0961993 -0.1604780 0.8446550 0.9929048 -2.4617079 -0.8469964 -2.7165316 -0.4316946 -2.8757193 -0.5934630
GGB45432_SGB63101|SGB63101 0.7232262 0.0088808 0.0181874 -0.7958911 0.0040635 0.0368175 -0.8984331 0.0212966 0.1415041 -0.6933491 0.0742736 0.3998957 0.2050839 0.7077520 0.9929048 -1.3350948 -0.2566874 -1.6614131 -0.1354531 -1.4554673 0.0687691
GGB2653_SGB3574|SGB3574 0.0113250 0.9753715 0.9753715 -1.4691557 0.0000921 0.0023730 -1.8428969 0.0004942 0.0105548 -1.0954145 0.0358765 0.2681768 0.7474824 0.3090608 0.9724308 -2.1924343 -0.7458771 -2.8663451 -0.8194487 -2.1177067 -0.0731223
Phocea_massiliensis|SGB14837 0.4892877 0.1130213 0.1698159 -1.1101840 0.0004001 0.0058102 -1.0621671 0.0155770 0.1164377 -1.1582010 0.0083931 0.1348712 -0.0960339 0.8759362 0.9929048 -1.7165656 -0.5038025 -1.9202046 -0.2041296 -2.0152693 -0.3011327
Lachnospiraceae_bacterium|SGB4953 -1.0110542 0.0196470 0.0376567 1.8996297 0.0000175 0.0007650 2.0042382 0.0011843 0.0208299 1.7950211 0.0035391 0.0961993 -0.2092171 0.8076874 0.9929048 1.0523587 2.7469007 0.8053392 3.2031372 0.5974763 2.9925660
Holdemania_filiformis|SGB4046 -0.8713113 0.0335927 0.0590836 -1.2995237 0.0016731 0.0185275 -2.2053986 0.0001804 0.0079526 -0.3936488 0.4942922 0.8905572 1.8117499 0.0272454 0.5717111 -2.1023519 -0.4966955 -3.3414106 -1.0693867 -1.5283776 0.7410800
Anaeromassilibacillus_senegalensis|SGB14894 1.0587520 0.0067348 0.0142817 -1.2016868 0.0021720 0.0223943 -1.3537201 0.0141471 0.1114726 -1.0496536 0.0559301 0.3474891 0.3040665 0.6939733 0.9929048 -1.9633423 -0.4400314 -2.4314720 -0.2759681 -2.1261882 0.0268810
Ruminococcus_sp_AF41_9|SGB25497 -0.8299026 0.0411850 0.0707719 1.7828783 0.0000179 0.0007650 2.7832380 0.0000025 0.0001522 0.7825187 0.1716609 0.6497039 -2.0007193 0.0141307 0.5281335 0.9865895 2.5791671 1.6564794 3.9099965 -0.3429672 1.9080046
Eubacterium_ramulus|SGB4959 -2.2432627 0.0000000 0.0000003 -1.3014646 0.0010277 0.0124560 -1.5154793 0.0066098 0.0835575 -1.0874499 0.0497902 0.3258371 0.4280295 0.5832078 0.9929048 -2.0701289 -0.5328003 -2.6031489 -0.4278097 -2.1738909 -0.0010088
Youxingia_wuxianensis|SGB82503 0.2836651 0.4199975 0.5187690 -1.0654123 0.0027893 0.0278003 -1.7207509 0.0006765 0.0126413 -0.4100738 0.4095044 0.8055383 1.3106771 0.0635956 0.7486670 -1.7582618 -0.3725628 -2.7011416 -0.7403602 -1.3893571 0.5692096
Blautia_massiliensis|SGB4826 -3.9054717 0.0000000 0.0000000 -1.7487130 0.0019052 0.0203449 -1.6636803 0.0353443 0.2054116 -1.8337457 0.0204108 0.2087421 -0.1700654 0.8782097 0.9929048 -2.8427577 -0.6546682 -3.2117672 -0.1155933 -3.3800841 -0.2874073
GGB9765_SGB15382|SGB15382 0.9724670 0.0052880 0.0115410 -1.0403415 0.0028965 0.0279370 -0.6868283 0.1601205 0.4949617 -1.3938547 0.0046947 0.1169756 -0.7070263 0.3055987 0.9724308 -1.7195795 -0.3611035 -1.6479586 0.2743020 -2.3538993 -0.4338100
GGB9770_SGB15390|SGB15390 1.1393138 0.0031158 0.0073356 -1.1023969 0.0042003 0.0369375 -1.1048957 0.0413047 0.2093238 -1.0998980 0.0419905 0.2853448 0.0049977 0.9947562 0.9950745 -1.8520565 -0.3527372 -2.1656735 -0.0441180 -2.1594776 -0.0403184
Ruminococcus_torques|SGB4608 -2.5450549 0.0000002 0.0000009 -2.9967166 0.0000000 0.0000002 -4.0037248 0.0000000 0.0000008 -1.9897083 0.0028797 0.0961993 2.0140166 0.0318587 0.5717111 -3.9152587 -2.0781744 -5.3034734 -2.7039763 -3.2879887 -0.6914278
Agathobaculum_butyriciproducens|SGB14991 -0.2799754 0.5736608 0.6497143 1.8298077 0.0003116 0.0049041 1.7918165 0.0116982 0.1114726 1.8677989 0.0085704 0.1348712 0.0759824 0.9391106 0.9929048 0.8492138 2.8104015 0.4042640 3.1793691 0.4818136 3.2537841
Clostridium_SGB48024|SGB48024 0.3495793 0.4685239 0.5603545 1.8824245 0.0001341 0.0027026 2.0858928 0.0025593 0.0425134 1.6789563 0.0145882 0.1671720 -0.4069364 0.6729259 0.9929048 0.9323470 2.8325021 0.7415212 3.4302643 0.3361033 3.0218094
GGB9531_SGB14932|SGB14932 0.6741149 0.0481296 0.0812355 -1.2220355 0.0004081 0.0058102 -1.0267774 0.0335676 0.2007343 -1.4172937 0.0035162 0.0961993 -0.3905163 0.5648881 0.9929048 -1.8905380 -0.5535330 -1.9727167 -0.0808380 -2.3621646 -0.4724228
Oscillibacter_valericigenes|SGB15076 2.2693025 0.0000001 0.0000007 -1.5455775 0.0002310 0.0038371 -0.3939028 0.4984511 0.7655611 -2.6972522 0.0000068 0.0020399 -2.3033494 0.0056168 0.3358826 -2.3558148 -0.7353402 -1.5403986 0.7525931 -3.8424531 -1.5520513
GGB9616_SGB15052|SGB15052 0.1922882 0.6243555 0.6939863 1.5604516 0.0001032 0.0023730 1.3778329 0.0139910 0.1114726 1.7430703 0.0019661 0.0961993 0.3652374 0.6418749 0.9929048 0.7865111 2.3343920 0.2826975 2.4729683 0.6491719 2.8369687
GGB58158_SGB79798|SGB79798 -0.0395133 0.9036721 0.9285153 -0.9249820 0.0051278 0.0414382 -1.2410264 0.0078830 0.0858020 -0.6089377 0.1881529 0.6772633 0.6320887 0.3337315 0.9724308 -1.5687010 -0.2812631 -2.1518967 -0.3301560 -1.5187792 0.3009038
Enterocloster_hominis|SGB4721 -0.4611836 0.2485952 0.3190128 -1.6735473 0.0000436 0.0014488 -2.1154645 0.0002425 0.0080565 -1.2316301 0.0301117 0.2433353 0.8838345 0.2688329 0.9724308 -2.4600398 -0.8870548 -3.2283612 -1.0025678 -2.3432697 -0.1199904
Streptococcus_thermophilus|SGB8002 -1.9892533 0.0000000 0.0000001 -2.6523655 0.0000000 0.0000000 -4.2589263 0.0000000 0.0000000 -1.0458046 0.0256343 0.2433353 3.2131217 0.0000024 0.0007185 -3.3010343 -2.0036966 -5.1768008 -3.3410518 -1.9626424 -0.1289669
Blautia_obeum|SGB4810 -1.2227533 0.0022206 0.0055796 1.8263749 0.0000071 0.0005271 2.8583481 0.0000008 0.0000597 0.7944017 0.1549579 0.6177656 -2.0639464 0.0095315 0.4749853 1.0496396 2.6031103 1.7592579 3.9574383 -0.3034471 1.8922505
GGB51884_SGB49168|SGB49168 0.1489066 0.6616867 0.7247045 1.2130993 0.0004677 0.0063559 1.3067445 0.0072659 0.0835575 1.1194540 0.0209440 0.2087421 -0.1872905 0.7831313 0.9929048 0.5423556 1.8838430 0.3576339 2.2558552 0.1714154 2.0674927
Roseburia_lenta|SGB4957 -1.4433630 0.0000086 0.0000362 1.7438636 0.0000001 0.0000112 2.9096930 0.0000000 0.0000001 0.5780342 0.1946353 0.6772633 -2.3316588 0.0002817 0.0421119 1.1237786 2.3639486 2.0322650 3.7871210 -0.2984027 1.4544712
GGB3288_SGB4342|SGB4342 -0.3043170 0.4247255 0.5187690 1.4470612 0.0002004 0.0035245 1.9437150 0.0004045 0.0100784 0.9504073 0.0788960 0.4138580 -0.9933077 0.1933818 0.9594504 0.6961415 2.1979808 0.8811543 3.0062757 -0.1109532 2.0117678
GGB9574_SGB14987|SGB14987 0.6979519 0.0760440 0.1235716 1.1288124 0.0044098 0.0376722 0.8497545 0.1264137 0.4308016 1.4078702 0.0117570 0.1671720 0.5581157 0.4763145 0.9929048 0.3569149 1.9007099 -0.2424901 1.9419991 0.3168594 2.4988811
Clostridiaceae_unclassified_SGB4771|SGB4771 -0.0888094 0.8522123 0.8878449 2.1375079 0.0000134 0.0007650 1.8328628 0.0072095 0.0835575 2.4421530 0.0003793 0.0567018 0.6092902 0.5230257 0.9929048 1.1976428 3.0773730 0.5029420 3.1627836 1.1137344 3.7705716
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -0.6037329 0.1247910 0.1811286 1.6753589 0.0000317 0.0011836 2.1178834 0.0001862 0.0079526 1.2328344 0.0271771 0.2433353 -0.8850489 0.2597427 0.9724308 0.9026890 2.4480288 1.0245459 3.2112209 0.1407319 2.3249370
GGB9616_SGB15051|SGB15051 0.0491406 0.9109482 0.9296024 1.7151507 0.0001356 0.0027026 1.5312508 0.0146741 0.1125016 1.8990506 0.0025687 0.0961993 0.3677999 0.6756214 0.9929048 0.8488687 2.5814327 0.3054509 2.7570507 0.6746353 3.1234660
Open code

if(file.exists('gitignore/lm_results/result_microbiome_SGB30.csv') == FALSE){
  write.table(result_microbiome, 
              'gitignore/lm_results/result_microbiome_SGB30.csv', row.names = FALSE)
}

dim(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05))
## [1] 39 22

5 Elastic net

To assess the predictive power of microbiome features to discriminate between diet strategy, we employed Elastic Net logistic regression.

As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4).

The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.

All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5).

5.1 Prepare data for glmnet

Open code
data_microbiome_glmnet <- data_microbiome_original %>%
  na.omit() %>%
  dplyr::mutate(
    vegan = as.numeric(
      dplyr::if_else(
        Diet == "VEGAN", 1, 0
      )
    ),
    dplyr::across(
      `Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`, 
      ~ arm::rescale(.)
    )
  ) %>%
  dplyr::select(
    vegan,Country,
    dplyr::everything()
  ) %>%
  dplyr::select(
    Sample, vegan, Country,
    `Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`
  )

5.2 Fit model

Open code
modelac <- "elanet_microbiome_SGB30"

assign(
  modelac,
  run(
    expr = clust_glmnet_sep(
      data = data_microbiome_glmnet,
      outcome = "vegan",
      clust_id = "Sample",
      sample_method = "oos_boot",
      N = 500,
      alphas = c(0, 0.2, 0.4),
      family = "binomial",
      seed = 478
    ),
    path = paste0("gitignore/run/", modelac)
  )
)

5.3 Model summary

Open code
elanet_microbiome_SGB30$model_summary
##   alpha     lambda       auc auc_OutOfSample auc_oos_CIL auc_oos_CIU  accuracy
## 1   0.4 0.02773441 0.9952189        0.887315   0.8047796   0.9523366 0.9759036
##   accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1            0.7973382        0.6949153        0.8833333
elanet_microbiome_SGB30$country_AUC
##   auc_OutOfSample_IT auc_oos_CIL_IT auc_oos_CIU_IT auc_OutOfSample_CZ
## 1          0.8056122      0.6605469      0.9236888          0.9428794
##   auc_oos_CIL_CZ auc_oos_CIU_CZ
## 1       0.850783              1

5.4 ROC curve - internal validation

Open code
elanet_microbiome_SGB30$plot

Receiver operating characteristic (ROC) curves from a random subset of out-of-bag bootstrap iterations, illustrating the model’s ability to discriminate between vegan and omnivore status based on standardized CLR-transformed count of bacteria reads. Each curve corresponds to an elastic net model trained on a bootstrap resample and evaluated on the subjects not included in that iteration (i.e., out-of-bag samples). The curve plots the true positive rate (sensitivity) against the false positive rate (1 – specificity) across thresholds of predicted vegan probability. The area under the curve (AUC) quantifies overall discriminatory performance, with values closer to 1 indicating stronger separation between groups.

5.5 Estimated coefficients

Open code
data.frame(
  microbiome = row.names(
    elanet_microbiome_SGB30$betas
    )[
      which(
        abs(
          elanet_microbiome_SGB30$betas
          )>0
        )
      ],
  beta = elanet_microbiome_SGB30$betas[
    abs(
      elanet_microbiome_SGB30$betas
      )>0
    ]
  ) %>% 
  mutate(
    is_in_ExtValCoh = if_else(
      microbiome %in% names(data_microbiome_validation),
      1, 0
      )
    )
##                                          microbiome         beta
## 1                                       (Intercept)  0.564498328
## 2                         GGB9602_SGB15031|SGB15031 -0.050575163
## 3                      Phocaeicola_vulgatus|SGB1814  0.028735639
## 4                Parabacteroides_distasonis|SGB1934  0.090984199
## 5             Faecalibacterium_prausnitzii|SGB15339  0.128597925
## 6                   Dialister_invisus|SGB5825_group  0.051028446
## 7                        GGB33469_SGB15236|SGB15236 -0.021033530
## 8                     Bacteroides_eggerthii|SGB1829 -0.239198334
## 9                           GGB3175_SGB4191|SGB4191 -0.114908598
## 10                Lachnospira_pectinoschiza|SGB5075 -0.322160033
## 11                       Bacteroides_clarus|SGB1832 -0.385086576
## 12                   Anaerotignum_faecicola|SGB5190 -0.234888794
## 13 Hydrogenoanaerobacterium_saccharovorans|SGB15350 -0.306700839
## 14                       Bacteroides_faecis|SGB1860  0.034365203
## 15                        Roseburia_hominis|SGB4936  0.030152264
## 16                        GGB9712_SGB15244|SGB15244 -0.005110184
## 17                       Bacteroides_ovatus|SGB1871  0.259354872
## 18                        GGB9614_SGB15049|SGB15049  0.077383387
## 19             Barnesiella_intestinihominis|SGB1965 -0.099142678
## 20         Agathobaculum_butyriciproducens|SGB14993  0.121857938
## 21                          GGB6649_SGB9391|SGB9391  0.278958714
## 22                      Ruminococcus_bromii|SGB4285  0.079729289
## 23          Lawsonibacter_asaccharolyticus|SGB15154 -0.104282749
## 24                       Clostridium_fessum|SGB4705 -0.281417507
## 25                    Clostridium_sp_AF36_4|SGB4644  0.147235148
## 26                        GGB9707_SGB15229|SGB15229  0.161051651
## 27                        Phocaeicola_dorei|SGB1815 -0.052381703
## 28                        GGB9502_SGB14899|SGB14899 -0.107431320
## 29      Lachnospiraceae_bacterium_AM48_27BH|SGB4706  0.142752425
## 30                        GGB9619_SGB15067|SGB15067 -0.057328862
## 31                        GGB9509_SGB14906|SGB14906 -0.339977572
## 32                          GGB2653_SGB3574|SGB3574 -0.209202356
## 33                Lachnospiraceae_bacterium|SGB4781 -0.119275863
## 34                     Phocea_massiliensis|SGB14837 -0.137873995
## 35                        Guopingia_tenuis|SGB14127 -0.142604384
## 36                Lachnospiraceae_bacterium|SGB4953  0.182716684
## 37                          Blautia_SGB4815|SGB4815 -0.203040322
## 38                    Holdemania_filiformis|SGB4046 -0.162509820
## 39      Anaeromassilibacillus_senegalensis|SGB14894 -0.092177287
## 40                Lachnospira_pectinoschiza|SGB5089  0.062808523
## 41                  Ruminococcus_sp_AF41_9|SGB25497  0.416188098
## 42                      Eubacterium_ramulus|SGB4959 -0.266997099
## 43                 Enterocloster_lavalensis|SGB4725  0.134033402
## 44                   Youxingia_wuxianensis|SGB82503 -0.126338803
## 45               Intestinimonas_gabonensis|SGB79840  0.011811531
## 46                        GGB9765_SGB15382|SGB15382 -0.245596546
## 47                     Coprobacter_secundus|SGB1962 -0.017989664
## 48                          GGB3510_SGB4687|SGB4687  0.148626169
## 49                Mediterraneibacter_faecis|SGB4563 -0.303790368
## 50                     Ruminococcus_torques|SGB4608 -0.545900784
## 51                   Eubacterium_ventriosum|SGB5045  0.167531498
## 52                        GGB9758_SGB15368|SGB15368 -0.036613335
## 53                          GGB3304_SGB4367|SGB4367  0.054870058
## 54         Agathobaculum_butyriciproducens|SGB14991  0.195509836
## 55                        GGB9296_SGB14253|SGB14253  0.047061326
## 56            Slackia_isoflavoniconvertens|SGB14773  0.017678966
## 57              Anaerotruncus_massiliensis|SGB14965 -0.079783612
## 58                        GGB9531_SGB14932|SGB14932 -0.386681608
## 59             Oscillibacter_valericigenes|SGB15076 -0.350829352
## 60                      Anaerostipes_hadrus|SGB4547  0.024870355
## 61                        GGB9616_SGB15052|SGB15052  0.043708600
## 62                        GGB9524_SGB14924|SGB14924  0.022894888
## 63                          Segatella_copri|SGB1626  0.090057068
## 64        Mediterraneibacter_butyricigenes|SGB25493  0.121777045
## 65                    Enterocloster_hominis|SGB4721 -0.258505538
## 66                     Blautia_sp_OF03_15BH|SGB4779 -0.000955559
## 67               Streptococcus_thermophilus|SGB8002 -0.810455275
## 68                            Blautia_obeum|SGB4810  0.348649793
## 69                          GGB3480_SGB4648|SGB4648 -0.025623885
## 70               Anaerobutyricum_soehngenii|SGB4537  0.031514564
## 71                Bacteroides_xylanisolvens|SGB1867  0.043692859
## 72                        GGB9708_SGB15233|SGB15233  0.105543680
## 73                       GGB51884_SGB49168|SGB49168  0.186109970
## 74                          Roseburia_lenta|SGB4957  0.477317205
## 75                          GGB3288_SGB4342|SGB4342  0.316230925
## 76                        GGB9574_SGB14987|SGB14987  0.403453173
## 77                 Eubacterium_sp_AF34_35BH|SGB5051 -0.015643625
## 78      Clostridiaceae_unclassified_SGB4771|SGB4771  0.354566828
##    is_in_ExtValCoh
## 1                0
## 2                1
## 3                1
## 4                1
## 5                1
## 6                1
## 7                1
## 8                1
## 9                1
## 10               1
## 11               1
## 12               1
## 13               1
## 14               1
## 15               1
## 16               1
## 17               1
## 18               1
## 19               1
## 20               1
## 21               1
## 22               1
## 23               1
## 24               1
## 25               1
## 26               1
## 27               1
## 28               1
## 29               1
## 30               1
## 31               1
## 32               1
## 33               1
## 34               1
## 35               1
## 36               1
## 37               1
## 38               1
## 39               1
## 40               1
## 41               1
## 42               1
## 43               1
## 44               1
## 45               1
## 46               1
## 47               1
## 48               1
## 49               1
## 50               1
## 51               1
## 52               1
## 53               1
## 54               1
## 55               1
## 56               1
## 57               1
## 58               1
## 59               1
## 60               1
## 61               1
## 62               1
## 63               1
## 64               1
## 65               1
## 66               1
## 67               1
## 68               1
## 69               1
## 70               1
## 71               1
## 72               1
## 73               1
## 74               1
## 75               1
## 76               1
## 77               1
## 78               1

5.6 Plot beta coefficients

Open code
elacoef <- data.frame(
  microbiome = row.names(elanet_microbiome_SGB30$betas),
  beta_ela = elanet_microbiome_SGB30$betas[, 1]
) %>%
  arrange(abs(beta_ela)) %>%
  filter(abs(beta_ela) > 0,
         !grepl('Intercept', microbiome)) %>%
  mutate(microbiome = factor(microbiome, levels = microbiome)) 

plotac <- "elanet_beta_microbiome_SGB30"
path <- "gitignore/figures"

assign(plotac, 
  ggplot(elacoef,
    aes(
      x = microbiome,
      y = beta_ela
    )
  ) +
  geom_point() +
  geom_hline(yintercept = 0, color = "black") +
  labs(
    y = "Standardized beta coefficients",
    x = "Bacteria species"
  ) +
  theme_minimal() +
  coord_flip() + 
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

Regression coefficients from the elastic net model predicting vegan diet strategy based on CLR-transformed and standardized proportion of bacterial taxa. Taxa are ordered by the magnitude of the standardized coefficients, indicating their relative importance in distinguishing between the diet groups. The sign of each coefficient indicates the direction of association with vegan diet status, with positive values indicating a higher likelihood of vegan status and negative values indicating omnivore status. Taxa whose effects were shrunk to zero are not shown.
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 7,
    height = 14
  )
}

6 External validation

External validation was performed with an independent Czech cohort.

As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each taxa as taken from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.

In a 2nd step, we will look at taxa that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated by linear models (one per a taxa) with data of training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these taxa will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort

6.1 Prediction of diet (elastic net)

6.1.1 Get table of weights, means and SDs

Open code

coefs_microbiome_all <- get_coef(
  original_data = data_analysis,
  glmnet_model = elanet_microbiome_SGB30)

coefs_microbiome_all
## # A tibble: 295 × 5
##    predictor                             beta_scaled beta_OrigScale   mean    SD
##    <chr>                                       <dbl>          <dbl>  <dbl> <dbl>
##  1 (Intercept)                                0.564          NA     NA     NA   
##  2 Bacteroides_stercoris|SGB1830              0               0      0.565  4.53
##  3 Alistipes_putredinis|SGB2318               0               0      4.70   3.87
##  4 Candidatus_Cibionibacter_quicibialis…      0               0      5.11   2.15
##  5 Bacteroides_uniformis|SGB1836              0               0      5.96   2.01
##  6 Eubacterium_siraeum|SGB4198                0               0     -1.27   3.66
##  7 GGB9602_SGB15031|SGB15031                 -0.0506         -0.337  0.576  3.33
##  8 Phocaeicola_vulgatus|SGB1814               0.0287          0.151  5.48   2.63
##  9 Faecalibacterium_prausnitzii|SGB15316      0               0      5.08   2.09
## 10 Lachnospiraceae_bacterium_CLA_AA_H24…      0               0      3.30   1.72
## # ℹ 285 more rows

6.1.2 Identify shared and missing predictors

Open code

## Which are missing in the validation set
missing <- setdiff(
  coefs_microbiome_all$predictor[-1], 
  colnames(
    data_microbiome_validation
    )
  )

missing
## character(0)

## Which are common with the validations et
common_predictors <- intersect(
  coefs_microbiome_all$predictor,
  colnames(data_microbiome_validation)
)

6.1.3 Standardize data in validation set

Open code
data_microbiome_validation_pred_all <- data_microbiome_validation %>%
  dplyr::mutate(
    vegan = if_else(
      Diet == "VEGAN", 1, 0
    )
  ) %>%
  dplyr::select(
    vegan,
    dplyr::all_of(common_predictors)
  ) %>% 
  dplyr::mutate(
    across(
      .cols = -vegan,
      .fns = ~ . 
      - coefs_microbiome_all$mean[
        match(
          cur_column(), 
          coefs_microbiome_all$predictor
          )
        ]
      )
    ) %>% 
  dplyr::mutate(
    across(
      .cols = -vegan,
      .fns = ~ . 
      / coefs_microbiome_all$SD[
        match(
          cur_column(), 
          coefs_microbiome_all$predictor
          )
        ]
      )
    ) 

6.1.4 Result

6.1.4.1 ROC curve

Open code

newx <- as.matrix(data_microbiome_validation_pred_all[,-1])

predicted <- predict(
  elanet_microbiome_SGB30$fit, 
  newx = newx)

tr <- data_microbiome_validation_pred_all %>% 
  dplyr::mutate(
    predicted_logit = as.numeric(
      predict(
        elanet_microbiome_SGB30$fit,
        newx = newx
        )
      )
    ) %>% 
  dplyr::mutate(
    predicted = inv_logit(predicted_logit)
  )

roc_microbiome_all <- pROC::roc(
      vegan ~ predicted_logit,
      data = tr,
      direction = "<",
      levels = c(0, 1),
      ci = TRUE
      )

roc_microbiome_all
## 
## Call:
## roc.formula(formula = vegan ~ predicted_logit, data = tr, direction = "<",     levels = c(0, 1), ci = TRUE)
## 
## Data: predicted_logit in 43 controls (vegan 0) < 59 cases (vegan 1).
## Area under the curve: 0.8786
## 95% CI: 0.8062-0.951 (DeLong)

plotac <- "roc_microbiom_SGB30"
path <- "gitignore/figures"

assign(plotac, ggroc(roc_microbiome_all))
get(plotac)

Receiver Operating Characteristic (ROC) curve illustrating the model’s ability to distinguish between vegan and omnivore status based on CLR-transformed counts of bacteria reads in the external validation Czech cohort. The curve plots the true positive rate (sensitivity) against the true positive rate (specificity) at various thresholds of predicted vegan status, as estimated from the elastic net model developed on the training data. The area under the curve (AUC) represents the model’s overall performance, with values closer to 1 indicating stronger discrimination.
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 6,
    height = 4.5
  )
}

6.1.4.2 Table

Open code
mod <- elanet_microbiome_SGB30

trainAUC <- mean(mod[["valid_performances"]]$auc_resamp_test)
trainCI <- quantile(mod[["valid_performances"]]$auc_resamp_test,
  probs = c(1 / 40, 39 / 40)
)

res <- data.frame(
  alpha = c(mod$model_summary$alpha, rep("", 4)),
  lambda = c(round(mod$model_summary$lambda, 4), rep("", 4)),
  `performance type` = c(
    "Training set AUC",
    "Out-of-sample AUC (all)",
    "Out-of-sample AUC (Czech)",
    "Out-of-sample AUC (Italy)",
    "External validation AUC"
  ),
  `performance [95% CI]` = c(
    sprintf("%.3f [%.3f to %.3f]", trainAUC, trainCI[1], trainCI[2]),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$model_summary$auc_OutOfSample,
      mod$model_summary$auc_oos_CIL,
      mod$model_summary$auc_oos_CIU
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$country_AUC$auc_OutOfSample_CZ,
      mod$country_AUC$auc_oos_CIL_CZ,
      mod$country_AUC$auc_oos_CIU_CZ
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      mod$country_AUC$auc_OutOfSample_IT,
      mod$country_AUC$auc_oos_CIL_IT,
      mod$country_AUC$auc_oos_CIU_IT
    ),
    sprintf(
      "%.3f [%.3f to %.3f]",
      roc_microbiome_all[["ci"]][2],
      roc_microbiome_all[["ci"]][1],
      roc_microbiome_all[["ci"]][3]
    )
  )
)


kableExtra::kable(
  res %>% mutate(across(where(is.numeric), ~ round(.x, 3))),
  caption = "Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed counts of bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized `alpha` (mixing parameter) and `lambda` (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method."
)
Performance of the elastic‑net logistic regression model for discriminating vegan from omnivore status using CLR-transformed counts of bacteria reads. The model was developed on the combined training data (Czech and Italian cohorts), with the optimmized alpha (mixing parameter) and lambda (penalty strength) selected via 10-fold cross-validation. Internal validation employed 500 out‑of‑bag bootstrap resamples: the out‑of‑sample AUC is the mean across resamples, and its 95 % confidence interval (CI) is given by the 2.5th and 97.5th percentiles of the bootstrap distribution. The training‑set AUC and its CI were computed analogously from the in‑bag predictions. External validation was carried out on an independent Czech cohort; the reported AUC is the point estimate on that cohort, and its 95% CI was obtained with DeLong’s method.
alpha lambda performance.type performance..95..CI.
0.4 0.0277 Training set AUC 1.000 [0.999 to 1.000]
Out-of-sample AUC (all) 0.887 [0.805 to 0.952]
Out-of-sample AUC (Czech) 0.943 [0.851 to 1.000]
Out-of-sample AUC (Italy) 0.806 [0.661 to 0.924]
External validation AUC 0.879 [0.806 to 0.951]

6.2 Diet effect across datasets

Similarly as in training data cohorts, we will fit linear model per each of the selected taxa (\(CLR\) - transformed), with a single fixed effect factor of diet.

6.2.1 Linear models in validation cohort

Open code

diet_sensitive_taxa <- result_microbiome %>% 
  filter(fdr_VGdiet_avg < 0.05) %>% 
  pull(outcome)

len <- length(diet_sensitive_taxa)

data_analysis_microbiome <- data_microbiome_validation %>%
  dplyr::mutate(
    Diet_VEGAN = as.numeric(
      dplyr::if_else(
        Diet == 'VEGAN', 1, 0
      )
    )
  ) %>%
  dplyr::select(
    Diet_VEGAN,
    all_of(diet_sensitive_taxa)
  )

6.2.1.1 Define number of microbiome and covariates

Open code
n_covarites <- 1
n_features <- ncol(data_analysis_microbiome) - n_covarites

6.2.1.2 Create empty objects

Open code
outcome <- vector('double', n_features)
est_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features)
CI_U_VGdiet <- vector('double', n_features)

6.2.1.3 Estimate over outcomes

Open code
for (i in 1:n_features) {
  ## define variable
  data_analysis_microbiome$outcome <- data_analysis_microbiome[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Diet_VEGAN, data = data_analysis_microbiome)

  ## save results
  outcome[i] <- names(data_analysis_microbiome)[i + n_covarites]

  ## diet effect
  tr <- confint(model)

  CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
  CI_U_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]

  est_VGdiet[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 1
  ]

  P_VGdiet[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_VEGAN"
    ), 4
  ]
}

6.2.1.4 Results table

Open code
result_microbiome_val <- data.frame(
  outcome,
  est_VGdiet, P_VGdiet,
  CI_L_VGdiet, CI_U_VGdiet
)

kableExtra::kable(result_microbiome_val,
  caption = "Results of linear models estimating the effect of diet on the CLR-transformed relative abundance of a given taxon as the outcome. Only taxa whose relative abundance significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `est` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed relative abundnace of a given differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model"
)
Results of linear models estimating the effect of diet on the CLR-transformed relative abundance of a given taxon as the outcome. Only taxa whose relative abundance significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. est represents the estimated effects (regression coefficient), indicating how much the CLR-transformed relative abundnace of a given differ between vegans and omnivores. P: p-value, fdr: p-value adjusted for multiple comparisons, and CI_L and CI_U represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model
outcome est_VGdiet P_VGdiet CI_L_VGdiet CI_U_VGdiet
Escherichia_coli|SGB10068 -0.0678001 0.8780534 -0.9422270 0.8066267
Bacteroides_clarus|SGB1832 -0.4474756 0.1501145 -1.0596258 0.1646745
Hydrogenoanaerobacterium_saccharovorans|SGB15350 0.0343774 0.8840093 -0.4319367 0.5006916
GGB6649_SGB9391|SGB9391 0.6707753 0.0619526 -0.0341736 1.3757242
Ruthenibacterium_lactatiformans|SGB15271 -0.5139319 0.1557949 -1.2268892 0.1990254
Lawsonibacter_asaccharolyticus|SGB15154 -2.0366512 0.0000022 -2.8405092 -1.2327932
Clostridium_fessum|SGB4705 0.0036823 0.9877139 -0.4695508 0.4769153
GGB9707_SGB15229|SGB15229 -0.2846691 0.5637032 -1.2596309 0.6902926
Lachnospiraceae_bacterium_AM48_27BH|SGB4706 0.2944198 0.1109060 -0.0687547 0.6575944
GGB9509_SGB14906|SGB14906 -0.5158608 0.1075919 -1.1461894 0.1144678
GGB45432_SGB63101|SGB63101 -0.3814408 0.3381234 -1.1676982 0.4048166
GGB2653_SGB3574|SGB3574 -0.7110515 0.0637825 -1.4636353 0.0415322
Phocea_massiliensis|SGB14837 -0.2249475 0.5453208 -0.9603579 0.5104629
Lachnospiraceae_bacterium|SGB4953 1.1093932 0.1054861 -0.2379349 2.4567214
Holdemania_filiformis|SGB4046 -0.7909502 0.0051249 -1.3392099 -0.2426906
Anaeromassilibacillus_senegalensis|SGB14894 0.1582383 0.6873362 -0.6195266 0.9360031
Ruminococcus_sp_AF41_9|SGB25497 1.5723968 0.0046260 0.4956389 2.6491548
Eubacterium_ramulus|SGB4959 -0.7821065 0.0023325 -1.2787575 -0.2854555
Youxingia_wuxianensis|SGB82503 -0.7728454 0.0336176 -1.4845820 -0.0611087
Blautia_massiliensis|SGB4826 -0.3538921 0.3462847 -1.0958595 0.3880752
GGB9765_SGB15382|SGB15382 -0.3380327 0.4376177 -1.1985609 0.5224954
GGB9770_SGB15390|SGB15390 -0.2149847 0.5654590 -0.9546180 0.5246486
Ruminococcus_torques|SGB4608 -1.9375993 0.0002924 -2.9616396 -0.9135591
Agathobaculum_butyriciproducens|SGB14991 0.4691300 0.2780519 -0.3842585 1.3225185
Clostridium_SGB48024|SGB48024 0.4767405 0.2762444 -0.3872161 1.3406970
GGB9531_SGB14932|SGB14932 -0.6312664 0.0872877 -1.3565383 0.0940056
Oscillibacter_valericigenes|SGB15076 -0.2233062 0.4190206 -0.7692669 0.3226545
GGB9616_SGB15052|SGB15052 0.9145023 0.0161248 0.1732221 1.6557826
GGB58158_SGB79798|SGB79798 -0.5931442 0.1208917 -1.3453877 0.1590993
Enterocloster_hominis|SGB4721 -2.5838231 0.0000001 -3.4848975 -1.6827488
Streptococcus_thermophilus|SGB8002 -1.9051306 0.0000122 -2.7259594 -1.0843017
Blautia_obeum|SGB4810 2.0476532 0.0000342 1.1115125 2.9837939
GGB51884_SGB49168|SGB49168 0.1576822 0.5954702 -0.4296599 0.7450243
Roseburia_lenta|SGB4957 1.9623313 0.0003619 0.9080098 3.0166528
GGB3288_SGB4342|SGB4342 0.1719754 0.6886263 -0.6770096 1.0209604
GGB9574_SGB14987|SGB14987 0.8228106 0.0037310 0.2731300 1.3724911
Clostridiaceae_unclassified_SGB4771|SGB4771 0.3951355 0.3029642 -0.3619755 1.1522465
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -0.3275102 0.5099089 -1.3100021 0.6549817
GGB9616_SGB15051|SGB15051 0.5125892 0.1770983 -0.2355437 1.2607221
Open code

if (file.exists("gitignore/lm_results/result_microbiom_validation_SGB30.csv") == FALSE) {
  write.table(result_microbiome_val,
    "gitignore/lm_results/result_microbiom_validation_SGB30.csv",
    row.names = FALSE
  )
}

6.2.2 Forest plot

6.2.2.1 Prepare data

Open code

## subset result tables
result_microbiome_subset <- result_microbiome %>%
  filter(outcome %in% diet_sensitive_taxa)

result_microbiome_val_subset <- result_microbiome_val %>%
  filter(outcome %in% diet_sensitive_taxa)

## create a data frame
data_forest <- data.frame(
  outcome = rep(diet_sensitive_taxa, 3),
  beta = c(
    result_microbiome_subset$est_VGdiet_inCZ,
    result_microbiome_subset$est_VGdiet_inIT,
    result_microbiome_val_subset$est_VGdiet
  ),
  lower = c(
    result_microbiome_subset$CI_L_VGdiet_inCZ,
    result_microbiome_subset$CI_L_VGdiet_inIT,
    result_microbiome_val_subset$CI_L_VGdiet
  ),
  upper = c(
    result_microbiome_subset$CI_U_VGdiet_inCZ,
    result_microbiome_subset$CI_U_VGdiet_inIT,
    result_microbiome_val_subset$CI_U_VGdiet
  ),
  dataset = c(
    rep("CZ", len),
    rep("IT", len),
    rep("Validation", len)
  )
)

## define ordering
validation_order <- data_forest %>%
  group_by(outcome) %>%
  summarise(beta_mean = mean(beta), .groups = "drop") %>%
  arrange(beta_mean) %>%
  pull(outcome)


## Define 'winners'
up_winners <- data_forest %>% 
  pivot_wider(names_from = dataset,
              values_from = c(beta, lower, upper)) %>% 
  left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
            by = 'outcome') %>% 
  filter(beta_CZ > 0,
         beta_IT > 0,
         lower_Validation > 0,
         beta_ela > 0.1) %>% 
  pull(outcome)

down_winners <- data_forest %>% 
  pivot_wider(names_from = dataset,
              values_from = c(beta, lower, upper)) %>% 
  left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
            by = 'outcome') %>% 
  filter(beta_CZ < 0,
         beta_IT < 0,
         upper_Validation < 0,
         beta_ela < -0.1) %>% 
  pull(outcome)

winners <- c(up_winners, down_winners)

 data_forest <- data_forest %>%
  mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
  left_join(
    elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
    by = 'outcome') %>% 
   mutate(outcome = factor(outcome, levels = validation_order))

6.2.2.2 Plotting

Open code
plotac <- "forest_plot_microbiome_SGB30"
path <- "gitignore/figures"

colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")

assign(plotac, ggplot(
  data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
  geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
  geom_hline(yintercept = 0, color = "black") +
  geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
  scale_color_manual(values = colors) +
  labs(
    y = "Effect of vegan diet on CLR-transformed relative abundance",
    x = "Outcome",
    color = "Dataset"
  ) +
  theme_minimal() +
  coord_flip() + # Flip coordinates to have outcomes on the y-axis
  scale_x_discrete(
    labels = setNames(
      ifelse(data_forest$in_winner, 
             paste0("**", data_forest$outcome, "**"), 
             as.character(data_forest$outcome)
      ), data_forest$outcome
    )
  ) +
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = ggtext::element_markdown(size = 10),  
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

The forest plot illustrates the effects of a vegan diet on the CLR-transformed relative abundances of selected taxa, along with their 95% confidence intervals, across two training cohorts (Czech and Italian) and one independent Czech cohort (validation). Green, blue, and grey points/lines represent differences in CLR-transformed relative abundance between vegans and omnivores within the Italian cohort, Czech cohort, and Czech validation cohort, respectively. Positive values suggest a higher relative abundance in vegans compared to omnivores. Only taxa whose relative abundances differed between vegan and omnivores (as an average effect across both training cohorts) were selected, and these effects were further validated in the independent cohort. The estimates for the training cohorts were obtained from a single linear model that included Diet, Country, and the interaction term Diet x Country as fixed-effect predictors. In the independent Czech validation cohort, Diet was the only fixed-effect predictor. Taxa validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are bold
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 13
  )
}

6.2.3 Boxplot

Open code
plotac <- "boxplot_microbiome_SGB30"
path <- "gitignore/figures"

colo <- c('#F9FFAF','#329243')

boxplot_cond <- function(variable) {
  
  p <- ggboxplot(data_merged, 
                 x = 'Diet', 
                 y = variable, 
                 fill = 'Diet', 
                 tip.length = 0.15,
                 palette = colo,
                 outlier.shape = 1,
                 lwd = 0.25,
                 outlier.size = 0.8,
                 facet.by = 'Data',
                 title = variable,
                 ylab = 'CLR(taxa proportion)') +
    
    theme(
      plot.title = element_text(size = 10), 
      axis.title = element_text(size = 8),  
      axis.text.y = element_text(size = 7),
      axis.text.x = element_blank(),
      axis.title.x = element_blank()
    ) 
  return(p)
}

# Plot all outcomes
plots <- map(diet_sensitive_taxa, boxplot_cond)

# Create a matrix of plots
assign(plotac, 
       ggarrange(plotlist = plots, ncol = 5, nrow = 8,  common.legend = TRUE)
       )

get(plotac)

CLR-transformed relative abundances of selected taxa across all 3 cohorts (Czech and Italian training cohorts and an independent Czech valdiation cohort) and across dietary groups
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 12
  )
}

7 Linear model VG duration

Next, we fit another series of linear models, this time modelling clr-transformed taxa proportions (derived from shotgun metagenomic sequencing) using the following fixed-effect predictors: duration of vegan status (Diet_duration, scaled in tens of years), Country, their interaction (Diet_duration × Country), and Age:

\[ \text{CLR(taxa proportion)} = \alpha + \beta_{1} \times \text{Country} + \beta_{2} \times \text{Diet duration} + \beta_{3} \times (\text{Country}:\text{Diet duration}) + \beta_{4} \times \text{Age} + \epsilon \]

This analysis includes only vegan participants, while omnivores are excluded. The aim was to test whether specific taxa that differ between vegans and omnivores also vary within the vegan group itself, depending on how long participants have been vegan. In other words, we asked whether long-term vegans show stronger enrichment or depletion of diet-sensitive taxa compared to those who adopted the diet more recently.

Because longer vegan duration is likely correlated with age (e.g. a 20-year-old cannot have 20 years of vegan history, unlike a 40- or 60-year-old), we also adjusted for age in the models.

7.1 Get data

7.1.1 Training

Open code
meta_trainIT <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 1)
meta_trainCZ <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 2) %>% 
  dplyr::mutate(ID = paste0('T', ID),
                Sex = SEX) %>% 
  dplyr::select(-SEX)

meta_trainIT[1:5,]
##       ID   COHORT GRP Diet_duration  Age Sex
## 1 VOV002 IT_train  OM             0 61.4   F
## 2 VOV003 IT_train  OM             0 43.7   F
## 3 VOV004 IT_train  OM             0 61.1   F
## 4 VOV006 IT_train  OM             0 31.7   F
## 5 VOV007 IT_train  OM             0 31.8   F
meta_trainCZ[1:5,]
##     ID   COHORT GRP Diet_duration      Age Sex
## 1  T97 CZ_train  OM             0 26.76438   M
## 2  T98 CZ_train  OM             0 27.61370   F
## 3 T134 CZ_train  OM             0 21.64384   F
## 4 T136 CZ_train  OM             0 31.25479   M
## 5 T137 CZ_train  OM             0 40.42192   F
data_meta_original <- bind_rows(meta_trainIT, meta_trainCZ) %>% 
  rename(`Sample` = `ID`)
  
data_meta_original[1:5,]
##   Sample   COHORT GRP Diet_duration  Age Sex
## 1 VOV002 IT_train  OM             0 61.4   F
## 2 VOV003 IT_train  OM             0 43.7   F
## 3 VOV004 IT_train  OM             0 61.1   F
## 4 VOV006 IT_train  OM             0 31.7   F
## 5 VOV007 IT_train  OM             0 31.8   F

data_microbiome_original2 <- data_microbiome_original %>% 
  left_join(data_meta_original, by = 'Sample') %>% 
  select(Sample:Diet, COHORT:Sex, everything())

data_microbiome_original2 %>% dim()
## [1] 166 307

data_microbiome_original2[
  1:4, 
  (ncol(data_microbiome_original2)-10):ncol(data_microbiome_original2)
  ]
##   GGB3288_SGB4342|SGB4342 GGB9574_SGB14987|SGB14987 Dorea_longicatena|SGB4582
## 1                3.521512                 -3.935543                 2.5467905
## 2                1.490726                 -4.410746                 0.4017447
## 3                1.916511                 -4.642201                 2.6654201
## 4               -2.628553                 -3.043946                 2.7020668
##   Eubacterium_sp_AF34_35BH|SGB5051 GGB9640_SGB15115|SGB15115
## 1                       -1.3341452                 2.5410401
## 2                       -5.0497228                -0.1233153
## 3                        0.5545071                 2.7650547
## 4                       -3.6829225                 0.1251354
##   Clostridiaceae_unclassified_SGB4771|SGB4771 GGB9766_SGB15383|SGB15383
## 1                                   0.5290295                 2.7333945
## 2                                   3.5708000                 1.4672546
## 3                                   1.4967928                 1.3702391
## 4                                   0.4902028                 0.4573094
##   Lachnospiraceae_bacterium_OM04_12BH|SGB4893 Clostridium_SGB6173|SGB6173
## 1                                    2.296050                    5.387591
## 2                                    1.496614                    4.887697
## 3                                    1.762727                   -4.964669
## 4                                   -2.970321                   -3.366413
##   GGB9616_SGB15051|SGB15051 Intestinibacter_bartlettii|SGB6140
## 1                  3.360962                           3.972138
## 2                  1.793992                           1.376463
## 3                  2.047487                          -3.402379
## 4                  3.922614                          -1.804123

data_microbiome_original2[1:4, 1:10]
##   Sample Country  Diet   COHORT GRP Diet_duration      Age Sex
## 1    T36      CZ VEGAN CZ_train  VG           3.5 31.64932   F
## 2    T47      CZ VEGAN CZ_train  VG           9.0 43.79178   M
## 3    T55      CZ VEGAN CZ_train  VG          10.0 29.83014   M
## 4    T59      CZ VEGAN CZ_train  VG           8.0 30.24658   M
##   Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## 1                     -3.911130                     2.710308
## 2                      4.081174                     3.760123
## 3                      1.860335                     3.653518
## 4                     -3.019533                     4.473163

7.1.2 Validation

Open code
data_meta_valid <- read.xlsx('gitignore/data/diet_duration_age.xlsx', sheet = 3) %>% 
  rename(`Sample` = `ID`)

data_microbiome_valid2 <- data_microbiome_validation %>% 
  left_join(data_meta_valid, by = 'Sample') %>% 
  select(Sample:Diet, COHORT:SEX, everything())

data_microbiome_valid2 %>% dim()
## [1] 103 307

data_microbiome_valid2[
  1:4, 
  (ncol(data_microbiome_valid2)-10):ncol(data_microbiome_valid2)
  ]
##   GGB3288_SGB4342|SGB4342 GGB9574_SGB14987|SGB14987 Dorea_longicatena|SGB4582
## 1               -4.647459                 -4.312100                  1.533233
## 2               -4.218210                 -3.882851                 -1.275541
## 3               -3.953490                 -3.618131                  3.095991
## 4               -3.946492                 -3.611132                 -3.283779
##   Eubacterium_sp_AF34_35BH|SGB5051 GGB9640_SGB15115|SGB15115
## 1                         2.198076                 -2.747402
## 2                        -2.205050                  2.989142
## 3                        -1.940330                 -2.053433
## 4                        -1.933331                  1.229879
##   Clostridiaceae_unclassified_SGB4771|SGB4771 GGB9766_SGB15383|SGB15383
## 1                                    1.729338                 0.1257596
## 2                                    2.180441                 1.5688877
## 3                                   -2.451381                -2.0461635
## 4                                    2.165251                -2.0391650
##   Lachnospiraceae_bacterium_OM04_12BH|SGB4893 Clostridium_SGB6173|SGB6173
## 1                                  -2.0432565                   0.1535293
## 2                                   0.8699953                  -4.6655362
## 3                                  -4.3964080                   2.6017067
## 4                                  -4.3894096                  -4.3938173
##   GGB9616_SGB15051|SGB15051 Intestinibacter_bartlettii|SGB6140
## 1                  1.514270                         -0.6887873
## 2                  2.895254                         -3.1328092
## 3                 -1.555433                          2.2670364
## 4                 -1.548435                         -2.8610903

data_microbiome_valid2[1:4, 1:10]
##   Sample  Data  Diet COHORT GRP Diet_duration      Age SEX
## 1     K4 valid VEGAN CZ_val  VN          <NA> 30.98904   M
## 2     K5 valid VEGAN CZ_val  VN          <NA> 30.66575   F
## 3     K7 valid VEGAN CZ_val  VN          <NA> 32.97808   F
## 4    K12 valid VEGAN CZ_val  VN          <NA> 27.89315   M
##   Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## 1                      2.108233                    3.9183436
## 2                      1.841974                    5.3157455
## 3                      3.812640                    5.3463451
## 4                      3.638903                   -0.3696454

data_microbiome_valid2 %>% select(Sample, Diet, GRP)
##     Sample  Diet GRP
## 1       K4 VEGAN  VN
## 2       K5 VEGAN  VN
## 3       K7 VEGAN  VN
## 4      K12 VEGAN  VN
## 5      K13 VEGAN  VN
## 6      K15 VEGAN  VN
## 7      K16 VEGAN  VN
## 8      K18 VEGAN  VN
## 9      K19 VEGAN  VN
## 10     K25 VEGAN  VN
## 11     K26 VEGAN  VN
## 12     K31 VEGAN  VN
## 13     K32 VEGAN  VN
## 14     K34 VEGAN  VN
## 15     K35 VEGAN  VN
## 16     K38 VEGAN  VN
## 17     K39 VEGAN  VN
## 18     K42  OMNI  OM
## 19     K43  OMNI  OM
## 20     K45 VEGAN  VN
## 21     K46 VEGAN  VN
## 22     K48 VEGAN  VN
## 23     K49 VEGAN  VN
## 24     K55 VEGAN  VN
## 25     K56 VEGAN  VN
## 26     K61 VEGAN  VN
## 27     K62 VEGAN  VN
## 28     K64 VEGAN  VN
## 29     K65 VEGAN  VN
## 30     K73 VEGAN  VN
## 31     K74 VEGAN  VN
## 32     K82 VEGAN  VN
## 33     K85 VEGAN  VN
## 34    K103 VEGAN  VN
## 35    K106 VEGAN  VN
## 36    K107 VEGAN  VN
## 37    K114 VEGAN  VN
## 38    K117 VEGAN  VN
## 39    K119 VEGAN  VN
## 40    K120 VEGAN  VN
## 41    K123  OMNI  OM
## 42    K124  OMNI  OM
## 43    K126  OMNI  OM
## 44    K127  OMNI  OM
## 45    K130  OMNI  OM
## 46    K131  OMNI  OM
## 47    K143  OMNI  OM
## 48    K144  <NA>  OM
## 49    K150  OMNI  OM
## 50    K151  OMNI  OM
## 51    K154  OMNI  OM
## 52    K155  OMNI  OM
## 53    K175  OMNI  OM
## 54    K176  OMNI  OM
## 55    K178  OMNI  OM
## 56    K179  OMNI  OM
## 57    K182  OMNI  OM
## 58    K183  OMNI  OM
## 59    K198  OMNI  OM
## 60    K199  OMNI  OM
## 61    K201  OMNI  OM
## 62    K202  OMNI  OM
## 63    K205 VEGAN  VN
## 64    K206 VEGAN  VN
## 65    K209  OMNI  OM
## 66    K210  OMNI  OM
## 67    K216 VEGAN  VN
## 68    K217 VEGAN  VN
## 69    K219 VEGAN  VN
## 70    K220 VEGAN  VN
## 71    K222  OMNI  OM
## 72    K223  OMNI  OM
## 73    K226 VEGAN  VN
## 74    K227 VEGAN  VN
## 75    K230  OMNI  OM
## 76    K231  OMNI  OM
## 77    K234  OMNI  OM
## 78    K235  OMNI  OM
## 79    K238  OMNI  OM
## 80    K239  OMNI  OM
## 81    K250 VEGAN  VN
## 82    K251 VEGAN  VN
## 83    K253  OMNI  OM
## 84    K254  OMNI  OM
## 85    K258 VEGAN  VN
## 86    K260 VEGAN  VN
## 87    K261 VEGAN  VN
## 88    K263 VEGAN  VN
## 89    K264 VEGAN  VN
## 90    K266  OMNI  OM
## 91    K267  OMNI  OM
## 92    K270  OMNI  OM
## 93    K281 VEGAN  VN
## 94    K284 VEGAN  VN
## 95    K285 VEGAN  VN
## 96    K289 VEGAN  VN
## 97    K291 VEGAN  VN
## 98    K292 VEGAN  VN
## 99    K298  OMNI  OM
## 100   K299  OMNI  OM
## 101   K318  OMNI  OM
## 102   K329  OMNI  OM
## 103   K330  OMNI  OM

data_microbiome_valid2 %>% select(Sample, Diet, GRP, Diet_duration) %>% 
  filter(
    (Diet == 'VEGAN' & GRP == 'OM') | (Diet == 'OMNI' & GRP == 'VN')
  )
## [1] Sample        Diet          GRP           Diet_duration
## <0 rows> (or 0-length row.names)

7.2 Training cohort

7.2.1 Select data

Open code
data_analysis <- data_microbiome_original2 %>%
    mutate(Country_IT = as.numeric(
      dplyr::if_else(
        Country == "IT", 0.5, -0.5
      )
    )
  ) %>%
  filter(Diet == 'VEGAN',
         !is.na(Diet_duration)) %>% 
  select(1:8, Country_IT, all_of(diet_sensitive_taxa))

summary(data_analysis[ , 1:12])
##     Sample            Country              Diet              COHORT         
##  Length:96          Length:96          Length:96          Length:96         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      GRP            Diet_duration         Age            Sex           
##  Length:96          Min.   : 0.600   Min.   :18.25   Length:96         
##  Class :character   1st Qu.: 3.150   1st Qu.:27.57   Class :character  
##  Mode  :character   Median : 5.000   Median :32.15   Mode  :character  
##                     Mean   : 5.492   Mean   :34.97                     
##                     3rd Qu.: 6.000   3rd Qu.:40.77                     
##                     Max.   :20.000   Max.   :60.70                     
##    Country_IT      Escherichia_coli|SGB10068 Bacteroides_clarus|SGB1832
##  Min.   :-0.5000   Min.   :-3.9601           Min.   :-6.302            
##  1st Qu.:-0.5000   1st Qu.:-2.9027           1st Qu.:-4.122            
##  Median :-0.5000   Median : 0.3207           Median :-3.449            
##  Mean   :-0.1042   Mean   : 0.1300           Mean   :-2.297            
##  3rd Qu.: 0.5000   3rd Qu.: 2.4967           3rd Qu.:-2.439            
##  Max.   : 0.5000   Max.   : 7.4579           Max.   : 7.912            
##  Hydrogenoanaerobacterium_saccharovorans|SGB15350
##  Min.   :-4.1492                                 
##  1st Qu.:-2.7727                                 
##  Median : 1.5574                                 
##  Mean   : 0.6297                                 
##  3rd Qu.: 3.1040                                 
##  Max.   : 6.0125

7.2.2 Define number of microbiome and covariates

Open code
n_covarites <- 9
n_features <- ncol(data_analysis) - n_covarites

7.2.3 Create empty objects

Open code
outcome <- vector('double', n_features)

est_Diet_duration_inCZ <- vector('double', n_features)
est_Diet_duration_inIT <- vector('double', n_features)
est_Diet_duration_avg <- vector('double', n_features)

est_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)

est_Age <- vector('double', n_features)

P_Diet_duration_inCZ <- vector('double', n_features)
P_Diet_duration_inIT <- vector('double', n_features)
P_Diet_duration_avg <- vector('double', n_features)

P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)

P_Age <- vector('double', n_features)

CI_L_Diet_duration_inCZ <- vector('double', n_features)
CI_L_Diet_duration_inIT <- vector('double', n_features)
CI_L_Diet_duration_avg <- vector('double', n_features)

CI_U_Diet_duration_inCZ <- vector('double', n_features)
CI_U_Diet_duration_inIT <- vector('double', n_features)
CI_U_Diet_duration_avg <- vector('double', n_features)

7.2.4 Estimate over outcomes

Open code
if(file.exists('gitignore/lm_results/result_microbiome_SGB30_VGdur_training.Rds') == FALSE){
for (i in 1:n_features) {
  
  ## define variable
  data_analysis$outcome <- data_analysis[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Country_IT * Diet_duration + Age, 
              data = data_analysis %>% 
                mutate(Diet_duration = (Diet_duration/10),
                       Age = (Age-33)/10))

  ## get contrast (effects of diet BY COUNTRY)
  contrast_emm <- summary(
    pairs(
      emmeans(
        model,
        specs = ~ Diet_duration | Country_IT,
        at = list(Diet_duration = c(0, 1)) 
        ),
      interaction = TRUE,
      adjust = "none"
      ),
    infer = c(TRUE, TRUE)
    )

  ## save results
  outcome[i] <- names(data_analysis)[i + n_covarites]
  
  ## country and Age effect
  est_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 1
  ]

  P_ITcountry_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT"
    ), 4
  ]
  
  est_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 1
  ]

  P_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 4
  ]
  
  ## diet effect
  tr <- confint(model)
  
  CI_L_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][1]
  CI_U_Diet_duration_avg[i] <- tr[which(row.names(tr) == 'Diet_duration'),][2]
  
  est_Diet_duration_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 1
  ]

  P_Diet_duration_avg[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 4
  ]
  
  est_Diet_duration_inCZ[i] <- -contrast_emm[1,3]
  P_Diet_duration_inCZ[i] <- contrast_emm$p.value[1]
  CI_L_Diet_duration_inCZ[i] <- -contrast_emm$upper.CL[1]
  CI_U_Diet_duration_inCZ[i] <- -contrast_emm$lower.CL[1]
  
  est_Diet_duration_inIT[i] <- -contrast_emm[2,3]
  P_Diet_duration_inIT[i] <- contrast_emm$p.value[2]
  CI_L_Diet_duration_inIT[i] <- -contrast_emm$upper.CL[2]
  CI_U_Diet_duration_inIT[i] <- -contrast_emm$lower.CL[2]
  
  ## interaction
  diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_duration"
    ), 1
  ]

  P_diet_country_int[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Country_IT:Diet_duration"
    ), 4
  ]
}

  result_microbiome <- data.frame(
  outcome,
  est_ITcountry_avg, P_ITcountry_avg,
  est_Age, P_Age,
  est_Diet_duration_avg, P_Diet_duration_avg,
  est_Diet_duration_inCZ, P_Diet_duration_inCZ,
  est_Diet_duration_inIT, P_Diet_duration_inIT,
  diet_country_int, P_diet_country_int,
  CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
  CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
  CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
)
  
  saveRDS(result_microbiome, 
              'gitignore/lm_results/result_microbiome_SGB30_VGdur_training.Rds')
}

result_microbiome <- readRDS('gitignore/lm_results/result_microbiome_SGB30_VGdur_training.Rds')

7.2.5 Show and save results

Open code
kableExtra::kable(result_microbiome %>% 
  dplyr::select(
    outcome,
    est_ITcountry_avg, P_ITcountry_avg,
    est_Age, P_Age,
    est_Diet_duration_avg, P_Diet_duration_avg,
    est_Diet_duration_inCZ, P_Diet_duration_inCZ,
    est_Diet_duration_inIT, P_Diet_duration_inIT,
    diet_country_int, P_diet_country_int,
    CI_L_Diet_duration_avg, CI_U_Diet_duration_avg,
    CI_L_Diet_duration_inCZ, CI_U_Diet_duration_inCZ,
    CI_L_Diet_duration_inIT, CI_U_Diet_duration_inIT
  ) %>% arrange(P_Diet_duration_avg),
  caption = "Results of linear models, modelling CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome, with vegan status duration (`Diet_duration`), `Country`, their interaction (`Diet_duration × Country`), and `Age` as fixed-effect predictors, using training data only (Czech and Italian vegan cohorts). Only taxa with CLR-transformed relative abundances differing by diet (FDR < 0.05, average effect across both countries) were included. `est`: denotes estimated effects (regression coefficients), i.e. expected change in clr-transformed relative abundnace per 10 years of vegan diet or age, or for country (Italy vs Czechia). `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. `avg` suffix: effect averaged across cohorts, whereas `inCZ` and `inIT` indicate effects in the Czech or Italian cohort, respectively. Interaction effects are reported as `diet_country_int` (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and `P_diet_country_int` (its p-value). All estimates in a single row are based on a single model with interaction",
  escape = FALSE
)
Results of linear models, modelling CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome, with vegan status duration (Diet_duration), Country, their interaction (Diet_duration × Country), and Age as fixed-effect predictors, using training data only (Czech and Italian vegan cohorts). Only taxa with CLR-transformed relative abundances differing by diet (FDR < 0.05, average effect across both countries) were included. est: denotes estimated effects (regression coefficients), i.e. expected change in clr-transformed relative abundnace per 10 years of vegan diet or age, or for country (Italy vs Czechia). P: p-value; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. avg suffix: effect averaged across cohorts, whereas inCZ and inIT indicate effects in the Czech or Italian cohort, respectively. Interaction effects are reported as diet_country_int (difference in the effect of vegan diet duration between Italian and Czech cohorts; positive values indicate a stronger effect in Italian, negative values a stronger effect in Czech cohort) and P_diet_country_int (its p-value). All estimates in a single row are based on a single model with interaction
outcome est_ITcountry_avg P_ITcountry_avg est_Age P_Age est_Diet_duration_avg P_Diet_duration_avg est_Diet_duration_inCZ P_Diet_duration_inCZ est_Diet_duration_inIT P_Diet_duration_inIT diet_country_int P_diet_country_int CI_L_Diet_duration_avg CI_U_Diet_duration_avg CI_L_Diet_duration_inCZ CI_U_Diet_duration_inCZ CI_L_Diet_duration_inIT CI_U_Diet_duration_inIT
Eubacterium_ramulus|SGB4959 -2.5175062 0.0086078 -0.0606638 0.8167213 2.2037084 0.0132447 1.3853024 0.1171102 3.0221144 0.0334956 1.6368120 0.2945036 0.4711830 3.9362339 -0.3540560 3.1246608 0.2413958 5.8028331
Youxingia_wuxianensis|SGB82503 1.0022451 0.2553363 0.7442438 0.0029686 -1.7165515 0.0378508 -0.9837263 0.2321514 -2.4493767 0.0642244 -1.4656504 0.3147611 -3.3346763 -0.0984266 -2.6082329 0.6407803 -5.0464813 0.1477280
Bacteroides_clarus|SGB1832 -1.5444133 0.2137562 0.0739828 0.8299100 2.3757015 0.0412789 -0.7634385 0.5092538 5.5148415 0.0035456 6.2782801 0.0027909 0.0960579 4.6553451 -3.0520728 1.5251958 1.8559935 9.1736896
GGB9531_SGB14932|SGB14932 -0.3439895 0.6726253 -0.1827125 0.4207511 1.2089138 0.1127821 0.1825335 0.8102317 2.2352941 0.0683331 2.0527606 0.1300695 -0.2907409 2.7085685 -1.3230357 1.6881027 -0.1716650 4.6422531
Blautia_obeum|SGB4810 -1.1168792 0.2948135 -0.7894953 0.0088519 1.3836493 0.1640112 1.7209547 0.0855576 1.0463438 0.5102474 -0.6746109 0.7016287 -0.5752914 3.3425899 -0.2457119 3.6876213 -2.0977732 4.1904609
Roseburia_lenta|SGB4957 -1.5866949 0.0991356 -0.2379291 0.3718854 1.1766007 0.1875325 1.7916602 0.0469512 0.5615412 0.6938721 -1.2301191 0.4374128 -0.5834517 2.9366530 0.0246664 3.5586541 -2.2633584 3.3864407
Phocea_massiliensis|SGB14837 -0.1082940 0.8799690 -0.1570641 0.4322242 0.8805241 0.1890172 0.1651280 0.8053045 1.5959201 0.1385152 1.4307921 0.2300992 -0.4411101 2.2021582 -1.1617186 1.4919746 -0.5253145 3.7171546
Agathobaculum_butyriciproducens|SGB14991 -1.4347503 0.3163834 -0.5179469 0.1947225 1.4617258 0.2728519 -0.1056128 0.9368928 3.0290645 0.1577653 3.1346773 0.1871032 -1.1702135 4.0936652 -2.7479324 2.5367067 -1.1952214 7.2533505
GGB58158_SGB79798|SGB79798 0.5795715 0.4367189 -0.2467789 0.2352844 -0.6992341 0.3137514 -0.5112545 0.4625641 -0.8872138 0.4253240 -0.3759593 0.7602957 -2.0703675 0.6718993 -1.8877956 0.8652866 -3.0878950 1.3134675
Streptococcus_thermophilus|SGB8002 -0.2730919 0.7494862 -0.0954289 0.6886222 0.7451119 0.3500872 0.6996705 0.3819823 0.7905532 0.5362238 0.0908827 0.9488201 -0.8306859 2.3209096 -0.8823421 2.2816831 -1.7386161 3.3197225
GGB3288_SGB4342|SGB4342 -1.7637274 0.1023633 0.8325136 0.0062810 0.8593587 0.3897714 0.4327941 0.6656752 1.2859234 0.4225326 0.8531293 0.6309620 -1.1159790 2.8346965 -1.5503343 2.4159224 -1.8845112 4.4563579
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 -1.9393019 0.0943426 0.2585104 0.4203690 0.9207256 0.3905816 0.1859236 0.8626297 1.6555275 0.3363818 1.4696039 0.4411552 -1.1993066 3.0407577 -1.9424698 2.3143171 -1.7471429 5.0581979
Holdemania_filiformis|SGB4046 -1.0429956 0.3022942 0.2479234 0.3780889 0.8035828 0.3925567 -0.1483020 0.8748696 1.7554676 0.2453477 1.9037696 0.2558393 -1.0544847 2.6616504 -2.0136976 1.7170937 -1.2267472 4.7376824
GGB9616_SGB15051|SGB15051 -1.2688777 0.3124855 0.7172333 0.0420483 -0.9573359 0.4123114 -1.8425605 0.1178141 -0.0721113 0.9692515 1.7704492 0.3943728 -3.2662170 1.3515451 -4.1605477 0.4754266 -3.7778858 3.6336632
Clostridium_fessum|SGB4705 -1.4481563 0.1840616 0.1136871 0.7067272 0.8221975 0.4161731 0.8556781 0.3993688 0.7887170 0.6265918 -0.0669611 0.9702681 -1.1772935 2.8216885 -1.1516988 2.8630549 -2.4204837 3.9979176
Oscillibacter_valericigenes|SGB15076 1.3696332 0.1290475 -0.1987828 0.4266397 -0.6233810 0.4555874 -0.4613657 0.5820240 -0.7853962 0.5578465 -0.3240305 0.8272665 -2.2758700 1.0291080 -2.1203720 1.1976406 -3.4376556 1.8668633
GGB9574_SGB14987|SGB14987 2.9325371 0.0059197 -0.3303480 0.2571013 -0.7151715 0.4619429 0.9691408 0.3213127 -2.3994838 0.1259754 -3.3686246 0.0536394 -2.6380763 1.2077332 -0.9613477 2.8996293 -5.4857629 0.6867953
GGB9509_SGB14906|SGB14906 0.1274415 0.8837364 -0.0536453 0.8250255 -0.5942356 0.4642694 -1.3836615 0.0916902 0.1951903 0.8807757 1.5788518 0.2754943 -2.2003334 1.0118622 -2.9960937 0.2287707 -2.3826109 2.7729915
GGB6649_SGB9391|SGB9391 -0.5487023 0.5683906 0.6674328 0.0141599 0.6257762 0.4846128 1.0000263 0.2669072 0.2515262 0.8608850 -0.7485001 0.6383350 -1.1454956 2.3970480 -0.7782313 2.7782839 -2.5913807 3.0944331
Hydrogenoanaerobacterium_saccharovorans|SGB15350 1.9306349 0.1284094 -0.2616931 0.4569638 -0.7624752 0.5164885 -0.3889152 0.7414750 -1.1360351 0.5469424 -0.7471199 0.7207528 -3.0878896 1.5629393 -2.7235009 1.9456705 -4.8683459 2.5962756
GGB9770_SGB15390|SGB15390 0.8601455 0.3151323 -0.2405888 0.3128854 0.4894934 0.5382333 0.0096781 0.9903186 0.9693086 0.4478753 0.9596305 0.4979073 -1.0842811 2.0632678 -1.5703032 1.5896594 -1.5566132 3.4952305
GGB9765_SGB15382|SGB15382 0.3850034 0.6095156 0.0255118 0.9031750 -0.4302221 0.5397052 -0.5945364 0.3990217 -0.2659077 0.8131471 0.3286286 0.7922321 -1.8184684 0.9580242 -1.9882578 0.7991851 -2.4940553 1.9622398
Ruthenibacterium_lactatiformans|SGB15271 1.9157625 0.0287860 -0.3245451 0.1797661 0.4904124 0.5425467 0.5762602 0.4761631 0.4045645 0.7541026 -0.1716957 0.9045595 -1.1032342 2.0840590 -1.0236716 2.1761921 -2.1532524 2.9623814
Lachnospiraceae_bacterium|SGB4953 -1.5384905 0.2273424 -0.0204087 0.9539446 -0.6881779 0.5604316 -0.9798906 0.4093972 -0.3964652 0.8343328 0.5834253 0.7813856 -3.0275018 1.6511460 -3.3284405 1.3686594 -4.1511007 3.3581702
GGB51884_SGB49168|SGB49168 -1.3975208 0.1376682 0.3938725 0.1329399 0.5026362 0.5640386 -0.6400748 0.4646012 1.6453473 0.2407513 2.2854221 0.1425799 -1.2218437 2.2271162 -2.3713560 1.0912064 -1.1224583 4.4131528
Escherichia_coli|SGB10068 -0.9743327 0.4408773 -0.0785194 0.8232017 0.6451714 0.5830299 -0.8037128 0.4959521 2.0940557 0.2681592 2.8977685 0.1678368 -1.6809816 2.9713245 -3.1390400 1.5316144 -1.6394405 5.8275519
GGB9616_SGB15052|SGB15052 1.2429148 0.2793412 -0.4548368 0.1560029 -0.5710696 0.5922799 0.0193215 0.9855891 -1.1614608 0.4975851 -1.1807823 0.5339685 -2.6817664 1.5396272 -2.0996998 2.1383427 -4.5491478 2.2262263
GGB45432_SGB63101|SGB63101 0.6894832 0.3156702 0.2357121 0.2185483 0.3343624 0.6002401 0.3228673 0.6142055 0.3458574 0.7354482 0.0229900 0.9838359 -0.9285686 1.5972934 -0.9450446 1.5907793 -1.6811580 2.3728728
Clostridium_SGB48024|SGB48024 0.2171189 0.8801367 -0.2661515 0.5072019 0.6905252 0.6064651 0.5296547 0.6938131 0.8513957 0.6922241 0.3217409 0.8926571 -1.9629599 3.3440103 -2.1342955 3.1936050 -3.4074714 5.1102627
Lachnospiraceae_bacterium_AM48_27BH|SGB4706 -1.3455879 0.1717859 0.1539243 0.5728364 -0.3108993 0.7331069 0.0375800 0.9672409 -0.6593787 0.6523591 -0.6969587 0.6676280 -2.1164106 1.4946120 -1.7750521 1.8502121 -3.5572403 2.2384829
Anaeromassilibacillus_senegalensis|SGB14894 1.6877504 0.0686701 -0.2505919 0.3284149 -0.2854188 0.7384825 0.0306524 0.9715015 -0.6014900 0.6611912 -0.6321424 0.6778665 -1.9783857 1.4075482 -1.6689915 1.7302963 -3.3187169 2.1157369
Ruminococcus_torques|SGB4608 -1.8237641 0.1424335 -0.0285497 0.9338786 0.3375568 0.7691653 -0.0817489 0.9435506 0.7568624 0.6818961 0.8386113 0.6821488 -1.9404398 2.6155533 -2.3687297 2.2052319 -2.8993422 4.4130670
Clostridiaceae_unclassified_SGB4771|SGB4771 -1.9260704 0.1449088 0.4819145 0.1896637 0.3574092 0.7699899 -1.3152470 0.2852400 2.0300655 0.3021035 3.3453125 0.1264964 -2.0634645 2.7782830 -3.7456685 1.1151745 -1.8554582 5.9155892
Blautia_massiliensis|SGB4826 -4.7208984 0.0022978 -0.5028800 0.2331091 0.1197294 0.9320368 -0.9744908 0.4898790 1.2139496 0.5903534 2.1884404 0.3821309 -2.6612685 2.9007273 -3.7664568 1.8174751 -3.2495766 5.6774758
Lawsonibacter_asaccharolyticus|SGB15154 4.0489822 0.0005465 -0.3465494 0.2734601 0.0895930 0.9322623 0.6779310 0.5222179 -0.4987450 0.7681896 -1.1766759 0.5309623 -1.9983612 2.1775472 -1.4182580 2.7741199 -3.8499299 2.8524400
Ruminococcus_sp_AF41_9|SGB25497 -2.3465858 0.0611050 0.0860370 0.8033638 -0.0865262 0.9402601 -0.4673026 0.6869627 0.2942501 0.8738413 0.7615527 0.7110415 -2.3735930 2.2005406 -2.7633894 1.8287842 -3.3765123 3.9650125
GGB2653_SGB3574|SGB3574 1.2135554 0.2438004 -0.4592590 0.1142516 -0.0614921 0.9491968 0.4343717 0.6541055 -0.5573558 0.7190775 -0.9917275 0.5640635 -1.9732718 1.8502877 -1.4849480 2.3536913 -3.6257792 2.5110676
GGB9707_SGB15229|SGB15229 0.3590995 0.6760163 0.1205664 0.6143524 -0.0287736 0.9712761 -0.3127869 0.6967372 0.2552398 0.8422714 0.5680267 0.6897419 -1.6117094 1.5541622 -1.9019657 1.2763919 -2.2853862 2.7958657
Enterocloster_hominis|SGB4721 -0.1320591 0.8789731 0.0331574 0.8907714 -0.0233820 0.9768810 -0.1075774 0.8943511 0.0608135 0.9625450 0.1683909 0.9066593 -1.6216844 1.5749205 -1.7121834 1.4970286 -2.5044760 2.6261030
Open code

if(file.exists('gitignore/lm_results/result_microbiome_SGB30_VGdur_training.csv') == FALSE){
  write.table(result_microbiome, 
              'gitignore/lm_results/result_microbiome_SGB30_VGdur_training.csv', row.names = FALSE)
  }

7.3 Validation cohort

Open code
data_analysis_microbiome <- data_microbiome_valid2 %>%
  filter(Diet == 'VEGAN',
         !is.na(Diet_duration)) %>%
  dplyr::mutate(Diet_duration = as.numeric(as.character(Diet_duration))) %>% 
  dplyr::select(
    Diet_duration, Age,
    all_of(diet_sensitive_taxa)
  )

summary(data_analysis_microbiome[ , 1:12])
##  Diet_duration         Age        Escherichia_coli|SGB10068
##  Min.   : 0.170   Min.   :21.54   Min.   :-3.8524          
##  1st Qu.: 4.062   1st Qu.:30.05   1st Qu.:-3.2960          
##  Median : 6.000   Median :32.81   Median :-2.7510          
##  Mean   : 6.790   Mean   :32.37   Mean   :-1.6863          
##  3rd Qu.:10.000   3rd Qu.:34.24   3rd Qu.:-0.3101          
##  Max.   :15.000   Max.   :44.08   Max.   : 4.5859          
##  Bacteroides_clarus|SGB1832 Hydrogenoanaerobacterium_saccharovorans|SGB15350
##  Min.   :-2.1911            Min.   :-1.6474                                 
##  1st Qu.:-1.7408            1st Qu.:-0.1372                                 
##  Median :-1.3999            Median : 0.7572                                 
##  Mean   :-0.9279            Mean   : 0.6008                                 
##  3rd Qu.:-0.6137            3rd Qu.: 1.2692                                 
##  Max.   : 3.9937            Max.   : 3.4472                                 
##  GGB6649_SGB9391|SGB9391 Ruthenibacterium_lactatiformans|SGB15271
##  Min.   :-3.4825         Min.   :-3.3057                         
##  1st Qu.:-2.6367         1st Qu.:-1.8476                         
##  Median :-1.3468         Median :-0.3574                         
##  Mean   :-0.9723         Mean   :-0.4243                         
##  3rd Qu.: 0.6831         3rd Qu.: 0.6305                         
##  Max.   : 2.2045         Max.   : 6.3991                         
##  Lawsonibacter_asaccharolyticus|SGB15154 Clostridium_fessum|SGB4705
##  Min.   :-4.9579                         Min.   :-0.2847           
##  1st Qu.:-3.0046                         1st Qu.: 2.0317           
##  Median :-0.5985                         Median : 2.7407           
##  Mean   :-1.0778                         Mean   : 2.5503           
##  3rd Qu.: 0.8116                         3rd Qu.: 3.2138           
##  Max.   : 2.5010                         Max.   : 6.0140           
##  GGB9707_SGB15229|SGB15229 Lachnospiraceae_bacterium_AM48_27BH|SGB4706
##  Min.   :-4.57266          Min.   :-1.2222                            
##  1st Qu.:-2.51742          1st Qu.: 0.9084                            
##  Median : 1.21365          Median : 1.5063                            
##  Mean   : 0.09053          Mean   : 1.3817                            
##  3rd Qu.: 2.03840          3rd Qu.: 1.8131                            
##  Max.   : 5.20551          Max.   : 3.5986                            
##  GGB9509_SGB14906|SGB14906
##  Min.   :-3.269           
##  1st Qu.:-2.770           
##  Median :-2.334           
##  Mean   :-1.951           
##  3rd Qu.:-1.239           
##  Max.   : 1.554

7.3.0.1 Define number of microbiome and covariates

Open code
n_covarites <- 2
n_features <- ncol(data_analysis_microbiome) - n_covarites

7.3.0.2 Create empty objects

Open code
outcome <- vector('double', n_features)
est_VGduration <- vector('double', n_features)
P_VGduration <- vector('double', n_features)
CI_L_VGduration <- vector('double', n_features)
CI_U_VGduration <- vector('double', n_features)

7.3.0.3 Estimate over outcomes

Open code
for (i in 1:n_features) {
  ## define variable
  data_analysis_microbiome$outcome <- data_analysis_microbiome[, (i + n_covarites)]

  ## fit model
  model <- lm(outcome ~ Diet_duration + Age, 
              data = data_analysis_microbiome %>% 
                mutate(Diet_duration = (Diet_duration)/10,
                       Age = (Age-33)/10))

  ## save results
  outcome[i] <- names(data_analysis_microbiome)[i + n_covarites]

  ## diet effect
  tr <- confint(model)

  CI_L_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][1]
  CI_U_VGduration[i] <- tr[which(row.names(tr) == "Diet_duration"), ][2]

  est_VGduration[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 1
  ]

  P_VGduration[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Diet_duration"
    ), 4
  ]
  
  est_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 1
  ]

  P_Age[i] <- summary(model)$coefficients[
    which(
      names(model$coefficients) == "Age"
    ), 4
  ]  
}

7.3.0.4 Results table

Open code
result_microbiome_val <- data.frame(
  outcome,
  est_VGduration, P_VGduration,
  CI_L_VGduration, CI_U_VGduration,
  est_Age, P_Age
)

kableExtra::kable(result_microbiome_val %>% arrange(P_VGduration),
  caption = "Results of linear models estimating the effect of vegan diet status duration on CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome. Only taxa with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. `est`: regression coefficient, i.e. expected change in CLR-transformed relative abundance per +10 years of vegan diet duration and age, respectively. `P`: p-value; `CI_L` and `CI_U`: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model"
)
Results of linear models estimating the effect of vegan diet status duration on CLR-transformed relative abundances of taxa (inferred from shotgun metagenomic sequencing) as the outcome. Only taxa with significant differences between vegans and omnivores in training cohorts (FDR < 0.05, average effect over both countries) were included. est: regression coefficient, i.e. expected change in CLR-transformed relative abundance per +10 years of vegan diet duration and age, respectively. P: p-value; CI_L and CI_U: lower and upper bounds of the 95% confidence interval. All estimates in a single row are based on a single model
outcome est_VGduration P_VGduration CI_L_VGduration CI_U_VGduration est_Age P_Age
Bacteroides_clarus|SGB1832 -1.3786805 0.0227758 -2.5573389 -0.2000221 0.1687836 0.7358085
Roseburia_lenta|SGB4957 2.5454335 0.0541621 -0.0472436 5.1381107 -1.0798158 0.3284438
Clostridium_fessum|SGB4705 -1.0407376 0.0805208 -2.2124951 0.1310200 0.2250477 0.6510305
Lachnospiraceae_bacterium_OM04_12BH|SGB4893 1.7936186 0.0818178 -0.2346948 3.8219320 -1.5217930 0.0814501
GGB9574_SGB14987|SGB14987 -1.0027985 0.1063705 -2.2275285 0.2219314 -0.3432288 0.5097171
Ruminococcus_sp_AF41_9|SGB25497 -1.8699188 0.1481967 -4.4268409 0.6870034 1.4200471 0.1941426
Blautia_obeum|SGB4810 -1.4000327 0.1709467 -3.4239389 0.6238734 1.2841260 0.1389656
Phocea_massiliensis|SGB14837 -1.0701246 0.1711432 -2.6178350 0.4775858 0.1873452 0.7754469
Lachnospiraceae_bacterium_AM48_27BH|SGB4706 -0.5746767 0.1732985 -1.4100898 0.2607365 0.5744697 0.1094495
Oscillibacter_valericigenes|SGB15076 -0.5337953 0.2033099 -1.3653465 0.2977558 -0.0413347 0.9067171
Clostridium_SGB48024|SGB48024 0.9646814 0.3223077 -0.9731131 2.9024759 -0.0767947 0.9255659
Holdemania_filiformis|SGB4046 -0.5553577 0.3715054 -1.7919613 0.6812458 0.1293362 0.8052947
GGB3288_SGB4342|SGB4342 0.7153222 0.4455226 -1.1524687 2.5831131 0.8660687 0.2771325
Ruthenibacterium_lactatiformans|SGB15271 0.6679235 0.4595417 -1.1313562 2.4672033 -0.1650553 0.8288189
Blautia_massiliensis|SGB4826 -0.7122384 0.4602138 -2.6337861 1.2093094 -0.4223935 0.6047744
Lawsonibacter_asaccharolyticus|SGB15154 0.7182570 0.4860113 -1.3364840 2.7729979 -1.2183773 0.1661446
Hydrogenoanaerobacterium_saccharovorans|SGB15350 0.3133473 0.4926696 -0.5969685 1.2236631 -0.8623585 0.0291799
Lachnospiraceae_bacterium|SGB4953 -1.0714136 0.5020708 -4.2531166 2.1102895 0.8484183 0.5303788
Anaeromassilibacillus_senegalensis|SGB14894 0.5567021 0.5155737 -1.1502601 2.2636644 -0.2668066 0.7126884
GGB9616_SGB15052|SGB15052 0.5523505 0.5171969 -1.1478627 2.2525637 -1.3758725 0.0608253
GGB6649_SGB9391|SGB9391 0.5181804 0.5191072 -1.0841812 2.1205420 0.4120110 0.5451184
Agathobaculum_butyriciproducens|SGB14991 -0.6360667 0.5447521 -2.7303886 1.4582552 1.0255316 0.2514126
Streptococcus_thermophilus|SGB8002 0.4532587 0.5878069 -1.2149387 2.1214562 -1.2390842 0.0844543
Clostridiaceae_unclassified_SGB4771|SGB4771 0.4795630 0.5898911 -1.2953741 2.2545001 -0.2329909 0.7570774
GGB2653_SGB3574|SGB3574 0.4311344 0.5961702 -1.1919745 2.0542434 0.5449960 0.4300171
Youxingia_wuxianensis|SGB82503 -0.2847367 0.6907907 -1.7136416 1.1441682 -0.3091343 0.6104829
Escherichia_coli|SGB10068 0.3938668 0.6970811 -1.6260111 2.4137448 0.2178192 0.7993791
GGB9770_SGB15390|SGB15390 0.2993452 0.7041724 -1.2745428 1.8732332 -0.1609072 0.8095900
GGB51884_SGB49168|SGB49168 0.2531832 0.7096334 -1.1042940 1.6106605 -0.0402421 0.9442836
GGB9707_SGB15229|SGB15229 -0.3734095 0.7430763 -2.6480569 1.9012379 1.5778412 0.1064571
GGB45432_SGB63101|SGB63101 -0.2969978 0.7559417 -2.2050470 1.6110514 0.7055560 0.3851098
Eubacterium_ramulus|SGB4959 0.1723507 0.7900212 -1.1202429 1.4649442 0.1560246 0.7760604
GGB9765_SGB15382|SGB15382 0.2354550 0.8040675 -1.6600181 2.1309281 0.4150256 0.6061898
GGB9531_SGB14932|SGB14932 -0.1410040 0.8232285 -1.4016601 1.1196521 -0.2605440 0.6264921
GGB9509_SGB14906|SGB14906 0.0949757 0.8511105 -0.9157040 1.1056554 -0.6725130 0.1211102
Ruminococcus_torques|SGB4608 -0.1512453 0.8912781 -2.3615853 2.0590947 -0.5412532 0.5643781
GGB58158_SGB79798|SGB79798 -0.0686542 0.9342381 -1.7308059 1.5934974 0.4839345 0.4934665
Enterocloster_hominis|SGB4721 -0.0476938 0.9466370 -1.4712447 1.3758571 -0.5922229 0.3289834
GGB9616_SGB15051|SGB15051 0.0073253 0.9934289 -1.7695818 1.7842324 0.4521875 0.5492505
Open code

if (file.exists("gitignore/lm_results/result_microbiom_validation_SGB30_VGdur_valid.csv") == FALSE) {
  write.table(result_microbiome_val,
    "gitignore/lm_results/result_microbiom_validation_SGB30_VGdur_valid.csv",
    row.names = FALSE
  )
}

7.4 Forest plot

7.4.1 Prepare data

Open code

## subset result tables
result_microbiome_subset <- result_microbiome %>%
  filter(outcome %in% diet_sensitive_taxa)

result_microbiome_val_subset <- result_microbiome_val %>%
  filter(outcome %in% diet_sensitive_taxa)

## create a data frame
data_forest <- data.frame(
  outcome = rep(diet_sensitive_taxa, 3),
  beta = c(
    result_microbiome_subset$est_Diet_duration_inCZ,
    result_microbiome_subset$est_Diet_duration_inIT,
    result_microbiome_val_subset$est_VGduration
  ),
  lower = c(
    result_microbiome_subset$CI_L_Diet_duration_inCZ,
    result_microbiome_subset$CI_L_Diet_duration_inIT,
    result_microbiome_val_subset$CI_L_VGduration
  ),
  upper = c(
    result_microbiome_subset$CI_U_Diet_duration_inCZ,
    result_microbiome_subset$CI_U_Diet_duration_inIT,
    result_microbiome_val_subset$CI_U_VGduration
  ),
  dataset = c(
    rep("CZ", len),
    rep("IT", len),
    rep("Validation", len)
  )
)


 data_forest <- data_forest %>%
  mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
  left_join(
    elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome), 
    by = 'outcome') %>% 
   mutate(outcome = factor(outcome, levels = validation_order))

7.4.1.1 Plotting

Open code

plotac <- "forest_plot_microbiome_SGB30_VGduration"
path <- "gitignore/figures"

colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")

assign(plotac, ggplot(
  data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
  geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
  geom_hline(yintercept = 0, color = "black") +
  geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
  scale_color_manual(values = colors) +
  labs(
    y = "Effect of vegan diet duration (+10y) on CLR(relative abundance)",
    x = "Outcome",
    color = "Dataset"
  ) +
  theme_minimal() +
  coord_flip() + 
  scale_x_discrete(
    labels = setNames(
      ifelse(data_forest$in_winner, 
             paste0("**", data_forest$outcome, "**"), 
             as.character(data_forest$outcome)
      ), data_forest$outcome
    )
  ) +
  theme(
    axis.text.x = element_text(size = 10),
    axis.text.y = ggtext::element_markdown(size = 10),  
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    legend.position = "bottom"
  )
)

get(plotac)

Forest plot shows the effect of duration of vegan diet (within the vegan sub-population) on CLR-transformed relative abundance of selected taxa (inferred from shotgun metagenomic sequencing), with 95% confidence intervals, across two training cohorts (Czech and Italian) and one independent Czech validation cohort. Green, blue, and grey points/lines represent the expected change in CLR-transformed relative abundance per +10 years of vegan diet duration within the Italian, Czech, and Czech validation cohorts, respectively. Positive values indicate higher relative abundances in long-term vegans compared to short-term vegans. Only taxa with significant differences between vegans and omnivores in training cohorts (average effect over the two countries) were included. Estimates for the training cohorts come from a single linear model with predictors Diet_duration, Country, their interaction (Diet_duration x Country), and Age as fixed-effect predictors. Age was included as it correlates with Diet_duration and could act as a confounder. In the Czech validation cohort, Diet_duration and Age were fixed-effect predictors. Diet-sensitive taxa are shown in bold, in the same order as in the forest plot of vegan–omnivore differences.
Open code

if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {  
  ggsave(
    path = paste0(path),
    filename = plotac,
    device = "svg",
    width = 9,
    height = 13
  )
}

8 Reproducibility

Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=cs_CZ.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=cs_CZ.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=cs_CZ.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Prague
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] vegan_2.6-6.1         lattice_0.22-5        permute_0.9-7        
##  [4] zCompositions_1.5.0-4 truncnorm_1.0-8       NADA_1.6-1.1         
##  [7] survival_3.7-0        MicrobiomeStat_1.2    glmnet_4.1-8         
## [10] pROC_1.18.0           arm_1.12-2            lme4_1.1-35.5        
## [13] Matrix_1.7-0          MASS_7.3-65           car_3.1-2            
## [16] carData_3.0-5         emmeans_1.10.4        brms_2.21.0          
## [19] Rcpp_1.0.13           rms_6.8-1             Hmisc_5.1-3          
## [22] glmmTMB_1.1.9         ggtext_0.1.2          ggdist_3.3.2         
## [25] cowplot_1.1.1         ggpubr_0.4.0          sjPlot_2.8.16        
## [28] kableExtra_1.4.0      flextable_0.9.6       gtsummary_2.0.2      
## [31] compositions_2.0-8    janitor_2.2.0         stringi_1.8.7        
## [34] lubridate_1.9.4       forcats_1.0.0         stringr_1.5.1        
## [37] dplyr_1.1.4           purrr_1.0.4           readr_2.1.5          
## [40] tidyr_1.3.1           tibble_3.3.0          ggplot2_3.5.1        
## [43] tidyverse_1.3.1       readxl_1.3.1          openxlsx_4.2.8       
## [46] RJDBC_0.2-10          rJava_1.0-11          DBI_1.2.3            
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.6                matrixStats_1.3.0       httr_1.4.2             
##   [4] insight_0.20.2          numDeriv_2016.8-1.1     tools_4.4.3            
##   [7] backports_1.5.0         utf8_1.2.6              sjlabelled_1.2.0       
##  [10] R6_2.6.1                mgcv_1.9-1              withr_3.0.2            
##  [13] Brobdingnag_1.2-7       prettyunits_1.2.0       gridExtra_2.3          
##  [16] bayesm_3.1-6            quantreg_5.98           cli_3.6.5              
##  [19] textshaping_0.3.6       performance_0.12.2      officer_0.6.6          
##  [22] sandwich_3.0-1          labeling_0.4.2          mvtnorm_1.1-3          
##  [25] robustbase_0.93-9       polspline_1.1.25        ggridges_0.5.3         
##  [28] askpass_1.1             QuickJSR_1.3.1          commonmark_1.9.1       
##  [31] systemfonts_1.0.4       StanHeaders_2.32.10     foreign_0.8-90         
##  [34] gfonts_0.2.0            svglite_2.1.3           rstudioapi_0.16.0      
##  [37] httpcode_0.3.0          generics_0.1.4          shape_1.4.6            
##  [40] distributional_0.4.0    zip_2.2.0               inline_0.3.19          
##  [43] loo_2.4.1               abind_1.4-5             lifecycle_1.0.4        
##  [46] multcomp_1.4-18         yaml_2.3.10             snakecase_0.11.1       
##  [49] grid_4.4.3              promises_1.2.0.1        crayon_1.5.3           
##  [52] haven_2.4.3             pillar_1.11.0           knitr_1.50             
##  [55] statip_0.2.3            boot_1.3-31             estimability_1.5.1     
##  [58] codetools_0.2-19        glue_1.7.0              V8_4.4.2               
##  [61] fontLiberation_0.1.0    data.table_1.15.4       vctrs_0.6.5            
##  [64] cellranger_1.1.0        gtable_0.3.0            assertthat_0.2.1       
##  [67] datawizard_0.12.2       xfun_0.52               mime_0.12              
##  [70] coda_0.19-4             modeest_2.4.0           timeDate_3043.102      
##  [73] iterators_1.0.14        statmod_1.4.36          TH.data_1.1-0          
##  [76] nlme_3.1-168            fontquiver_0.2.1        rstan_2.32.6           
##  [79] fBasics_4041.97         tensorA_0.36.2.1        TMB_1.9.14             
##  [82] rpart_4.1.24            colorspace_2.0-2        nnet_7.3-20            
##  [85] tidyselect_1.2.1        processx_3.8.4          timeSeries_4032.109    
##  [88] compiler_4.4.3          curl_6.4.0              rvest_1.0.2            
##  [91] htmlTable_2.4.0         SparseM_1.81            xml2_1.3.3             
##  [94] fontBitstreamVera_0.1.1 posterior_1.6.0         checkmate_2.3.2        
##  [97] scales_1.3.0            DEoptimR_1.0-10         callr_3.7.6            
## [100] spatial_7.3-15          digest_0.6.37           minqa_1.2.4            
## [103] rmarkdown_2.27          htmltools_0.5.8.1       pkgconfig_2.0.3        
## [106] base64enc_0.1-3         stabledist_0.7-2        dbplyr_2.1.1           
## [109] fastmap_1.2.0           rlang_1.1.6             htmlwidgets_1.6.4      
## [112] shiny_1.9.1             farver_2.1.0            zoo_1.8-9              
## [115] jsonlite_2.0.0          magrittr_2.0.3          Formula_1.2-4          
## [118] bayesplot_1.8.1         munsell_0.5.0           gdtools_0.3.7          
## [121] stable_1.1.6            plyr_1.8.6              pkgbuild_1.3.1         
## [124] parallel_4.4.3          ggrepel_0.9.5           sjmisc_2.8.10          
## [127] ggeffects_1.7.0         splines_4.4.3           gridtext_0.1.5         
## [130] hms_1.1.3               sjstats_0.19.0          ps_1.7.7               
## [133] uuid_1.0-3              markdown_1.13           ggsignif_0.6.3         
## [136] stats4_4.4.3            rmutil_1.1.10           rstantools_2.1.1       
## [139] crul_1.5.0              reprex_2.0.1            evaluate_1.0.4         
## [142] RcppParallel_5.1.8      modelr_0.1.8            nloptr_2.0.0           
## [145] tzdb_0.5.0              foreach_1.5.2           httpuv_1.6.5           
## [148] MatrixModels_0.5-3      openssl_1.4.6           clue_0.3-65            
## [151] broom_1.0.6             xtable_1.8-4            rstatix_0.7.0          
## [154] later_1.3.0             viridisLite_0.4.0       ragg_1.4.0             
## [157] lmerTest_3.1-3          cluster_2.1.8.1         timechange_0.3.0       
## [160] bridgesampling_1.1-2

References

Lenth, Russell V. 2024. “Emmeans: Estimated Marginal Means, Aka Least-Squares Means.” https://CRAN.R-project.org/package=emmeans.