To content
Department of Statistics

Publications

Publication Short description of the content
Grouped regression modeling of proteins
Jonas Heiner, Jan G. Hengstler, Andreas Groll
(2023): Grouped regression modeling of proteins, Proceedings of the 37th International Workshop on Statistical Modelling. Volume 1, 451-456.
https://iwsm2023.statistik.tu-dortmund.de/storages/iwsm2023-statistik/r/dokumente/IWSM_2023_Conference_Proceedings.pdf

part of research area "Integration"
In an organism’s genome, proteins emerge from a gene via transition and transcription. The relation between the expression of a gene and the resulting levels of the corresponding protein is known to be positively correlated, but gene expression explains only a relatively small fraction of the variance of protein expression. This motivates the utilization of regression models in order to investigate the relationship between gene expression and protein levels. Co-expression analysis for gene grouping is used for the regression models to additionally consider the grouped genes as covariates for modeling a protein’s expression. Quality measures are compared for the models, which show a clear improvement of the protein modeling when including grouping information of the genes.
logicDT: a procedure for identifying response-associated interactions between binary predictors
Michael Lau, Tamara Schikowski, Holger Schwender
Machine Learning 113, 933–992, 2024.
https://doi.org/10.1007/s10994-023-06488-6
Conventional statistical learning procedures that can autonomously detect interaction effects typically do not yield interpretable models, are limited to lower order interactions, or require modeling assumptions that might not hold. We propose a statistical learning method that fits a single interpretable and highly predictive decision tree by identifying important predictors and interactions between predictors. The method is, furthermore, accompanied by a variable importance measure for assessing the magnitude of marginal and interaction effects.
Benefit of using interaction effects for the analysis of high-dimensional time-response or dose-response data for two-group comparisons
Julia C. Duda, Carolin Drenda, Hue Kästel, Jörg Rahnenführer, Franziska Kappenberg. Scientific Reports [ISSN: 2045-2322], 13(1) (2023).
https://doi.org/10.1038/s41598-023-47057-0

part of research area "Prediction"
Based on our experience in collaborative projects, we identified a common experimental setup in toxicology, which often might not consider a more promising analysis approach.
In a two-group comparison scenario with two factors (e.g.: Treatment A and B and Genotype 0 and 1), it is typically of interest if the treatment effect differs w.r.t. the genotype.
For such a common experimental scenario, the statistical modeling of an interaction effect likely captures the research question very well, but is often not considered. This work targets practitioners and explains in detail the potential benefit of using interaction effects in such settings.
Inhibition of the Renal Apical Sodium Dependent Bile Acid Transporter Prevents Cholemic Nephropathy in Mice with Obstructive Cholestasis

Ahmed Ghallab, Daniela González, Ellen Strängberg, Ute Hofmann, Maiju Myllys, Reham Hassan, Zaynab Hobloss, Lisa Brackhagen, Brigitte Begher-Tibbe, Julia C. Duda, Carolin Drenda, Franziska Kappenberg, Joerg Reinders, Adrian Friebel, Mihael Vucur, Monika Turajski, Abdel-latief Seddek, Tahany Abbas, Noha Abdelmageed, Samy A.F. Morad, Walaa Morad, Amira Hamdy, Wiebke Albrecht, Naim Kittana, Mohyeddin Assali, Nachiket Vartak, Christoph van Thriel, Ansam Sous, Patrick Nell, Maria Villar-Fernandez, Cristina Cadenas, Erhan Genc, Rosemarie Marchan, Tom Luedde, Peter Åkerblad, Jan Mattsson, Hanns-Ulrich Marschall, Stefan Hoehme, Guido Stirnimann, Matthias Schwab, Peter Boor, Kerstin Amann, Jessica Schmitz,
Jan H. Bräsen, Jörg Rahnenführer, Karolina Edlund, Saul J. Karpen, Benedikt Simbrunner, Thomas Reiberger, Mattias Mandorfer, Michael Trauner, Paul A. Dawson, Erik Lindström, Jan G. Hengstler.
Journal of Hepatology, 80(2), 268-281, 2023.
https://doi.org/10.1016/j.jhep.2023.10.035


part of research area "Prediction"
Cholemic nephropathy (CN) is a complication of a liver disease, that so far has no treatment. In this work, therapeutic strategies for CN were found. The mechanism of CN was analyzed and proven experimentally: It is triggered by the accumulation of bile acid (BA) in kidney cells. These cells take up BA through certain transporters. By developing a substance that blocks these transporters, the BA uptake is reduced. This in turn almost entirely cures CN in mouse models.
By providing tailored gene-expression analyses, the RTG helped to uncover the beneficial effects of the suggested treatment from a bioinformatical, high-dimensional perspective.
Information sharing in high-dimensional gene expression data for improved parameter estimation in concentration-response modelling

Franziska Kappenberg, Jörg Rahnenführer.
Plos one 18.10 (2023): e0293180.
https://doi.org/10.1371/journal.pone.0293180

part of research area "Prediciton"
Determination of an alert concentration based on parametric modelling is a frequent goal in toxicology, and in gene expression experiments, concentration-response profiles can be measured for thousands of genes simultaneously. In this paper, we propose an empirical Bayes approach to information sharing across genes, where in essence a weighted mean of the individual estimate for one parameter of a fitted model and the mean of all estimates of the entire set of genes is calculated. Results of a simulation study show an improvement in terms of the mean squared error (MSE) between estimate and true parameter value for many genes, while however for some genes, the MSE increases.
Designs for the simultaneous inference of concentration–response curves

Leonie Schürmeyer, Kirsten Schorning, Jörg Rahnenführer.
BMC Bioinformatics 24, 393 (2023).
https://doi.org/10.1186/s12859-023-05526-3

part of research area "Prediction"
We compare different design approaches for the simultaneous inference of concentration-response microarray gene expression data. Therefore, we have developed a simultaneous D-optimal design. It is shown that our developed simultaneous D-optimal design performs best, while often used designs like the log-equidistant design are not appropriate for microarray gene expression data. We base our findings on a theoretical analysis with D-efficiencies and an extensive simulation study.
Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data

Marieke  Stolte, Wiebke Albrecht, Tim Brecklinghaus, Lisa Gründler, Peng Chen, Jan G Hengstler, Franziska Kappenberg, Jörg Rahnenführer.
Computational Toxicology 28 (2023): 100288.
https://doi.org/10.1016/j.comtox.2023.100288

part of research area "Prediction"
We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity when combined with cytotoxicity data. Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on a dataset of 60 compounds. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually.
Transcriptome-based prediction of drugs, inhibiting cardiomyogenesis in human induced pluripotent stem cells
Anna Cherianidou, Franziska Kappenberg, Florian Seidel, Aviseka Acharya, Panagiota Papazoglou, Sureshkumar Perumal Srinivasan, Jürgen Hescheler, Luying Peng, Marcel Leist, Jan G. Hengstler, Jörg Rahnenführer, Agapios Sachinidis.
Cell Death Discovery 9.1 (2023): 321
https://doi.org/10.1038/s41420-023-01616-6

part of research area "Prediction"
Human induced pluripotent stem cells (hiPSCs) are analysed to identify a key cardiomyogenesis gene signature that can be applied to identify compounds and stress factors compromising the cardiomyogenesis process. Three retinoids are identified that completely block the process of cardiomyogenesis in hiPSCs, and a gene signature consisting of 31 genes and associated biological processes that are affected by these retinoids are identified.
Risk assessment of parabens in a transcriptomics-based in vitro test
Florian Seidel, Franziska Kappenberg, Susann Fayyaz, Andreas Scholtz-Illigens, Anna Cherianidou, Katharina Derksen, Patrick Nell, Rosemarie Marchan, Karolina Edlund, Marcel Leist, Agapois Sachinidis, Jörg Rahnenführer, Reinhard Kreiling, Jan G. Hengstler.
Chemico-Biological Interactions 384 (2023): 110699
https://doi.org/10.1016/j.cbi.2023.110699

part of research area "Prediction"
Parabens have been used as preservatives in food, drugs, and cosmetics for decades, but the majority were banned in 2009 and 2014. Only four types of parabens are still available for use, two of which have been extensively tested in vivo with no resulting evidence for developmental and reproductive toxicity (DART). In this work, ethylparaben is analysed via toxicity and gene expression assays and the results are compared to those of the two parabens for which no evidence of DART could be found in vivo.
Guidance for statistical design and analysis of toxicological dose–response experiments, based on a comprehensive literature review
Franziska Kappenberg, Julia C. Duda, Leonie Schürmeyer, Onur Gül, Tim Brecklinghaus, Jan G. Hengstler, Kirsten Schorning, Jörg Rahnenführer. Archives of Toxicology. 2023, 1-21.
https://doi.org/10.1007/s00204-023-03561-w

part of research area "Prediciton"
There is a discrepancy in the state of the art in statistical methodological research with respect to the analysis of dose-response experiments, and what is done in published toxicological literature. In this paper, a comprehensive literature review is conducted to quantify the extent of this discrepancy with respect to the three aspect biological background, statistical design, and statistical analysis of such experiments. Based on the findings of the review, three selected issues are discussed critically in the context of statistical research, and concrete guidance for planning, execution, and analysis of dose-response studies from a statistical viewpoint is proposed.
Estimating the Relative Contribution of Environmental and Genetic Risk Factors to Different Aging Traits by Combining Correlated Variables into Weighted Risk Scores Claudia Wigmann, Anke Hüls, Jean Krutmann, Tamara Schikowski. International Journal of Environmental Research and Public Health. 2022; 19(24):16746.
https://doi.org/10.3390/ijerph192416746

part of research area "Integration"
Genetic and exposomal factors (e.g. air pollution, tobacco smoke, etc.) contribute to the development of human aging. For prevention purposes it is highly desirable to know the extent to which each category of the exposome and genetic factors contribute to their development. We use weighted risk scores to assess combined effects of categories of such predictors, and a measure of relative importance to estimate their relative contribution to lung and skin aging in a cohort of elderly Caucasian women. The proposed approach enables us to quantify and rank contributions of categories of exposomal and genetic factors to human aging traits as well as health outcomes in general.
Efficient gene-environment interaction testing through bootstrap aggregating
Michael Lau, Sara Kress, Tamara Schikowski, Holger Schwender.  Scientific Reports 13.1 (2023): 937.
https://doi.org/10.1038/s41598-023-28172-4

part of project R3
Gene-environment (GxE) interactions are an important and sophisticated component in the development of complex phenotypes. Established methods for detecting GxE interactions suffer from a low statistical power due to limited modeling or the need of data splitting to avoid overfitting. We propose a GxE interaction testing procedure based on bagging (bootstrap aggregating) that utilizes the full data set for both fitting a genetic risk score and testing a GxE interaction by employing the out-of-bag prediction mechanism.
Identifying alert concentrations using a model-based bootstrap approach
Kathrin Möllenhoff, Kirsten Schorning, Franziska Kappenberg.
Biometrics 79.3 (2023)
https://doi.org/10.1111/biom.13799

part of project P7
A new model-based method to identify alert concentrations, based on fitting a concentration-response curve and constructing a simultaneous confidence band for the difference of the response of a concentration compared to the control, is proposed in this paper. The confidence bands are obtained using a bootstrap approach, which can be applied to any functional form of the concentration-response curve. This particularly offers the possibility to investigate also those situations where the concentration-response relationship is not monotone and, moreover, to detect alerts at concentrations which were not measured during the study, providing a highly flexible framework for determinining alert concentrations.
High Accuracy Classification of Developmental Toxicants by In Vitro Tests of Human Neuroepithelial and Cardiomyoblast Differentiation
Florian Seidel, Anna Cherianidou, Franziska Kappenberg, Miriam Marta, Nadine Dreser, Jonathan Blum, Tanja Waldmann, Nils Blüthgen, Johannes Meisig, Katrin Madjar, Margit Henry, Tamara Rotshteyn, Andreas Scholtz-Illigens, Rosemarie Marchan, Karolina Edlund, Marcel Leist, Jörg Rahnenführer, Agapios Sachinidis, Jan Georg Hengstler. Cells, 11(21), Article 3404, 2022
https://doi.org/10.3390/cells11213404

part of research area "Prediction"
In order to predict the risk of developmental toxicity in compounds, recently, the UKK2 in vitro test based on cardiac differentiation was proposed. Here, the new UKN1 assay modeling neuroepithelial differentiation is proposed and analyzed, also with respect to the benefit of combining UKK2 and UKN1 assays for classification of compounds. Classification results showed accuracies of the UKN1 assay between 87 and 90%, and combination of both assays yielded even higher accuracies with generally a high congruence in compound classification and high overlap of signaling pathways.
In vitro/in silico prediction of drug induced steatosis in relation to oral doses and blood concentrations by the Nile Red Assay
Tim Brecklinghaus, Wiebke Albrecht, Julia Duda,Franziska Kappenberg, Lisa Gründler, Karolina Edlund, Rosemarie Marchan, Ahmed Ghallab, Cristina Cadenas, Adrian Rieck, Nachiket Vartak, Laia Tolosa, José V.Castell, Iain Gardner, Emina Halilbasic, Michael Trauner, Anett Ullrich, Anja Zeigerer, Özlem Demirci Turgunbayer, Georg Damm, Daniel Seehofer, Jan G.Hengstler. Toxicologial Letters 268: 33-46, 2022
doi.org/10.1016/j.toxlet.2022.08.006

part of project P1 and P2
The accumulation of lipid droplets, a key feature of drug-induced liver injury, is quantified by a new biological assay based on fluorescent dye Nile Red. The method aims at distinguishing between hepatotoxic and non-hepatotoxic compounds, based on the determination of alert concentration from toxicological data. Data from this assay were modeled in a flexible manner using the MCP-Mod (Multiple Comparison Procedure and Modeling) method to determine different alert concentrations. The new assay was combined with an existing assay, the CTB assay, resulting in improved discrimination between hepatotoxic and non-hepatotoxic compounds compared to using the assays alone.
An intuitive time-dose-response model for cytotoxicity data with varying exposure durations
Julia Duda, Jan G. Hengstler, Jörg Rahnenführer. Computational Toxicology 23 (2022): 100234
doi.org/10.1016/j.comtox.2022.100234

part of project P2
Statistical modeling approaches for dose-response analyses are often required in toxicological applications. By fitting a concentration-response curve, one can derive certain concentrations of interest. In practice, concentration-response data for different exposure durations might be available and the target concentration for each exposure duration is of interest. In this work, we propose a two-dimensional model that considers both the concentration and exposure duration to improve target concentration estimation.
Classification of Developmental Toxicants in a Human iPSC Transcriptomics-Based Test
Anna Cherianidou, Florian Seidel, Franziska Kappenberg, Nadine Dreser, Jonathan Blum, Tanja Waldmann, Nils Blüthgen, Johannes Meisig, Katrin Madjar, Margit Henry, Tamara Rotshteyn, Rosemarie Marchan, Karolina Edlund, Marcel Leist, Jörg Rahnenfuhrer, Agapios Sachinidis, and Jan G. Hengstler. Chemical Research in Toxicology 35(5): 760-773, 2022
doi.org/10.1021/acs.chemrestox.1c00392

part of research area "Prediction"
The assessment of the potential for developmental toxicity in drugs is usually based on in vivo testing, and thus comes with high costs and a high number of required animals. Here, the hiPSC-based UKK2 in vitro test, using genome-wide expression profiles, is proposed. Two classifiers were considered, where the cross-validated AUC for the considered set of 23 compounds that are known to cause developmental toxicity (teratogens) and 16 non-teratogens was 0.96, when including information about cytotoxicity to the l1-penalized logistic regression-based classifier.
Evaluation of tree-based statistical learning methods for constructing genetic risk scores.
Michael Lau, Claudia Wigmann, Sara Kress, Tamara Schikowski, Holger Schwender. BMC Bioinformatics 23, 97, 2022.
doi.org/10.1186/s12859-022-04634-w

part of project R3
Genetic risk scores are a valuable tool for assessing individual disease risks and uncovering biological mechanisms. Thus far, mainly linear construction approaches not considering gene-gene interactions are employed. In simulations and a real data application, we show that tree-based approaches based on random forests and logic regression are able to yield superior genetic risk score models.
Influence of bile acids on the cytotoxicity of chemicals in cultivated human hepatocytes
Tim Brecklinghaus, Wiebke Albrecht, Franziska Kappenberg, Julia Duda, Mian Zhang, Iain Gardner, Rosemarie Marchan, Ahmed Ghallaba, Özlem Demirci Turgunbayera, Jörg Rahnenführer, Jan G. Hengstler. Toxicology In Vitro 81: 105344, 2022
doi:10.1016/j.tiv.2022.105344

part of project P1 and P2
Bile acids are known to influence the susceptibility of hepatocytes to chemicals. Cytotoxicity of 18 compounds with known hepatotoxicity status was assessed with and without the addition of a bile acids mix. EC10 values of 7 compounds were notably decreased by the bile acids, and notably increased for 5 compounds. No improvement of the separation between hepatotoxic and non-hepatotoxic compounds, assessed by a recently introduced method, could be observed.
Model selection characteristics when using MCP-Mod for dose-response gene expression data.
Julia C Duda, Franziska Kappenberg, Jörg Rahnenführer. Biometrical Journal 64.5 (2022)
doi: 10.1002/bimj.202000250

part of project P2
Advances in genomics bring forward increasingly large omics data sets, such that even concentration-resolved gene expression data are available. We transfer well established dose-response theory from clinical research to toxicological gene expression data. Multiple-Comparison-Procedure and Modeling (MCP-Mod) is a relatively new dose-response modeling technique developed for Phase II clinical dose-finding trials that accounts for model uncertainty. By applying MCP-Mod on a concentration-resolved gene expression data set, we find that commonly assumed monotonicity is not adequate and model uncertainty should be considered.
The hepatocyte export carrier inhibition assay improves the separation of hepatotoxic from non-hepatotoxic compounds.
Tim Brecklinghaus, Wiebke Albrecht, Franziska Kappenberg, Julia Duda, Nachiket Vartak, Karolina Edlund, Rosemarie Marchan, Ahmed Ghallab, Cristina Cadenas, Georgia Günther, Marcel Leist, Mian Zhang, Iain Gardner, Jörg Reinders, Frans GM. Russel, Alison J. Foster, Dominic P. Williams, Amruta Damle-Vartak, Melanie Grandits, Gerhard Ecker, Naim Kittana, Jörg Rahnenführer, Jan G. Hengstler. Chemico-Biological Interactions 351: 109728, 2021
doi: 10.1016/j.cbi.2021.109728.

part of project P1 and P2.
The risk of drug-induced liver injury has recently been addressed by a new method aimed at distinguishing between hepatotoxic and non-hepatotoxic compounds, based on the determination of alert concentration from toxicological data. In this work, a new biological assay was used to calculate the alert concentration. Data from this assay were modeled in a flexible manner using the MCP-Mod (Multiple Comparison Procedure and Modeling) method to determine different alert concentrations, resulting in improved discrimination between hepatotoxic and non-hepatotoxic compounds.
Spatio-Temporal Multiscale Analysis of Western Diet-Fed Mice Reveals a Translationally Relevant Sequence of Events during NAFLD Progression.
Ahmed Ghallab, Maiju Myllys, Adrian Friebel, Julia Duda, Karolina Edlund, Emina Halilbasic, Mihael Vucur, Zaynab Hobloss, Lisa Brackhagen, Brigitte Begher-Tibbe, Reham Hassan, Michael Burke, Erhan Genc, Lynn Johann Frohwein, Ute Hofmann, Christian H. Holland, Daniela González, Magdalena Keller, Abdel-latif Seddek, Tahany Abbas, Elsayed S.I. Mohammed, Andreas Teufel, Timo Itzel, Sarah Metzler, Rosemarie Marchan, Cristina Cadenas, Carsten Watzl, Michael A. Nitsche, Franziska Kappenberg, Tom Luedde, Thomas Longerich, Jörg Rahnenführer, Stefan Hoehme, Michael Trauner, Jan G. Hengstler. Cells 10(10): 2516, 2021.
doi: 10.3390/cells10102516.

part of project P2.
Non-alcoholic fatty liver disease (NAFLD) is a chronic liver disease that affects more than one billion people worldwide with an increasing incidence. We analyze mice that are fed a fast-food style 'Western-Diet' (WD), a well-known contributor ot human NAFLD. This work is the first time-resolved study of NAFLD in the sense that pathophysiological and transcriptomical changes of the mice relative to 9 different feeding durations with WD are analyzed. A series of key events that occur with prolonged WD, such as lipid droplet formation and hepatocellular cancer, were identified. These key events recapitulate many features of human disease and offer a basis for the identification of therapeutic targets.