Dear colleagues,
I kindly invite you to join my defense* next week on Tuesday (16.12.) 16:30 at Engehalde 8, Seminarraum 109. I would be glad if you'd also join an Apéro at the Schmiede afterward. If the weather allows it and people are in the mood, we may then go to the Sternenmarkt for a Glühwy.
Cheers
Alex
*Dissertation Title:
Bridging the Data Desert: Mitigating Challenges of Model Accessibility in Simulink Research
Abstract:
Simulink is the industry standard for model-driven development in safety-critical domains such as automotive, aerospace, and medical devices. However, empirical research in the context of Simulink faces a persistent challenge: a scarcity of high-quality, industry-representative models that are essential for rigorous tool evaluation, empirical validation, and reproducible studies. This scarcity not only slows down scientific progress but also contributes to a replication crisis in the field -- primarily due to the unavailability of experimental models.
This thesis addresses this challenge through three interconnected contributions, grounded in a multi-method approach that includes a systematic literature review, empirical case studies, community surveys, dataset analysis, and tool prototyping and validation:
*
A diagnosis of model scarcity demonstrating that the lack of models limits the ability to conduct empirical research and also contributes to only 9\% of Simulink tool studies meeting replicability criteria (i.e., all artifacts available).
*
An assessment of existing open-source Simulink models and datasets, evaluating their suitability for empirical research and investigating their limitations in scale, complexity, and industrial realism. Through case studies -- including model matching, analyzing bus architecture of Simulink models, and investigating commenting practices -- we demonstrate that open-source models, while imperfect, can serve as valuable research subjects for empirical investigation when carefully selected and used appropriately.
*
To address the lack of (i) large-scale and (ii) industry-representative models, we developed two novel tools: (i) GRANDSLAM, a linearly scaling synthesizer for Simulink that generates models with adjustable properties, enabling the synthesis of very large open-source models; (ii) SMOKE, a model anonymizer that removes sensitive information from Simulink models while preserving their structural properties, thus facilitating the sharing of real-world models without violating intellectual property constraints.
Our work complements and extends contemporary datasets by showing their suitability for empirical research and providing tools for their expansion. By lowering the barriers to data access, we advance open science in model-driven engineering, enabling replicable studies, specifically large-scale studies that were previously infeasible. The contributions of this thesis are foundational: they narrow the "data desert" in Simulink research and foster collaboration through shareable resources. Beyond immediate applications, our tools and findings support standardized benchmarks, comparative tool evaluations, and longitudinal studies of modeling practices -- ultimately strengthening the empirical rigor and industrial relevance of Simulink research.
In summary, this thesis provides both the evidence of a critical gap in Simulink research and practical solutions to address it, offering a pathway toward more transparent, reproducible, and impactful model-driven engineering.