CRC 1404/1: FONDA - Foundations of Workflows for Large-Scale Scientific Data Analysis
Facts
Materials Engineering
Materials Science
Systems Engineering
Medicine
Computer Science
DFG Collaborative Research Centre
![]()
Description
Scientific discoveries in the natural sciences rely on the computational analysis of large data sets, which are carried out by complex data analysis workflows (DAWs) executed on a distributed infrastructure. Most research in DAWs focuses on techniques for minimizing their runtime on a specific infrastructure, which leads to solutions that are difficult to maintain and dependent on the involvement of highly specialized and scarce data engineers. However, in most data science projects, runtime is not the decisive factor; instead, it is its development time. FONDA set out in 2020 to address this long-lasting and increasingly pressing problem. Our overarching research goal is to research languages, technologies, and algorithms to increase human productivity when designing, maintaining, or reusing DAWs for large-scale scientific data analysis. Within its first funding period, FONDA focused on three specific properties of DAWs that are directly linked to human productivity, namely portability, adaptability, and dependability. FONDA achieved groundbreaking results in these regards, such as improved portability through flexible interfaces between infrastructure components, improved adaptability via intelligent scheduling, and improved dependability through contract-driven DAW development. In its second phase, FONDA will further develop its research topics by lifting three restrictions we imposed on ourselves in phase I. First, we break the assumption that DAWs are executed in a single data center hosting all necessary data and will study multi-site DAWs, i.e., DAWs whose sub-workflows are executed in different data centers. Second, we extend our scope in terms of the DAW lifecycle by addressing usability of DAW systems, i.e., empirical investigations of hu-man-computing interfaces and a systematic approach to DAW design. Third, we generalize from single workflows to workflow reuse by researching the technical sustainability of DAWs. Furthermore, as human productivity in data analysis is increasingly threatened by excessive energy costs, we take improvements to environmental sustainability in focus. Besides its scientific results, FONDA’s first phase also excelled in several overachieving topics. With the recent founding of the new HPC@HU service, it had a long-lasting structural impact on the speaker university. The recognition of its highly important research topic at the interface be-tween computer science and the natural sciences is reflected by many recent appointments in the region, which allowed a perfectly matching extension of our PI group. We are proud to have achieved an outstanding high percentage of female PhD students (38%), and we are looking forward to the new edited book on “Workflows for Large-Scale Scientific Data Analysis”, for which more than 100 authors from 15 countries have confirmed contributions and that will ap-pear in summer 2024 in the newly created Open Access publisher BerlinUP.
Topics
Organization entities
Department of Computer Science
Address
Johann von Neumann-Haus, Institutsgeb?ude, Rudower Chaussee 25, 12489 BerlinGeneral contactTel.: 030 2093-41140
Partners
- Cooperation partnerUniversityGermany
Charité – Berlin University Medicine
- Cooperation partnerGermany
Federal Institute for Materials Research and Testing
- Cooperation partnerUniversityGermany
Free University of Berlin
- Cooperation partnerResearch instituteGermany
Hasso Plattner Institute for Digital Engineering
- Cooperation partnerNon-university research institutionGermany
Helmholtz Centre Potsdam – German Research Centre for Geosciences
- Cooperation partnerNon-university research institutionGermany
Max Delbrück Center for Molecular Medicine in the Helmholtz Association
- Cooperation partnerUniversityGermany
Technical University of Berlin
- Cooperation partnerUniversityGermany
Technical University of Darmstadt
- Cooperation partnerUniversityGermany
University of Potsdam
- Cooperation partnerResearch instituteGermany
Zuse Institute Berlin
Child projects
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Adapting Genomic Data Analysis Workflows for Different Data Access Patterns (SP A02)
Project management: Prof. Dr. Ulf Leser
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Adaptive, Distributed and Scalable Analysis of Massive Satellite Data (SP B05)
Project management: Prof. Dr. Ulf Leser, Prof. Dr. Patrick Hostert
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Data Analysis Workflows for Interactive Scientific Exploration (SP A06)
Project management: Prof. Dr. Matthias Weidlich
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Deriving Trust Levels for Multi-Choice Data Analysis Workflows (SP A03)
Project management: Prof. Dr. Dr. h.c. Claudia Draxl, Prof. Dr. Lars Grunske
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Distributed Run-Time Monitoring and Control of Data Analysis Workflows (SP B06)
Project management: Prof. Dr. Lars Grunske
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Exploiting SDNs for Efficient Data Management in Next-Generation Data Analysis Workflows (SP B04)
Project management: Prof. Dr. Bj?rn Scheuermann, Prof. Dr. Alexander Reinefeld
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Foundations of Data Analysis Workflow Validation (SP A01)
Project management: Prof. Dr. Matthias Weidlich, Prof. Dr. Nicole Schweikardt
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: MGK: Integrated Research Training Group (SP S02)
Project management: Prof. Dr. Lars Grunske
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Portable and Adaptive Data Analysis Workflows for Real-Time 3D Vision (SP B02)
Project management: Prof. Dr.-Ing. Peter Eisert, Prof. Christoph T. Koch, PhD
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Scheduling and Adaptive Execution of Data Analysis Workflows Across Heterogeneous Infrastructures (SP B01)
Project management: Prof. Dr. Henning Meyerhenke
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
CRC 1404/1: Testbeds and Repositories (S01)
Project management: Prof. Dr. Ulf Leser, Malte Dreyer
- ProjectDFG Collaborative Research Centre07/2020 - 06/2024
SFB 1404/1: Debugging verteilter Datenanalyseworkflows (TP B03)
Project management: Prof. Dr. Timo Kehrer