Run-time Adaptation of Distributed Software Systems

Software applications increasingly run in highly dynamic and heterogeneous execution environments that require behavioral and functional adaptation depending the user’s and system’s current situation and are furthermore distributed over multiple interconnected devices, cooperating to achieve a common goal.
A coordinated adaptation is required in order to ensure a consistent system behavior in response to changes in the system’s execution environment.

Existing approaches for distributed adaptable software systems focus on strategies and algorithms, e.g DecAp[2], to calculate change prescriptions for distributed environments as well as software architectures that allow for adaptation, e.g. Rainbow[1].
The execution of such change prescriptions, especially in an unstable environment coined by message loss or temporary partitioning of the system, has not been addressed by the proposed systems.
If adaptation operations are assumed that have to be executed on multiple devices in a coordinated way to prevent the system from reaching invalid configurations, adaptation in unstable environments becomes a challenging task.

Our approach is based on the role-concept[3] and uses a distributed middleware architecture of *execution runtimes* that performs adaptation operations on the adaptable software system.
We propose a set of adaptation operations that explicitly allows to change variable parts of the system on multiple devices, e.g. adding, removing or migrating system functionality from one device to another.
The execution runtimes require further support that goes beyond the coordination of single adaptation operations if complex adaptations, that are comprised of multiple adaptation operations that affect several devices, are to be performed.
Therefore, we also propose a protocol that describes the message exchange between the execution runtimes to coordinate not only the execution of a single operation but a set of operations performed on several devices.
In addition to communication errors between execution runtimes due to unstable network conditions, adaptation operations may fail locally which would result in an invalid system configuration if not handled at run time.
We investigate mechanisms and approaches to handle such error scenarios without having the entire set of adaptation operations be reverted in response, which would affect the system’s performance adversely.
Our findings are continuously incorporated and tested in our protocol to improve the reliability of executing complex adaptations in distributed and unstable environments.