Transactional Workflows in Self-Organizing Information Systems

My research activities about Transactional Workflows in Self-Organizing Systems are related to the research field C (Distributed Information Systems) and subproject D1 of the Graduiertenkolleg Metrik.

Inhaltsverzeichnis

Thesis Topics in Short

  • workflows and transactions in distributed systems
  • distribution/clustering algorithm for workflow activities/data
  • assignment of workflow activities/data using constraint-solving techniques
  • local failure handling of workflows/transactions (retriable activities)

People

Motivation

Recent disasters, such as the 2004 Asian tsunami, the 2005 hurricane Katrina or the 2007 forest fire in Greece, showed the shortcomings of existing information systems for disaster rescue and recovery. In particular, a better preparation for future disaster events as well as an efficient coordination of emergency processes short after a disaster is crucial for a successful disaster management. Systems that support collaborative work among many distributed entities, such as people, components, software services, machines are referred to as Groupware. The majority of existing groupware system, such as Workflow Management Systems, have in common that there is a lack of support for adaptiveness. These process-aware information systems are rather focused on well-known business processes that are executed in a static environment where resources are guaranteed to be available, communication is reliable and potential partners are known beforehand.

However, workflows can be applied in other, more dynamic application scenarios, such as wilderness exploration or the disaster management. Our graduate school envisions an distributed information systems that consists of (wireless) communicating sensor networks that act within two roles: (a) as an early-warning system before the disaster event and (b) as additional information unit that provides humans (e.g. rescue forces) with important data short after a disaster event. Dynamic and unforeseen disaster events poses new challenges for a reliable execution of appropriate response workflows in a such distributed environment. The underlying network topology may change during the execution time; (physical) nodes, i.e. humans as well as devices may fail leaving workflow activities uncompleted. Communication links may be temporary broken preventing the information/data flow from machines to machines, machines to humans or humans to humans. Finally, resources capabilities of devices as well as the humans capacity are limited to a certain degree requiring a careful resource selection and assignment respectively.

Problem Statement

The overall goal of my dissertation thesis is to support a robust execution of workflows in such distributed, dynamic changing information system. In our understanding, robustness is defined as a system behavior that leads to a appropriate, self-organized reaction to certain unforeseen failure events. However, preventive strategies are required as complement to a reactive strategy to achieve robustness. In my thesis, we focus on following building blocks:

  • Designing a suitable process/coordination model for disaster events
  • Designing a model for distributing process activities (control data) and process data
  • Designing algorithms to perform recovery for process activities and process data

Process execution has to cope with heterogeneous and distributed participant including backend server, sensor motes, software services, distributed databases, people and so on. Furthermore, the resource capabilities are limited and possible network changes have to be expected. Considering the mentioned characteristics we are focusing on, none of the existing approaches, neither industrial nor academic, have addressed these challenges all at once.

Suggested Solutions

To address the three research challenges described in the problem statement, we focus on following (possible) approaches:

  • Process and Network Model: First, we need a suitable model that can be used to describe/model processes during a disaster management. Workflow concepts are well suited for scenarios where many distributed entities work collaboratively together to achieve a common goal. Workflows are conventionally seen as a collection of activities executed in a specific temporal/causal sequence order. Since we are rather interested on the physical execution of workflows than on the analysis and verification on organzational level, we decide to use graph-based representations for workflows (rather than petri-nets). In our model, we distinguish between a global workflow set and so-called local workflow schedules. The former contains multiple workflows whereas the latter determines the concurrent execution of partial workflows. Thereby, we assume that the network is divided into several groups (e.g. rescue forces, headquarter) which the local schedules are assigned to.
  • Distribution Issues: Given the process and network model, we are interested in a suitable distribution of workflow activities and data to workflow participants (e.g. server,sensor motes, distributed databases, people) as part of a preventive strategy. The goal is to organize efficiently the partitioning of activities and data to enhance the life span of the system including people and devices on the one hand and to support inter-process concurrency on the other hand. Several research issues arise regarding the distribution process: How should we fragment a workflow set? How much should we fragment? What is a correct decomposition? How should we allocate? What are the necessary information for fragmentation and allocation? To answer these questions, we propose a multi-stage procedure divided into a logical (clustering) step and a physical (allocation) step. Based on qualitative and quantitative attributes of the activities (control data), we first identify criteria (including the data flow) for good clusters that groups similar activities together. In a second step, these clusters are assigned to each network group whereas the actual assignment of activities is conducted locally within each group by using constraint-programming techniques.
Logical Partitioning and Physcial Allocation of Workflows
  • Recovery Issues: Changes can dynamically appear on the workflow level as well as on the network level. Focusing on the latter case, we distinguish between different dynamic events that can trigger re-scheduling. We classify failure events into communication failures and node failures. For each of these events, we aim to provide reactive strategies to re-schedule activities. Re-scheduling is only possible if there is an alternative execution path existing. So-called retriable activities are activities which eventually succeed even in the case of (temporary) node failure. We propose an recovery algorithm that finds a suitable alternative execution paths locally within a group or among the network groups (if a re-scheduling within the group is not possible).

A case-study of several concrete emergency processes will be used to evaluate my suggested solutions and the applicability for real-world scenarios. Technically, we focus on an OSGi-based implementation (http://www.osgi.org) - a Java-based middleware platform that provides a service-oriented, component-based environment for application development - that may run either on top of embedded databases or as a stand-alone execution component of software services.

Related Work

Publications

  • Artin Avanes: Adaptive Workflow Scheduling Under Resource Allocation Constraints and Network Dynamics.
    The 34th International Conference on on Verly Large Data Bases (VLDB), Phd Workshop, Auckland, New Zealand, August 2008
  • Artin Avanes: An Adaptive Process and Data Infrastructure for Disaster Management.
    The 5th International Conference on Information Systems for Crisis Response and Management (ISCRAM), Phd Colloquium, Washington,D.C. USA, May 2008
  • Artin Avanes, Johann-Christoph Freytag, Christof Bornhoevd: Distributed Service Deployment in Mobile Ad-Hoc Networks.
    The 4th IEEE International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MOBIQUITOUS 2007), Philadelphia, USA, August 2007
  • Christof Bornhövd, Holger Ziekow, Artin Avanes: Service Composition and Deployment for a Smart Items Infrastructure.
    The 14th International Conference on Cooperative Information Systems (CooPIS), Montpellier, France 2006
  • Timo Mika Gläßer, Markus Scheidgen, Artin Avanes: Self-Organizing Information Systems for Disaster Management.
    3. GI/ITG KuVS Fachgespräch "Ortsbezogene Anwendungen und Dienste", Berlin, Germany, September 2006