News

Paper and Poster Presentation at the CrossCloud Workshop in Belgrade

A paper and a poster about the FederatedCloudSim framework will be presented at the CrossCloud workshop on the 23rd April. The workshop is co-located with the EuroSys conference in Belgrade.

Title: Financial Evaluation of SLA-based VM Scheduling Strategies for Cloud Federations

Authors: Andreas Kohne, Marcel Krüger, Marco Pfahlberg, Olaf Spinczyk, Lars Nagel

Abstract: In recent years, cloud federations have gained popularity. Small as well as big cloud service providers (CSPs) join federations to reduce their costs, and also cloud management software like OpenStack offers support for federations. In a federation, individual CSPs cooperate such that they can move load to partner clouds at high peaks and possibly offer a wider range of services to their customers. Research in this area addresses the organization of such federations and strategies that CSPs can apply to increase their profit.

In this paper we present the latest extensions to the FederatedCloudSim framework that considerably improve the simulation and evaluation of cloud federations. These simulations include service-level agreements (SLAs), scheduling and brokering strategies on various levels, the use of real-world cloud workload traces and a fine-grained financial evaluation using the new CloudAccount module. We use FederatedCloudSim to compare scheduling and brokering strategies on the federation level. Among them are new strategies that conduct auctions or consult a reliance factor to select an appropriate federated partner for running outsourced virtual machines. Our results show that choosing the right strategy has a significant impact on SLA compliance and revenue.

Presentation at PODC: Randomised Parallel Allocation of Jobs to Servers

At the ACM Symposium on Principles of Distributed Computing (PODC) in Chicago we will present a theoretical paper on the distribution of jobs to servers. The talk will be given on 26/07/2016.

Title: Self-stabilizing Balls & Bins in Batches

Authors: Petra Berenbrink, Tom Friedetzky, Peter Kling, Frederik Mallmann-Trenn, Lars Nagel, Chris Wastell

Abstract: A fundamental problem in distributed computing is the distribution of requests to a set of uniform servers without a centralized controller. Classically, such problems are modelled as static balls into bins processes, where m balls (tasks) are to be distributed to n bins (servers). In a seminal work, Azar et al. proposed the sequential strategy Greedy[d] for n=m. When thrown, a ball queries the load of d random bins and is allocated to a least loaded of these. Azar et al. showed that d=2 yields an exponential improvement compared to d=1. Berenbrink et al. extended this to m >> n, showing that the maximal load difference is independent of m for d=2 (in contrast to d=1).

We propose a new variant of an infinite balls into bins process. In each round an expected number of c*n new balls arrive and are distributed (in parallel) to the bins and each non-empty bin deletes one of its balls. This setting models a set of servers processing incoming requests, where clients can query a server's current load but receive no information about parallel requests.
We study the Greedy[d] distribution scheme in this setting and show a strong self-stabilizing property: For any arrival rate c=c(n)<1, the system load is time-invariant. Moreover, for any (even super-exponential) round t, the maximum system load is (w.h.p.) O(1/(1-c) * log(n/(1-c))) for d=1 and O(log(n/(1-c)))$ for d=2. In particular, Greedy[2] has an exponentially smaller system load for high arrival rates.

Publication on Acceleration of Migrations (VHPC Workshop)

FAST will present a new publication on the acceleration of migrations in virtual environments at the 11th Workshop on Virtualization in High-Performance Cloud Computing (VHPC). The VHPC workshop will be held on 23/6/2016 in Frankfurt, Germany, in conjunction with the International Supercomputing Conference - High Performance 2016 (ISC).

Title: Accelerating Application Migration in HPC

Authors: Ramy Gad, Simon Pickartz, Tim Süß, Lars Nagel, Stefan Lankes, André Brinkmann

Abstract: It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

VM migration currently seems to be the most promising technique for such approaches as it provides a strong level of isolation. However, the migration time of virtual machines is higher than the respective migration time on the process level. This can be explained by the additional virtualization layer in the memory hierarchy.

In this paper, we propose a technique for the acceleration of VM migration. We take advantage of the fact that freed memory regions within the guest system are not recognized by the hypervisor. Therefore, we fill them with zeros such that zero-page detection and compression can work more efficiently. We demonstrate that the approach reduces migration time by up to 8% with a negligible overhead for some applications.

Talk About Container Migration at VHPC Workshop in Frankfurt

The publication "Migrating LinuX Containers Using CRIU" will be presented at the 11th Workshop on Virtualization in High-Performance Cloud Computing (VHPC) on 23/6/2016. The workshop is co-located with the International Supercomputing Conference (ISC) in Frankfurt, an important international HPC conference.

Title: Migrating LinuX Containers Using CRIU

Authors: Simon Pickartz, Niklas Eiling, Stefan Lankes, Lukas Razik, Antonello Monti

Abstract: Process migration is one of the most important techniques in modern computing centers. It enables the implementation of load balancing strategies and eases the system administration. As supercomputers continue to grow in size, according mechanisms become interesting to High-Performance Computing (HPC) as well.

Usually, migration is accomplished by means of hypervisor-based virtualization. However, container-based approaches are an attractive alternative for HPC to minimize the performance penalties. In contrast to virtual machine migration, the migration of operating system containers is mostly unexplored in the context of HPC until today.

In this paper we present a prototype implementation of a libvirt driver enabling the migration of LinuX Containers. We evaluate the driver in terms of overhead added by the additional software layer and compare its migration performance with that of virtual machines based on KVM.

Talk About Application Migration in HPC Environments at HPCS Conference in Innsbruck

FAST will present a paper on application migration in HPC environments at the 14th "International Conference on High Performance Computing & Simulation" (HPCS 2016) in Innsbruck, Austria.

Title: Application Migration in HPC -- A Driver of the Exascale Era?

Authors: Simon Pickartz, Carsten Clauss, Jens Breitbart, Stefan Lankes und Antonello Monti

Abstract: Application migration is valuable for modern computing centers. Apart from a facilitation of the maintenance process, it enables dynamic load balances for an improvement of the system’s efficiency. Although, the concept is already wide-spread in cloud computing environments, it did not find huge adoption in HPC yet.
As major challenges of future exascale systems are resiliency, concurrency, and locality, we expect migration of applications to be one means to cope with these challenges. In this paper we investigate its viability for HPC by deriving respective requirements for this specific field of application. In doing so, we sketch example scenarios demonstrating its potential benefits. Furthermore, we discuss challenges that result from the migration of OS-bypass networks and present a prototype migration mechanism enabling the seamless migration of MPI processes in HPC systems.

Talk About Migration of MPI Processes at IPDRM Workshop

FAST will present a paper on the migration of MPI processes at the first "Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware" (IPDRM), co-located with the "IEEE International Parallel and Distributed Processing Symposium" (IPDPS) in Chicago, USA.

Title: Non-Intrusive Migration of MPI Processes in OS-bypass Networks

Authors: Simon Pickartz, Carsten Clauss, Stefan Lankes, Stephan Krempel, Thomas Moschny, Antonello Monti

Abstract: Load balancing, maintenance, and energy efficiency are key challenges for upcoming supercomputers. An indispensable tool for the accomplishment of these tasks is the ability to migrate applications during runtime. Especially in HPC, where any performance hit is frowned upon, such migration mechanisms have to come with minimal overhead. This constraint is usually not met by current practice adding further abstraction layers to the software stack.
In this paper, we propose a concept for the migration of MPI processes communicating over OS-bypass networks such as InfiniBand. While being transparent to the application, our solution minimizes the runtime overhead by introducing a protocol for the shutdown of individual connections prior to the migration. It is implemented on the basis of an MPI library and evaluated using virtual machines based on KVM.
Our evaluation reveals that the runtime overhead is negligible small. The migration time itself is mainly determined by the particular migration mechanism, whereas the additional execution time of the presented protocol converges to 2ms per connection if more than a few dozen connections are shut down at a time.

HPC Status Conference of the Gauß-Allianz

The fifth HPC Status Conference of the Gauß-Allianz took place from December 14th to December 15th at the Forschungskolleg Humanwissenschaften in Bad Homburg and was organized by the Goethe University Frankfurt am Main. The target audience of the conference was the High Performance Computing community in Germany. The conference focused on projects, which have been funded by the BMBF as part of the HPC Software Call, and nearly all projects of this BMBF funding initiative presented and discussed their interim results. The FAST project presented the integration of the batch environment Slurm with the agent-based environment and the mechanisms for transparent process migration developed in FAST.

COSH Workshop: Accepted Papers

The 1st COSH Workshop on Co-Scheduling of HPC Applications, organized by FAST member TU München, will be held in Prague on January 19, 2016, co-located with HiPEAC 2016. The following papers were accepted and will be presented at the workshop:

"A Resource-centric Application Classification Approach"
Alexandros-Herodotos Haritatos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris
School of ECE, NTUA, Athens, Greece

"Dynamic Process Management with Allocation-internal Co-Scheduling Towards Interactive Supercomputing"
Carsten Clauss, Thomas Moschny and Norbert Eicker
ParTec GmbH and Forschungszentrum Jülich, Germany

"Detailed Characterization of HPC Applications for Co-Scheduling"
Josef Weidendorfer and Jens Breitbart
Technische Universität München, Germany

"Terrible Twins: A Simple Scheme to Avoid Bad Co-Schedules"
Andreas de Blanche and Thomas Lundqvist
Department of Engineering Science, University West, Sweden

"Implications of Process-Migration in Virtualized Environments"
Simon Pickartz, Jens Breitbart and Stefan Lankes
RWTH Aachen and Technische Universität München, Germany

"Impact of the Scheduling Strategy in Heterogeneous Systems That Provide Co-Scheduling"
Tim Süß, Nils Döring, Ramy Gad, Lars Nagel, André Brinkmann, Dustin Feld, Eric Schricker and Thomas Soddemann
Johannes Gutenberg-Universität Mainz and Fraunhofer SCAI, Germany

 

COSH Workshop in Prague (Organized by FAST)

The 1st COSH Workshop on Co-Scheduling of HPC Applications will be hosted in Prague on January 19, 2016 by Carsten Trinitis and Josef Weidendorfer from the Technische Universität München. The workshop is co-located with HiPEAC 2016.

 

Overview

The task of a high performance computing system is to carry out its calculations (mainly scientific applications) with maximum performance and energy efficiency. Up until now, this goal could only be achieved by exclusively assigning an appropriate number of cores/nodes to parallel applications. As a consequence, applications had to be highly optimised in order to achieve even only a fraction of a supercomputer's peak performance which required huge efforts on the programmer side.

This problem is expected to become more serious on future Exa-Scale systems with millions of compute cores. Many of today's highly scalable applications will not be able to utilise an Exa-Scale system's extreme parallelism due to node specific limitations like e.g. I/O bandwidth. Therefore, to be able to efficiently use future supercomputers, it will be necessary to simultaneously run more than one application on a node.

To be able to efficiently perform co-scheduling, applications must not slow down each other, i.e. candidates for co-scheduling could e.g. be a memory-bound and a compute bound application. Within this context, it might also be necessary to dynamically migrate applications between nodes if e.g. a new application is scheduled to the system. In order to be able to monitor performance and energy efficiency during operation, additional sensors are required. These need to be correlated to running applications to deliver values for key performance indicators.

 

Main topics

Exascale architectures, supercomputers, scheduling, performance sensors, energy efficiency, task migration

 

Important Dates

Paper submission deadline: November 22, 2015
Notification: December 5, 2014
Camera ready: December 15, 2015
Workshop: January 19, 2016