Because of lightweight virtualization techniques virtual
machines become more portable, more efficient and easy to manage. Docker is an
open source lightweight virtual container engine that enables developers to
package their applications and dependencies into a portable container and then
publish to any popular Linux machine for virtualization. Containers execute in
the user space on top of the OS kernel. Docker restricts containers to run only
one process at a time. Although the Docker container is flexible, lightweight
and easy to use, the Docker engine (the runtime system) lacks a very common
function for the conventional virtual machine manager (entities that run the
regular virtual machine): live migration of the container. Live migration
allows moving containers between Docker Engine’s, without having to shut down
the container as well as without having any other user or software accessing
this container noticing the migration.
In this paper, we provide a solution for a live migration
mechanism for containers running cluster computing frameworks for large-scale
data analytics in Docker, using a checkpoint and restore strategy that stores
checkpoints at specific intervals and uses this pre-saved checkpoint to resume
this containers to their previous state and allow the live migration of these
containers into other containers.
Live migration, CRIU, Docker container, Checkpointing
In industrial cloud platforms Docker1 has gained increasing
popularity as a container engine in recent years. Based on OS-level
virtualization Docker serves as a composing engine for Linux containers, where
an application runs in an isolated environment. Live migration of Docker
container has been a topic of interest because of many reasons such as:
without downtime: To automate live migration from one container to another
container during maintenance and replacement of Hardware.
balancing: By implementing triggers or scheduling algorithms we can automate
the migration of container to rebalance load on Docker containers.
Availability: Try to achieve high availability in data centers and cloud
platforms using live migration
Thus, in this paper we proposed a solution of Docker live
migration using checkpoint and restore. CRIU 2 is a software tool to allow checkpoint/restore
processes for Linux. We can save this status of running applications by using
this tool so that it can later resume its execution from the time of the
Figure1: Docker container live migration using CRIU with Big
data system MapReduce and Spark running inside it.
shows use case scenario of Docker container live migration using CRIU. Using checkpoint/restore
approach we can achieve high availability solution which allows us to
the state of a running container and restore it later on the same or a
With the help of implementing this checkpoint/restore feature to docker
containers, Big data systems could be deployed more convenient and high availability
in the cloud infrastructure based on container. Moreover, we are running a Big
data system MapReduce and Spark with WordCount function on real-world datasets.
MapReduce7 usually divides the input data set into separate blocks
that are handled in a parallel way by the map task. The input and output jobs
are stored in a file system. The framework is responsible for scheduling,
monitoring, and re-performing failed tasks. Apache Spark8 is open-source
cluster computing network, which provides an interface for fault tolerance and
The solutions which we are providing in this paper are
mainly: 1) Live Migration of Docker containers using check-point and restore.
2) Compare different applications Spark Streaming, MapReduce, Storm performance
losses due to live migration.
Live migration of containers has
attracted attention in recent years. We will first discuss about different
migration strategies for containers.
Live migration using
Pre-copy approach3 is very common solution for virtual machines and also
default approach for Xen-virtual systems. This approach will continuously copy
all memory pages from source to destination and also repeatedly update all
dirty pages on the target machine. But, limitation of this approach is that it
will lead to infinite iteration rate when dirty page production rate is higher
than transfer rate.
approach4 dirty pages are transferred when it required by destination.
However, Post-copy approach is improved version of pre-copy approach, it has
several limitations such as unreliability and high downtime.
Check-point and restore
approach5 for migration provides many benefits including fault recovery by rolling back applications
to a previous checkpoint, better response time by restarting applications from
checkpoints instead of from scratch, and better system utilization by
suspending jobs on demand. CRIU2 is proving support for live migration
functionality in container technology such as Docker, OpenVZ and LXC. In SpotOn
6 they also achieved lowest expected cost for batch jobs by implementing
fault-tolerance using checkpoint and restore technology.
Since, there is no
official migration tool available for Docker, we are using external tool CRIU
in our work. However in paper7 authors mentioned some drawbacks of using CRIU
for Docker live migration:1) Corruption of layered file system inside container
after restoration on destination.2)It also reduces efficiency and robustness of
migration. In our work we will try to evaluate this claim and also try to
3C. Clark, K. Fraser, S. Hand, J.G. Hansen, E. Jul, C.
Limpach, I. Pratt and A. Warfield, Live
4migration of virtual machines, USENIX Symposium on
Networked Systems Design &
Implementation, 2 (2005) 273-286.
5 M.R. Hines, U. Deshpande and K. Gopalan, Post-copy live
migration of virtual machines, ACM
Sigops Operating Systems Review, 43 (2009) 14-26.
Checkpoint and Restoration of Micro-service in Docker
6 Supreeth Subramanya, Tian Guo, Prateek Sharma, David
Prashant Shenoy. 2015. SpotOn: a batch computing service for
the spot market.
In Proceedings of the Sixth ACM Symposium on Cloud Computing
(SoCC ’15). ACM,
New York, NY, USA, 329-341.