Michael Crosby


Advanced Docker Volumes

I have noticed that many people have a hard time understanding what volumes are and how to use them effectively.

So what is a volume? A volume can be a directory that is located outside of the root filesystem of your container. This allows you to import this directory in other containers. You can also use volumes to mount directories from your host machine inside a container.

# create a new volume in /www
docker run -v /www ubuntu echo yo

# create a new volume mounted to your host machine
docker run -v /host/logs:/container/logs ubuntu echo momma

So pretty simple, right? So what actually happens under the hood when you specify a volume?

Under the hood

So with the command docker run -v /www ubuntu echo yo a container is created in /var/lib/docker/containers with the rootfs setup for COW from the ubuntu image. Then a new volume is created in /var/lib/docker/volumes for the data that is placed in /www for the container. We can run docker inspect to see the volume configuration.

... stufff
    "HostnamePath": "/var/lib/docker/containers/831d4ecbdf2e096475365b019b211f74aff7a3cea3c443570e1fd2f2bb0dc843/hostname",
    "HostsPath": "/var/lib/docker/containers/831d4ecbdf2e096475365b019b211f74aff7a3cea3c443570e1fd2f2bb0dc843/hosts",
    "Volumes": {
        "/www": "/var/lib/docker/volumes/ec3c543bc92f114c2c568733541e89381881e5a62996d7084e07793f86280535/layer"
    },
    "VolumesRW": {
        "/www": true
    }
}

By inspecting the container we can see that we have a new volume and the full path to the volume's data on disk. You will also notice the field VolumesRW denoting that this volume is Read/Write.

This simple from the user's perspective. You just specify a -v with the directory that you want to use as a volume and your done. Now what's up with mounting directories from the host system?

Host mounts

There are a few use cases where you want to mount a directory from your host system into the container. Just remember that this is not portable. You may have a specific directory structure on your development machine that is not on your production servers causing issues when you try to move the container. Many people also feel like they need to mount the data directory of their database container to the host machine so that it can be backed up. However, you can see all the files for your "normal" volumes are stored in /var/lib/docker/volumes on the host filesystem. Docker containers are not like VM's where your data is not locked into some disk image. So usually mounting to the host is not needed.

Volumes From

Ok, so now we have a container running with a volume specified. What can we do with it? For this example lets say that we have container A that is a web frontend and it writes logs to /logs/webapp. When we ran the frontend container we specified -v /logs/webapp to create a volume. Next we need some type of process to collect the logs and push them into a central repository. We can have container B start and reference the volume from A to gain access to it's logs.

docker run -volumes-from A log-daemon collect

We use the -volumes-from flag and pass the container's ID to gain access to it's volumes. Now our daemon can cd into /logs/webapp and start processing the logs. Easy, simple, and boring. Now lets do something cool.

The cool stuff

So how can we push volumes to the limit? I currently have docker running a minecraft server. I have a minecraft server instance running and the directory with all the world data is located in /minecraft. There is another cool project called Overviewer that will generate a "google map" like, uhh, map of your world as an html site. You give overviewer.py the path to your world files and an output directory to write the result to.

So this stack has three components. The actual game server, a worker that takes data from the server and generates html, and a web server to serve the static file output of the map. What docker and volumes allows us to do is to separate our concerns so that the web server does not have to be in the same container as the game server and the worker that generates the map does not have to run in the same container as the web or game servers. Just like functions and classes in your code, containers should do one thing and do that well.

WebServer (web frontend)

So lets look at how the web server is built and configured. It is a static file server that just serves files out of a directory to a specified port and ip. If you want to see the web server code look here.

FROM busybox

ADD server server

VOLUME /www
USER daemon
EXPOSE 8000

ENTRYPOINT ["/server", "-h", "0.0.0.0", "-p", "8000", "-dir", "/www"]

The server knows nothing about the stack that it is running in or what it is serving. it just serves files located in /www ( which is a volume ) and nothing more. Simple and clean.

Map Generator (worker)

Now lets look at our worker.

FROM ubuntu

RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get upgrade -y

RUN echo "deb http://overviewer.org/debian ./" >> /etc/apt/sources.list
RUN apt-get update
RUN apt-get install -y --force-yes minecraft-overviewer wget

ENV VERSION 1.6.2

RUN wget --no-check-certificate https://s3.amazonaws.com/Minecraft.Download/versions/${VERSION}/${VERSION}.jar -P ~/.minecraft/versions/${VERSION}/

ENTRYPOINT ["overviewer.py"]
CMD ["/minecraft/world", "/www"]

The worker has a few more dependencies but this container does it's one thing. Get world files from /minecraft/world, create the map, and write the output to /www. This worker knows nothing about where or how it gets the files to process, it just processes world files into maps from it's local filesystem.

Minecraft server

Finally where the fun happens.

FROM ubuntu

RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get upgrade -y

RUN apt-get install -y python-software-properties
RUN add-apt-repository ppa:webupd8team/java -y

RUN apt-get update
RUN echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections

RUN apt-get install -y oracle-java7-installer
RUN apt-get install -y wget

RUN mkdir /minecraft
RUN wget -O /minecraft/minecraft.jar https://s3.amazonaws.com/Minecraft.Download/versions/1.6.2/minecraft_server.1.6.2.jar
RUN chmod +x /minecraft/minecraft.jar

VOLUME /minecraft
WORKDIR /minecraft
EXPOSE 25565

CMD java -Xmx1600M -Xms768M -jar minecraft.jar nogui

We run the minecraft.jar and use /minecraft as our current working dir.


Now we have our containers created and they have clear boundaries. They are only concerned about doing their one job and have no frame of reference where they are located in the stack.

Runtime

So now how do we glue this stack together with volumes? We can start by running the minecraft and web servers.

MINECRAFT=$(docker run -d minecraft)
MAPSERVER=$(docker run -d mapserver)

Now that we have our servers running we use the worker to bridge both of them together with volumes. Because we exposed /minecraft with the world data from the MINECRAFT container and /www where static files are served from the MAPSERVER container we need to tell the worker/generator to use the -volumes-from both containers.

docker run -volumes-from $MINECRAFT -volumes-from $MAPSERVER mapgenerator

Cloud

Just like magic the mapgenerator will pull the data out of /minecraft/world and write the output into /www, and once it is done we can remove the generator container because it contains zero data that we need. It is an ephemeral processes that takes input and produces an output with no state in the middle.

If you want the map to be generated every hour ( like I do ) you can just run the same command in a cron job.

@hourly docker run -volumes-from $MINECRAFT -volumes-from $MAPSERVER mapgenerator

Using volumes with this stack allows me to not worry about the generator process locking up the main game server, causing crashes, leaking memory and leaving artifacts on disk. The game server runs and serves up super-fun-times. The web server just serves files from one directory and does not need to be bothered. And the generator just takes input and produces output.

Docker does all the work while each container interacts with it's own filesystem and does not know the difference. We don't have to mess with or restart the web or game servers. It's almost like dependency injection but with an entire filesystem which is really cool.

Warning: this post has features of docker that are not in a current release. Look for these features in 0.6.2

comments powered by Disqus