Michael Crosby


Dockerfile Best Practices

Dockerfiles provide a simple syntax for building images. The following are a few tips and tricks to help you get the most out of Dockerfiles.

1: Use the cache

Each instruction in a Dockerfile commits the change into a new image which will then be used as the base of the next instruction. If an image exists with the same parent and instruction ( except for ADD ) docker will use the image instead of executing the instruction, i.e. the cache.

In order to effectively utilize the cache you need to keep your Dockerfiles consistent and only add the alterations at the end. All my Dockerfiles start with the same 5 lines.

FROM ubuntu
MAINTAINER Michael Crosby <michael@crosbymichael.com>

RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get upgrade -y

Changing MAINTAINER instruction will force docker to execute the proceeding RUN instructions to update apt instead of hitting the cache.

1. Keep common instructions at the top of the Dockerfile to utilize the cache.

2: Use tags

Unless you are experimenting with docker you should always pass the -t option to docker build so that the resulting image is tagged. A simple human readable tag will help you manage what each image was created for.

docker build -t="crosbymichael/sentry" .

2. Always pass -t to tag the resulting image.

3: EXPOSE-ing ports

Two of the core concepts of docker are repeatability and portability. Images should able to run on any host and as many times as needed. With Dockerfiles you have the ability to map the private and public ports, however, you should never map the public port in a Dockerfile. By mapping to the public port on your host you will only be able to have one instance of your dockerized app running.

# private and public mapping
EXPOSE 80:8080

# private only
EXPOSE 80

If the consumer of the image cares what public port the container maps to they will pass the -p option when running the image, otherwise, docker will automatically assign a port for the container.

3. Never map the public port in a Dockerfile.

4: CMD and ENTRYPOINT syntax

Both CMD and ENTRYPOINT are straight forward but they have a hidden, err, "feature" that can cause issues if you are not aware. Two different syntaxes are supported for these instructions.

CMD /bin/echo
# or
CMD ["/bin/echo"]

This may not look like it would be an issues but the devil in the details will trip you up. If you use the second syntax where the CMD ( or ENTRYPOINT ) is an array, it acts exactly like you would expect. If you use the first syntax without the array, docker pre-pends /bin/sh -c to your command. This has always been in docker as far as I can remember.

Pre-pending /bin/sh -c can cause some unexpected issues and functionality that is not easily understood if you did not know that docker modified your CMD. Therefore, you should always use the array syntax for both instructions because both will be executed exactly how you intended.

4. Always use the array syntax when using CMD and ENTRYPOINT.

5. CMD and ENTRYPOINT better together

In case you don't know ENTRYPOINT makes your dockerized application behave like a binary. You can pass arguments to the ENTRYPOINT during docker run and not worry about it being overwritten ( unlike CMD ). ENTRYPOINT is even better when used with CMD. Let's checkout my Rethinkdb Dockerfile and see how to use this.

# Dockerfile for Rethinkdb 
# http://www.rethinkdb.com/

FROM ubuntu

MAINTAINER Michael Crosby <michael@crosbymichael.com>

RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get upgrade -y

RUN apt-get install -y python-software-properties
RUN add-apt-repository ppa:rethinkdb/ppa
RUN apt-get update
RUN apt-get install -y rethinkdb

# Rethinkdb process
EXPOSE 28015
# Rethinkdb admin console
EXPOSE 8080

# Create the /rethinkdb_data dir structure
RUN /usr/bin/rethinkdb create

ENTRYPOINT ["/usr/bin/rethinkdb"]

CMD ["--help"]

This is everything that is required to get Rethinkdb dockerized. We have my standard 5 lines at the top to make sure the base image is updated, ports exposed, etc... With the ENTRYPOINT set, we know that whenever this image is run, all arguments passed during docker run will be arguments to the ENTRYPOINT ( /usr/bin/rethinkdb ).

I also have a default CMD set in the Dockerfile to --help. What this does is incase no arguments are passed during docker run, rethinkdb's default help output will display to the user. This is same functionality that you would expect interacting with the rethinkdb binary.

docker run crosbymichael/rethinkdb

Output

Running 'rethinkdb' will create a new data directory or use an existing one,
  and serve as a RethinkDB cluster node.
File path options:
  -d [ --directory ] path           specify directory to store data and metadata
  --io-threads n                    how many simultaneous I/O operations can happen
                                    at the same time

Machine name options:
  -n [ --machine-name ] arg         the name for this machine (as will appear in
                                    the metadata).  If not specified, it will be
                                    randomly chosen from a short list of names.

Network options:
  --bind {all | addr}               add the address of a local interface to listen
                                    on when accepting connections; loopback
                                    addresses are enabled by default
  --cluster-port port               port for receiving connections from other nodes
  --driver-port port                port for rethinkdb protocol client drivers
  -o [ --port-offset ] offset       all ports used locally will have this value
                                    added
  -j [ --join ] host:port           host and port of a rethinkdb node to connect to
  .................

Now lets run the container with the --bind all argument.

docker run crosbymichael/rethinkdb --bind all

Output

info: Running rethinkdb 1.7.1-0ubuntu1~precise (GCC 4.6.3)...
info: Running on Linux 3.2.0-45-virtual x86_64
info: Loading data from directory /rethinkdb_data
warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems.
warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/auth_metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems.
info: Listening for intracluster connections on port 29015
info: Listening for client driver connections on port 28015
info: Listening for administrative HTTP connections on port 8080
info: Listening on addresses: 127.0.0.1, 172.16.42.13
info: Server ready
info: Someone asked for the nonwhitelisted file /js/handlebars.runtime-1.0.0.beta.6.js, if this should be accessible add it to the whitelist.

And there it is, a full Rethinkdb instance running with access to the db and admin console by, interacting with the image the same way you interact with the binary. Very powerful and yet extremely simple. I love simple.

5. ENTRYPOINT and CMD are better together.


I hope this post helps you to get started working with Dockerfiles and building images that we all can use and benefit from. Going forward, I believe that Dockerfiles will be a very important part of what makes docker so simple and easy to use whether you are consuming or producing images. I plan to invest much of my time to provide a complete, powerful, yet simple solution to building docker images via the Dockerfile.

comments powered by Disqus