Dockerfiles

Earlier, we got started with a base image that let us run RStudio from within Docker, and learned to modify the contents of that image using docker commit. This is an excellent technique for capturing what we’ve done so we can reproduce it later, but what if we want to be able to easily change the collection of things in our image, and have a clear record of just what went into it? This is useful when maintaining running environments that may change and evolve over a project, and is facilitated by Dockerfiles.

Dockerfiles are a set of instructions on how to add things to a base image. They build custom images up in a series of layers. In a new file called Dockerfile, put the following:

FROM rocker/verse:latest

This tells Docker to start with the rocker/verse base image - that’s what we’ve been using so far. The FROM command must always be the first thing in your Dockerfile; this is the bottom crust of the pie we are baking.

Next, let’s add another layer on top of our base, in order to have gapminder pre-installed and ready to go:

RUN R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')"

RUN commands in your Dockerfile execute shell commands to build up your image, like putting the filling in our pie. In this example, we install gapminder from the command line using install.packages, which does the same thing as if we had done install.packages('gapminder') from within RStudio. Save your Dockerfile, and return to your docker terminal; we can now build our image by doing:

docker build -t my-r-image .

-t my-r-image gives our image a name (note image names are always all lower case), and the . says all the resources we need to build this image are in our current directory. List your images via:

docker images

and you should see my-r-image in the list. Launch your new image similarly to how we launched the base image:

docker run --rm -p 8787:8787 my-r-image

Then in the RStudio terminal, try gapminder again:

library('gapminder')
gapminder

And there it is - gapminder is pre-installed and ready to go in your new docker image.

Our pie is almost complete! All we need to finish it is the topping. In addition to R packages like gapminder, we may also want some some static files inside our Docker image - such as data. We can do this using the ADD command in your Dockerfile:

ADD data/gapminder-FiveYearData.csv /home/rstudio/

Rebuild your Docker image:

docker build -t my-r-image .

And launch it again:

docker run --rm -p 8787:8787 my-r-image

Go back to RStudio in the browser, and there gapminder-FiveYearData.csv will be, present in the files visible to RStudio. In this way, we can capture files as part of our Docker image, so they’re always available along with the rest of our image in the exact same state.

Protip: Cached Layers

While building and rebuilding your Docker image in this tutorial, you may have noticed lines like this:

Step 2 : RUN R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')"
 ---> Using cache
 ---> fa9be67b52d1

Noting that a cached version of the commands was being used. When you rebuild an image, Docker checks the previous version(s) of that image to see if the same commands were executed previously; each of those steps is preserved as a separate layer, and Docker is smart enough to re-use those layers if they are unchanged and in the same order as previously. Therefore, once you’ve got part of your setup process figured out (particularly if it’s a slow part), leave it near the top of your Dockerfile and don’t put anything above or between those lines, particularly things that change frequently; this can substantially speed up your build process.

Summary

In this lesson, we learned how to compose a Dockerfile so that we can re-create our images at will. We learned three main commands:

FROM is always at the top of a Dockerfile, and specifies the image we want to start from.
RUN runs shell commands on top of our base image, and is used for doing things like downloads and installations.
ADD adds files from our computer to our new Docker image.

The image is built by running docker build -t my-r-image . in the same directory as our Dockerfile and any files we want to include with an ADD command.

Challenge Questions

Write a Dockerfile that makes an image like the one you made at the end of Section 3: built on rocker/verse, with gapminder and gsl installed. Also add a readme to your image describing what it contains.

Find the Dockerfile of the rocker/verse image through Docker Hub.

How does having a Dockerfile compare to a README file (like the one we prepared in lesson 3)?
Docker Hub can automatically build images when you store the file on Github or Bitbucket. What do you think are the benefits of doing this instead of commiting the image directly? Could it be negative?
Find the build details of the rocker/verse image. What do they tell us?

Go to Lesson 06 Share all your analysis or back to the
main page.