Speeding up Bundler in dockerized environments

While Docker makes a lot of things easier, it also introduced some new challenges. Increased wait times and complexity when it comes to installing gems is one of those challenges. In this article we will learn how to decrease the build time by putting a cache in place.

While this article focuses on Ruby and Bundler, the techniques used are universal and are also applicable to Node modules, Python eggs, etc.

Where we start

Let’s say this is our Dockerfile:

FROM ruby:2.6.4
RUN gem install bundler:2.0.2
WORKDIR /usr/src/app
COPY Gemfile Gemfile.lock ./
RUN bundle install
COPY . .

And our docker-compose.yml:

version: "3.7"

services:
  app:
    image: jfahrer/myapp:development
    build:
      context: .
    volumes:
      - ./:/usr/src/app:cached
    environment:
      - POSTGRES_HOST=db

  db:
    image: "postgres:10.6-alpine"

We copy the Gemfile and Gemfile.lock separately before we copy the rest of the source code. This is great because thanks to the Docker build cache we only install gems when the Gemfile or Gemfile.lock have changed.

That being said, whenever we add, remove or update our gems, these files will change and we have to trigger a rebuild with docker-compose build. When the build reaches the RUN bundle install instruction, we have to wait for all gems to be reinstalled. Docker does not (and should not) care about how Ruby and Bundler work. It is just executing a shell command (bundle install) in an environment that doesn’t contain any of the gems we want to install. So depending which and how many gems we have in our Gemfile, reinstalling all of them can take quite a while.

We don’t face this problems in non-dockerized development environments. Once a gem is installed, it is available to the Ruby processes on your local machine.

The next to sections show you how I work around this issue. But before we get there, let’s look at the current workflow for adding, updating, and removing gems:

Update the Gemfile

Generate the Gemfile.lock using bundle lock

docker-compose run --rm app bundle lock

Rebuild the image
```
docker-compose build app
```
Restart the services
```
docker-compose up -d
```

Utilizing a cache image

Thanks to Docker’s ability to copy files from other images, we can build a cache image that contains our gems and copy them over:

FROM ruby:2.6.4 AS base
RUN gem install bundler:2.0.2
WORKDIR /usr/src/app
# Copy the gems from a dedicated cache image
COPY --from jfahrer:myapp:gem-cache /usr/local/bundle /usr/local/bundle
COPY Gemfile Gemfile.lock ./
RUN bundle install
COPY . .

The interesting part is this line:

COPY --from=jfahrer:myapp:gem-cache /usr/local/bundle /usr/local/bundle

It copies the /usr/local/bundle bundle directory from the jfahrer/myapp:gem-cache image to our new image.

This can be a little tricky because we can’t just execute docker-compose build anymore unless our jfahrer/myapp:gem-cache image exists. It also breaks the docker-compose build --pull command if the cache image can’t be pulled from Docker Hub.

We can of course create and push the cache image to Docker Hub. However, I generally don’t like this idea too much. The cache image might require frequent updates and a process to do so. Different team members might have different requirements or want to build their own cache. I rather give the developers control over the cache on their local system.

So instead of creating and pushing the image, let’s combine multi-stage builds and build arguments to dynamically create and use a cache as needed:

ARG BASE_IMAGE=ruby:2.6.4
ARG CACHE_IMAGE=${BASE_IMAGE}

# Create a build stage for the gem cache
# Ensure that the /usr/local/bundle exists in case we use an empty image as cache
FROM ${CACHE_IMAGE} AS gem-cache
RUN mkdir -p /usr/local/bundle

# Create an intermediate image that has bundler installed
FROM $BASE_IMAGE AS base
RUN gem install bundler:2.0.2
WORKDIR /usr/src/app

# Copy the gems from a the gem-cache build stage, install missing gems and clean up
FROM base AS gems
COPY --from=gem-cache /usr/local/bundle /usr/local/bundle
COPY Gemfile Gemfile.lock ./
RUN bundle install && bundle clean

# Copy the gems from the gems build stage and get the source code in place
FROM base AS final
COPY --from=gems /usr/local/bundle /usr/local/bundle
COPY . .

As you can see, our Dockerfile has four build stages and defines two build arguments: BASE_IMAGE and CACHE_IMAGE. We set BASE_IMAGE to the actual base image we want to use. CACHE_IMAGE will fall back to BASE_IMAGE in case it is not set.

Our first build stage, called gem-cache, is created from the CACHE_IMAGE and ensures that the /usr/local/bundle exists. We can dynamically control which image will be used as a cache by setting the CACHE_IMAGE build argument.

The next build stage base acts as an intermediate build stage that we use in the following stages. It simply installs bundler and sets the working directory.

The build stage gems is where the magic happens. In here we copy the contents of the /usr/local/bundle from the gem-cache stage, so our CACHE_IMAGE. If CACHE_IMAGE is set, the content of this directory will be empty and we won’t copy any gems. If it is set to an image that contains our gems however, we copy those over and don’t have to reinstall them. Once the gems are copied, we copy the Gemfile and Gemfile.lock and execute bundle install && bundle clean. This way we install any potentially missing gems that were not copied from the gem-cache. We also remove everything that was copied but is not part of our Gemfile.lock.

In our last build stage final, we copy the final state of the /usr/local/bundle directory from the gems stage and our source code.

The whole shenanigans of creating additional an additional build stage for running bundle install && bundle clean is important to make sure our final image doesn’t contain any layers with gems we don’t use anymore.

We can now build the image with

docker-compose build app

Docker will use our BASE_IMAGE as the cache and this will just work.

Once we built our image, we can tag it as our cache with

docker image tag jfahrer/myapp:development jfahrer/myapp:gem-cache

And there we go! In order to utilize the cache, we can pass in the build argument:

docker-compose build --build-arg CACHE_IMAGE=jfahrer/myapp:gem-cache app

If our list of gems changes, we don’t have to wait for all gems to be reinstalled. Just the ones that are missing in our CACHE_IMAGE.

To be fair, copying hundreds of gems between build stages also takes a little bit of time. But as long as our CACHE_IMAGE doesn’t change, Docker can utilize its build cache and won’t execute the COPY instruction at all! If the CACHE_IMAGE changes, Docker will have to copy the gems which is still a lot faster than installing all gems from scratch.

Tip: You can set the CACHE_IMAGE build arg in your docker-compose.yml and utilize environment substitution to set the cache image.

Our new workflow

If we now add, update or remove gems, we have to

Update the Gemfile

Generate the Gemfile.lock using bundle lock

docker-compose run --rm app bundle lock

Rebuild the image
```
docker-compose build app
```

Optionally: Tag the newly build image as our cache image:

docker image tag jfahrer/myapp:development jfahrer/myapp:gem-cache

Restart the services
```
docker-compose up -d
```

I usually script the initial and successive builds, cache generation, and updating gems. Check out RailsWithDocker.com if you want to learn how.

Conclusion

Utilizing an image to cache gems can save quite some time in development. In addition to that, this technique can be utilized in CI/CD-pipelines as well to reduce build times.

However, there are still some drawbacks. Every time we change our Gemfile, we have to create the Gemfile.lock by running bundle lock before we can re-build the image. This is somewhat unfortunate because it differs from a non-dockerized development workflow and hence creates additional friction and mental overhead. We also have to rebuild the image every time we make changes to our gems. This means we have to rebuild the image every time we switch between branches that use a different set of gems! In the next article we will utilize a Docker Volume to create a development workflow that does not suffer from those drawbacks! Stay tuned.

Interested in more tips and tricks around dockerizing your Ruby and Rails applications? Check out RailsWithDocker.com for a complete guide.