Re-usage of build cache is one of the most important things in Docker images creating.
To efficiently dockerize an app you need to split source code copying and dependencies installation in a few steps:
- Copy dependencies files.
- Install dependencies.
- Copy source code.
For a node.js application these steps look like:
COPY package.json yarn.lock ./ RUN yarn install COPY . .
However, this solution does not work with yarn workspaced application because the root
yarn.lock are not enough to install whole project dependencies.
When I faced this task the first time I thought: what if I find all nested
package.json files and copy them to a
COPY src/**/package.json src/
src/**/package.json pattern matches all
package.json’s that I need. But
COPY works as not I expected. And instead of the expected directories structure I’ve got a single file under the
Expand to see trees examples
# The original project's tree app ├── package.json ├── src │ ├── backend │ │ ├── backend.js │ │ └── package.json │ ├── notifier │ │ ├── notifier.js │ │ └── package.json │ └── scraper │ ├── package.json │ └── scraper.js └── yarn.lock # The expected tree app ├── package.json ├── src │ ├── backend │ │ └── package.json │ ├── notifier │ │ └── package.json │ └── scraper │ └── package.json └── yarn.lock # The result tree app ├── package.json ├── src │ └── package.json └── yarn.lock
For a second I thought I could replace the single pattern line with a
COPY operation for every workspace. But I wanted to have a scalable solution, a solution without duplication.
I’ve googled some alternative solutions. Commonly they suggest wrapping
docker build with a script that creates a
tmp folder, build the expected
package.json’s tree there and
COPY the folder in the image.
And the “shell solution” is much better than the previous “copy-paste” solution. But it did not make me feel pleased.
Multi-stage builds solution
At some point, I thought of multi-stage builds. I used it in another project to build a tiny production image. “What if I will prepare the tree on a first stage and copy it on a second stage?”
In addition to the root
yarn.lock files I copied the
src directory and removed all not
COPY package.json yarn.lock ./ COPY src src # Remove not "package.json" files RUN find src \! -name "package.json" \ -mindepth 2 \ -maxdepth 2 \ -print \ | xargs rm -rf
On a second stage I copied the tree and installed dependencies:
COPY --from=0 /app . RUN yarn install --frozen-lockfile --production=true
Under the hood
yarn workspaces use symlinks. So it’s important to create them after copying
COPY . . # Restore workspaces symlinks RUN yarn install --frozen-lockfile --production=true
The final solution Dockerfile
FROM node:14.15.0-alpine3.10 WORKDIR /app COPY package.json yarn.lock ./ COPY src src # Remove not "package.json" files RUN find src \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf FROM node:14.15.0-alpine3.10 ENV NODE_ENV production WORKDIR /app COPY --from=0 /app . RUN yarn install --frozen-lockfile --production=true COPY . . # Restore workspaces symlinks RUN yarn install --frozen-lockfile --production=true CMD ["yarn", "start"]
Join the discussion if you have any comments or suggestions