A dockerfile is a simple text file which does not have an extension which contains a series of simple instructions, which define how to build a docker image, what base image to use, what files to copy, what commands to run and how the containers should start. Writing a Dockerfile is like scripting the blueprint for a containerized application which tells Docker exactly how to build an image step by step.
INSTRUCTION | PURPOSE OF THE INSTRUCTION |
FROM | sets the basic image |
WORKDI | Sets the working directory inside the container |
COPY | Copies files from host to container |
RUN | Executes commands during image build |
ENV | Sets environment variables |
EXPOSE | Documents the port the container listens on |
USER | Sets the user to run the container |
CMD | Sets the default command to run |
TIPS TO REMEMBER
- Minimize layers by combining commands:
RUN apt-get update && apt-get install -y curl
- Pin versions of dependencies to ensure consistent builds.
- Use
.dockerignore
to exclude unnecessary files (like.git
,__pycache__
, etc.) - Use non-root users for better security (
USER
instruction). - Keep images small by using slim base images (e.g.,
python:3.12-slim
).
ARG AND ENV
ARG refers to argument. In a Dockerfile, ARG
defines a build-time variable—a value available only during the image build process. It allows you to customize image builds by passing parameters using the --build-arg
flag. Unlike ENV
, ARG
values do not persist in the final image. You can use ARG
to control things like base image versions or conditional logic during builds. It’s especially useful for creating flexible and reusable Dockerfiles.
ENV stands for Environment Variable in Docker. The ENV
instruction in a Dockerfile sets environment variables that persist in the built image and are available at runtime. These variables can be used by applications running inside the container. You define them like ENV PORT=8080
, and they’re accessible via $PORT
in scripts or app configs. Unlike ARG
, ENV
values remain in the final image. They can also be overridden when running a container using -e
. This makes ENV
ideal for configuration settings like ports, API keys, or environment modes.
DIFFERENCE BETWEEN ARG & ENV
FEATURES | ARG (ARGUMENT) | ENV (ENVIRONMENT VARIABLE) |
SCOPE | build time only | runtime and build-time |
PERSESTANCE | is not saved in the final image | saved in the image and available to containers |
DEFAULT VALUE | ARG VERSION=1.0 | ENV MODE=production |
OVERRIDE METHOD | --build-arg during docker build | -e flag during docker run |
USE CASE | Customize builds (e.g., base image version) | Configure app behavior (e.g., ports, modes) |
SECURITY | Safer for secrets (not persisted) | Less secure—values remain in image layers |
MULTI-STAGE BUILDS
Multi-stage builds in Docker let you use multiple FROM
statements in a single Dockerfile to create separate build stages. This allows you to compile or build your app in one stage and copy only the final output into a clean, minimal image. It helps reduce image size, improve security, and keep Dockerfiles organized. You can name stages and selectively copy artifacts using COPY --from=
. It’s ideal for production-ready containers without unnecessary build tools or files.
REASONS TO USE MULTI-STAGE BUILDS-
- Better Security– Fewer packages mean a smaller attack surface.
- Cleaner Dockerfiles– No need for external scripts or complex cleanup commands.
- smaller images- only the essentials go into the final image, no compilers, build tools or temp files.
- Improved Caching– Each stage can be cached independently, speeding up rebuilds.
ADVANTAGES OF USING MULTI-STAGE BUILDS-
- Improved Security: By excluding build tools and dependencies from the final image, you reduce the attack surface.
- Cleaner Dockerfiles: You can separate build logic from runtime logic, making the Dockerfile easier to read and maintain.
- Smaller Image Size: Only the necessary artifacts are copied into the final image, reducing bloat and improving performance.
- Better Caching: Each stage can be cached independently, speeding up rebuilds when only part of the Dockerfile changes.
- No Need for External Scripts: You can handle complex build workflows entirely within a single Dockerfile.
DISADVANTAGES OF USING MULTI-STAGE BUILDS-
- Longer Initial Build Time: The first build may take longer due to multiple stages and dependencies being installed.
- Debugging Challenges: Troubleshooting issues across stages can be harder, especially if intermediate stages are not preserved.
- Limited Visibility: Intermediate stages are discarded unless explicitly targeted, which can make it harder to inspect build artifacts.
- Increased Complexity: Managing multiple stages and copying artifacts between them can be confusing for beginners.
- Compatibility Issues: Some older Docker versions or third-party tools may not fully support multi-stage builds.
DOCKER IMAGE LAYERS
Docker images are built in layers, where each Dockerfile instruction (like FROM
, RUN
, COPY
) creates a new read-only layer. These layers stack on top of each other to form the final image. They are immutable and reusable across different images. Docker uses a union filesystem to present them as a single coherent view. This layered approach improves modularity, efficiency, and version control.
FEATURES OF DOCKER IMAGE LAYERS-
- Layer per Instruction: Every Dockerfile instruction (
FROM
,RUN
,COPY
, etc.) creates a new layer. - Immutable Structure: Each layer is read-only and cannot be changed once created, ensuring consistency across builds.
- Layer Reuse: Common base layers (like OS or language runtimes) can be reused across multiple images, saving space.
- Union Filesystem: Layers are stacked using a union filesystem, presenting a single coherent view to the container.
- Efficient Distribution: Only changed layers are downloaded when pulling updated images, reducing bandwidth usage.
DOCKER CACHING
Docker caching stores previously built image layers to speed up future builds. If a Dockerfile instruction and its context haven’t changed, Docker reuses the cached layer. Once a layer changes, all subsequent layers are rebuilt. This makes build times faster and more efficient. Proper layer ordering in Dockerfiles helps maximize cache reuse.
FEATURES OF DOCKER CACHING-
- Build Acceleration: Speeds up image builds by skipping unchanged layers, especially useful in iterative development.
- Layer Invalidation Logic: If one layer changes, all subsequent layers are rebuilt—ordering matters!
- Context Sensitivity: Cache is sensitive to changes in files, environment variables, and build arguments.
- Instruction-Level Caching: Docker caches each instruction’s result, reusing it if the command and context haven’t changed.
- Custom Cache Control: Advanced features like
--no-cache
,--build-arg
, and BuildKit cache mounts give fine-grained control.