Docker GPU Passthrough for ML

Reproducibility is the quiet reason so many machine learning projects stall. A model that trained cleanly last month fails today because a driver changed, a library version drifted, or the new cloud GPU instance ships a different CUDA build. Docker with GPU passthrough fixes most of this by packaging your code, libraries, and runtime into an image that behaves the same everywhere, while still reaching the host GPU. This tutorial walks through setting that up correctly.

How GPU passthrough works in Docker

A normal container is isolated from host hardware. To use a GPU, the container needs the GPU device files and the matching user space libraries exposed to it. The NVIDIA Container Toolkit handles this. It injects the GPU devices, the driver libraries, and the necessary configuration into the container at runtime, so a process inside the container can call the GPU as if it were running on the host.

One subtlety matters above all others: the GPU driver lives on the host, not in the container. The container ships the CUDA toolkit and libraries, but it relies on the host driver. This split is what makes your images portable across instances, as long as the host driver is new enough for the CUDA version inside your image.

Prepare the host instance

Start with a clean cloud GPU instance. Many providers offer images with the GPU driver and Docker preinstalled, which saves time. If you are starting from a bare image, you need three things on the host.

The NVIDIA GPU driver, installed and confirmed with the driver query tool.
Docker Engine, installed and running.
The NVIDIA Container Toolkit, which bridges Docker and the GPU.

After installing the toolkit, configure the Docker runtime to use it and restart the Docker daemon. Then run a quick test container that prints the GPU status. If it lists your GPU, passthrough works and you can move on.

Run your first GPU container

The key flag is the one that requests GPUs for the container. You can ask for all GPUs or a specific count. Inside the container, the same driver query tool should now see the hardware. Start from an official CUDA base image that matches the toolkit version your framework expects, then layer your dependencies on top.

Request all GPUs when you want the container to see every device on the host.
Request a specific number when you want to pin a job to part of the machine.
Expose specific device indexes when you share one instance across several jobs.

Build a reproducible image

Passthrough is only half the story. The other half is an image that pins everything. A reproducible Dockerfile follows a few rules.

Start from a fixed base image tag, not a moving latest tag.
Pin every library to an exact version, including the deep learning framework and CUDA dependent packages.
Use a lockfile so dependency resolution is deterministic across builds.
Record the build so anyone can rebuild the identical image later.

The version that catches people is the framework and CUDA pairing. A deep learning framework build is compiled against a specific CUDA version, which in turn needs a host driver at or above a minimum. Document that minimum driver version alongside your image so whoever provisions the instance knows what to install.

Match versions across the stack

Think of the stack as three layers that must agree.

Layer	Lives where	Pin it to
GPU driver	Host instance	A version at or above the framework minimum
CUDA toolkit and libraries	Container image	The version your framework was built against
ML framework and packages	Container image	Exact versions via a lockfile

When all three agree, your image runs identically on a laptop with a small GPU and on a rented data center GPU. When they drift, you get cryptic errors about missing symbols or unsupported architectures. Keeping this table in your project README saves hours of debugging on a fresh instance.

Operational tips for cloud GPUs

A few habits make container based ML smoother on rented hardware. Mount your dataset and output directories as volumes so results survive container restarts and you do not bake large data into images. Push your built image to a registry so any new instance pulls the exact same artifact rather than rebuilding. Keep images lean by using multi stage builds, since a smaller image pulls faster onto a freshly provisioned GPU node and gets you training sooner.

Share GPUs and limit resources

One instance often hosts several jobs, and Docker gives you control over how they share the hardware. You can pin a container to specific GPU indexes so two jobs never collide on the same device, which is the cleanest way to split a multi GPU node. For finer control, some setups allow memory and compute limits per container, though full isolation usually means dedicating whole GPUs to whole containers.

Be deliberate about it. Two memory hungry jobs sharing one GPU will fight over memory and may crash with out of memory errors that look mysterious until you realize they were never isolated. When in doubt, give each serious training job its own GPU and use index pinning to enforce it.

Debug common passthrough failures

When a GPU container fails, the cause is usually one of a small set of issues. Knowing them saves time on a fresh instance.

The container cannot see any GPU: the toolkit is not installed or the runtime flag was omitted.
Errors about missing symbols or unsupported architecture: the CUDA version in the image does not match the host driver.
The GPU appears but the framework ignores it: a framework build without GPU support, or a version mismatch.
Permission errors on device files: the toolkit configuration did not apply, often fixed by restarting the Docker daemon.

Work the diagnosis from the bottom up. First confirm the host sees the GPU, then confirm a plain CUDA container sees it, and only then test your full application image. Isolating which layer fails turns a vague crash into a specific, fixable problem.

Conclusion

Docker GPU passthrough turns a flaky, machine specific ML setup into a portable artifact. Install the NVIDIA Container Toolkit on the host, request GPUs at run time, and build images that pin the framework, CUDA, and driver expectations explicitly. The payoff is that the same image trains the same way on your laptop, a colleague's box, and whatever cloud GPU you rent this week, which is exactly the reproducibility serious ML work depends on.

Set Up Docker With GPU Passthrough for Reproducible ML Environments