Architect for Low Data Transfer

Compute gets all the attention on a cloud bill, but data transfer is where many teams bleed money without noticing. Egress fees, cross-region replication, and cross-availability-zone chatter accumulate quietly until they rival the cost of the GPUs themselves. The root cause is almost always the same: compute and the data it needs are too far apart. This article is about data gravity, the principle that data is heavy and expensive to move, and how to architect so that work happens next to the data instead of dragging the data to the work.

Understanding Data Gravity

Data gravity is the observation that as a dataset grows, it becomes harder and costlier to move, so applications and services tend to accumulate around it. In cloud economics this is literal: moving a large dataset out of a region or out of a provider triggers transfer charges, while reading it locally is cheap or free. The practical takeaway is that the location of your data should drive the location of your compute, not the other way around.

The Three Layers of Transfer Cost

Egress to the internet. Data leaving the provider's network, typically the most expensive category and the one users feel directly.
Cross-region transfer. Data moving between datacenters, common in replication and multi-region designs.
Cross-zone transfer. Data moving between availability zones within a region, individually small but easy to rack up at scale.

Each layer is priced differently, and a system can be efficient at one while wasteful at another. A well-architected pipeline keeps as much traffic as possible in the cheapest layer.

Pattern One: Co-Locate Compute and Storage

The foundational move is to run your GPUs in the same region, and ideally the same zone, as the data they read. Training a model on a dataset in one region while the GPUs sit in another means every epoch pulls data across an expensive boundary. Place the compute where the data already lives. If the data must be in a particular region for residency reasons, that region also dictates where you rent GPUs.

Layout	Transfer profile
Compute and data in same zone	Cheapest, intra-zone traffic only
Compute and data in same region, different zone	Cross-zone fees on every read
Compute and data in different regions	Cross-region fees, often heavy
Compute reads data from another provider	Egress fees, usually the worst case

Pattern Two: Push Computation to the Data

Instead of pulling raw data to a central compute cluster, move the processing to where the data sits. Pre-filter, aggregate, or sample at the storage tier so only the small, relevant result travels. If a job needs a tenth of a dataset, selecting that tenth near the source and transferring only it beats shipping the whole thing and discarding most of it. The same logic applies to inference: serving from the region closest to your users avoids dragging requests and responses across the globe.

Pattern Three: Cache and Stage Strategically

When data must cross a boundary, do it once and reuse the result. Staging a dataset into local fast storage at the start of a multi-run experiment turns repeated cross-region reads into a single transfer plus many cheap local reads. The same applies to model weights, container images, and reference datasets. A read-through cache near the compute can collapse thousands of remote fetches into a handful.

Identify data that is read repeatedly across jobs.
Stage it once into storage co-located with compute.
Point all jobs at the local copy.
Refresh the staged copy only when the source changes.

Pattern Four: Mind the Egress on Outputs

Transfer is not only about inputs. Generated artifacts, model checkpoints, logs shipped to external tools, and responses sent to users all leave the network and can incur egress. Compress outputs, batch log shipping, and serve user-facing content through a content delivery network so repeated requests hit cached copies at the edge rather than your origin. For large model artifacts, keep them in the provider where they will be consumed rather than copying them everywhere by default.

Designing the System End to End

Put the patterns together into a simple discipline. Decide where the data must live first, driven by residency, source, and size. Place compute in that location. Process near the source so only small results move. Cache anything read more than once. Watch the egress on the way out as carefully as the ingress on the way in. The architecture that results keeps the bulk of traffic in the cheapest possible layer and reserves expensive movement for the rare cases that genuinely require it.

Content Delivery and the Edge

For user-facing traffic, the most effective transfer reduction often happens outside your own infrastructure entirely, at the edge. A content delivery network places copies of your static and cacheable content in points of presence around the world, so a user in one region is served from a nearby edge node rather than from your origin across an expensive boundary. The first request populates the edge cache, and every subsequent request for the same content is served locally, collapsing repeated origin egress into a single transfer.

The same edge thinking applies to API responses and even some model outputs. Anything that is identical across users and changes slowly is a candidate for edge caching, which both lowers your egress and speeds up the experience. Set cache headers deliberately so the edge knows what it may store and for how long, and version your assets so a content change invalidates cleanly rather than serving stale copies. For dynamic, per-user content the edge cannot help directly, but you can still terminate connections close to the user and keep the long-haul path internal to the provider's backbone, which is usually cheaper and faster than routing user traffic across the public internet to a distant origin.

Measuring and Catching Drift

Transfer waste tends to creep back in as systems evolve. A new service reads from the wrong region, a replication rule fans out more widely than needed, a debugging log starts shipping verbose data to an external endpoint. Instrument transfer cost by category and region, and review it on the same cadence as compute. When a transfer line item grows, trace it back to the boundary it crosses and ask whether the compute could move closer to the data instead.

Egress and cross-region fees are some of the least glamorous numbers on a cloud bill and some of the most rewarding to attack, because the fix is architectural and durable rather than a recurring manual chore. Respect data gravity, keep compute close to data, and move only what you must. The result is a system that is both cheaper and, as a happy side effect, usually faster, because the shortest data path is almost always the quickest one too.

Architecting for Low Data Transfer: Keep Compute Near Your Data