GPU Cloud Glossary: 40 Key Terms | DeployCue Skip to content
DeployCue
GPU Cloud

GPU Cloud Glossary: 40 Terms Every Buyer Should Know

Jun 20, 2026

A beginner-friendly glossary of 40 essential GPU cloud terms covering hardware, networking, pricing, and operations to help buyers read quotes and specs.

GPU cloud quotes and datasheets are dense with jargon. Before you can compare providers fairly, you need to read the language. This glossary defines 40 terms that show up again and again across hardware specs, networking diagrams, pricing pages, and billing dashboards. It is organized into four groups so you can skim to the part you need. Keep it open the next time you read a provider quote, and the numbers will start to make sense.

Hardware and accelerators

  • GPU: graphics processing unit, the parallel accelerator that runs AI and high-performance compute workloads.
  • HBM: high-bandwidth memory stacked on the GPU package, offering far more bandwidth than ordinary memory. Versions include HBM2e, HBM3, and HBM3e.
  • VRAM: the GPU's onboard memory, which limits the size of the model and batch you can hold.
  • Tensor core: specialized GPU unit that accelerates the matrix math behind deep learning.
  • FP32, FP16, BF16, FP8: numeric formats trading precision for speed and memory. Lower precision usually runs faster.
  • FLOPS: floating point operations per second, a raw measure of compute throughput.
  • TDP: thermal design power, the heat a chip generates, which influences density and cost.
  • SXM: a GPU board form factor that supports NVLink and high power, used in dense server nodes.
  • PCIe: the standard bus connecting GPUs to the host, slower than NVLink for GPU-to-GPU traffic.
  • ECC: error-correcting memory that detects and fixes bit errors, important for reliable training.

GPU models and families

  • H100: NVIDIA Hopper datacenter GPU widely used for training and inference.
  • H200: a Hopper variant with more and faster HBM3e memory for memory-heavy work.
  • A100: the prior-generation Ampere datacenter GPU, still common and cost-effective.
  • B200: a newer Blackwell-generation GPU aimed at large-scale AI.
  • MI300X: AMD's datacenter accelerator competing in the AI space, notable for large memory.
  • RTX 4090: a high-end consumer GPU offered cheaply by some clouds for single-GPU work.
  • MIG: multi-instance GPU, splitting one physical GPU into isolated smaller slices.

Networking and interconnect

  • NVLink: NVIDIA's high-speed direct link between GPUs, far faster than PCIe.
  • NVSwitch: a switch fabric letting every GPU in a node reach every other at full NVLink bandwidth.
  • InfiniBand: a low-latency, high-bandwidth network fabric common in GPU clusters.
  • RDMA: remote direct memory access, moving data between machines with minimal CPU overhead.
  • RoCE: RDMA over Converged Ethernet, bringing RDMA benefits to Ethernet networks.
  • East-west traffic: data moving between nodes inside the cluster, critical for distributed training.
  • Bisection bandwidth: a measure of how much data the cluster network can carry across its halves.
  • Topology: the physical arrangement of GPUs and links, which shapes scaling efficiency.

Pricing, billing, and operations

  • On-demand: pay-as-you-go pricing with no commitment and the highest hourly rate.
  • Reserved: committed capacity for a term in exchange for a lower rate.
  • Spot or preemptible: cheap capacity the provider can reclaim with little notice.
  • Egress: the cost of moving data out of a provider's network.
  • Ingress: data coming into the provider, usually free.
  • Object storage: scalable storage for datasets and checkpoints, billed by volume and requests.
  • Block storage: persistent disk volumes attached to instances.
  • Cold start: the time from requesting a GPU to having it ready to run work.
  • Provisioning: the process of allocating and configuring an instance before use.
  • Utilization: the share of time a rented GPU is actually doing useful work.
  • Idle time: hours a GPU is rented but not working, billed but wasted.
  • Quota: the cap a provider places on how many GPUs you can launch.
  • Region: a geographic location where the provider runs hardware.
  • Availability zone: an isolated location within a region for fault tolerance.
  • Neocloud: a provider specialized in GPU compute, often pricing below hyperscalers.

How to use this glossary when buying

When you read a quote, work through it in layers. First check the GPU model and its HBM capacity, since that sets what workloads fit. Next check the interconnect, since NVLink and InfiniBand decide multi-GPU scaling. Then read the pricing model and confirm whether the rate is on-demand, reserved, or spot. Finally look for the hidden costs: egress, storage, and any quota or minimum commitment. A provider that looks cheap per GPU hour can become expensive once egress and idle time are counted.

LayerKey terms to check
What fitsGPU model, HBM, VRAM, MIG
How it scalesNVLink, NVSwitch, InfiniBand, topology
What it costsOn-demand, reserved, spot, egress, storage
How fast you startCold start, provisioning, quota, region

Terms that trip up new buyers

A handful of these terms cause the most confusion and the most expensive mistakes, so they are worth a closer look.

  • VRAM versus system RAM: these are different pools. A GPU can have ample VRAM while the host is starved of system RAM, which bottlenecks data loading. Read both numbers on a quote.
  • On-demand versus spot: the price gap is large, but spot capacity can vanish mid-job. The deciding factor is whether your work checkpoints and can resume. If it can, spot is usually the better buy.
  • Egress versus ingress: pushing data in is typically free, pulling it out is not. Teams that move large datasets or artifacts repeatedly get surprised here, so estimate egress before you commit to a provider.
  • FLOPS versus real throughput: peak FLOPS is a theoretical ceiling. Memory bandwidth, interconnect, and software efficiency mean your actual throughput is often well below the headline number. Benchmark on your own workload.
  • Cold start versus steady-state cost: a cheap hourly rate means little if provisioning is slow and you over-provision to compensate. Factor time-to-ready into the true cost.

Putting the vocabulary to work

Knowing the words is the start; using them to interrogate a quote is the payoff. When a provider advertises a GPU, ask which HBM generation it uses and how much. When they advertise a cluster, ask about NVLink within the node and InfiniBand or RoCE between nodes. When they advertise a low price, ask whether it is on-demand, reserved, or spot, and what egress and storage will add. Each question maps to a term in this glossary, and together they turn a glossy marketing page into a comparison you can actually trust.

Vocabulary is the foundation of smart purchasing. Once these 40 terms feel familiar, provider comparisons stop being a wall of acronyms and become a clear, side-by-side decision. Bookmark this glossary, and revisit it whenever a new spec or pricing line leaves you guessing.