NVIDIA A100 GPU Price Guide (2025) - Cloud Rental & Purchase Costs

Vishnu Subramanian

Founder @JarvisLabs.ai

The NVIDIA A100 is available to rent on JarvisLabs starting from competitive hourly rates, or $5,000-$20,000 to purchase depending on configuration and condition. Check our pricing page for current rates. Industry reports in 2024 indicated NVIDIA was winding down A100 production, but the GPU remains widely available through existing inventory and cloud providers.

Quick Price Reference

Option	A100 40GB	A100 80GB
Purchase (PCIe)	$8,000-$10,000	$9,500-$14,000
Purchase (SXM)	$12,000-$15,000	$18,000-$20,000
Used/Refurbished	$5,000-$8,000	$7,000-$12,000

These are street prices observed via resellers. Actual pricing varies by region, warranty status, and seller. For cloud rental rates, check our pricing page.

Cloud GPU Rental

Renting A100s makes more sense than purchasing for most teams. You skip the upfront capital, avoid maintenance overhead, and don't take on depreciation risk. When you're done with a job, you stop paying.

JarvisLabs offers A100 instances with minute-level billing, so you pay for actual usage rather than rounding up to the nearest hour. Instances spin up in under 90 seconds (often faster for vanilla templates), and your workspace volume persists between sessions. There's no long-term commitment required.

Check our pricing page for current A100 rates and other GPU options.

A100 Purchase Prices (2025)

If you need to own hardware, here's what the market looks like:

New Hardware

Configuration	Street Price	Notes
A100 40GB PCIe	$8,000-$10,000	Standard datacenter card
A100 80GB PCIe	$9,500-$14,000	Higher memory bandwidth
A100 40GB SXM	$12,000-$15,000	Requires HGX baseboard
A100 80GB SXM	$18,000-$20,000	Maximum performance variant
DGX A100 (8x GPUs)	Starts at $199,000	Complete turnkey system

Used Market

With enterprises upgrading to H100 and H200, plenty of A100s are hitting the secondary market:

Condition	40GB Price	80GB Price
Certified Refurbished	$6,000-$8,000	$8,000-$12,000
Used (Good Condition)	$5,000-$7,000	$7,000-$10,000

Before buying used, ask for serial number and warranty status, request datacenter pull documentation if available, and plan to run burn-in and memory tests yourself (DCGM diagnostics or stress tests). Not all sellers will have detailed usage history, so verification on your end matters.

A100 40GB vs 80GB

The 80GB variant has double the memory and meaningfully higher bandwidth, which matters for memory-bound workloads.

Specifications

Specification	A100 40GB PCIe	A100 80GB PCIe	A100 80GB SXM
Memory	40GB HBM2	80GB HBM2e	80GB HBM2e
Memory Bandwidth	Up to 1,555 GB/s	Up to 1,935 GB/s	Up to 2,039 GB/s
CUDA Cores	6,912	6,912	6,912
Tensor Cores	432 (3rd gen)	432 (3rd gen)	432 (3rd gen)
TDP	250W	300W	400W
FP32	19.5 TFLOPS	19.5 TFLOPS	19.5 TFLOPS
FP16/BF16 (dense)	312 TFLOPS	312 TFLOPS	312 TFLOPS

When to Choose Each

The 40GB variant handles most common AI workloads. It works well for LoRA and QLoRA fine-tuning of 7B-13B parameter models, inference on quantized models, and any workload that fits comfortably in 40GB of VRAM.

The 80GB makes sense when you're training or fine-tuning models in the 13B-65B range, running multiple models simultaneously, working with large batch sizes for inference throughput, or when memory bandwidth is limiting your performance. If you're doing serious LLM work, the extra memory and bandwidth usually justify the price difference.

A100 vs H100

Specs Comparison

Metric	A100 80GB SXM	H100 80GB SXM
CUDA Cores	6,912	16,896
Tensor Cores	432 (3rd gen)	528 (4th gen)
Memory Bandwidth	2.0 TB/s	3.35 TB/s
FP16 Tensor (dense)	312 TFLOPS	989 TFLOPS
FP16 Tensor (with sparsity)	624 TFLOPS	1,979 TFLOPS

NVIDIA typically publishes Tensor Core peaks with sparsity enabled. Dense performance is roughly half.

Real-World Performance

The H100 is often materially faster than A100, commonly 1.5-3x in many LLM inference setups. But results vary significantly based on precision (FP16 vs FP8), model architecture, batch size, sequence length, and framework optimizations.

For training, speedups depend on whether your workload is compute-bound or memory-bound. The H100's advantage is largest with FP8 inference and transformer-heavy workloads where its dedicated Transformer Engine shines.

Which One Makes Sense?

The key metric is cost per completed task, not cost per hour.

For workloads where H100 delivers significant speedups, faster completion can offset the higher hourly rate. If a training run takes 10 hours on A100 but only 4 hours on H100, the total cost might be similar despite H100's higher rate. For inference, higher throughput means more tokens per dollar.

A100 makes sense for batch processing, experimentation, budget-conscious production, and workloads where A100 performance is sufficient. H100 makes sense for latency-sensitive inference, training from scratch, FP8 workloads, and scenarios where you're optimizing for time rather than cost.

Check our pricing page to compare current rates.

Full Technical Specifications

Specification	Value
Architecture	NVIDIA Ampere (GA100)
Manufacturing Process	TSMC 7nm
Transistors	54.2 billion
CUDA Cores	6,912
Tensor Cores	432 (3rd generation)
Memory	40GB HBM2 or 80GB HBM2e
Memory Interface	5120-bit
L2 Cache	40MB
TDP	250W (40GB PCIe) / 300W (80GB PCIe) / 400W (SXM)
Form Factors	PCIe, SXM4
NVLink	3rd generation, 600 GB/s
PCIe	Gen 4, 64 GB/s
MIG Support	Up to 7 instances

Best Use Cases

LLM Fine-tuning

A100 40GB is commonly used for parameter-efficient fine-tuning (LoRA, QLoRA) of 7B-13B class models. Exact limits depend on context length, batch size, precision, and whether you're using optimizer offloading. The 80GB variant extends this to larger models and enables full fine-tuning of smaller models where 40GB would require memory optimization tricks.

Training

A100 handles computer vision models (ResNet, EfficientNet, ViT), transformer models (BERT, GPT-style architectures), reinforcement learning experiments, and diffusion model training without issues.

Inference at Scale

For batch processing where latency isn't critical, A100 works well. You can run multiple smaller models on one GPU via MIG partitioning, deploy quantized models efficiently, and build cost-optimized production inference pipelines.

Scientific Computing

Molecular dynamics simulations, climate modeling, financial modeling, and drug discovery all run well on A100. The FP64 performance matters for scientific workloads that need double precision.

A100 Market Status

Multiple industry reports in 2024 indicated NVIDIA was winding down A100 production. But A100 remains widely available through existing inventory and across all major cloud providers.

What this means practically: significant inventory exists through NVIDIA partners and distributors, software support (CUDA, drivers, frameworks) continues unchanged, cloud providers continue offering A100 instances, and the used market is growing as enterprises upgrade to newer hardware.

What it doesn't mean: A100 is not obsolete or unsupported, software compatibility will continue for years, and it's not a bad choice for the right use cases.

NVIDIA hasn't published a specific end-of-support date for A100. In practice, datacenter GPUs remain supported for years after production winds down, and A100 continues to work with current CUDA and driver releases.

FAQ

How much does an NVIDIA A100 cost?

Cloud rental rates vary by provider. Check our pricing page for current JarvisLabs rates. Purchase prices range from $8,000-$20,000 for new units depending on memory (40GB vs 80GB) and form factor (PCIe vs SXM). Used A100s are available from $5,000-$12,000.

Is the A100 still worth buying in 2025?

For many use cases, yes. The A100 handles most practical AI workloads at roughly half the price of an H100. Software support continues, and for batch processing, fine-tuning, and cost-optimized inference, A100 delivers strong value.

What's the difference between A100 40GB and 80GB?

The 80GB has double the memory, uses faster HBM2e instead of HBM2, and provides higher memory bandwidth (up to 2.0 TB/s vs 1.6 TB/s for the 40GB PCIe variant). Choose 80GB for large language models, multi-model serving, or when memory bandwidth limits your throughput.

Which should I rent, 40GB or 80GB?

Start with 40GB unless you know you need more. It handles LoRA/QLoRA fine-tuning of models up to 13B parameters and inference for most quantized LLMs. Choose 80GB for training larger models, running multiple models simultaneously, or workloads that need the extra bandwidth.

How does A100 compare to H100 for LLM inference?

H100 is often 1.5-3x faster for LLM inference, with larger gains when using FP8 precision. But A100 costs significantly less per hour. For latency-sensitive applications, H100 is worth the premium. For batch processing or cost optimization, A100 often offers better value per dollar.

Can I still buy new A100 GPUs?

Yes, new A100 GPUs are available through NVIDIA partners and distributors. Supply is gradually decreasing as it's being replaced by H100/H200, but inventory remains available.

What models can I run on an A100 40GB?

LoRA/QLoRA fine-tuning works for models up to roughly 13B parameters, depending on context length and batch size. For inference, you can run most models up to 30B with quantization. Common examples include Llama 2 7B/13B, Mistral 7B, CodeLlama, and Stable Diffusion XL.

How long will NVIDIA support the A100?

NVIDIA hasn't published a specific end-of-support date for A100. Datacenter GPUs typically remain supported for years after production winds down, and A100 continues to work with current CUDA and driver releases. Framework support (PyTorch, TensorFlow) follows NVIDIA's lead.

Bottom Line

The A100 remains a strong choice for AI compute in 2025. It offers proven performance, a mature software ecosystem, and prices that have settled at attractive levels compared to when it was the flagship product.

For teams that don't need H100's cutting-edge performance, A100 delivers solid value for fine-tuning, inference, and training workloads.

Check our pricing page for current A100 rates and availability.

Build & Deploy Your AI in Minutes

Get started with JarvisLabs today and experience the power of cloud GPU infrastructure designed specifically for AI development.

← Back to FAQs