NVIDIA A100 GPU Price Guide (2025) - Cloud Rental & Purchase Costs
The NVIDIA A100 is available to rent on JarvisLabs starting from competitive hourly rates, or $5,000-$20,000 to purchase depending on configuration and condition. Check our pricing page for current rates. Industry reports in 2024 indicated NVIDIA was winding down A100 production, but the GPU remains widely available through existing inventory and cloud providers.
Quick Price Reference
| Option | A100 40GB | A100 80GB |
|---|---|---|
| Purchase (PCIe) | $8,000-$10,000 | $9,500-$14,000 |
| Purchase (SXM) | $12,000-$15,000 | $18,000-$20,000 |
| Used/Refurbished | $5,000-$8,000 | $7,000-$12,000 |
These are street prices observed via resellers. Actual pricing varies by region, warranty status, and seller. For cloud rental rates, check our pricing page.
Cloud GPU Rental
Renting A100s makes more sense than purchasing for most teams. You skip the upfront capital, avoid maintenance overhead, and don't take on depreciation risk. When you're done with a job, you stop paying.
JarvisLabs offers A100 instances with minute-level billing, so you pay for actual usage rather than rounding up to the nearest hour. Instances spin up in under 90 seconds (often faster for vanilla templates), and your workspace volume persists between sessions. There's no long-term commitment required.
Check our pricing page for current A100 rates and other GPU options.
A100 Purchase Prices (2025)
If you need to own hardware, here's what the market looks like:
New Hardware
| Configuration | Street Price | Notes |
|---|---|---|
| A100 40GB PCIe | $8,000-$10,000 | Standard datacenter card |
| A100 80GB PCIe | $9,500-$14,000 | Higher memory bandwidth |
| A100 40GB SXM | $12,000-$15,000 | Requires HGX baseboard |
| A100 80GB SXM | $18,000-$20,000 | Maximum performance variant |
| DGX A100 (8x GPUs) | Starts at $199,000 | Complete turnkey system |
Used Market
With enterprises upgrading to H100 and H200, plenty of A100s are hitting the secondary market:
| Condition | 40GB Price | 80GB Price |
|---|---|---|
| Certified Refurbished | $6,000-$8,000 | $8,000-$12,000 |
| Used (Good Condition) | $5,000-$7,000 | $7,000-$10,000 |
Before buying used, ask for serial number and warranty status, request datacenter pull documentation if available, and plan to run burn-in and memory tests yourself (DCGM diagnostics or stress tests). Not all sellers will have detailed usage history, so verification on your end matters.
A100 40GB vs 80GB
The 80GB variant has double the memory and meaningfully higher bandwidth, which matters for memory-bound workloads.
Specifications
| Specification | A100 40GB PCIe | A100 80GB PCIe | A100 80GB SXM |
|---|---|---|---|
| Memory | 40GB HBM2 | 80GB HBM2e | 80GB HBM2e |
| Memory Bandwidth | Up to 1,555 GB/s | Up to 1,935 GB/s | Up to 2,039 GB/s |
| CUDA Cores | 6,912 | 6,912 | 6,912 |
| Tensor Cores | 432 (3rd gen) | 432 (3rd gen) | 432 (3rd gen) |
| TDP | 250W | 300W | 400W |
| FP32 | 19.5 TFLOPS | 19.5 TFLOPS | 19.5 TFLOPS |
| FP16/BF16 (dense) | 312 TFLOPS | 312 TFLOPS | 312 TFLOPS |
When to Choose Each
The 40GB variant handles most common AI workloads. It works well for LoRA and QLoRA fine-tuning of 7B-13B parameter models, inference on quantized models, and any workload that fits comfortably in 40GB of VRAM.
The 80GB makes sense when you're training or fine-tuning models in the 13B-65B range, running multiple models simultaneously, working with large batch sizes for inference throughput, or when memory bandwidth is limiting your performance. If you're doing serious LLM work, the extra memory and bandwidth usually justify the price difference.
A100 vs H100
Specs Comparison
| Metric | A100 80GB SXM | H100 80GB SXM |
|---|---|---|
| CUDA Cores | 6,912 | 16,896 |
| Tensor Cores | 432 (3rd gen) | 528 (4th gen) |
| Memory Bandwidth | 2.0 TB/s | 3.35 TB/s |
| FP16 Tensor (dense) | 312 TFLOPS | 989 TFLOPS |
| FP16 Tensor (with sparsity) | 624 TFLOPS | 1,979 TFLOPS |
NVIDIA typically publishes Tensor Core peaks with sparsity enabled. Dense performance is roughly half.
Real-World Performance
The H100 is often materially faster than A100, commonly 1.5-3x in many LLM inference setups. But results vary significantly based on precision (FP16 vs FP8), model architecture, batch size, sequence length, and framework optimizations.
For training, speedups depend on whether your workload is compute-bound or memory-bound. The H100's advantage is largest with FP8 inference and transformer-heavy workloads where its dedicated Transformer Engine shines.
Which One Makes Sense?
The key metric is cost per completed task, not cost per hour.
For workloads where H100 delivers significant speedups, faster completion can offset the higher hourly rate. If a training run takes 10 hours on A100 but only 4 hours on H100, the total cost might be similar despite H100's higher rate. For inference, higher throughput means more tokens per dollar.
A100 makes sense for batch processing, experimentation, budget-conscious production, and workloads where A100 performance is sufficient. H100 makes sense for latency-sensitive inference, training from scratch, FP8 workloads, and scenarios where you're optimizing for time rather than cost.
Check our pricing page to compare current rates.
Full Technical Specifications
| Specification | Value |
|---|---|
| Architecture | NVIDIA Ampere (GA100) |
| Manufacturing Process | TSMC 7nm |
| Transistors | 54.2 billion |
| CUDA Cores | 6,912 |
| Tensor Cores | 432 (3rd generation) |
| Memory | 40GB HBM2 or 80GB HBM2e |
| Memory Interface | 5120-bit |
| L2 Cache | 40MB |
| TDP | 250W (40GB PCIe) / 300W (80GB PCIe) / 400W (SXM) |
| Form Factors | PCIe, SXM4 |
| NVLink | 3rd generation, 600 GB/s |
| PCIe | Gen 4, 64 GB/s |
| MIG Support | Up to 7 instances |
Best Use Cases
LLM Fine-tuning
A100 40GB is commonly used for parameter-efficient fine-tuning (LoRA, QLoRA) of 7B-13B class models. Exact limits depend on context length, batch size, precision, and whether you're using optimizer offloading. The 80GB variant extends this to larger models and enables full fine-tuning of smaller models where 40GB would require memory optimization tricks.
Training
A100 handles computer vision models (ResNet, EfficientNet, ViT), transformer models (BERT, GPT-style architectures), reinforcement learning experiments, and diffusion model training without issues.
Inference at Scale
For batch processing where latency isn't critical, A100 works well. You can run multiple smaller models on one GPU via MIG partitioning, deploy quantized models efficiently, and build cost-optimized production inference pipelines.
Scientific Computing
Molecular dynamics simulations, climate modeling, financial modeling, and drug discovery all run well on A100. The FP64 performance matters for scientific workloads that need double precision.
A100 Market Status
Multiple industry reports in 2024 indicated NVIDIA was winding down A100 production. But A100 remains widely available through existing inventory and across all major cloud providers.
What this means practically: significant inventory exists through NVIDIA partners and distributors, software support (CUDA, drivers, frameworks) continues unchanged, cloud providers continue offering A100 instances, and the used market is growing as enterprises upgrade to newer hardware.
What it doesn't mean: A100 is not obsolete or unsupported, software compatibility will continue for years, and it's not a bad choice for the right use cases.
NVIDIA hasn't published a specific end-of-support date for A100. In practice, datacenter GPUs remain supported for years after production winds down, and A100 continues to work with current CUDA and driver releases.
FAQ
How much does an NVIDIA A100 cost?
Cloud rental rates vary by provider. Check our pricing page for current JarvisLabs rates. Purchase prices range from $8,000-$20,000 for new units depending on memory (40GB vs 80GB) and form factor (PCIe vs SXM). Used A100s are available from $5,000-$12,000.
Is the A100 still worth buying in 2025?
For many use cases, yes. The A100 handles most practical AI workloads at roughly half the price of an H100. Software support continues, and for batch processing, fine-tuning, and cost-optimized inference, A100 delivers strong value.
What's the difference between A100 40GB and 80GB?
The 80GB has double the memory, uses faster HBM2e instead of HBM2, and provides higher memory bandwidth (up to 2.0 TB/s vs 1.6 TB/s for the 40GB PCIe variant). Choose 80GB for large language models, multi-model serving, or when memory bandwidth limits your throughput.
Which should I rent, 40GB or 80GB?
Start with 40GB unless you know you need more. It handles LoRA/QLoRA fine-tuning of models up to 13B parameters and inference for most quantized LLMs. Choose 80GB for training larger models, running multiple models simultaneously, or workloads that need the extra bandwidth.
How does A100 compare to H100 for LLM inference?
H100 is often 1.5-3x faster for LLM inference, with larger gains when using FP8 precision. But A100 costs significantly less per hour. For latency-sensitive applications, H100 is worth the premium. For batch processing or cost optimization, A100 often offers better value per dollar.
Can I still buy new A100 GPUs?
Yes, new A100 GPUs are available through NVIDIA partners and distributors. Supply is gradually decreasing as it's being replaced by H100/H200, but inventory remains available.
What models can I run on an A100 40GB?
LoRA/QLoRA fine-tuning works for models up to roughly 13B parameters, depending on context length and batch size. For inference, you can run most models up to 30B with quantization. Common examples include Llama 2 7B/13B, Mistral 7B, CodeLlama, and Stable Diffusion XL.
How long will NVIDIA support the A100?
NVIDIA hasn't published a specific end-of-support date for A100. Datacenter GPUs typically remain supported for years after production winds down, and A100 continues to work with current CUDA and driver releases. Framework support (PyTorch, TensorFlow) follows NVIDIA's lead.
Bottom Line
The A100 remains a strong choice for AI compute in 2025. It offers proven performance, a mature software ecosystem, and prices that have settled at attractive levels compared to when it was the flagship product.
For teams that don't need H100's cutting-edge performance, A100 delivers solid value for fine-tuning, inference, and training workloads.
Check our pricing page for current A100 rates and availability.
Build & Deploy Your AI in Minutes
Get started with JarvisLabs today and experience the power of cloud GPU infrastructure designed specifically for AI development.
Related Articles
Should I Run Llama-405B on an NVIDIA H100 or A100 GPU?
Practical comparison of H100, A100, and H200 GPUs for running Llama 405B models. Get performance insights, cost analysis, and real-world recommendations from a technical founder's perspective.
NVIDIA A100 vs H100 vs H200: Which GPU Should You Choose?
Compare NVIDIA A100, H100, and H200 GPUs for AI training and inference. Detailed specs, memory bandwidth, and practical guidance on picking the right datacenter GPU for your workload.
Should I run Llama 70B on an NVIDIA H100 or A100?
Should you run Llama 70B on H100 or A100? Compare 2–3× performance gains, memory + quantization trade-offs, cloud pricing, and get clear guidance on choosing the right GPU.
What are the Differences Between NVIDIA A100 and H100 GPUs?
Compare NVIDIA A100 vs H100 GPUs across architecture, performance, memory, and cost. Learn when to choose each GPU for AI workloads and get practical guidance from a technical founder.
Which AI Models Can I Run on an NVIDIA A6000 GPU?
Discover which AI models fit on an A6000's 48GB VRAM, from 13B parameter LLMs at full precision to 70B models with quantization, plus practical performance insights and cost comparisons.