NVIDIA A100 40GB GPU
From $1.29/hr — billed by the minute
The data center workhorse for AI training and fine-tuning. 40GB HBM2e memory, 1,555 GB/s bandwidth, and third-generation Tensor Cores deliver 312 TFLOPS of FP16 compute — at 70% less than AWS.
Powering teams that push boundaries
Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama
The gold standard for AI training
The A100 40GB delivers the raw compute and memory bandwidth needed for serious AI training workloads.
NVIDIA Ampere Architecture
Third-generation data center GPU architecture with 20x higher AI performance. Structural sparsity support doubles effective throughput for compatible models.
Third-Gen Tensor Cores
432 Tensor Cores supporting TF32, FP16, BF16, INT8, and FP64. TF32 mode delivers 156 TFLOPS without code changes — just set mixed precision and train.
40GB HBM2e Memory
High Bandwidth Memory at 1,555 GB/s keeps Tensor Cores fed during training. Fine-tune models up to 13B parameters full precision, or 30B+ with LoRA/QLoRA.
Multi-Instance GPU (MIG)
Partition into up to 7 isolated GPU instances, each with dedicated memory and compute. Run multiple inference workloads on one GPU without contention.
Key specs at a glance
Save 70% vs. AWS on A100 compute
Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.
| Provider | A100 40GB $/hr | Billing | Min. GPUs | Pre-configured |
|---|---|---|---|---|
| Jarvislabs | $1.29 | Per minute | 1 | |
| AWS (p4d.24xlarge) | ~$4.10/GPU | Per second | 8 (bundled) | — |
| Google Cloud | ~$3.67 | Per second | 1 | — |
| Azure (NC24ads) | ~$3.67 | Per second | 1 | — |
| RunPod (Secure) | $1.39–1.49 | Per millisecond | 1 | — |
| Lambda | $1.79 | Per hour | 1 | — |
AWS requires renting 8x A100s at $32.77/hr minimum. On Jarvislabs, rent a single A100 for $1.29/hr. You save 70% per GPU-hour and can scale from 1 to 4 GPUs on demand.
What developers build on the A100 40GB
From fine-tuning LLMs to training custom models from scratch.
Fine-tune LLMs
Fine-tune 7B–13B models full precision, 30B+ with LoRA/QLoRA. Train 7B in 2–4 hours.
Train from Scratch
312 TFLOPS FP16 + 1.5 TB/s bandwidth. CV models, recommendation systems, custom architectures. Scale to 4 GPUs.
Batch Processing & Inference
Large-scale batch inference, embeddings, classification. MIG partitions one A100 into up to 7 instances for multi-model serving.
ML Research
Reproduce papers, ablation studies, architecture iteration. Professional hardware at accessible pricing. Pause between experiments.
Technical specifications
Complete hardware specifications for the NVIDIA A100 40GB data center GPU.
| Specification | Value | Great for |
|---|---|---|
| Architecture | NVIDIA Ampere | 20x AI performance vs. prior gen |
| CUDA Cores | 6,912 | General-purpose GPU compute |
| Tensor Cores | 432 (3rd gen) | TF32/FP16/BF16/INT8 acceleration |
| VRAM | 40 GB HBM2e (ECC) | Models up to 13B full-precision |
| Memory Bandwidth | 1,555 GB/s | Feeding Tensor Cores during training |
| FP32 Performance | 19.5 TFLOPS | Traditional compute / simulation |
| TF32 Tensor | 156 TFLOPS | Training with auto mixed precision |
| FP16 Tensor | 312 TFLOPS | Mixed-precision training / inference |
| FP64 Tensor | 19.5 TFLOPS | Scientific computing / HPC |
| INT8 Tensor | 624 TOPS | Quantized inference serving |
| MIG Support | Up to 7 instances | Multi-tenant / multi-model serving |
| NVLink | 600 GB/s (SXM) | Multi-GPU training interconnect |
| PCIe | Gen4 x16 | Host-to-GPU data transfer |
| TDP | 250W (PCIe) / 400W (SXM) | Balanced performance |
Launch your A100 instance in seconds
Three simple steps from sign-up to a running GPU instance.
Choose Template
PyTorch 2.x (includes Transformers, DeepSpeed, PEFT), TensorFlow, JAX, or clean CUDA.
Configure & Launch
Select A100 40GB, 1–4 GPUs, set storage. Templates ready in seconds, VMs in under a minute.
Train & Iterate
Upload data, start training. Pause = GPU billing stops. Resume where you left off.
Frequently asked questions
Everything you need to know about renting the NVIDIA A100 40GB on Jarvislabs.
80GB doubles memory, increases bandwidth to 2,039 GB/s. Choose 40GB ($1.29/hr) for models up to 13B or batch inference. Choose 80GB ($1.49/hr) for 30B+ models or larger batch sizes.
Start training on the NVIDIA A100 in seconds
$1.29/hr with per-minute billing. Pre-configured PyTorch & TensorFlow. No commitments.