Ampere Architecture

NVIDIA A100 40GB GPU

From $1.29/hr — billed by the minute

The data center workhorse for AI training and fine-tuning. 40GB HBM2e memory, 1,555 GB/s bandwidth, and third-generation Tensor Cores deliver 312 TFLOPS of FP16 compute — at 70% less than AWS.

View All GPU Pricing
A100 40GB: $1.29/hr·Per-minute billing·No commitments
Trusted worldwide

Powering teams that push boundaries

27,000+AI developers
50M+GPU hours served
99.9%Uptime SLA
<90sInstance launch

Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama

Why A100

The gold standard for AI training

The A100 40GB delivers the raw compute and memory bandwidth needed for serious AI training workloads.

NVIDIA Ampere Architecture

Third-generation data center GPU architecture with 20x higher AI performance. Structural sparsity support doubles effective throughput for compatible models.

Third-Gen Tensor Cores

432 Tensor Cores supporting TF32, FP16, BF16, INT8, and FP64. TF32 mode delivers 156 TFLOPS without code changes — just set mixed precision and train.

40GB HBM2e Memory

High Bandwidth Memory at 1,555 GB/s keeps Tensor Cores fed during training. Fine-tune models up to 13B parameters full precision, or 30B+ with LoRA/QLoRA.

Multi-Instance GPU (MIG)

Partition into up to 7 isolated GPU instances, each with dedicated memory and compute. Run multiple inference workloads on one GPU without contention.

Specs

Key specs at a glance

40 GB
VRAM
HBM2e with ECC
1,555 GB/s
Memory Bandwidth
5x faster than L4
312 TFLOPS
Tensor Performance
FP16 / BF16
6,912
CUDA Cores
FP32 compute
Pricing

Save 70% vs. AWS on A100 compute

Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.

Jarvislabs$1.29/hr
Per minute·Min 1 GPUPre-configured
AWS (p4d.24xlarge)~$4.10/GPU/hr
Per second·Min 8 (bundled) GPUs
Google Cloud~$3.67/hr
Per second·Min 1 GPU
Azure (NC24ads)~$3.67/hr
Per second·Min 1 GPU
RunPod (Secure)$1.39–1.49/hr
Per millisecond·Min 1 GPU
Lambda$1.79/hr
Per hour·Min 1 GPU

AWS requires renting 8x A100s at $32.77/hr minimum. On Jarvislabs, rent a single A100 for $1.29/hr. You save 70% per GPU-hour and can scale from 1 to 4 GPUs on demand.

Use Cases

What developers build on the A100 40GB

From fine-tuning LLMs to training custom models from scratch.

Fine-tune LLMs

Fine-tune 7B–13B models full precision, 30B+ with LoRA/QLoRA. Train 7B in 2–4 hours.

LLaMA 3 8B/13BMistral 7BCodeLlamaPhi-3

Train from Scratch

312 TFLOPS FP16 + 1.5 TB/s bandwidth. CV models, recommendation systems, custom architectures. Scale to 4 GPUs.

PyTorchTensorFlowJAXDeepSpeed

Batch Processing & Inference

Large-scale batch inference, embeddings, classification. MIG partitions one A100 into up to 7 instances for multi-model serving.

vLLMTGITriton

ML Research

Reproduce papers, ablation studies, architecture iteration. Professional hardware at accessible pricing. Pause between experiments.

JupyterLabVS CodeSSH
Full Specs

Technical specifications

Complete hardware specifications for the NVIDIA A100 40GB data center GPU.

Architecture
NVIDIA Ampere
20x AI performance vs. prior gen
CUDA Cores
6,912
General-purpose GPU compute
Tensor Cores
432 (3rd gen)
TF32/FP16/BF16/INT8 acceleration
VRAM
40 GB HBM2e (ECC)
Models up to 13B full-precision
Memory Bandwidth
1,555 GB/s
Feeding Tensor Cores during training
FP32 Performance
19.5 TFLOPS
Traditional compute / simulation
TF32 Tensor
156 TFLOPS
Training with auto mixed precision
FP16 Tensor
312 TFLOPS
Mixed-precision training / inference
FP64 Tensor
19.5 TFLOPS
Scientific computing / HPC
INT8 Tensor
624 TOPS
Quantized inference serving
MIG Support
Up to 7 instances
Multi-tenant / multi-model serving
NVLink
600 GB/s (SXM)
Multi-GPU training interconnect
PCIe
Gen4 x16
Host-to-GPU data transfer
TDP
250W (PCIe) / 400W (SXM)
Balanced performance
Get Started

Launch your A100 instance in seconds

Three simple steps from sign-up to a running GPU instance.

01

Choose Template

PyTorch 2.x (includes Transformers, DeepSpeed, PEFT), TensorFlow, JAX, or clean CUDA.

02

Configure & Launch

Select A100 40GB, 1–4 GPUs, set storage. Templates ready in seconds, VMs in under a minute.

03

Train & Iterate

Upload data, start training. Pause = GPU billing stops. Resume where you left off.

27,343+
AI developers trust Jarvislabs
50M+
GPU hours served
99.9%
Uptime SLA
FAQ

Frequently asked questions

Everything you need to know about renting the NVIDIA A100 40GB on Jarvislabs.

80GB doubles memory, increases bandwidth to 2,039 GB/s. Choose 40GB ($1.29/hr) for models up to 13B or batch inference. Choose 80GB ($1.49/hr) for 30B+ models or larger batch sizes.

Start training on the NVIDIA A100 in seconds

$1.29/hr with per-minute billing. Pre-configured PyTorch & TensorFlow. No commitments.

Compare All GPUs