How many A100 GPUs can I use in a single instance?

Up to 4 per instance with 600 GB/s NVLink. PyTorch DDP and DeepSpeed pre-configured. Contact us for larger clusters.

Can I run inference on the A100 40GB?

Yes, great for 13B–30B models. For cost-efficient inference <14B, consider L4 ($0.44/hr). MIG partitions one A100 into up to 7 instances.

How does billing work when I pause?

GPU billing stops immediately. Small storage fee ($0.00014/GB/hr). Resume anytime, pick up where you left off.

What frameworks are pre-installed?

PyTorch 2.x, Transformers, PEFT, DeepSpeed, Accelerate, bitsandbytes, Jupyter. TF and JAX templates also available.

Why is Jarvislabs cheaper than AWS for A100?

AWS bundles 8 A100s at $32.77/hr minimum. We let you rent 1–4 at $1.29/hr per GPU with per-minute billing.

Ampere Architecture

NVIDIA A100 40GB GPU

Q: What's the difference between the A100 40GB and A100 80GB?

80GB doubles memory, increases bandwidth to 2,039 GB/s. Choose 40GB ($1.29/hr) for models up to 13B or batch inference. Choose 80GB ($1.49/hr) for 30B+ models or larger batch sizes.

Q: How does billing work when I pause?

GPU billing stops immediately. Small storage fee ($0.00014/GB/hr). Resume anytime, pick up where you left off.

Q: What frameworks are pre-installed?

PyTorch 2.x, Transformers, PEFT, DeepSpeed, Accelerate, bitsandbytes, Jupyter. TF and JAX templates also available.

Q: Why is Jarvislabs cheaper than AWS for A100?

AWS bundles 8 A100s at $32.77/hr minimum. We let you rent 1–4 at $1.29/hr per GPU with per-minute billing.

From $0.89/hr — billed by the minute

The data center workhorse for AI training and fine-tuning. 40GB HBM2e memory, 1,555 GB/s bandwidth, and third-generation Tensor Cores deliver 312 TFLOPS of FP16 compute — at 70% less than AWS.

View All GPU Pricing

A100 40GB: $0.89/hr·Per-minute billing·No commitments

Trusted worldwide

Powering teams that push boundaries

27,000+AI developers

50M+GPU hours served

99.9%Uptime SLA

<90sInstance launch

Why A100

The gold standard for AI training

The A100 40GB delivers the raw compute and memory bandwidth needed for serious AI training workloads.

NVIDIA Ampere Architecture

Third-generation data center GPU architecture with 20x higher AI performance. Structural sparsity support doubles effective throughput for compatible models.

Third-Gen Tensor Cores

432 Tensor Cores supporting TF32, FP16, BF16, INT8, and FP64. TF32 mode delivers 156 TFLOPS without code changes — just set mixed precision and train.

40GB HBM2e Memory

High Bandwidth Memory at 1,555 GB/s keeps Tensor Cores fed during training. Fine-tune models up to 13B parameters full precision, or 30B+ with LoRA/QLoRA.

Multi-Instance GPU (MIG)

Partition into up to 7 isolated GPU instances, each with dedicated memory and compute. Run multiple inference workloads on one GPU without contention.

Specs

Key specs at a glance

40 GB

VRAM

HBM2e with ECC

1,555 GB/s

Memory Bandwidth

5x faster than L4

312 TFLOPS

Tensor Performance

FP16 / BF16

6,912

CUDA Cores

FP32 compute

Pricing

Save 70% vs. AWS on A100 compute

Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.

Provider	A100 40GB $/hr	Billing	Min. GPUs	Pre-configured
Jarvislabs	$1.29	Per minute	1
AWS (p4d.24xlarge)	~$4.10/GPU	Per second	8 (bundled)	—
Google Cloud	~$3.67	Per second	1	—
Azure (NC24ads)	~$3.67	Per second	1	—
RunPod (Secure)	$1.39–1.49	Per millisecond	1	—
Lambda	$1.79	Per hour	1	—

Jarvislabs$1.29/hr

Per minute·Min 1 GPUPre-configured

AWS (p4d.24xlarge)~$4.10/GPU/hr

Per second·Min 8 (bundled) GPUs

Google Cloud~$3.67/hr

Per second·Min 1 GPU

Azure (NC24ads)~$3.67/hr

Per second·Min 1 GPU

RunPod (Secure)$1.39–1.49/hr

Per millisecond·Min 1 GPU

Lambda$1.79/hr

Per hour·Min 1 GPU

AWS requires renting 8x A100s at $32.77/hr minimum. On Jarvislabs, rent a single A100 for $1.29/hr. You save 70% per GPU-hour and can scale from 1 to 4 GPUs on demand.

Use Cases

What developers build on the A100 40GB

From fine-tuning LLMs to training custom models from scratch.

Fine-tune LLMs

Fine-tune 7B–13B models full precision, 30B+ with LoRA/QLoRA. Train 7B in 2–4 hours.

LLaMA 3 8B/13BMistral 7BCodeLlamaPhi-3

Train from Scratch

312 TFLOPS FP16 + 1.5 TB/s bandwidth. CV models, recommendation systems, custom architectures. Scale to 4 GPUs.

PyTorchTensorFlowJAXDeepSpeed

Batch Processing & Inference

Large-scale batch inference, embeddings, classification. MIG partitions one A100 into up to 7 instances for multi-model serving.

vLLMTGITriton

ML Research

Reproduce papers, ablation studies, architecture iteration. Professional hardware at accessible pricing. Pause between experiments.

JupyterLabVS CodeSSH

Full Specs

Technical specifications

Complete hardware specifications for the NVIDIA A100 40GB data center GPU.

Specification	Value	Great for
Architecture	NVIDIA Ampere	20x AI performance vs. prior gen
CUDA Cores	6,912	General-purpose GPU compute
Tensor Cores	432 (3rd gen)	TF32/FP16/BF16/INT8 acceleration
VRAM	40 GB HBM2e (ECC)	Models up to 13B full-precision
Memory Bandwidth	1,555 GB/s	Feeding Tensor Cores during training
FP32 Performance	19.5 TFLOPS	Traditional compute / simulation
TF32 Tensor	156 TFLOPS	Training with auto mixed precision
FP16 Tensor	312 TFLOPS	Mixed-precision training / inference
FP64 Tensor	19.5 TFLOPS	Scientific computing / HPC
INT8 Tensor	624 TOPS	Quantized inference serving
MIG Support	Up to 7 instances	Multi-tenant / multi-model serving
NVLink	600 GB/s (SXM)	Multi-GPU training interconnect
PCIe	Gen4 x16	Host-to-GPU data transfer
TDP	250W (PCIe) / 400W (SXM)	Balanced performance

Architecture

NVIDIA Ampere

20x AI performance vs. prior gen

CUDA Cores

6,912

General-purpose GPU compute

Tensor Cores

432 (3rd gen)

TF32/FP16/BF16/INT8 acceleration

VRAM

40 GB HBM2e (ECC)

Models up to 13B full-precision

Memory Bandwidth

1,555 GB/s

Feeding Tensor Cores during training

FP32 Performance

19.5 TFLOPS

Traditional compute / simulation

TF32 Tensor

156 TFLOPS

Training with auto mixed precision

FP16 Tensor

312 TFLOPS

Mixed-precision training / inference

FP64 Tensor

19.5 TFLOPS

Scientific computing / HPC

INT8 Tensor

624 TOPS

Quantized inference serving

MIG Support

Up to 7 instances

Multi-tenant / multi-model serving

NVLink

600 GB/s (SXM)

Multi-GPU training interconnect

PCIe

Gen4 x16

Host-to-GPU data transfer

TDP

250W (PCIe) / 400W (SXM)

Balanced performance

Get Started

Launch your A100 instance in seconds

Three simple steps from sign-up to a running GPU instance.

Choose Template

PyTorch 2.x (includes Transformers, DeepSpeed, PEFT), TensorFlow, JAX, or clean CUDA.

Configure & Launch

Select A100 40GB, 1–4 GPUs, set storage. Templates ready in seconds, VMs in under a minute.

Train & Iterate

Upload data, start training. Pause = GPU billing stops. Resume where you left off.

Manage via CLI

Create and manage A100 instances from your terminal.

jl create --gpu A100

Explore CLI

27,343+

AI developers trust Jarvislabs

50M+

GPU hours served

99.9%

Uptime SLA

Compare with Other Providers

vs Lambda

Compare pricing

FAQ

Frequently asked questions

Everything you need to know about renting the NVIDIA A100 40GB on Jarvislabs.

80GB doubles memory, increases bandwidth to 2,039 GB/s. Choose 40GB ($1.29/hr) for models up to 13B or batch inference. Choose 80GB ($1.49/hr) for 30B+ models or larger batch sizes.

Start training on the NVIDIA A100 in seconds

$1.29/hr with per-minute billing. Pre-configured PyTorch & TensorFlow. No commitments.

Compare All GPUs