Hopper Architecture141GB HBM3e

NVIDIA H200 GPU

From $3.80/hr — billed by the minute

The highest-memory Hopper GPU. 141GB HBM3e at 4.8 TB/s bandwidth. Run Llama 70B in full FP16 on a single GPU. No quantization compromises.

View All GPU Pricing
H200: $3.80/hr·Per-minute billing·No commitments
Trusted worldwide

Powering teams that push boundaries

27,000+AI developers
50M+GPU hours served
99.9%Uptime SLA
<90sInstance launch

Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama

Why H200

Maximum memory meets maximum bandwidth

The H200 delivers the memory and bandwidth needed for the largest AI inference and training workloads — without compromises.

141GB HBM3e Memory

76% more than H100. Fit 70B models in FP16 without quantization. Run multiple models simultaneously.

4.8 TB/s Memory Bandwidth

43% faster than H100. Token generation scales with bandwidth — faster inference for every LLM.

Zero-Compromise Inference

No quantization needed for 70B models. Full FP16 precision means maximum output quality.

Same Hopper Ecosystem

Identical CUDA/software stack as H100. Same Transformer Engine, same NVLink. Migration is seamless.

Specs

Key specs at a glance

141 GB
VRAM
HBM3e with ECC
4,800 GB/s
Memory Bandwidth
43% faster than H100
989 TFLOPS
Tensor Performance
FP16 / BF16
1,128 GB
Multi-GPU Memory
8x H200 via NVLink
Pricing

141GB of VRAM — rent a single GPU, not eight

Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.

Jarvislabs$3.80/hr
Per minute·Min 1 GPUPre-configured
AWS~$15-20/GPU/hr
Per second·Min 8 (bundled) GPUs
Google CloudVaries/hr
Per second·Min Varies GPUs
RunPod$4.49–5.49/hr
Per second·Min 1 GPU
LambdaVaries/hr
Per hour·Min 1 GPU

H200's 141GB eliminates the need for model quantization on 70B LLMs. Run at full FP16 precision with memory to spare for large KV caches and batch sizes.

Use Cases

What developers build on the H200

From full-precision LLM serving to large-scale training without compromises.

Full-Precision LLM Serving

Serve Llama 70B in FP16 on a single GPU. 141GB means no quantization. Maximum quality output for production.

vLLMTGITensorRT-LLM

Long-Context Inference

141GB handles massive KV caches. Serve 100K+ token contexts without memory pressure.

ClaudeGPT-4 classLlama 3

Multi-Model Serving

Load multiple models simultaneously. Run a 70B model + embedding model + reranker on one GPU.

LLM + embeddingsMulti-model pipelines

Large Model Training

Train 70B+ models with larger batch sizes. 8x H200 = 1,128GB for massive training runs.

LLaMAMixtralCustom architectures
Full Specs

Technical specifications

Complete hardware specifications for the NVIDIA H200 data center GPU.

Architecture
NVIDIA Hopper (enhanced)
Next-gen AI performance
CUDA Cores
16,896
General-purpose GPU compute
Tensor Cores
528 (4th gen)
FP8/FP16/BF16/TF32/INT8/FP64
VRAM
141 GB HBM3e (ECC)
70B models in FP16 on single GPU
Memory Bandwidth
4,800 GB/s
43% faster than H100
FP16 Tensor
989 TFLOPS
Mixed-precision training & inference
BF16 Tensor
989 TFLOPS
LLM training (preferred)
TF32 Tensor
989 TFLOPS
Auto mixed-precision training
FP8 Tensor
1,979 TFLOPS
Transformer Engine optimized
INT8 Tensor
1,979 TOPS
Quantized inference
NVLink
900 GB/s bidirectional
Multi-GPU scaling
PCIe
Gen5 x16
Host data transfer
TDP
700W (SXM)
Maximum performance
Manufacturing
TSMC 4N
Advanced process node
Multi-GPU
Up to 8x per instance
1,128GB unified memory
Get Started

Launch your H200 instance in seconds

Three simple steps from sign-up to a running GPU instance.

01

Choose Template

PyTorch 2.x (with Transformers, DeepSpeed, PEFT, Accelerate), TensorFlow, JAX, or clean CUDA.

02

Configure & Launch

Select H200, 1–8 GPUs, allocate storage. Templates ready in seconds, VMs in under a minute.

03

Train at Scale

DeepSpeed and PyTorch DDP pre-configured for multi-GPU. Pause when idle, resume from checkpoint.

Manage via CLI

Create and manage H200 instances from your terminal.

jl create --gpu H200
Explore CLI
27,343+
AI developers trust Jarvislabs
50M+
GPU hours served
99.9%
Uptime SLA
FAQ

Frequently asked questions

Everything you need to know about renting the NVIDIA H200 on Jarvislabs.

76% more memory (141GB vs 80GB) and 43% more bandwidth (4.8 vs 3.35 TB/s). Same compute. Choose H200 when memory or bandwidth is your bottleneck.

Start running on the NVIDIA H200 in seconds

$3.80/hr with per-minute billing. 141GB HBM3e. Up to 8 GPUs. No commitments.

Compare All GPUs