NVIDIA H200 GPU
From $3.80/hr — billed by the minute
The highest-memory Hopper GPU. 141GB HBM3e at 4.8 TB/s bandwidth. Run Llama 70B in full FP16 on a single GPU. No quantization compromises.
Powering teams that push boundaries
Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama
Maximum memory meets maximum bandwidth
The H200 delivers the memory and bandwidth needed for the largest AI inference and training workloads — without compromises.
141GB HBM3e Memory
76% more than H100. Fit 70B models in FP16 without quantization. Run multiple models simultaneously.
4.8 TB/s Memory Bandwidth
43% faster than H100. Token generation scales with bandwidth — faster inference for every LLM.
Zero-Compromise Inference
No quantization needed for 70B models. Full FP16 precision means maximum output quality.
Same Hopper Ecosystem
Identical CUDA/software stack as H100. Same Transformer Engine, same NVLink. Migration is seamless.
Key specs at a glance
141GB of VRAM — rent a single GPU, not eight
Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.
| Provider | H200 $/hr | Billing | Min. GPUs | Pre-configured |
|---|---|---|---|---|
| Jarvislabs | $3.80 | Per minute | 1 | |
| AWS | ~$15-20/GPU | Per second | 8 (bundled) | — |
| Google Cloud | Varies | Per second | Varies | — |
| RunPod | $4.49–5.49 | Per second | 1 | — |
| Lambda | Varies | Per hour | 1 | — |
H200's 141GB eliminates the need for model quantization on 70B LLMs. Run at full FP16 precision with memory to spare for large KV caches and batch sizes.
What developers build on the H200
From full-precision LLM serving to large-scale training without compromises.
Full-Precision LLM Serving
Serve Llama 70B in FP16 on a single GPU. 141GB means no quantization. Maximum quality output for production.
Long-Context Inference
141GB handles massive KV caches. Serve 100K+ token contexts without memory pressure.
Multi-Model Serving
Load multiple models simultaneously. Run a 70B model + embedding model + reranker on one GPU.
Large Model Training
Train 70B+ models with larger batch sizes. 8x H200 = 1,128GB for massive training runs.
Technical specifications
Complete hardware specifications for the NVIDIA H200 data center GPU.
| Specification | Value | Great for |
|---|---|---|
| Architecture | NVIDIA Hopper (enhanced) | Next-gen AI performance |
| CUDA Cores | 16,896 | General-purpose GPU compute |
| Tensor Cores | 528 (4th gen) | FP8/FP16/BF16/TF32/INT8/FP64 |
| VRAM | 141 GB HBM3e (ECC) | 70B models in FP16 on single GPU |
| Memory Bandwidth | 4,800 GB/s | 43% faster than H100 |
| FP16 Tensor | 989 TFLOPS | Mixed-precision training & inference |
| BF16 Tensor | 989 TFLOPS | LLM training (preferred) |
| TF32 Tensor | 989 TFLOPS | Auto mixed-precision training |
| FP8 Tensor | 1,979 TFLOPS | Transformer Engine optimized |
| INT8 Tensor | 1,979 TOPS | Quantized inference |
| NVLink | 900 GB/s bidirectional | Multi-GPU scaling |
| PCIe | Gen5 x16 | Host data transfer |
| TDP | 700W (SXM) | Maximum performance |
| Manufacturing | TSMC 4N | Advanced process node |
| Multi-GPU | Up to 8x per instance | 1,128GB unified memory |
Launch your H200 instance in seconds
Three simple steps from sign-up to a running GPU instance.
Choose Template
PyTorch 2.x (with Transformers, DeepSpeed, PEFT, Accelerate), TensorFlow, JAX, or clean CUDA.
Configure & Launch
Select H200, 1–8 GPUs, allocate storage. Templates ready in seconds, VMs in under a minute.
Train at Scale
DeepSpeed and PyTorch DDP pre-configured for multi-GPU. Pause when idle, resume from checkpoint.
Manage via CLI
Create and manage H200 instances from your terminal.
jl create --gpu H200Frequently asked questions
Everything you need to know about renting the NVIDIA H200 on Jarvislabs.
76% more memory (141GB vs 80GB) and 43% more bandwidth (4.8 vs 3.35 TB/s). Same compute. Choose H200 when memory or bandwidth is your bottleneck.
Start running on the NVIDIA H200 in seconds
$3.80/hr with per-minute billing. 141GB HBM3e. Up to 8 GPUs. No commitments.