NVIDIA H100 GPU
From $2.69/hr — billed by the minute
The flagship datacenter GPU for AI training and inference. 4th-gen Tensor Cores with FP8 support, Transformer Engine, and 3.35 TB/s bandwidth. Train and serve the largest LLMs.
Powering teams that push boundaries
Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama
The fastest GPU for AI training and inference
The H100 delivers breakthrough performance with Hopper architecture, Transformer Engine, and native FP8 support.
Transformer Engine
Dynamic FP8/FP16 mixed precision per layer, per iteration. Near-FP16 quality at half the memory footprint. Automatic — no code changes needed.
3.35 TB/s Memory Bandwidth
64% faster than A100 80GB (3,350 vs 2,039 GB/s). Keeps 528 Tensor Cores saturated. Eliminates memory bottlenecks on large model training.
8-GPU NVLink Scaling
900 GB/s per GPU with 4th-gen NVLink. Up to 8 GPUs per instance with 640GB unified memory — enough for training 70B+ parameter models.
Native FP8 Support
Halve model memory footprint and double inference throughput with hardware-native FP8 compute. First GPU with dedicated FP8 Tensor Core instructions.
Key specs at a glance
78% less than AWS — rent a single GPU, not eight
Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.
| Provider | H100 $/hr | Billing | Min. GPUs | Pre-configured |
|---|---|---|---|---|
| Jarvislabs | $2.69 | Per minute | 1 | |
| AWS (p5.48xlarge) | ~$12.38/GPU | Per second | 8 (bundled) | — |
| Azure (ND H100 v5) | ~$9.74/GPU | Per second | 8 (bundled) | — |
| Google Cloud (a3-highgpu) | ~$5.07 | Per second | 1 | — |
| RunPod (Secure) | $3.29–3.89 | Per second | 1 | — |
| Lambda | $2.49 | Per hour | 1 | — |
AWS charges $98.32/hr for 8x H100 (p5.48xlarge). On Jarvislabs, rent 1 GPU for $2.69/hr or 8 for $21.52/hr. Save 78% per GPU-hour vs. AWS.
What developers build on the H100
From training 70B+ LLMs to production inference with FP8.
Train Large Language Models
Train and fine-tune 70B+ models with FP8 mixed precision. 8x H100 handles 400B+ parameter models with FSDP/DeepSpeed. Transformer Engine optimizes precision automatically.
Production LLM Inference
Serve 70B models on a single GPU with FP8 quantization via TensorRT-LLM, vLLM. 2-3x faster than A100. Native FP8 eliminates quantization overhead.
Image & Video Generation
FLUX, Stable Diffusion, and video models at maximum speed. 80GB fits multiple models simultaneously. 3.35 TB/s bandwidth eliminates pipeline stalls.
Research & Experimentation
Fastest iteration cycles. Train, evaluate, iterate. FP8 training cuts memory in half, Transformer Engine optimizes precision automatically per layer.
Technical specifications
Complete hardware specifications for the NVIDIA H100 data center GPU.
| Specification | Value | Great for |
|---|---|---|
| Architecture | NVIDIA Hopper | Next-gen AI performance |
| CUDA Cores | 16,896 | General-purpose GPU compute |
| Tensor Cores | 528 (4th gen) | FP8/FP16/BF16/TF32/FP64 |
| VRAM | 80 GB HBM3 (ECC) | Models up to 70B+ in FP16 |
| Memory Bandwidth | 3,350 GB/s | 64% faster than A100 80GB |
| FP32 Performance | 67 TFLOPS | Traditional compute |
| TF32 Tensor | 495 TFLOPS | Auto mixed-precision training |
| FP16/BF16 Tensor | 989 TFLOPS | Mixed-precision training |
| FP8 Tensor | 1,979 TFLOPS | Via Transformer Engine |
| INT8 Tensor | 1,979 TOPS | Quantized inference |
| Transformer Engine | Yes (1st gen FP8/FP16) | Automatic mixed precision |
| NVLink | 900 GB/s bidirectional (4th gen) | Multi-GPU training |
| PCIe | Gen5 x16 | Host data transfer |
| TDP | 700W (SXM) | Maximum performance |
| Multi-GPU | Up to 8x per instance | 640GB unified memory |
Launch your H100 instance in seconds
Three simple steps from sign-up to a running GPU instance.
Choose Template
PyTorch 2.x (with Transformers, DeepSpeed, PEFT, Accelerate), TensorFlow, JAX, or clean CUDA. Transformer Engine pre-configured.
Configure & Launch
Select H100, 1–8 GPUs, allocate storage. Templates ready in seconds, VMs in under a minute.
Train at Scale
DeepSpeed, FSDP, and Transformer Engine pre-configured for multi-GPU. Pause when idle, resume from checkpoint.
Manage via CLI
Create and manage H100 instances from your terminal.
jl create --gpu H100Frequently asked questions
Everything you need to know about renting the NVIDIA H100 on Jarvislabs.
Fine-tune 70B parameters in FP8 on a single GPU. 8x H100 (640GB) handles 180B+ full fine-tuning. Common workloads: LLaMA 3 70B, Mixtral 8x22B, SDXL training.
Start training on the NVIDIA H100 in seconds
$2.69/hr with per-minute billing. 80GB HBM3. Up to 8 GPUs. No commitments.