How does the H100 compare to the A100?

2-3x faster for LLM training and inference. Native FP8 support, Transformer Engine, 64% more memory bandwidth. Worth the premium for speed-critical workloads.

What is the Transformer Engine?

Dedicated hardware that dynamically manages FP8/FP16 precision per layer, per iteration. Delivers near-FP16 quality at half the memory footprint. Fully automatic — no code changes needed.

Can I scale to multiple H100s?

Up to 8 per instance with 900 GB/s 4th-gen NVLink. 640GB unified memory. PyTorch FSDP and DeepSpeed pre-configured for distributed training.

We provide SXM — higher bandwidth (3,350 vs 2,039 GB/s) and full NVLink connectivity. The premium datacenter variant for maximum performance.

How does per-minute billing work?

Billed per minute. A 24-hour run costs $64.56 (24 x $2.69). Pause anytime, checkpoint your work, resume later. No hourly rounding.

Hopper Architecture80GB HBM3

NVIDIA H100 GPU

Q: What models can I train on the H100?

Fine-tune 70B parameters in FP8 on a single GPU. 8x H100 (640GB) handles 180B+ full fine-tuning. Common workloads: LLaMA 3 70B, Mixtral 8x22B, SDXL training.

From $2.69/hr — billed by the minute

The flagship datacenter GPU for AI training and inference. 4th-gen Tensor Cores with FP8 support, Transformer Engine, and 3.35 TB/s bandwidth. Train and serve the largest LLMs.

View All GPU Pricing

H100: $2.69/hr·Per-minute billing·No commitments

Trusted worldwide

Powering teams that push boundaries

27,000+AI developers

50M+GPU hours served

99.9%Uptime SLA

<90sInstance launch

Why H100

The fastest GPU for AI training and inference

The H100 delivers breakthrough performance with Hopper architecture, Transformer Engine, and native FP8 support.

Transformer Engine

Dynamic FP8/FP16 mixed precision per layer, per iteration. Near-FP16 quality at half the memory footprint. Automatic — no code changes needed.

3.35 TB/s Memory Bandwidth

64% faster than A100 80GB (3,350 vs 2,039 GB/s). Keeps 528 Tensor Cores saturated. Eliminates memory bottlenecks on large model training.

8-GPU NVLink Scaling

900 GB/s per GPU with 4th-gen NVLink. Up to 8 GPUs per instance with 640GB unified memory — enough for training 70B+ parameter models.

Native FP8 Support

Halve model memory footprint and double inference throughput with hardware-native FP8 compute. First GPU with dedicated FP8 Tensor Core instructions.

Specs

Key specs at a glance

80 GB

VRAM

HBM3 with ECC

3,350 GB/s

Memory Bandwidth

64% faster than A100 80GB

989 TFLOPS

Tensor Performance

FP16 / BF16

640 GB

Multi-GPU Memory

8x H100 via NVLink

Pricing

78% less than AWS — rent a single GPU, not eight

Transparent, per-minute billing with no hidden fees. Pause anytime — only pay for active minutes.

Provider	H100 $/hr	Billing	Min. GPUs	Pre-configured
Jarvislabs	$2.69	Per minute	1
AWS (p5.48xlarge)	~$12.38/GPU	Per second	8 (bundled)	—
Azure (ND H100 v5)	~$9.74/GPU	Per second	8 (bundled)	—
Google Cloud (a3-highgpu)	~$5.07	Per second	1	—
RunPod (Secure)	$3.29–3.89	Per second	1	—
Lambda	$2.49	Per hour	1	—

Jarvislabs$2.69/hr

Per minute·Min 1 GPUPre-configured

AWS (p5.48xlarge)~$12.38/GPU/hr

Per second·Min 8 (bundled) GPUs

Azure (ND H100 v5)~$9.74/GPU/hr

Per second·Min 8 (bundled) GPUs

Google Cloud (a3-highgpu)~$5.07/hr

Per second·Min 1 GPU

RunPod (Secure)$3.29–3.89/hr

Per second·Min 1 GPU

Lambda$2.49/hr

Per hour·Min 1 GPU

AWS charges $98.32/hr for 8x H100 (p5.48xlarge). On Jarvislabs, rent 1 GPU for $2.69/hr or 8 for $21.52/hr. Save 78% per GPU-hour vs. AWS.

Use Cases

What developers build on the H100

From training 70B+ LLMs to production inference with FP8.

Train Large Language Models

Train and fine-tune 70B+ models with FP8 mixed precision. 8x H100 handles 400B+ parameter models with FSDP/DeepSpeed. Transformer Engine optimizes precision automatically.

LLaMA 3 70BMixtral 8x22BFalcon 180B

Production LLM Inference

Serve 70B models on a single GPU with FP8 quantization via TensorRT-LLM, vLLM. 2-3x faster than A100. Native FP8 eliminates quantization overhead.

vLLMTGITensorRT-LLM

Image & Video Generation

FLUX, Stable Diffusion, and video models at maximum speed. 80GB fits multiple models simultaneously. 3.35 TB/s bandwidth eliminates pipeline stalls.

FLUXSDXLStable Video Diffusion

Research & Experimentation

Fastest iteration cycles. Train, evaluate, iterate. FP8 training cuts memory in half, Transformer Engine optimizes precision automatically per layer.

PyTorchJAXDeepSpeedMegatron

Full Specs

Technical specifications

Complete hardware specifications for the NVIDIA H100 data center GPU.

Specification	Value	Great for
Architecture	NVIDIA Hopper	Next-gen AI performance
CUDA Cores	16,896	General-purpose GPU compute
Tensor Cores	528 (4th gen)	FP8/FP16/BF16/TF32/FP64
VRAM	80 GB HBM3 (ECC)	Models up to 70B+ in FP16
Memory Bandwidth	3,350 GB/s	64% faster than A100 80GB
FP32 Performance	67 TFLOPS	Traditional compute
TF32 Tensor	495 TFLOPS	Auto mixed-precision training
FP16/BF16 Tensor	989 TFLOPS	Mixed-precision training
FP8 Tensor	1,979 TFLOPS	Via Transformer Engine
INT8 Tensor	1,979 TOPS	Quantized inference
Transformer Engine	Yes (1st gen FP8/FP16)	Automatic mixed precision
NVLink	900 GB/s bidirectional (4th gen)	Multi-GPU training
PCIe	Gen5 x16	Host data transfer
TDP	700W (SXM)	Maximum performance
Multi-GPU	Up to 8x per instance	640GB unified memory

Architecture

NVIDIA Hopper

Next-gen AI performance

CUDA Cores

16,896

General-purpose GPU compute

Tensor Cores

528 (4th gen)

FP8/FP16/BF16/TF32/FP64

VRAM

80 GB HBM3 (ECC)

Models up to 70B+ in FP16

Memory Bandwidth

3,350 GB/s

64% faster than A100 80GB

FP32 Performance

67 TFLOPS

Traditional compute

TF32 Tensor

495 TFLOPS

Auto mixed-precision training

FP16/BF16 Tensor

989 TFLOPS

Mixed-precision training

FP8 Tensor

1,979 TFLOPS

Via Transformer Engine

INT8 Tensor

1,979 TOPS

Quantized inference

Transformer Engine

Yes (1st gen FP8/FP16)

Automatic mixed precision

NVLink

900 GB/s bidirectional (4th gen)

Multi-GPU training

PCIe

Gen5 x16

Host data transfer

TDP

700W (SXM)

Maximum performance

Multi-GPU

Up to 8x per instance

640GB unified memory

Get Started

Launch your H100 instance in seconds

Three simple steps from sign-up to a running GPU instance.

Choose Template

PyTorch 2.x (with Transformers, DeepSpeed, PEFT, Accelerate), TensorFlow, JAX, or clean CUDA. Transformer Engine pre-configured.

Configure & Launch

Select H100, 1–8 GPUs, allocate storage. Templates ready in seconds, VMs in under a minute.

Train at Scale

DeepSpeed, FSDP, and Transformer Engine pre-configured for multi-GPU. Pause when idle, resume from checkpoint.

Manage via CLI

Create and manage H100 instances from your terminal.

jl create --gpu H100

Explore CLI

27,343+

AI developers trust Jarvislabs

50M+

GPU hours served

99.9%

Uptime SLA

Compare with Other Providers

FAQ

Frequently asked questions

Everything you need to know about renting the NVIDIA H100 on Jarvislabs.

Fine-tune 70B parameters in FP8 on a single GPU. 8x H100 (640GB) handles 180B+ full fine-tuning. Common workloads: LLaMA 3 70B, Mixtral 8x22B, SDXL training.

Start training on the NVIDIA H100 in seconds

$2.69/hr with per-minute billing. 80GB HBM3. Up to 8 GPUs. No commitments.

Compare All GPUs