Blackwell Architecture192GB HBM3e

NVIDIA B200 GPU

Coming Soon

NVIDIA's next-generation datacenter GPU. 192GB HBM3e at 8 TB/s bandwidth with 2nd-gen Transformer Engine and native FP4 support. Up to 4x faster LLM inference vs H100.

View Available GPUs
Coming soon·Not yet available on JarvisLabs
Trusted worldwide

Powering teams that push boundaries

27,000+AI developers
50M+GPU hours served
99.9%Uptime SLA
<90sInstance launch

Trusted by companies including: Tesla, Hugging Face, Kaggle, Zoho, Weights & Biases, upGrad, Saama

Why B200

Next-generation performance at every level

The B200 delivers generational leaps in memory, bandwidth, and compute for the largest AI workloads.

192GB HBM3e Memory

2.4x H100's 80GB. Run Llama 70B in full FP16 with 52GB to spare. Serve multiple large models simultaneously.

8 TB/s Memory Bandwidth

2.4x H100's 3.35 TB/s. Memory-bound workloads (most LLM inference) scale almost linearly with bandwidth.

2nd-Gen Transformer Engine

Native FP4 precision. Half the memory of FP8, enabling even larger models or bigger batch sizes.

1.8 TB/s NVLink

2x H100's NVLink bandwidth. Faster distributed training, more efficient tensor parallelism.

Specs

Key specs at a glance

192 GB
VRAM
HBM3e
8,000 GB/s
Memory Bandwidth
2.4x H100
~2,000+ TFLOPS
Tensor Performance
FP16
1,000W
TDP
Maximum power
Available Now

Available today on JarvisLabs

While B200 is coming soon, these GPUs are ready for immediate use with per-minute billing.

Use Cases

What B200 enables

From next-gen training to ultra-scale inference and multi-modal AI.

Next-Gen LLM Training

Train 200B+ models with FP4/FP8 mixed precision. 4x speedup vs H100 reduces training costs proportionally.

LLaMA 400B+GPT-scaleMixed Precision FP4

Ultra-Scale Inference

Serve 70B models with massive batch sizes. 192GB handles multiple concurrent models per GPU.

vLLMTensorRT-LLMMulti-model serving

Long-Context Applications

192GB accommodates enormous KV caches for 200K+ token contexts without memory pressure.

200K+ tokensRAG pipelinesDocument analysis

Multi-Modal AI

Image, video, and language models simultaneously. The memory headroom enables complex multi-modal pipelines.

Vision-LanguageVideo generationMulti-modal
Comparison

B200 vs H100 vs H200

How Blackwell compares to the current Hopper generation.

Memory
B200
192 GB HBM3e
H200
141 GB HBM3e
H100
80 GB HBM3
Bandwidth
B200
8,000 GB/s
H200
4,800 GB/s
H100
3,350 GB/s
Tensor Perf (FP16)
B200
~2,000+ TFLOPS
H200
989 TFLOPS
H100
989 TFLOPS
NVLink
B200
1,800 GB/s
H200
900 GB/s
H100
900 GB/s
TDP
B200
1,000W
H200
700W
H100
700W
Transformer Engine
B200
2nd gen (FP4/FP8)
H200
1st gen (FP8)
H100
1st gen (FP8)
Architecture
B200
Blackwell
H200
Hopper
H100
Hopper
Full Specs

Technical specifications

Complete hardware specifications for the NVIDIA B200 data center GPU.

Architecture
NVIDIA Blackwell (GB200)
Next-gen datacenter GPU
Manufacturing
TSMC 4NP
Advanced process node
Transistors
208 billion
2.6x H100 (80B)
VRAM
192 GB HBM3e
2.4x H100's 80GB
Memory Bandwidth
8,000 GB/s (8 TB/s)
2.4x H100
Tensor Cores
5th gen with FP4 native
FP4/FP8/FP16/BF16
FP8 Tensor
2nd generation
Higher throughput than Hopper
FP16 Tensor
~2,000+ TFLOPS (est.)
Mixed-precision training
Transformer Engine
2nd generation
FP4/FP8/FP16 dynamic
NVLink
5th gen, 1,800 GB/s (1.8 TB/s)
2x H100 NVLink
TDP
Up to 1,000W
43% more than H100

B200 is NVIDIA's next-generation datacenter GPU

For current AI workloads, JarvisLabs offers H100 and H200 with per-minute billing. Launch an instance in seconds and start training immediately.

27,343+
AI developers trust Jarvislabs
50M+
GPU hours served
99.9%
Uptime SLA
FAQ

Frequently asked questions

Everything you need to know about the NVIDIA B200 and when it's coming to JarvisLabs.

We'll add B200 instances as hardware becomes available. H100 and H200 are available today for immediate use.

Get notified when B200 is available

192GB HBM3e. 8 TB/s bandwidth. 2nd-gen Transformer Engine. Sign up to be the first to know.

View Available GPUs