Best GPU for Stable Diffusion: SDXL, SD 1.5, and FLUX (2026 Guide)

Vishnu Subramanian
Vishnu Subramanian
Founder @JarvisLabs.ai

For Stable Diffusion XL inference, an RTX 4090 (24GB, $0.59/hr on JarvisLabs) is the sweet spot — fast generation, enough VRAM for high-resolution outputs, and great value. For SD 1.5, even an L4 (24GB, $0.44/hr) works well. For FLUX, see our dedicated FLUX GPU guide. For training or fine-tuning custom models, an A100 80GB gives you room for larger batch sizes and LoRA training at higher resolutions.

VRAM Requirements by Model

ModelMinimum VRAMRecommended VRAMNotes
SD 1.54GB8-12GBRuns on almost anything
SDXL8GB12-24GB1024×1024 base resolution
SDXL + Refiner12GB24GBTwo models loaded simultaneously
SD 3 Medium8GB16-24GBMMDiT architecture
FLUX.1 Dev12GB24GB+See FLUX GPU guide
FLUX.1 Schnell10GB16-24GBFaster, lighter FLUX variant

These are for inference with default settings. Higher resolutions, larger batch sizes, ControlNet, IP-Adapter, and other extensions increase VRAM requirements.

GPU Recommendations

Best Overall: NVIDIA RTX 4090 (24GB)

The RTX 4090 hits the sweet spot for Stable Diffusion:

  • 24GB VRAM — handles SDXL at full resolution with room for ControlNet, LoRAs, and upscaling
  • Fast generation — SDXL 1024×1024 in ~3-5 seconds per image (20 steps, Euler sampler)
  • Good tensor performance — 4th-gen Tensor Cores accelerate both inference and training
  • $0.59/hr on JarvisLabs — best price-to-performance for image generation

For most Stable Diffusion users — whether generating images, running ComfyUI workflows, or fine-tuning LoRAs — the RTX 4090 is the GPU to pick. Check our pricing page.

Best Budget: NVIDIA L4 (24GB)

The L4 is surprisingly capable for image generation:

  • 24GB VRAM — same memory as RTX 4090, so it runs SDXL and FLUX without issues
  • Slower generation — roughly 2-3x slower than RTX 4090 for inference
  • $0.44/hr on JarvisLabs — cheapest option that comfortably runs SDXL
  • Low power — 72W TDP, efficient for batch generation

Best for: batch generation jobs where speed per image matters less than cost per image. If you're generating thousands of images overnight, the L4's lower hourly rate can be more cost-effective despite slower generation.

Best for Training: NVIDIA A100 80GB

For fine-tuning Stable Diffusion models or training LoRAs:

  • 80GB VRAM — large batch sizes, high-resolution training (1024×1024+), multiple models in memory
  • High memory bandwidth (2.0 TB/s) — faster data loading during training
  • $1.49/hr on JarvisLabs — reasonable for training sessions that take hours
  • Multi-GPU support — scale to 2-4 A100s for faster training

Fine-tuning SDXL LoRAs works on an RTX 4090 (24GB) for most cases, but full model fine-tuning and DreamBooth training at high resolutions benefit from the A100's extra memory.

Best for Production: NVIDIA H100 (80GB)

For production image generation APIs serving many users:

  • Highest throughput — fastest single-GPU generation speeds
  • 80GB VRAM — serve multiple models simultaneously (SD, SDXL, FLUX)
  • FP8 support — run quantized models with minimal quality loss for higher throughput
  • $2.69/hr on JarvisLabs — worth it when throughput per dollar matters

The H100 makes sense when you're running a production service and need maximum images-per-second. For individual generation, the RTX 4090 is more cost-effective.

GPU Comparison for SDXL

GPUVRAMSDXL Speed (approx)Price/hrCost per 1000 images
H10080GB~1.5-2s/image$2.69~$1.50-2.00
A100 80GB80GB~3-4s/image$1.49~$1.50-2.00
RTX 409024GB~3-5s/image$0.59~$0.80-1.20
RTX 6000 Ada48GB~3-4s/image$0.99~$1.00-1.50
A600048GB~5-7s/image$0.79~$1.20-1.80
L424GB~8-12s/image$0.44~$1.20-1.80
A500024GB~8-10s/image$0.49~$1.40-1.80
RTX 309024GB~5-7s/image$0.29~$0.60-0.90

Speeds are approximate for SDXL 1024×1024, 20 steps, Euler sampler, batch size 1. Actual performance varies by sampler, step count, extensions, and software stack.

Key insight: The RTX 3090 ($0.29/hr) offers the lowest cost per image for batch workloads where you don't need the latest architecture features. The RTX 4090 offers the best speed-per-dollar for interactive generation.

Optimizing Stable Diffusion Performance

Software Stack Matters

The right software stack can 2-5x your generation speed regardless of GPU:

  • xformers — memory-efficient attention, reduces VRAM usage and speeds up generation
  • torch.compile — PyTorch 2.0+ compilation can significantly speed up repeated inference
  • TensorRT — NVIDIA's inference optimizer, provides the fastest generation but requires model conversion
  • SDPA (Scaled Dot Product Attention) — built into PyTorch 2.0+, automatic optimization

ComfyUI vs Automatic1111

ComfyUI is generally more memory-efficient than Automatic1111 WebUI, especially for complex workflows with multiple models. If you're hitting VRAM limits, switching to ComfyUI can help.

Batch Generation Tips

For generating large batches of images:

  1. Use the largest batch size that fits in VRAM — GPU utilization improves with larger batches
  2. Keep models loaded — loading/unloading models between generations wastes time
  3. Use FP16 — half precision is the standard for inference, no quality loss vs FP32
  4. Consider L4 or RTX 3090 for overnight batches — lower hourly cost matters more than speed for batch jobs

Fine-Tuning and Training

LoRA Training

LoRA fine-tuning is the most common way to customize Stable Diffusion. VRAM requirements:

Training TaskMinimum GPURecommended GPU
SD 1.5 LoRA (512×512)8GB+ GPURTX 4090 (24GB)
SDXL LoRA (1024×1024)16GB+ GPURTX 4090 (24GB) or A100
DreamBooth (SD 1.5)16GB+ GPURTX 4090 (24GB)
DreamBooth (SDXL)24GB+ GPUA100 80GB
Full fine-tune (SDXL)48GB+ GPUA100 80GB or 2x RTX 4090

Training Time Estimates

LoRA training for 1,000 steps on SDXL:

GPUApproximate Time
H1005-10 minutes
A100 80GB10-15 minutes
RTX 409010-20 minutes
A600015-25 minutes
L425-40 minutes

Times vary significantly based on resolution, batch size, optimizer, and whether gradient checkpointing is enabled.

Which Resolution Needs Which GPU?

ResolutionVRAM Needed (SDXL)Minimum GPU
512×512~6-8GBAny 8GB+ GPU
768×768~8-10GB12GB+ GPU
1024×1024 (native SDXL)~10-12GB12GB+ GPU
1024×1024 + ControlNet~14-18GB24GB GPU
1536×1536~18-22GB24GB GPU
2048×2048~28-35GB48GB+ GPU (A6000, RTX 6000 Ada)

For resolutions above 1024×1024, consider using SDXL's native 1024×1024 generation followed by an upscaler (Real-ESRGAN, 4x-UltraSharp) rather than generating at high resolution directly.

FAQ

What is the minimum GPU for Stable Diffusion?

SD 1.5 runs on 4GB+ GPUs. SDXL needs at least 8GB (tight) and is comfortable with 12-24GB. For a good experience with SDXL and extensions, 24GB (RTX 4090, L4) is recommended.

Is RTX 4090 or A100 better for Stable Diffusion?

RTX 4090 for inference — it's faster per dollar for image generation. A100 for training — the 80GB VRAM enables larger batch sizes and higher-resolution training. For most users generating images, RTX 4090 is the pick.

Can I run SDXL and ControlNet together?

Yes, with 24GB VRAM (RTX 4090, L4, or better). SDXL base model (~6GB) plus ControlNet (~2-4GB) plus generation buffers fit comfortably in 24GB. Running SDXL + refiner + ControlNet simultaneously may need more.

How much does it cost to generate 10,000 images?

On JarvisLabs with an RTX 4090 ($0.59/hr) generating SDXL at ~4 seconds per image: 10,000 images ≈ 11 hours ≈ $6.50. With an RTX 3090 ($0.29/hr) at ~6 seconds per image: ~17 hours ≈ $4.90. Check our pricing page.

Should I buy a GPU or rent cloud GPUs for Stable Diffusion?

If you generate images daily for hours, buying an RTX 4090 ($1,599) pays for itself within a few months versus cloud rental. If you generate intermittently or need multiple GPUs for batch jobs, cloud rental is more flexible and avoids the upfront cost.

What GPU do I need for FLUX?

FLUX requires more VRAM than SDXL — 12GB minimum, 24GB recommended. See our complete FLUX GPU guide for detailed recommendations.

Build & Deploy Your AI in Minutes

Get started with JarvisLabs today and experience the power of cloud GPU infrastructure designed specifically for AI development.

← Back to FAQs