Local LLM Performance Architect

Local AI VRAM Calculator

Accurately calculate weight and KV cache footprint for local AI runs. Compare model fitting across GGUF & EXL2 quantizations, view matching local GPU setups, and check renting economics.

Grouped-Query Attention (GQA) Supported Static client-side app Quantization-aware
Fits

VRAM OK. Ready for local execution.

Analyzing mathematical memory margins...

0.0 GB of 16.0 GB VRAM utilized

📊 VRAM Allocation Blueprint

Interactive graphical breakdown of where every megabyte goes.

Model Weights 0.0 GB
KV Cache 0.0 GB
Engine Overhead 0.0 GB
VRAM Limit 16.0 GB
Model Weights
-
Params × Bitrate / 8
KV Cache Size
-
2×L×KV_H×(H/Q_H)×Ctx×Bytes
Total Required
-
Weights + KV + Overhead
Inference Job
-
Execution duration/run

💻 Local GPU Compatibility Recommender

See how physical consumer GPUs and cloud hardware fit your configured parameters.

💰 Buy vs. Cloud Rental Economics

Calculates direct cloud GPU API run costs versus purchasing dedicated physical hardware.

- Estimated cost per cloud API run.
- Cloud rental spent per month.
- Hardware payback timeline.

🧮 View Mathematical Formula Specs

[Expand]
Llama 3 8B
5.8 GB / 16 GB FITS