Local LLM Performance Architect
Local AI VRAM Calculator
Accurately calculate weight and KV cache footprint for local AI runs. Compare model fitting across GGUF & EXL2 quantizations, view matching local GPU setups, and check renting economics.
Fits
VRAM OK. Ready for local execution.
Analyzing mathematical memory margins...
0.0 GB of 16.0 GB VRAM utilized
Model Weights
-
Params × Bitrate / 8
KV Cache Size
-
2×L×KV_H×(H/Q_H)×Ctx×Bytes
Total Required
-
Weights + KV + Overhead
Inference Job
-
Execution duration/run
Local GPU Compatibility Recommender
See how physical consumer GPUs and cloud hardware fit your configured parameters.
Buy vs. Cloud Rental Economics
Calculates direct cloud GPU API run costs versus purchasing dedicated physical hardware.
-
Estimated cost per cloud API run.
-
Cloud rental spent per month.
-
Hardware payback timeline.