// Mac Device Check

What can your Mac
actually run?

Most people know their Mac by name — MacBook Air, MacBook Pro — not by chip or memory. This page bridges the gap. Pick your device and memory; get a straight answer.

> The constraint on Mac is RAM, not chip generation. Here's what yours unlocks.

Not sure which model? Check Apple menu → About This Mac

Find it at Apple menu → About This Mac — look for "Memory" or "Unified Memory"

info On Mac, memory (RAM) is the key variable — two MacBook Pros from the same year can land in completely different tiers. Chip generation matters less than you'd think; unified memory architecture means even older M1 chips use their memory efficiently for AI.

The Mac AI story nobody tells you

iPhone AI gets all the coverage. Mac can go significantly further — and for different reasons than you might expect.

// No jetsam
Apps don't get killed silently
iOS enforces a per-app memory budget and kills apps silently when models exceed it. macOS doesn't. If memory gets tight, it slows down via SSD swap — the model keeps running.
→ Slower, not dead
// Unified memory
CPU, GPU, and Neural Engine share one pool
Apple Silicon's unified memory means the GPU can work on model weights that were just used by the CPU — no copy overhead. This is genuinely better for LLM inference than discrete VRAM setups.
→ Memory works harder here
// SSD as fallback
Models larger than RAM can still load
When a model exceeds physical RAM, macOS pages layers to the SSD instead of refusing to load. A 7B model on 8GB RAM is slow (~5 tok/s) but functional. On iPhone, it crashes.
→ 8 GB Macs have more reach than you'd expect
// Background inference
Generate while you work
On macOS, a model can generate in the background while you write in another window. On iPhone, switching apps suspends inference. Mac lets the model finish its thought.
→ Real multitasking
// Context depth
Much longer conversations
Without a per-app memory cap, you can push 32K–128K token context windows with the right models. That means feeding an entire codebase or document into a single prompt and getting coherent answers.
→ Whole-project reasoning
// Network serve
Serve your iPhone and iPad too
With a local inference server (Ollama, or Mulberry IDE's network mode), your Mac can serve AI responses to your iPhone and iPad over Wi-Fi — one model, multiple devices, still private.
→ Your Mac as a local AI hub

Not sure how much memory you have?

Most people can't name their RAM without looking it up — that's normal. Apple puts it exactly one click away.

  • 01 Click the Apple menu () in the top-left corner of your screen
  • 02 Select About This Mac
  • 03 Look for "Memory" or "Unified Memory" — it will say something like 16 GB
  • 04 Come back and select that number in Step 2 above

What your memory unlocks

Entry
8 GB Unified Memory
MacBook Air M1/M2/M3 · Mac mini M1/M2 · iMac M1/M3 (base)
The most misunderstood tier. 4B-class models (Phi-4 mini, Qwen3 4B, Gemma 3 4B) run at full speed with headroom to spare. 7B models load and run too — via SSD swap at ~5 tok/s. Useful for patient tasks; not the right choice for interactive writing or coding.
Phi-4 mini · 2.2 GB ★ Qwen3 4B · 2.5 GB ★ Gemma 3 4B · 2.5 GB Llama 3.2 3B · 2 GB Llama 3.1 8B · 4.7 GB ⚠ slow via swap
Standard
16–18 GB Unified Memory
MacBook Air M1/M2/M3/M4 · MacBook Pro M1/M2 14" · Mac mini M2/M4 · iMac M3/M4
The everyday workhorse. Llama 3.1 8B is your default — full Neural Engine speed (~15–25 tok/s on M2+), long sessions without slowdown. Gemma 3 12B adds a quality step for complex reasoning. 18 GB configs (M3 Pro) handle Qwen 2.5 14B cleanly.
Llama 3.1 8B · 4.7 GB ★ Gemma 3 12B · 7 GB ★ Qwen 2.5 7B · 4.5 GB Mistral 7B · 4.5 GB Qwen 2.5 14B · 8 GB ⚠ tight on 16 GB
Pro
24–36 GB Unified Memory
MacBook Air M2/M3/M4 (max) · MacBook Pro M3 Pro / M4 Pro · Mac mini M4 Pro
Serious AI hardware. Qwen 2.5 32B and Gemma 3 27B run cleanly — this is where local AI starts competing with cloud API quality. 36 GB configs (M3 Pro, M4 Pro) can also push Llama 3.1 70B at Q2 quantization.
Qwen 2.5 32B · ~20 GB ★ Gemma 3 27B · ~15 GB ★ Qwen 2.5 14B · 8 GB Llama 3.1 70B Q2 · ~30 GB ⚠ 36 GB only
Max
48 GB+ Unified Memory
MacBook Pro M3 Max / M4 Max · Mac Studio M2/M4 Max / Ultra · Mac Pro
No practical ceiling. Llama 3.1 70B Q4 runs at speed. Multiple large models simultaneously. Serve inference to other devices on your network. At 192 GB+ (Mac Studio Ultra, Mac Pro), 405B-class models become possible.
Llama 3.1 70B Q4 · ~40 GB ★ Qwen 72B · ~42 GB ★ Mistral Large · ~40 GB Multiple simultaneous models 405B Q4 · ~200 GB ⚠ 192 GB+ only
lock

Whatever tier your Mac runs, the model lives on your machine. No cloud round-trip, no account, no telemetry. Your prompts, your documents, and your code never leave the device.