// Mac Device Check

What can your Mac
actually run?

Most people know their Mac by name — MacBook Air, MacBook Pro — not by chip or memory. This page bridges the gap. Pick your device and memory; get a straight answer.

> The constraint on Mac is RAM, not chip generation. Here's what yours unlocks.

iPhone / iPad Mac Linux

01 Which Mac?

Not sure which model? Check Apple menu → About This Mac

02 How much memory?

Find it at Apple menu → About This Mac — look for "Memory" or "Unified Memory"

info On Mac, memory (RAM) is the key variable — two MacBook Pros from the same year can land in completely different tiers. Chip generation matters less than you'd think; unified memory architecture means even older M1 chips use their memory efficiently for AI.

// Why Mac is different from iPhone

The Mac AI story nobody tells you

iPhone AI gets all the coverage. Mac can go significantly further — and for different reasons than you might expect.

// No jetsam

Apps don't get killed silently

iOS enforces a per-app memory budget and kills apps silently when models exceed it. macOS doesn't. If memory gets tight, it slows down via SSD swap — the model keeps running.

→ Slower, not dead

// Unified memory

CPU, GPU, and Neural Engine share one pool

Apple Silicon's unified memory means the GPU can work on model weights that were just used by the CPU — no copy overhead. This is genuinely better for LLM inference than discrete VRAM setups.

→ Memory works harder here

// SSD as fallback

Models larger than RAM can still load

When a model exceeds physical RAM, macOS pages layers to the SSD instead of refusing to load. A 7B model on 8GB RAM is slow (~5 tok/s) but functional. On iPhone, it crashes.

→ 8 GB Macs have more reach than you'd expect

// Background inference

Generate while you work

On macOS, a model can generate in the background while you write in another window. On iPhone, switching apps suspends inference. Mac lets the model finish its thought.

→ Real multitasking

// Context depth

Much longer conversations

Without a per-app memory cap, you can push 32K–128K token context windows with the right models. That means feeding an entire codebase or document into a single prompt and getting coherent answers.

→ Whole-project reasoning

// Network serve

Serve your iPhone and iPad too

With a local inference server (Ollama, or Mulberry IDE's network mode), your Mac can serve AI responses to your iPhone and iPad over Wi-Fi — one model, multiple devices, still private.

→ Your Mac as a local AI hub

// Find your RAM

Not sure how much memory you have?

Most people can't name their RAM without looking it up — that's normal. Apple puts it exactly one click away.

01 Click the Apple menu () in the top-left corner of your screen
02 Select About This Mac
03 Look for "Memory" or "Unified Memory" — it will say something like 16 GB
04 Come back and select that number in Step 2 above

// The four Mac tiers

What your memory unlocks

Entry

8 GB Unified Memory

MacBook Air M1/M2/M3 · Mac mini M1/M2 · iMac M1/M3 (base)

The most misunderstood tier. 4B-class models (Phi-4 mini, Qwen3 4B, Gemma 3 4B) run at full speed with headroom to spare. 7B models load and run too — via SSD swap at ~5 tok/s. Useful for patient tasks; not the right choice for interactive writing or coding.

Phi-4 mini · 2.2 GB ★ Qwen3 4B · 2.5 GB ★ Gemma 3 4B · 2.5 GB Llama 3.2 3B · 2 GB Llama 3.1 8B · 4.7 GB ⚠ slow via swap

Standard

16–18 GB Unified Memory

MacBook Air M1/M2/M3/M4 · MacBook Pro M1/M2 14" · Mac mini M2/M4 · iMac M3/M4

The everyday workhorse. Llama 3.1 8B is your default — full Neural Engine speed (~15–25 tok/s on M2+), long sessions without slowdown. Gemma 3 12B adds a quality step for complex reasoning. 18 GB configs (M3 Pro) handle Qwen 2.5 14B cleanly.

Llama 3.1 8B · 4.7 GB ★ Gemma 3 12B · 7 GB ★ Qwen 2.5 7B · 4.5 GB Mistral 7B · 4.5 GB Qwen 2.5 14B · 8 GB ⚠ tight on 16 GB

Pro

24–36 GB Unified Memory

MacBook Air M2/M3/M4 (max) · MacBook Pro M3 Pro / M4 Pro · Mac mini M4 Pro

Serious AI hardware. Qwen 2.5 32B and Gemma 3 27B run cleanly — this is where local AI starts competing with cloud API quality. 36 GB configs (M3 Pro, M4 Pro) can also push Llama 3.1 70B at Q2 quantization.

Qwen 2.5 32B · ~20 GB ★ Gemma 3 27B · ~15 GB ★ Qwen 2.5 14B · 8 GB Llama 3.1 70B Q2 · ~30 GB ⚠ 36 GB only

Max

48 GB+ Unified Memory

MacBook Pro M3 Max / M4 Max · Mac Studio M2/M4 Max / Ultra · Mac Pro

No practical ceiling. Llama 3.1 70B Q4 runs at speed. Multiple large models simultaneously. Serve inference to other devices on your network. At 192 GB+ (Mac Studio Ultra, Mac Pro), 405B-class models become possible.

Llama 3.1 70B Q4 · ~40 GB ★ Qwen 72B · ~42 GB ★ Mistral Large · ~40 GB Multiple simultaneous models 405B Q4 · ~200 GB ⚠ 192 GB+ only

lock

Whatever tier your Mac runs, the model lives on your machine. No cloud round-trip, no account, no telemetry. Your prompts, your documents, and your code never leave the device.

What can your Macactually run?

The Mac AI story nobody tells you

Not sure how much memory you have?

What your memory unlocks

What can your Mac
actually run?