The Archive

ENTRY //

Local AI Is the Holy Grail. So Where Is It?

Imagine Claude 4.7 running on your Mac mini with no login, no subscription, no usage caps. Just there, all the time, yours. That is the dream. We’re not quite there yet. But we are closer than many people realize, and the gap is shrinking faster than anyone expected.

Author
Arrow & Bell
Published
May 5, 2026

What is local AI?

Local AI is a setup where the language model itself runs on hardware you own. Your laptop, your phone, a server in the closet. The thinking happens in your house, not in a data center somewhere in Utah.

That is the core distinction. When you talk to ChatGPT or Claude, your prompt travels to OpenAI or Anthropic, gets processed on a rack of GPUs that costs more than your house, and the answer travels back. When you talk to a local model, none of that happens. The brain is on your desk, or in your pocket.

This matters for three reasons. Privacy: your data never leaves the machine. Cost: no per-token bill, no monthly subscription, no rate limits. Independence: your AI works when the internet is down, when an API gets deprecated, when a company decides to change its terms of service. You own the thing.

What does Clawdbot have to do with this?

Clawdbot (now officially called OpenClaw) is an open-source personal AI assistant that has gotten a lot of attention lately. People love it because you run it yourself, on your own machine, and it can actually do things: send emails, manage your calendar, control your browser, run scripts, all from inside WhatsApp or Telegram or Discord.

Here is the thing people get confused about. Clawdbot is not an AI. It is a body without a brain.

Think of it as a robot butler shell. It has hands (sends messages, controls your browser, reads emails, runs scripts). It has ears (listens on WhatsApp, Telegram, Discord, Signal). It has a memory drawer (remembers your preferences). It has a to-do list (cron jobs, reminders, scheduled tasks). What it does not have is the ability to actually think. For that, it needs a brain plugged into it.

The brain is the language model. And you pick which one to plug in.

Plug in a cloud brain like Claude or GPT, and Clawdbot calls Anthropic or OpenAI every time it needs to think. Smart, but you pay per use and your data takes a trip to Utah and back. Plug in a local brain like Llama or Qwen running through Ollama or Docker Model Runner, and the brain sits on your machine: free after setup, fully private, and a lot dumber than the frontier cloud models. You can also mix the two. Cheap local model for simple stuff, Claude for the hard stuff.

This is why Clawdbot keeps getting called “local AI” when it isn’t quite. The body is local. Whether the brain is local depends entirely on what you plug into it. And that choice is where the real local-AI question lives.

Why doesn’t everyone just run AI locally?

Because the best models in the world right now, Claude Opus 4.7 and GPT-5.5, are not models you can run on a Mac mini. They are not models you can run on a server rack in your basement either. They are massive: hundreds of billions of parameters, distributed across clusters of specialized hardware, drawing the kind of power your house is not zoned for.

You cannot just “download” GPT-5.5. It is not a file. It is a system. The version you can talk to through ChatGPT is the public-facing endpoint of an enormous, expensive, ongoing operation that probably costs OpenAI more per query than they charge you for it.

So when people ask “can I run AI locally instead of paying $20 a month for ChatGPT,” the honest answer is: yes, you can run AI locally. You just cannot run THAT AI locally. Not yet.

What can you actually run on a Raspberry Pi?

A Raspberry Pi 5 with 8GB of RAM can run a small local model. Llama 3.2 1B, Qwen 2.5 1.5B, Gemma 3 1B. Pair it with whisper.cpp for voice-to-text and you have a fully self-contained AI device for about $130 in parts.

Here is the catch: at this size, the AI is not very smart. We mean that with affection. It can answer simple questions. It can have a basic conversation. It can be told to act like a particular character and mostly stick with it.

What it cannot do is reason. It cannot do math beyond basic arithmetic. It will get confused by anything more than a few turns deep. And the latency, the time between you finishing your sentence and the model starting to respond, is somewhere between four and eight seconds.

To put it in perspective: if you are building a kid’s computer where pressing a button lets a child ask a question, what you are realistically delivering is a friendly little device that, after a thoughtful pause of eight seconds, might moo like a cow. For a four-year-old, that is delightful. For a small business trying to automate customer support, it is a disaster.

The Pi version of local AI is great for kid stuff and voice toys, home automation triggers (“turn off the lights when I say good night”), single-purpose devices with very narrow scope, and tinkering. It is not great for any task you would actually use ChatGPT for.

What if I have a serious computer?

Now we are getting somewhere. A Mac Studio with an M3 Ultra and 192GB of unified memory, or a desktop PC with an RTX 4090, can run much larger models. Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3 distilled. These are real models, the kind that hold up in actual work.

Performance-wise, you are looking at something roughly equivalent to GPT-4 from a couple of years ago. Not bleeding-edge frontier, but capable enough to write code, summarize documents, draft emails, and have substantive conversations. Tokens per second on a Mac Studio M3 Ultra running a quantized 70B model land around 15 to 25, which feels close to real-time.

The price tag for this kind of setup is real: $5,000 to $10,000 depending on configuration. But once you have it, you can run that model forever. No tokens, no subscriptions, no rate limits, no internet dependency. For a small business that runs a lot of AI workload, the math starts to make sense compared to the daily cost of running an AI agent on the cloud.

This is the level where local AI stops being a toy and starts being a tool. It is still a compromise compared to Claude 4.7 or GPT-5.5, but the gap has narrowed from “comically inferior” to “noticeably less polished but workable for most tasks.”

How do the local AI tiers compare today?

TierHardwareModelsSpeedWhat it can doCost
Pi-classRaspberry Pi 5, 8GBLlama 3.2 1B, Qwen 2.5 1.5B5 to 15 tok/secVoice toys, simple Q&A, kid devices, narrow automation~$130 in parts
ProsumerMac Studio M3 Ultra, RTX 4090 desktopLlama 3.3 70B, Qwen 2.5 72B15 to 25 tok/secCode, drafts, summaries, real knowledge work$5,000 to $10,000
Frontier (cloud only)Data center GPU clustersClaude Opus 4.7, GPT-5.550+ tok/sec, streamingAnything frontier AI can do$20/mo subscription or per-token API

The frontier row is included for context. You cannot run those models locally on any consumer hardware that exists today.

When will local AI catch up to the cloud?

The honest answer is it never quite will, because the frontier keeps moving. Whatever runs on your hardware in 2030 will be impressive, but the cloud version of the same year will be more impressive. That is the nature of the curve.

But “good enough” is a different question. Here is what we think happens.

Within two to three years, a high-end consumer machine will run something equivalent to today’s Claude 4.7 or GPT-5.5. That alone is enough to change how a lot of work gets done. A small business could run an always-on AI agent on a single mid-range server, processing hundreds of tasks a day in private, with no API bills.

Within five years, a phone will run something equivalent to today’s frontier. That changes everything. An AI assistant that lives in your pocket, knows your patterns, has access to your full context, and never sends a byte to anyone. Truly personal AI.

Picture what your morning looks like in that world. You wake up and your phone has already drafted responses to overnight emails, queued up the calendar shifts you will probably want, summarized the news that actually matters to you, and prepped a grocery list based on what is left in the fridge. Not because some cloud company knows all that about you, but because the model on your device does, and that data never went anywhere.

The infrastructure is already being built. Apple’s neural engines, Qualcomm’s NPUs, the new generation of efficient models like Phi and Gemma that are optimized for inference rather than benchmark scores. The pieces are coming together. It is not a question of if, just when.

What does this mean for me right now?

If you are running a small business and you want to use AI today, use the cloud. That is what works, that is what is cost-effective, that is what gets the job done. Pay the $20 a month or the API fees. The tokens are cheap (see our breakdown of AI token economics), the models are excellent, and the productivity gains are real.

If you are building something where privacy is non-negotiable, where data cannot leave your premises, where a regulator would have a problem with cloud APIs: a local model on a serious machine is now viable for many use cases. Not for everything. But more than people think. The compliance picture lines up with what we covered in The Honest State of AI Agents in April 2026.

If you are tinkering, building a kid’s device, experimenting with personal AI agents, exploring what comes next: a Raspberry Pi running a small model is a fantastic playground. It is also the most direct way to actually understand how this stuff works, because when the model is sitting on your desk, all the abstractions disappear.

And if you are watching the trajectory and trying to figure out where to bet: pay attention to the local models. Llama 3.3 was a big step. Qwen 2.5 was another. Whatever ships six months from now will be better. The cloud will always be ahead, but the floor keeps rising, and at some point the floor is high enough that the cloud is no longer necessary for most things.

That is the future Clawdbot is gesturing at. Not the future where the agent is local, which is already here. The future where the brain is local too. We are not there yet. But it is coming.

// END OF ENTRYReturn to archive →