The Race to Build a Personal AI Agent (And Why Nobody Has Won Yet)
Everyone wants to build an AI chief of staff. Here's my honest take on the pros and cons of OpenClaw, Hermes, Claude Code, Codex, and Gemini
Dear subscribers,
Today, I want to talk about the race to build a personal AI agent.
Everyone wants an AI chief of staff that can triage emails, book meetings, edit documents, and take care of work that we don’t want to do.
I've spent the last few months testing OpenClaw, Hermes, Claude Code, Codex, and Gemini as personal agents. In this post, I'll share:
10 capabilities a great personal agent needs
An honest take on where each product stands today
The personal agent stack that I use right now
10 capabilities that a personal agent needs
I think a great personal agent needs to be able to:
Manage email, calendar, docs, sheets, and slides
Integrate with any API, MCP, or CLI
Run recurring and triggered tasks
Remember things about you
Work across web and mobile with no friction
Switch between text and voice replies
Have a personality that’s fun to talk to
Use your computer and browser
Stay reliable
Keep your data safe
The reality is that no product checks all 10 boxes today. Here’s how each stacks up:
OpenClaw: The most flexible personal agent, but reliability is a real problem
As the original personal agent, OpenClaw still gets a lot right:
Lives inside your messaging apps. Talking to my OpenClaw in Telegram just feels more personal than talking to another agent in a separate app or terminal.
Easily switches between text and voice. I primarily talk to my OpenClaw using voice replies, which is convenient and fast.
Incredibly flexible. You can customize OpenClaw’s memory, capabilities, and integrations more deeply than any other option in this post.
But OpenClaw’s critical flaw is reliability. I estimate that 10% of my time with OpenClaw is spent fixing it instead of using it. Examples:
It forgot it had access to edit Google Docs.
It randomly started using a robot voice instead of the one I like.
It breaks half the time after every update.
Having to use Claude Code or Codex to fix my OpenClaw doesn’t feel great.
My verdict: OpenClaw is still the most powerful and flexible personal agent. But the maintenance tax is real and reliability is a major concern.
Hermes: I was skeptical at first, but it breaks less often than OpenClaw
I’ll be honest, I avoided Hermes for a long time. Too many people were shilling it on X, and the luxury handbag name felt off. What changed my mind was watching AI builders I trust quietly migrate over from OpenClaw.
I’ve been testing Hermes for the past week. Early impressions:
It’s more reliable than OpenClaw. Hermes works more consistently. It even fixed some of the cron jobs that OpenClaw on GPT 5.5 broke:
It communicates more what it’s doing. It tells me in Telegram when it completes a cron job and how long a task is taking.
It tends to work more independently. When it spots a repetitive workflow, it turns it into a reusable skill automatically. This can be a pro or a con depending on how much control you want to give up.
OpenClaw still feels more “alive” thanks to its transparent heartbeat and memory system, but I’d take reliability over those features any day.
My verdict: If OpenClaw’s maintenance tax is wearing you down, give Hermes a try. A week in, it’s been more reliable for me.
Claude Code: The best personality, but availability and rate limits are a concern
Over the past few months, Anthropic has shipped many features to replicate OpenClaw’s functionality, including routines, remote control, and chat channels. As a personal agent, Claude Code has real strengths:
Best model personality. Opus feels the most like talking to a trusted friend who will support and challenge you.
Strong agentic capabilities. Opus is also, along with GPT 5.5, the best model at just figuring things out, no matter how wild the request.
Claude Code is a rabbit hole. Using Claude Code feels like playing a video game where you’re always discovering new shortcuts. It’s a rabbit hole in the best way.
But Claude Code also has major reliability issues:
98% uptime isn’t great. It feels like Claude is randomly unavailable every other week. Key features like routines also break silently without sending any alerts.
Mobile integration is manual. You have to type /remote-control to continue your chats on mobile, and it sometimes disconnects after a while.
Rate limits are stricter. You can’t get as much usage out of the equivalent Claude plans as you can from Codex (the next option).
My verdict: Claude has a great personality and strong agentic capabilities. But Anthropic needs to prioritize fixing reliability and scaling compute.
Codex: An incredible desktop app and generous limits, but no mobile yet
As a long-time Claude subscriber, I’m genuinely impressed by how much progress the Codex team has made. They’ve shipped:
A beautiful desktop app. The app is simple and intuitive to use. You can hook it up to other popular apps with a few simple clicks instead of having to remember slash commands:
GPT 5.5 with generous rate limits. GPT 5.5 is a great model and you can also get more usage out of the Codex plan than the equivalent Claude plan.
Best-in-class browser and computer use. GPT’s strengths here help your personal agent complete workflows in products that don’t have great API support.
Codex has one big gap: you can’t talk to it on mobile yet. 80% of my conversations with my personal agents happen on mobile, so this is a deal breaker.
My verdict: Codex has won me over for coding. It has the potential to be a great personal agent once the mobile app ships (probably very soon).
Gemini: This is Google’s race to lose
Gemini is best positioned to be the go-to personal agent for the masses, but it just doesn’t have all the capabilities yet. What Gemini does well:
Native Workspace access. The Gemini app reads emails, schedules meetings, and creates docs by default. But it can’t edit docs yet (more on this below).
Best live voice and video conversations. Gemini Live lets you talk back and forth and even supports camera and screen sharing.
But the Gemini app still can’t edit Google Docs, Sheets, or Slides. All of these products are owned by Google, so it’s not great that Gemini is the only AI app that doesn’t support this. Codex and Claude Code do it natively with a few simple connections.
Overall, Google seems to be adding chat windows to all of its individual products instead of making the Gemini app the most capable personal agent.
My verdict: Google has all the data and products to win this race. But basic capabilities are still missing, and time is running out.
My personal agent stack right now
As of this writing, I use:
Hermes for everyday tasks. Emails, calendar, Google Doc edits, and honestly, therapy through voice replies. I switched over to Hermes because it’s more reliable than OpenClaw.
Codex and Claude Code for building things. When I’m coding, writing, or producing real work, I live in these apps. If I had to pick one, it’d be Codex due to the more generous rate limits (but you know how these things change fast).
The important thing is to not get stuck like me, migrating from one personal agent to another instead of doing real work. Pick one or two agents that work for you based on the pros and cons above and just commit.
One thing I can promise you:
Once you have an agent that’s available 24/7 and can actually get work done for you, you’ll never go back to a regular AI chat interface again.










I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
Peter, I’m pretty stoked to say that my “Claude-Claw” setup (entirely Claude Code) is ticking each of your 10 boxes.
Reliable as in, haven’t ghosted me or frozen since set-up.
Voice / text replies via ElevenLabs / Telegram — any message below 200characters is a voice note.