I've actually built this and have added a lot more to it.
It listens to my fathom transcripts, schedules followups, replies to emails I don't need to (though I approve everything in Slack) creates monthly reporting decks that curates all of my business critical metrics, creates briefs / drafts / pushes to the website, creates training pairs for editorial feedback, tracks urls in my SEO tools and in Google Analytics, updates my workflow tools, updates statuses, keeps a daily log of activities...
I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
the personal agent problem is mostly unsolved at the context layer, not the model layer. a model can reason and act. what it can’t do reliably is know enough about you to take the right action. that’s the hard part.
The race has been on in this space ever since Claude Code and OpenClaw launched. I disagree with the “it’s Google’s race to lose”. Google has the reputation that others don’t quite live upto and frankly, can’t. I think they’re already learning from all these use cases and waiting for the opportunistic moment to drop a big release that will shift things heavily in Google’s favour.
I’ve been thinking of trying Hermes but the potential bot upvoting on X and Reddit kept me skeptical. You’ve inspired me to try it! Have you migrated from OC to Hermes completely or running in parallel?
I've migrated to Hermes completely. Honestly I was a little skeptical too (the profile is an anime girl after all) but it just works more reliably than OpenClaw right now. I'm actually working on an opinionated Hermes gude here that'll hopefully go live by mid-June: https://www.behindthecraft.com/
Could not agree more with this perspective. This is the current move all big AI players are pivoting towards.
Few comments from the field:
1. Gemini is actually great with emails, calendar, chats etc. Of course, limited to Google Workspace. At this point thier integration of Gemini within Workspace + the addition of Gems makes it a great competitor in this race.
2. Claude Cowork. The first product which by design working on the well 10 points you chose. A fork of Claude Code, especially designed for business.
3. Copilot Cowork from Microsoft. Microsoft, within their Copilot stack (Copilot M365, Copilot Studio and Agents365) is the closest to check all those boxes, and even for enterprise customers and highly regulated industries.
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
I've actually built this and have added a lot more to it.
It listens to my fathom transcripts, schedules followups, replies to emails I don't need to (though I approve everything in Slack) creates monthly reporting decks that curates all of my business critical metrics, creates briefs / drafts / pushes to the website, creates training pairs for editorial feedback, tracks urls in my SEO tools and in Google Analytics, updates my workflow tools, updates statuses, keeps a daily log of activities...
That only scratches the surface.
I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
the personal agent problem is mostly unsolved at the context layer, not the model layer. a model can reason and act. what it can’t do reliably is know enough about you to take the right action. that’s the hard part.
The race has been on in this space ever since Claude Code and OpenClaw launched. I disagree with the “it’s Google’s race to lose”. Google has the reputation that others don’t quite live upto and frankly, can’t. I think they’re already learning from all these use cases and waiting for the opportunistic moment to drop a big release that will shift things heavily in Google’s favour.
I’ve been thinking of trying Hermes but the potential bot upvoting on X and Reddit kept me skeptical. You’ve inspired me to try it! Have you migrated from OC to Hermes completely or running in parallel?
I've migrated to Hermes completely. Honestly I was a little skeptical too (the profile is an anime girl after all) but it just works more reliably than OpenClaw right now. I'm actually working on an opinionated Hermes gude here that'll hopefully go live by mid-June: https://www.behindthecraft.com/
The personal AI agent race will not be won by whoever builds the best chatbot with more tools.
It will be won by whoever solves the operating layer:
memory,
permissions,
workflow state,
context routing,
verification,
rollback,
and user trust.
Most “AI chief of staff” demos look impressive because they operate in clean toy environments.
Real work is messy.
Emails contradict calendars.
Docs have hidden context.
Tasks depend on judgment.
People change priorities.
Systems fail silently.
And one bad autonomous action can destroy trust instantly.
The real product is not an agent.
It is a personal operating system where AI can act without becoming dangerous, annoying, or wrong at scale.
Nobody has won yet because autonomy is easy to demo and hard to trust.
Could not agree more with this perspective. This is the current move all big AI players are pivoting towards.
Few comments from the field:
1. Gemini is actually great with emails, calendar, chats etc. Of course, limited to Google Workspace. At this point thier integration of Gemini within Workspace + the addition of Gems makes it a great competitor in this race.
2. Claude Cowork. The first product which by design working on the well 10 points you chose. A fork of Claude Code, especially designed for business.
3. Copilot Cowork from Microsoft. Microsoft, within their Copilot stack (Copilot M365, Copilot Studio and Agents365) is the closest to check all those boxes, and even for enterprise customers and highly regulated industries.
Peter, I’m pretty stoked to say that my “Claude-Claw” setup (entirely Claude Code) is ticking each of your 10 boxes.
Reliable as in, haven’t ghosted me or frozen since set-up.
Voice / text replies via ElevenLabs / Telegram — any message below 200characters is a voice note.
His names Watney. Based in Mark Watney’s personality from The Martian — a fixer with impeccable optimism and humor
Great framing, personal AI agents will only win when they understand context, trust, and daily workflows better than a normal chatbot.
are you also using Hermes to orchestrate software development in Claude Code and potential other setups like Qwen (local compute)?
Such a good overview! Thanks for sharing this 💫
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
https://substack.com/@csarticles/note/p-196851209?r=8co1m9
Therapy through voice replies is interesting… why that over text replies? (given your voice input)
Voice just sounds more personal and is easier to use while out on a walk
Have you defined a personality or soul.md for your Hermes agent?
No I just copied my OpenClaw soul.md over. It seems to work the same.
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
Cheers from Brazil!
I have something good cooking with Notion will share more soon!
ALREADY ANXIOUS 🫠
I’d say Hermes has won