I've actually built this and have added a lot more to it.
It listens to my fathom transcripts, schedules followups, replies to emails I don't need to (though I approve everything in Slack) creates monthly reporting decks that curates all of my business critical metrics, creates briefs / drafts / pushes to the website, creates training pairs for editorial feedback, tracks urls in my SEO tools and in Google Analytics, updates my workflow tools, updates statuses, keeps a daily log of activities...
I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
I’ve been thinking of trying Hermes but the potential bot upvoting on X and Reddit kept me skeptical. You’ve inspired me to try it! Have you migrated from OC to Hermes completely or running in parallel?
I've migrated to Hermes completely. Honestly I was a little skeptical too (the profile is an anime girl after all) but it just works more reliably than OpenClaw right now. I'm actually working on an opinionated Hermes gude here that'll hopefully go live by mid-June: https://www.behindthecraft.com/
Could not agree more with this perspective. This is the current move all big AI players are pivoting towards.
Few comments from the field:
1. Gemini is actually great with emails, calendar, chats etc. Of course, limited to Google Workspace. At this point thier integration of Gemini within Workspace + the addition of Gems makes it a great competitor in this race.
2. Claude Cowork. The first product which by design working on the well 10 points you chose. A fork of Claude Code, especially designed for business.
3. Copilot Cowork from Microsoft. Microsoft, within their Copilot stack (Copilot M365, Copilot Studio and Agents365) is the closest to check all those boxes, and even for enterprise customers and highly regulated industries.
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
Google is terrible at most things it tries beyond basic search. Every good idea they come up with is eventually murdered by their own inability to implement well. Remember Google+? Great idea. Dead.
I've actually built this and have added a lot more to it.
It listens to my fathom transcripts, schedules followups, replies to emails I don't need to (though I approve everything in Slack) creates monthly reporting decks that curates all of my business critical metrics, creates briefs / drafts / pushes to the website, creates training pairs for editorial feedback, tracks urls in my SEO tools and in Google Analytics, updates my workflow tools, updates statuses, keeps a daily log of activities...
That only scratches the surface.
I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
I’ve been thinking of trying Hermes but the potential bot upvoting on X and Reddit kept me skeptical. You’ve inspired me to try it! Have you migrated from OC to Hermes completely or running in parallel?
I've migrated to Hermes completely. Honestly I was a little skeptical too (the profile is an anime girl after all) but it just works more reliably than OpenClaw right now. I'm actually working on an opinionated Hermes gude here that'll hopefully go live by mid-June: https://www.behindthecraft.com/
The personal AI agent race will not be won by whoever builds the best chatbot with more tools.
It will be won by whoever solves the operating layer:
memory,
permissions,
workflow state,
context routing,
verification,
rollback,
and user trust.
Most “AI chief of staff” demos look impressive because they operate in clean toy environments.
Real work is messy.
Emails contradict calendars.
Docs have hidden context.
Tasks depend on judgment.
People change priorities.
Systems fail silently.
And one bad autonomous action can destroy trust instantly.
The real product is not an agent.
It is a personal operating system where AI can act without becoming dangerous, annoying, or wrong at scale.
Nobody has won yet because autonomy is easy to demo and hard to trust.
Could not agree more with this perspective. This is the current move all big AI players are pivoting towards.
Few comments from the field:
1. Gemini is actually great with emails, calendar, chats etc. Of course, limited to Google Workspace. At this point thier integration of Gemini within Workspace + the addition of Gems makes it a great competitor in this race.
2. Claude Cowork. The first product which by design working on the well 10 points you chose. A fork of Claude Code, especially designed for business.
3. Copilot Cowork from Microsoft. Microsoft, within their Copilot stack (Copilot M365, Copilot Studio and Agents365) is the closest to check all those boxes, and even for enterprise customers and highly regulated industries.
Peter, I’m pretty stoked to say that my “Claude-Claw” setup (entirely Claude Code) is ticking each of your 10 boxes.
Reliable as in, haven’t ghosted me or frozen since set-up.
Voice / text replies via ElevenLabs / Telegram — any message below 200characters is a voice note.
His names Watney. Based in Mark Watney’s personality from The Martian — a fixer with impeccable optimism and humor
Great framing, personal AI agents will only win when they understand context, trust, and daily workflows better than a normal chatbot.
are you also using Hermes to orchestrate software development in Claude Code and potential other setups like Qwen (local compute)?
Such a good overview! Thanks for sharing this 💫
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
https://substack.com/@csarticles/note/p-196851209?r=8co1m9
Therapy through voice replies is interesting… why that over text replies? (given your voice input)
Voice just sounds more personal and is easier to use while out on a walk
Have you defined a personality or soul.md for your Hermes agent?
No I just copied my OpenClaw soul.md over. It seems to work the same.
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
Cheers from Brazil!
I have something good cooking with Notion will share more soon!
ALREADY ANXIOUS 🫠
I’d say Hermes has won
Google is terrible at most things it tries beyond basic search. Every good idea they come up with is eventually murdered by their own inability to implement well. Remember Google+? Great idea. Dead.
I think they'll pull through eventually it just takes awhile.
Never betting against Demis.