I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
Google is terrible at most things it tries beyond basic search. Every good idea they come up with is eventually murdered by their own inability to implement well. Remember Google+? Great idea. Dead.
I’d probably frame one part slightly differently: I’m not sure “AI chief of staff” is the end state people actually want.
Most users don’t want a highly autonomous operator making decisions. They want a low-friction cognitive extension that reduces coordination and context-switching without feeling invasive or unpredictable.
That’s why reliability matters so much more than demos right now. The trust threshold for delegation is dramatically higher than the trust threshold for chat.
Peter, I’m pretty stoked to say that my “Claude-Claw” setup (entirely Claude Code) is ticking each of your 10 boxes.
Reliable as in, haven’t ghosted me or frozen since set-up.
Voice / text replies via ElevenLabs / Telegram — any message below 200characters is a voice note.
His names Watney. Based in Mark Watney’s personality from The Martian — a fixer with impeccable optimism and humor
Great framing, personal AI agents will only win when they understand context, trust, and daily workflows better than a normal chatbot.
are you also using Hermes to orchestrate software development in Claude Code and potential other setups like Qwen (local compute)?
Such a good overview! Thanks for sharing this 💫
It seems to me that prompt injection is still a huge blocker for productizing these tools for everyday use. Do you think that will be solved or mitigated any time soon?
https://substack.com/@csarticles/note/p-196851209?r=8co1m9
Therapy through voice replies is interesting… why that over text replies? (given your voice input)
Voice just sounds more personal and is easier to use while out on a walk
Have you defined a personality or soul.md for your Hermes agent?
No I just copied my OpenClaw soul.md over. It seems to work the same.
Hey Peter! Great post, thanks for sharing. I always love to hear your thoughts on this, have been following you since the first Openclaw outburst and really appreciate this comparative rundown.
Quite curious to hear your thoughts about Notion AI. I have been using it for a couple of months now, alongside with Claude, and I really think it is the unappreciated racer on this battle. Did you get the opportunity to test it already ?
Cheers from Brazil!
I have something good cooking with Notion will share more soon!
ALREADY ANXIOUS 🫠
I’d say Hermes has won
Google is terrible at most things it tries beyond basic search. Every good idea they come up with is eventually murdered by their own inability to implement well. Remember Google+? Great idea. Dead.
I think they'll pull through eventually it just takes awhile.
Never betting against Demis.