Is This the First AI Analyst That Actually Works? | James Evans (Amplitude)

Real talk about where AI is useful for analytics and where it falls short. Plus a live demo of AI agents that can monitor your product 24/7 and auto-create experiments.

Jul 06, 2025

Dear subscribers,

Today, I want to share a new episode with James Evans.

James is the Head of AI for Amplitude, the leading product analytics platform. We had a great chat about why nobody has built a good AI analyst yet and why he’s betting on AI agents that can monitor your product 24/7. James also shared some real talk about whether AI will displace PM and data scientist jobs.

Watch now on YouTube, Apple, and Spotify.

James and I talked about:

(00:00) Why nobody has built a good AI data analyst yet
(03:32) The biggest problem with product analytics today
(06:13) Live demo: AI agents that monitor your product 24/7
(21:48) The hardest parts of building AI analytics product
(25:21) How to evaluate AI agents that run experiments
(32:53) AI product pricing strategies that actually work
(35:42) Should PMs and data scientists worry about their jobs?
(38:04) Non-obvious advice for building AI products

This post is brought to you by…Vanta

To scale your company, you need compliance. And by investing in compliance early, you protect sensitive data and simplify the process of meeting industry standards—ensuring long-term trust and security.

Vanta helps growing companies achieve compliance quickly and painlessly by automating 35+ frameworks—including SOC 2, ISO 27001, HIPAA, and more.

Start with Vanta’s Compliance for Startups Bundle with key resources for free.

Get the free bundle now

Why AI hasn’t cracked analytics yet

Amplitude is betting on a team of AI agents that can accomplish specific tasks

So AI has transformed coding with tools like Cursor. Why hasn't anyone figured out how to build an AI analyst yet?

For analytics specifically, it's a much more multimodal job than text generation. Just like in traditional analytics, data quality is the bottleneck for good insights.

There are hard data science problems people have been working on for a long time that underlie that constraint. On the action side, when it comes to generating experiments or shipping surveys, you need the human workflow and existing software infrastructure first.

You can't just rely on an AI agent to generate code and hope it works.

What's the biggest problem with product analytics today?

The biggest problem is it takes time. There aren't that many people whose jobs are to live in tools like Amplitude and squeeze out insights.

Some Amplitude customers are insanely good at using our platform. A large part of our job is taking what we see those customers doing and getting all our customers to use Amplitude the same way—to know when to use it, make sure their taxonomy is great, know what kinds of actions tend to drive impact.

My hot take is that:

AI analytics should focus on providing unlimited time instead of unlimited intelligence.

There are some things these agents find that humans may not have stumbled upon, but most of it is just doing the same things you can do in Amplitude as a human if you had 100 hours in a week.

AI agents that monitor your website 24/7

Our AI agents are basically AI users of Amplitude that have a specific goal.

We had a choice between two approaches:

A monolithic agent that optimizes across all your pages and KPIs at once. This would be like having one super-agent handling everything.
Goal-specific agents that map to how humans work. Each agent does something specific like website conversion optimization.

We chose the goal-specific path because it makes iteration much easier. We have agents that help with things like:

Website conversion for marketing
Cart abandonment conversion for e-commerce
Feature adoption for product

Can you share an end-to-end example of how an agent works?

Sure, let me show you how to create a website conversion agent for our Amplitude pricing page. It works in 3 steps:

Understand best practices. This agent knows how website conversion works and what the specific friction points are.
Pull and analyze data. Behind the scenes, it’s writing queries to grab and analyze data for the pricing page.
Extract insights. Based on the data, it’s concluded that there are dead clicks on the headers of each pricing option.

The reason it knows about dead clicks is because this is the webpage optimization template, and dead clicks are a classic problem. We give it that context instead of relying on the LLM.

Suggest actions. Based on this insight, it’s suggesting that we run an A/B test to make the headers clickable.

Does the agent run continuously, or do you have to trigger it manually each time?

The agents are persistent. This is the first run, but now that it exists and it's monitoring this page, I can tell it to monitor other pages too.

It could surface a Slack message like: "Hey, I think I found dead clicks happening at a specific part of the pricing page. This is a conversion opportunity and I created this experiment. Can I run it?"

There's a whole art to deciding how it's triggered to look for opportunities:

Regular monitoring. It looks for problems based on conversion best practices.
Anomaly-triggered investigation. New anomalies trigger it to investigate deeper. For example, the classic anomaly would be conversion rate dropping.

Why wouldn't I just turn these AI agents on for every page on my website?

I think it suffers from a similar problem as anomaly detectors. If you turn it on for every page, you might just get spammed with notifications and then you probably wouldn't pay attention to them.

This product is intentionally high friction. The expectation should be that the first time you create an agent, it's probably giving you a pretty obvious insight or something you're aware of, and you have to give it feedback. Our bet is our AI agents will get better the more feedback the user provides.

You should create an agent to outsource a problem, not for every little question.

The hardest part of building AI analytics

What was the hardest part about building this product?

Everyone has high expectations for what they want AI to do for them. There are tons of data quality use cases, instrumentation use cases, and data science use cases customers want.

But in my experience, customers don't have a lot of intuition about what AI is actually good at. They'll describe what they want as "the crystal ball," and that's not an easy product to build today.

You have to be less focused on traditional B2B enterprise customer discovery and more on building an MVP and testing it.

I think that's hard for bigger companies—they're not used to building product that way.

How do you evaluate whether Amplitude’s AI agents are suggesting the right insights and actions?

We do some evaluation, but the challenge is that it's not clear what a "good" experiment really means. In the early days, the product was producing very subtle border radius changes—the kind of thing that you could run a thousand experiments for and it wouldn't do anything.

It’s also not always clear that an experiment that fails isn’t valuable.

We're prioritizing the success metric of: are people clicking "yes, I approve this experiment"? If you fail 10 experiments but customers still keep clicking approve, that's a win for us.

We're also working on proactive clarification—the agent can follow up and ask "did this create value for you?"

Trading off AI model functionality and cost

Do you have approval to upgrade to whatever the latest model is or is cost a factor?

I think people get way too lost in the sauce on pricing for AI products. Just do rate limiting and fair use—that eliminates most nightmare cost scenarios.

At my previous startup Command AI, we priced by MTUs (monthly tracked users), not by chats. We took a risk because if a user asks many complicated questions, we could lose money. But we don't want people thinking "is it worth rolling out the chatbot to these users because it costs us X?" We just want users to use our product maximally.

The reality is it's hard for AI products to find product-market fit. If you have people using it too much, that's a post-success problem.

How do you handle roadmap planning when models are improving so rapidly?

Part of the magic is having intuition for where models are getting better in your specific domain and then building and expecting the models to improve. The timeline of building enterprise products has a longer aperture than the model release cycle.

If you plan to build a product on January 1st and release it in September, it might be terrible in January, but by summer, the models are actually good enough to support the use case. We've seen dramatically improved performance since April.

Who knows, maybe one day a model like GPT5 will be good enough to solve the "here's the raw data, find the insights" use case.

But we’re not there yet.

Should PMs and data scientists be worried about their jobs?

So should we be worried about our jobs as PMs and data scientists?

There are so many experiments to run that just aren't being run today. So it's not about replacing humans with AI agents.

Most companies aren't willing to let AI agents run experiments autonomously for anything but toy use cases.

So we're definitely taking the Iron Man suit framework to make PMs and data scientists better at their jobs than replacing them.

Our agents today aren't deciding what new products you should ship or what new markets to enter. It's more about informing the board deck, then the board makes decisions.

Non-obvious advice for building AI products

Any non-obvious advice for folks building AI products?

How To Make An Acquisition Work — From left, Amplitude CEO Spenser Skates, Amplitude Chief Product Officer Francois Ajenstat, Amplitude Head of AI James Evans (formerly Co-Founder + CEO of Command AI, acquired by Amplitude), and Amplitude Director of Engineering Vinay Ayyala (formerly Co-Founder + CTO of Command AI, acquired by Amplitude)

Here’s what I emphasize:

Prototype with AI first to discover what's possible. We have this tremendous ability to prototype anything—paste it into ChatGPT, see if it does a reasonable job, and if so, you can probably build a good product with some eval and RAG work. Building good AI products requires taking a "wouldn't it be cool if" mindset rather than traditional customer discovery.
Don't be afraid to test and iterate until it's actually good. A good example is Lovable and Bolt for UI prototyping. I wasn't hearing "UI prototyping is the number one thing we think AI will be good at." But these products figured out models were good at that, made it exceptional, and now everyone loves them. Don't be afraid to build and iterate in the open until it's good.
The text box is underrated. There's this debate about how much of AI products should be chat-based, but I think a text box lets the user tell you exactly what they’re trying to do with your product. With traditional software, it’s much more difficult to reach through the screen and ask "Hey, what are you trying to do?"

Thank you James! If you enjoyed this interview, follow James on LinkedIn and check out Amplitude's AI agents.

Behind the Craft by Peter Yang

Discussion about this post