The AI era requires a different kind of experimentation.
I bet your 2022 experimentation playbook is obsolete.
Experimentation is my favorite. I just love it. But this dear old friend of mine is quite different from even a few years ago.
And maybe that’s not such a bad thing. The old way of experimenting had its limits:
Focus on minor tweaks instead of big bets
Optimizing superficial surfaces instead of foundational complexity
Avoiding monetization (‘It takes 6 months of decision approvals & build out. If you are lucky.’)
Product mostly testing through previous/post, instead of actual A/B tests or feature flags
Focusing on fast wins - results measured in 2 weeks
Some of this made sense, back when product development was slow. There was so much bloat and so many surfaces that just creating awareness of features and explaining how to use them was a win.
But even then, many of those experiments were just pulling up revenue (if anything). They weren’t actually having incremental progress. But by improving the experience a bit or creating some urgency, we made this thing (which would have happened anyway) happen sooner. This is why a lot of experiments do deliver value to the bottom line, but they don’t deliver a fundamental, step-function change.
So there’s been room for improvement for a while. Now, things have completely changed. Here are the 4 biggest shifts and 3 things I’d do about it. Plus… why you just maybe you shouldn’t test some stuff at all?
Sponsors: Elena’s Picks
You don’t want generic ads, right? To make things more valuable for you, I’m featuring products and services that I personally recommend or am trying out:
→ Databox’s MCP & Skills Marketplace: I’m exploring this one because LLMs without the right data sources are a hot mess. So, unless your team is pumped about creating (and maintaining) dozens of separate integrations, you’ll need something like Databox’s new MCP - which gives you 130+ integrations in one place. Plus… not sure what to do with all that yummy data? They’ve got a skills marketplace with tons of pre-built reports and workflows.
🎁 Offer [Exclusive]: Get 30 days on the Analyst plan, for free. An exclusive offer for Growth Scoop readers! Make sure to use the link - offer ends July 31, 2026.
→ Clarify’s new agents: I’m trying this one out because your CRM seems like exactly the kind of product that would ideally do the work for you, so you never have to touch it. Clarify has been automating stuff for a while and they just announced their agents. Especially for founders and sales leaders… imagine having all of your contact data auto-cleaned and enriched, with your pipeline reports accessible via MCP or sent to you in Slack.
🎁 Offer [Exclusive]: Get 50% off paid plans for your first 12 months, with code ELENA!
→ Lovable: What can I say? Yes, I work here. But… I work here because I was obsessed with the product, first. It’s the best way to build an app or site, even if you have no technical experience. Try it out!
🎁 Offer [Exclusive]: Get a year of Lovable for free! For paid subscribers to Growth Scoop.
4 ways things have changed
A few foundational blocks have changed in the last 2 years or so.
1. Surfaces are collapsing.
New product workflows (esp in AI) are more conversational and prompt-based, so there are fewer surfaces that need awareness. Your UI just doesn’t need to be optimized to infinity and beyond, anymore. That’s not the primary way of interacting with most of the features.
For example: Granola. I use it pretty much every single day, maybe every single (working) hour. And yet… I don’t even know if I ever actually open Granola UI? It jumps into my meetings, generates the transcript, but then I converse with the transcript through prompts in other products, and then I’m done. So, what UI is Granola going to A-B test for me here?
2. Product development velocity is accelerating.
The impact of a successful experiment on any given surface is shorter-lived because of how quickly they change. Improving any given funnel 1% or 2% just seems silly when it’s just gonna change in a couple months.
Especially if that change takes you 2-3 weeks to measure. Most teams don’t have the time or the traffic (and usually both) to figure something out quickly enough for it to make sense.
3. Personalization is unnecessary.
One of the biggest reasons for minor tweaks and experimentation was personalization. Trying to provide more context or use more relevant language. This was always painful: (a) By default, you’re limiting your audience, (b) personalizing for one segment doesn’t always scale to other segments, and (c) static personalization had to be hard-coded. So much debt, and you didn’t know the shelf-life.
Now, AI makes so much of that irrelevant. The whole innovation of ChatGPT was that it just talks to you, so it’s always very personalized. The need for testing out these minor tweaks is gone.
4. The cost of product usage is increasing.
Because many products cost LLM tokens to run, the usage isn’t free anymore. This means that monetization strikes earlier in the funnel than before AND it’s a lot costlier for actions that the user is performing.
Note: This is why not touching monetization in your experiments is actually the biggest negative that you can possibly deliver to your business.
Guess what? Nobody has their monetization of AI locked in completely. Even Anthropic and OpenAI are still changing their own monetization models constantly, so to approach this area as something static and not worth experimenting on is crazy. This is one of the biggest growth levers you can be pulling, but most teams are refusing to see it.
3 things you should be doing differently
There’s a lot you can be doing here, but the most important stuff is:
1. Stop focusing on minor optimizations (esp with eng resources!!)
In the past, I used engineering hours on every single optimization. But now that I can handle increasingly complex and end-to-end items as a PM / Hi-C. Some of these I’ll run as a real A/B test. But for other tweaks… it’s seriously not worth it. But either way, I’ll tell you what I will absolutely not do: Use those precious growth eng team resources on interface tweaks. I’m telling you: Using them on UI optimizations is a CRIME.
Making these tweaks can be empowering and exciting for Growth PMs and Marketers to do, and miserable for most engineers. If you can take that stuff on, it frees them up to take on harder engineering challenges - everyone’s creating more value, everyone wins.
2. Start taking much bigger swings (especially on monetization)
Eventually, we’ll get back to the point when a lot of these transformations have settled out and we can go back to tweaking more stable platforms. But right now, big things are moving fast, so you need to be engaging at that level.
And yeah, I’m talking about monetization. This is the biggest lever you can pull at the end of the day, both to ungate engagement and to get the right margin profile, and to drive the right level of expansion and retention in your products.
At Lovable, we do a ton of monetization experimentation… but it’s not geared at revenue increases. Let me say that again: Our monetization experiments are NOT focused on increasing revenue.
Actually, revenue-neutral experiments on monetization are our biggest winners. This is because they increase engagement, which then translates into further revenue down the line. Maybe it’s a month, 2 months, 3 months, but as long as it’s not deteriorating our business into unsustainable territory and it’s increasing engagement, that’s a win.
At Lovable, our monetization bets are not focused on revenue increases. They are focused on engagement lift.
When most teams think about revenue, they immediately go to free-to-paid conversion rates or trying to improve renewal rates. But that’s not the point, now. We’re in the adoption cycle of this new technology, so everything should be focused on your North Star metric or other engagement metric, as long as you’re staying within the bounds of what’s acceptable from a margin perspective.
3. Bigger swings means longer tests.
Your changes should be having a much bigger impact on the entire product system, which means you have to run them for much longer. Those 2-week blips that every executive loves are actually the worst thing you can do right now.
I mean, if you have so much low-hanging fruit that you can meaningfully improve your product in 2 weeks, then by all means: unleash Claude Code and go fix it all. But for mature products, that shouldn’t be happening. Instead, you should be focused on the big stuff: What are your freemium boundaries? How does your credit system work? What types of features are in each plan? How do users engage with different features when placed in different plans?
All of these things have serious, system-level impact. And they’ll have trickle-down effects. So run them for 1-2 weeks… but then monitor the cohorts for 1-2 months! Don’t you dare make a decision as soon as the initial split is done running.
For example: Right now, we’re testing 5 credits vs. 10 credits on the first day of the free plan. Immediately, it looks terrible. Like, really bad. Why would we just give away more? But when you look at the longer term (2 months later), the impact in terms of engagement and retention actually does catch up. In the first 30 days, so bad. But the overall, cascading impacts over the long-term are so positive that it’s worth the up-front hit.
It was the same with our top-ups experiment. It looked horrible in the first 30 days, and then it became wildly positive. But it took 30 days for the cohorts to mature, and even longer (45+ days) for it to actually materialize into us being able to measure the impact.
So, switch the experiments on and then give them time to show impact.
And to have that impact, you actually need real, significant bets. I’m not talking about changing the pricing page from saying ‘$25/mo’ to adding in ‘$24.99/mo’ That’s the kind of stuff you should just hand off to Claude. Let it apply the best practices for those types of tweaks, because they’re right 80% of the time.
Maybe you just shouldn’t even test it?
I’ll leave you with maybe my most contrarian take - you don’t have to test everything.
Yep, in this wild new world, you can skip that step in some areas.
I do think everything should be measured and everything should be noted when you’re making changes. That usually sits somewhere within your code base, which is particularly helpful when an AI agent can go in and retrieve what changes were made on what surfaces at what times. The tracking is important: How many clicks, how many interactions, etc.
But with the rise of Average Intelligence, you really should not be starting from scratch. You don’t need some growth product person A/B testing every element to arrive at a highly optimized pricing page. Seriously, just ask AI and implement that as a baseline.
(Or read this article, if you want: The DNA of a Great Pricing Page.)
AI has accumulated a ton of industry standards, and you should start there. Unless you’re truly a unique outlier or snowflake that has never existed, just snap into the default as your starting point.
As one example: One thing I really, deeply, truly believe as a product growth principle is that higher plan features should be available and seen in the lower-tier plan. Feature-level paywalls should be pretty self-explanatory. But a couple weeks ago, I found out that Lovable has like 10 paid plan features that are not visible on the free plan!
To me… this is a bug. So, I just Lovable’d an implementation together, with the features visible and the upgrade trigger showing. I submitted it to a dev for review.
He asked if we should A/B test it.
I’m glad that he was able to ask me that and pushback, but in this case… no! Based on thousands of other companies that have gone through this with their freemium experience, this does not need to be tested. It’s so basic that it’s pretty much a bug that it wasn’t there in the first place. And then, yeah, if you went to ask AI what it should look like, it would give you this exact presentation.
It’s not something you need to validate or try out, from scratch. Maybe in the past, where 10 engineering teams would have needed to be pulled in to implement it on their features, we might need to vet this. But now, it takes almost no time to implement, and it’s been validated over and over. So in this case, just use what already works, so that everyone can focus on testing out the big stuff.
Experimentation is ded. Long live experimentation!
So, yes: The UI that we all used to tweak is disappearing. Changes are happening too fast to test. Personalization is automatic. And the cost of product usage itself is blowing up. Which means that the old way of testing things out isn’t so smart anymore. You don’t need to (and can’t) test every little thing these days. But if you skip the minor optimizations, take bigger swings, and let your tests run longer, your experimentation system will be more important than ever.
And if you take one thing away from all of this - go run some experimentation on your monetization system! I promise you there’s room for growth.
Edited by Jonathan Yagel.






