A lot of the AI writing out there treats it like a magic button: type a wish, receive a finished product. That has not been my experience, and if you do conversion work for real clients on live sites, it won't be yours either.
What I've found instead is that Claude is an excellent working partner — fast, tireless, and genuinely good at the engineering that sits between a good test idea and a live experiment. But it works best inside a tight, disciplined loop, with someone who knows the terrain steering it.
The work itself is far from trivial. Over the course of these projects I've rebuilt entire navigation mega menus; replaced a platform's native variant selectors with custom dropdowns wired straight into the Shopify variant API; restructured product pages to pull the price and CTA above the fold; built site-wide promotional banner systems; and stood up a complete reporting pipeline that turns raw test data into dashboards and decks. These are substantial builds. The point of this guide is that ambitious work and disciplined execution aren't opposites. The discipline is how you ship the ambitious stuff without it falling over.
If you're a CRO specialist, a freelancer, or an in-house marketer who's curious about where AI actually earns its keep, this is the honest version.
The terrain I'm working in
Most of my testing happens on live ecommerce sites, and the constraints are real:
- Tests are injected as JavaScript through an A/B testing platform (I use VWO), so I'm not touching the theme directly — I'm overlaying changes onto a page someone else built.
- The sites typically run on a hosted platform like Shopify, which means the DOM structure and section stylesheets are not mine to control.
- Some of the most important page elements — like a third-party quote configurator living inside an iframe — are completely sealed off. No script of mine can reach inside them.
- Traffic is often modest. Statistical purity is a luxury I rarely get.
These constraints matter, because they shape everything about how AI is useful here. You're not generating code in a vacuum. You're writing code that has to behave politely inside someone else's house.
The working loop
The single biggest shift that made AI useful to me was treating it as an iterative collaborator, not an oracle.
In practice that means I give short, precise corrections rather than long briefs. I'll confirm a selector from DevTools, paste it in, and say "use this exact element." When something renders wrong, I describe what I'm seeing in one line — "it's doubling up on the page," "the grid isn't applying" — and let Claude apply a targeted fix and confirm exactly what it changed.
The rhythm looks like this: I describe the change and hand over any confirmed facts. Claude builds the smallest version that could work. I test it live in preview and report back precisely what happened. We adjust. Repeat until it's clean.
The hard-won technical lessons
These are the things that took real tests breaking before they stuck.
Lesson 1.
Start from confirmed selectors, not guesses.
The earliest tests I built failed for one boring reason: the code was looking for elements that didn't exist, or were too broadly matched. Letting an AI "search the DOM programmatically" sounds clever but is fragile. The reliable move is to open DevTools, confirm the exact selector on the live page, and feed that in. A confirmed fieldset.product-form__input--box beats a hopeful guess every time.
Lesson 2.
Build ambitiously, but ship in disciplined increments.
The biggest builds succeed because they're assembled in small, verifiable steps. Where I've come unstuck is trying to restructure a whole page section in one sweep: brittle, and a nightmare to debug. So I build the ambitious thing as a series of focused, testable pieces — get the element injecting, then the layout, then the styling, then the tracking — each one confirmed live before the next goes on top.
Lesson 3.
On themed platforms, inline styles win.
On hosted-platform pages, the parent theme's stylesheets routinely override any CSS classes you inject — especially for layout, positioning and background colour. A grid that should obviously be two columns simply won't render until you apply display:grid inline. For anything load-bearing, I now write inline styles rather than relying on injected classes. Less elegant, far more dependable.
Lesson 4.
Respect the testing tool's constraints.
A/B platforms often run their own JavaScript validators, and they're pickier than a browser. The pattern I've settled on for VWO: no template literals, no CSS comments inside style blocks, var instead of let/const, and string concatenation instead of fancy interpolation. It looks old-fashioned. It passes validation every time.
Lesson 5.
Guard against double-execution in the DOM, not in memory.
Testing tools can run your script more than once on a page, which leads to the classic "everything is doubled" bug. An in-memory flag won't save you because it resets on each run. The fix that actually holds is a DOM-based guard: check for the injected element and bail out early if it already exists.
Lesson 6.
Poll for what you need, with a ceiling.
Pages load asynchronously, so the element you want often isn't there on first run. My consistent pattern is a setInterval at 400ms with a maximum number of tries before it gives up. It waits patiently for slow-loading elements without spinning forever.
Lesson 7.
Keep your tracking lean.
It's tempting to track everything. Resist it. I keep goals minimal — typically two per test: one for the product or card clicks, one for the primary CTA. It conserves goal slots, keeps reporting clean, and forces clarity about what success actually looks like before the test goes live.
Turning results into decisions
Building the test is half the job. The other half is making the outcome legible to the people who decide what happens next — usually a busy owner or leadership team who do not want p-values.
I've built a repeatable reporting workflow around this. I hand over the raw results, the control and variation pages, and a one-line description of what was tested. Out the other side comes an interactive dashboard for sharing digitally and a short slide deck for presenting.
The crucial framing decision — and this is true for most small and mid-sized businesses — is directional evidence over hard statistical thresholds. On a lower-traffic site you will rarely hit textbook 95% significance, and pretending otherwise just stalls every decision forever. So I report honestly: here's the uplift, here's the confidence so far, here's whether the trend has stabilised, and here's the pragmatic call.
What AI is genuinely good at here — and what it isn't
Where it earns its place:
- Building genuinely complex things fast — full menu systems, custom selectors wired to platform APIs, multi-placement banner systems — once you've handed it the real selectors and constraints.
- Remembering and reapplying your established patterns so every test is consistent.
- Turning messy results into clean dashboards and decks in minutes.
- Being an infinitely patient debugging partner at 9pm when a script is misbehaving.
Where you still have to drive:
- Knowing what to test and why. The hypothesis is yours.
- Confirming the ground truth — real selectors, real page behaviour, real constraints.
- Judging when a result is good enough to act on.
- Catching when a "clever" solution is actually fragile.
The mental model I'd offer: AI is a brilliant, fast, slightly over-eager junior who has read everything and forgotten nothing, but has never seen your client's site. Your job is to give it the ground truth and the judgement. Its job is to execute, tirelessly and consistently.
If you want to start doing this
- Always confirm your selectors live before you build anything.
- Build ambitiously, but assemble it in small, verifiable steps — inject, then layout, then style, then track.
- Feed the AI your constraints explicitly — the platform, the tool's validator quirks, the iframe you can't touch.
- Build a consistent code pattern and reuse it, so everything is debuggable.
- Correct precisely and confirm what changed. Tight feedback beats long instructions.
- Report in plain English, framed around the decision your client actually needs to make.
None of this replaces knowing your craft. If anything, it rewards it — because the better you understand conversion, the front-end, and your own tools, the more leverage you get from an assistant that can execute as fast as you can think.
That's the real promise. Not a magic button. A very capable partner, working inside a loop you control.
Work with me
Want this kind of thinking
on your ecommerce site?