Skip to main content
A/B testing is a method for validating hypotheses about your marketing campaigns by comparing the performance of two or more variants, or by comparing a variant against a control group. Instead of guessing what works, you let real customer behavior decide. Performance is measured through metrics such as average order value, conversion rate, and—in some test types—average revenue per user (ARPU) and click-through conversion.

How an A/B test runs

Every A/B test in Maestra follows the same four stages:
1

Identify the audience

Choose who participates in the test—a segment, all visitors to a site, or users of a mobile app.
2

Split the audience into groups

Divide participants into two or more groups, one per variant. You can also include a control group that receives nothing.
3

Run the experiment

Each group experiences its assigned variant. Maestra tracks behavior throughout.
4

Compare results

Compare each group’s behavior against the metrics you defined for success and decide whether your hypothesis holds up.

Where you can run A/B tests

Maestra supports A/B testing across a range of channels and mechanics:
  • Workflow scenarios
  • Mobile apps (in-app messages)
  • Website personalization—popups, embedded blocks, and recommendation widgets
  • Website visitors as a whole
  • Customer segments, which you can plug into any marketing mechanic

What you configure in a test

Every test has the same core settings:
SettingWhat it does
HypothesisThe statement you want to prove or disprove.
ParticipantsThe segment, site, or app whose audience joins the test.
Traffic distributionThe share of participants assigned to each variant, expressed as proportions (for example, 50/50 or 75/25).
Analytics metricsThe KPIs Maestra uses to decide which variant wins.

Things to keep in mind when reading results

Wholesale buyers and other outliers. When a test uses average order value or ARPU, Maestra excludes unusually high-revenue customers—wholesale buyers, for example—from the calculation so they don’t distort the result.
Tests with more than two variants need proportionally more participants. They can also end without a clear winner if variant pairs disagree with one another. Uneven traffic splits (such as 75/25) take longer to reach significance than balanced splits (50/50), because the smaller variant accumulates participants more slowly. Device-based assignment for site and personalization tests. Website and personalization tests assign participants by device. A customer who visits from multiple devices can land in different variants on each one and counts as a participant in every branch they hit. Orders are attributed to the device used during the customer’s most recent site visit.

How to make tests finish faster

How quickly a test wraps up depends on three things:
  1. Traffic volume. The more traffic to the surface you’re testing, the faster participants accumulate.
  2. Number of variants. Fewer variants mean fewer participants needed overall.
  3. Even distribution. Balanced splits (50/50, 33/33/34) fill every variant at the same rate—uneven splits drag out the slower branch.

Running multiple tests at once

Running several A/B tests on the exact same audience at the same time creates interaction effects. You won’t get false winners, but each test will take longer to reach significance, and tests with similar hypotheses across different channels can still muddy the interpretation.
As a rule, keep only one site-wide personalization test running at any given time. Otherwise, the control group gets fragmented across overlapping tests and you can’t trust the comparison.

Audience fatigue

If you run A/B tests on the same audience back-to-back, give that audience a cooldown period between tests. Audiences that are tested too often—either in parallel or in rapid succession—produce unreliable results and hide real differences between variants.

Reading the report

Tests run until statistical significance is reached. Maestra doesn’t stop them automatically—you’ll get a notification when significance arrives so you can decide what to do next. Reports become available within 24 hours of launch, and an estimated finish date appears about a week after the test starts. Each report includes:
  • A graph for every metric in the test
  • Configuration details—segment, participant count, hypothesis
  • A date range and aggregation period you can adjust
  • A variant comparison table that includes revenue figures (in $)
  • A statistical-significance indicator for each metric

When there’s no winner

Sometimes a test ends without a clear winner. Common reasons:
  • The metric you chose isn’t sensitive enough to pick up the difference.
  • The mechanic only works for a narrower slice of the audience than the one you tested.
  • You didn’t have enough participants—especially likely with multi-variant tests or heavily skewed splits.
  • The mechanic was turned off before the test finished.
  • Seasonality or an outside event interfered with the results.
  • The mechanic genuinely has little or no effect.
A “no winner” result is still a result. It tells you the variants are interchangeable for this audience, on this metric, in this window—which is useful information when you’re deciding where to invest next.