Skip to main content
A/B tests by customer segments let you measure the impact of almost any campaign on Maestra Platform — email and SMS blasts, push notifications, media placements, promotional offers, on-site personalization, loyalty mechanics, and more. Instead of splitting traffic inside a single campaign, you split your audience into segments up front, run the campaign variants against each segment, and compare the results. Use this approach whenever you want to:
  • Compare two or more campaign concepts head-to-head (for example, a 10% discount vs. free shipping).
  • Validate a hypothesis about how a specific mechanic changes customer behavior.
  • Establish a clean control group that doesn’t receive the campaign at all, so you can isolate net incremental revenue.

How it works

Running an A/B test by segments takes three steps:
  1. Create an A/B test. Define the participant segment, the number of variants, how traffic is split between them, and the metrics you want to track (conversion rate, average order value, or ARPU).
  2. Wire each variant into your campaigns. Use a filter in every campaign mechanic so that only customers from the matching test variant receive that version.
  3. Wait for results. Maestra Platform monitors statistical significance and notifies you when the test reaches a conclusion — either confirming the hypothesis or rejecting it.
Each customer is assigned to exactly one variant for the lifetime of the test. That means a customer in variant A will never accidentally receive variant B, no matter how many campaigns the test feeds.

Step 1. Create the A/B test

Open the A/B tests section and create a new test. You’ll be asked to configure the following fields.

Hypothesis

A short statement of what you expect the test to prove or disprove — for example, “Customers who receive a 10% discount email will place 15% more orders than customers who receive a free shipping email.” A clearly written hypothesis keeps the team aligned and makes it easier to interpret the report later.

Participants

The segment of customers eligible for the test. Maestra Platform will randomly assign every customer in this segment to one of the variants. A few rules of thumb:
  • The smaller the segment, the longer the test will take to reach significance.
  • The segment should be large enough that each variant ends up with a representative sample.
  • Avoid changing the participant segment after the test has launched — it invalidates the random assignment.

Number of variants and traffic distribution

Choose how many variants you want to test and how to split traffic between them. Shares are entered as fractions and automatically recalculated into percentages. A typical setup:
VariantSharePurpose
00.25Control group — receives nothing
10.25Discount email
20.25Free shipping email
30.25Loyalty points bonus
The more variants you add, and the more unevenly traffic is split, the longer the test needs to run before it can detect a meaningful difference. Stick to two or three variants when you can, and keep splits as even as possible.

Control group

If you want to measure the net incremental effect of a campaign — that is, how much extra revenue it actually generates compared with doing nothing — designate variant 0 as your control group and don’t assign any campaign mechanic to it. The report will then compare every other variant against variant 0 as the baseline.

Metrics

Pick a primary metric and, optionally, a secondary metric:
  • Conversion rate — the share of customers in each variant who completed the target action (a purchase, a registration, etc.).
  • Average order value (AOV) — average revenue per order placed by customers in the variant.
  • ARPU — average revenue per customer in the variant, regardless of whether they ordered.
The primary metric is what Maestra Platform uses to decide whether the test is statistically significant.

Advanced settings

These come pre-filled with sensible defaults, but you can adjust them if you know what you’re doing:
  • Expected lift — the minimum improvement you want to be able to detect. Smaller expected lifts require larger samples.
  • Statistical power — the probability that the test will detect a real effect if one exists. Default is 80%.
  • Confidence level — the probability that the detected effect is not due to random chance. Default is 95%.

Step 2. Add each variant to your campaigns

Once the test is saved, each variant becomes a filter you can use anywhere on Maestra Platform. For every campaign mechanic that’s part of the test:
  1. Open the campaign (email, SMS, push, pop-up, promo, loyalty rule, etc.).
  2. In the Recipients or Audience filter, add the condition: Customer is in A/B test variant X.
  3. Save and launch the campaign.
Repeat for each variant, plugging the matching variant filter into the matching mechanic. Leave the control group (variant 0) without any campaign — that’s what makes it a control.
You can run a single A/B test across multiple channels at once. For example, variant 1 could include both an email and a push, while variant 2 includes only an email. Just add the variant filter to every mechanic you want bundled into that arm of the test.

Step 3. Monitor and stop the test

While the test is running, Maestra Platform tracks the chosen metrics for each variant and updates the report at least once every 24 hours. The report shows:
  • The number of customers in each variant.
  • The value of each metric per variant.
  • The percentage difference vs. the baseline (variant 0 or the lowest-numbered variant if there’s no control).
  • A statistical significance indicator that turns green once the result is reliable.
Tests do not stop automatically when significance is reached. You decide when to call the test and switch off the losing variants. Stopping a test too early — before significance — produces unreliable conclusions; leaving it running far past significance just delays the rollout of the winning variant.

Examples

Compare two offer mechanics

You want to know whether a flat 10% discount or free shipping drives more revenue.
  • Participants: all customers who opened at least one email in the last 30 days.
  • Variants: 0 (control, 25%), 1 (10% discount email, 37.5%), 2 (free shipping email, 37.5%).
  • Primary metric: ARPU.
  • Hypothesis: free shipping will outperform the 10% discount by at least $1 ARPU.
Add variant 1’s filter to the discount email, variant 2’s filter to the free shipping email, and leave variant 0 alone. After two weeks, the report shows free shipping’s ARPU is $1.40 higher than the discount at 95% confidence — hypothesis confirmed.

Validate a new welcome series

You want to know whether your redesigned welcome series actually lifts first-order conversion.
  • Participants: new subscribers from the last 60 days.
  • Variants: 0 (control, 50%, no series), 1 (new welcome series, 50%).
  • Primary metric: first-order conversion rate.
Add variant 1’s filter to every email in the new series. Variant 0 receives nothing. When the report reaches significance, you know exactly how much incremental revenue the series is generating.

Best practices

  • Write the hypothesis first. If you can’t state what you expect to learn, you’re not ready to run the test.
  • Don’t change the test mid-flight. Adding variants, shifting traffic, or editing the participant segment after launch breaks the math.
  • Run one test at a time per audience. Overlapping tests on the same customers contaminate each other’s results.
  • Give it enough time. Most retail tests need at least one full purchase cycle (typically 2–4 weeks) before results stabilize.
  • Always keep a control group when you care about incremental revenue. Comparing two campaigns tells you which one is better; comparing both against a control tells you whether either one is worth running at all.