How to Set Up an A/B Test for Your Fall Appeal

How to Set Up an A/B Test for Your Fall Appeal


4 October 2022


By David Allen, Development for Conservation


That doesn’t work here.

We tried a four-page letter, and it bombed.

The two-page letter we did this year did much better.


I tend to take these comments at face value. No one wants to believe that four-pagers work better, so anything that supports their point of view is used to justify their position.

I’m OK with that. And it’s not worth arguing.


But it’s not science. And we should stop pretending it is.


Over the course of my career, I have set up three different A/B tests on the number of pages in appeal letters. Actually, let’s call it two, because the first one I didn’t know what I was doing. The most recent of these tests was conducted about six years ago.

I’ve also tested different ask amounts and envelope styles. I don’t have a large volume of experience, but I do have some, and I’ve paid attention when others tell me about their experiments. I also pay attention to the Agitator blog and Jeff Brooks’ Future Fundraising Now blog. Both include direct mail fundraising experiment data from time to time.


In scientific A/B tests it’s important to hold everything constant except for the variable you wish to test. For example:

  • Use the same letter with the same inserts mailed at the exact same time and date. Change just one thing – say the ask amount – from a control (A) to a test (B). Like test asking for $50 against asking for $100.
  • Or the page color from white to blue.
  • Or a teaser on the envelope versus plain.
  • Or including a PS note versus not.
  • Or including photos against not.


You can literally test for anything. And you should. How else will you know what works and what doesn’t?

[If your answer to that last question has something to do with the gut instincts of your Board Chair, I’m not talking to you anymore.]


It turns out that testing for number of pages is one of the more difficult tests to perform, because it’s harder than it looks to keep the content of the two letters the same. More often the “tests” I hear about compare one letter mailed this year against a different letter mailed last year. Different letters, different years, and different mailing lists – the lists won’t be exactly the same from year to year either.

That’s not a test.

In that scenario, there is literally no way to say that one year’s letter was better or worse than the other.


If you want to try a test using number of pages, consider this:

  • Write the longer letter first. Follow all the rules and put your best foot forward with the content. For help, see also A dozen Rules for Writing Better Fundraising Letters and One way to Tackle Writing an Appeal Letter.
  • Then write the short letter from the longer content. Change only the words and phrases that you absolutely have to to make it work. If the first letter had a PS, include a PS in the second. Same with the Board list on the first page. Same with the ask amount. Same with the margins and overall look and feel. You’ll have to drop the number of words, but that’s all that should change.
  • Next make sure the two lists are statistically equivalent. If you are using a mail house, they should be able to randomly assign the letters to an A list and a B list. If you are doing it in-house and merging from an Excel file, Excel has a random number generator. You can also sort the lines by street address within zipcode and get pretty close to random by assigning the odd lines A and the even lines B.
  • Remember to code the response cards so that you know which list the responses came from. And you’ll need a list master with the assigned codes so you can track online responses.


I mentioned that my most recent organized test for number of pages was six years ago. The results were not ambiguous. The four-pager pulled 34 percent more responses and 31 percent more money.

The other thing about scientific method is that when other experimenters perform the same experiment, they get similar results. So far, the experiments I have heard about that followed these principles got similar responses.

That doesn’t mean you will.

But it’s worth testing.


Cheers, and Have a great week!




PS: Your comments on these posts are welcomed and warmly requested. If you have not posted a comment before, or if you are using a new email address, please know that there may be a delay in seeing your posted comment. That’s my SPAM defense at work. I approve all comments as soon as I am able during the day.


Photo by Ted Erski courtesy of Pixabay.



Share this!
  • Katie C
    Posted at 08:49h, 04 October

    I have heard that an A/B test on a list smaller than 5,000 recipients isn’t necessarily worth it (advice from MailChimp–not sure why exactly, but I assume it has to do with statistics). I’m curious what your thoughts are on that?

    • David Allen
      Posted at 10:14h, 04 October


      Thank you for the question. I am not a professional statistician, but I believe that testing is valuable regardless of the size of your universe. MailChimp is probably talking about large-scale inferences based on relatively small sample sizes. But your universe is not the U.S population or even the population of your County. It’s a universe made up of people who already give you money. Statistical relevance isn’t a threshold. It’s a continuum. The greater the sample size, the more likely the results are to be repeatable. But that alone wouldn’t invalidate your testing.

      But there’s another reason to test as well. It gets us out of the thinking trap that what we did last year was “good enough.” We can always get better. We can always learn. Let’s conduct a whole bunch of small-size tests and share the results within our community. Collectively, we may be able to get closer to 5,000 than any one of us might be able to do independently.

      Again – thank you so much for the question.