Testing procmail and fdm configurations for local email delivery

I periodically download email messages from a webmail provider using the POP3 protocol so I can back them up locally. Once the messages are on my local system, I want to file them by sender so I have some hope of finding what I'm looking for later. Some messages don't have any long-term value (e.g. automated reminders), so I want to delete those instead.

The usual way to process local email on a Unix system is with a Message Delivery Agent1 like procmail or maildrop. The user writes various rules that match individual messages (often using regular expressions that are executed against a message's header fields) and specify actions to perform (like saving, forwarding, or deleting the message).

Over the years, I've amassed a large number of rules for filtering messages. I was scared by how easy it would be for me to make a typo in a new rule that would silently divert all mail to /dev/null. The difference between

match '^From:.*(bozo@example\.org|jerk@example\.com)' in headers
      action drop

and

match '^From:.*(bozo@example\.org|jerk@example\.com)?' in headers
      action drop

doesn't exactly jump out at me, especially in the middle of a long list of similar rules.

To catch errors before they make me lose mail, I put together a script that tests my current MDA configuration using a corpus of messages. I initially wrote the script for procmail, but I was able to update it for fdm with minimal changes.

MRAs (Mail Retrieval Agents) and MDAs

For a long time, I used fetchmail to download messages and procmail to process them. Due to fetchmail's many flaws, I eventually replaced it with getmail while still using procmail as my MDA. More recently, I read Enrico Zini's Migrating from procmail to sieve post, which led me to Anarcat's procmail considered harmful page, which pointed me to Nathan Willis's Reports of procmail's death are not terribly exaggerated LWN article from 2010. I don't like using unmaintained code, so ditching fetchmail seemed wise.

After looking at the various options out there, I eventually settled on fdm, which merges the jobs of fetching and delivering mail into a single program. It supports POP3-over-SSL/TLS and has a reasonable syntax for its configuration files that made it not too painful to manually port over my existing procmail config.

Running MDAs in test mode

The main requirements for testing an MDA's behavior are getting it to accept mail via stdin (so you can pass it messages from the corpus) and convincing it to deliver mail to an alternate location (so you can verify what it did with each message in the corpus). This assumes that your configuration only delivers messages to local mailboxes, as opposed to forwarding them to external recipients or passing them to other programs.

procmail already reads a single message over stdin by default. It accepts environment variables as NAME=VALUE arguments in its command line, so you can add a line like the following near the top of ~/.procmailrc (assuming that mail is typically delivered under ~/Mail):

MAILDIR=${TEST_MAILDIR:-$HOME/Mail}

and then override the default base directory via the command line when testing:

procmail TEST_MAILDIR=/some/test/dir ...

fdm usually fetches messages itself rather than accepting them from a different program, but it fortunately supports a stdin account type. You can add a line like the following to ~/.fdm.conf:

account "stdin" disabled stdin

This account will be ignored by default when running fdm fetch, but -a stdin can be passed on the command line to use it instead of polling any other accounts declared in the config.

To override fdm's delivery location for testing, you can update your configuration to define a $base variable if it isn't already defined and then refer to it in the action portion of your rules:

ifndef $base
  $base = "%h/Mail"
endif

# ...

match '^From:.*me@example.org' in headers
      action maildir "${base}/sent"

The variable can be overridden via the command line when testing by passing -D '$base=/some/test/dir'. Note that you'll need to put the variable assignment in single-quotes or backslash-escape the dollar sign if it's in double-quotes to prevent your shell from treating it as a variable expansion.

The script

The final piece is a test_fdm.sh script that I wrote to feed messages from the corpus to fdm and then check what it did:

#!/bin/bash -e

# Get the script's directory: https://stackoverflow.com/a/246128
BASE_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)

# Corpus of messages to be kept. Subdirectories correspond to expected
# destination Maildirs under $MAIL_DIR, e.g. keep/inbox/foo.txt should be
# delivered to $MAIL_DIR/inbox/new.
KEEP_DIR=${BASE_DIR}/keep

# Corpus of messages to be dropped or delivered to $TRASH. Files should be at
# the top level, e.g. drop/foo.txt.
DROP_DIR=${BASE_DIR}/drop

TMP_DIR=${BASE_DIR}/tmp
LOG_FILE=${TMP_DIR}/log.txt
MAIL_DIR=${TMP_DIR}/Mail
TRASH=${MAIL_DIR}/trash
KEPT_FILE=${TMP_DIR}/kept.txt

# Set to 1 if testing failed.
ERROR=0

die() {
  echo "$1" >&2
  exit 1
}

# Extracts the Message-Id header value from the passed message file.
# (We can't use SHA1s since messages may be modified during delivery.)
msgid() {
  local id
  # This gnarly command is based on https://stackoverflow.com/a/54491870.
  # Headers can be folded over two lines, and sed makes it hard to do multi-line
  # matches.
  id=$(sed '/^\S/h;G;/^Message-Id:/IMP;d' <"$1" | sed -nr 's/.*<([^>]+)>.*/\1/p')
  [ -z "$id" ] && die "Didn't get Message-Id from ${1}"
  echo -n "$id"
}

# Enable '**' in globs so we don't need to use find.
shopt -s globstar

if [ ! -d "$KEEP_DIR" ] || [ ! -d "$DROP_DIR" ]; then
  die "${KEEP_DIR} and ${DROP_DIR} must exist"
fi

rm -rf "$TMP_DIR"
mkdir -p "$MAIL_DIR"

# Process the corpus.
for f in "${KEEP_DIR}"/** "${DROP_DIR}"/**; do
  [ -f "$f" ] || continue
  fdm -D "\$base=$MAIL_DIR" -a stdin -m -v fetch <"$f" >>"$LOG_FILE" 2>&1
done

# Write a file listing dirs and IDs of kept messages.
for f in "${MAIL_DIR}"/**; do
  { [ -f "$f" ] && [[ "$f" != "$TRASH"* ]]; } || continue
  rel=${f#${MAIL_DIR}/} # trim MAIL_DIR prefix
  dir=${rel%%/new/*}    # trim new/filename suffix
  id=$(msgid "$f")
  echo "${dir} <${id}>" >>"$KEPT_FILE"
done

# Check that all the good messages were kept.
for f in "${KEEP_DIR}"/**; do
  [ -f "$f" ] || continue
  rel=${f#${KEEP_DIR}/} # trim KEEP_DIR prefix
  dir=${rel%%/*}        # trim filename suffix
  id=$(msgid "$f")
  if ! grep -q -F "${dir} <${id}>" "$KEPT_FILE"; then
    wrong=$(grep -F " <${id}>" "$KEPT_FILE" | cut -f 1 -d ' ')
    if [ -n "$wrong" ]; then
      echo "Good message ${rel} <${id}> went to wrong folder (${wrong})"
    else
      echo "Good message ${rel} <${id}> was dropped"
    fi
    ERROR=1
  fi
done

# Check that all the bad messages were dropped.
for f in "${DROP_DIR}"/**; do
  [ -f "$f" ] || continue
  rel=${f#${DROP_DIR}/} # trim DROP_DIR prefix
  id=$(msgid "$f")
  wrong=$(grep -F " <${id}>" "$KEPT_FILE" | cut -f 1 -d ' ')
  if [ -n "$wrong" ]; then
    echo "Bad message ${rel} <${id}> was kept (${wrong})"
    ERROR=1
  fi
done

# Sanity check the total count.
keep_count=$(find "$KEEP_DIR" -type f -printf . | wc -c)
kept_count=$(wc -l "$KEPT_FILE" | cut -f 1 -d ' ')
if [ "$keep_count" -ne "$kept_count" ]; then
  echo "Expected to keep ${keep_count} messages but kept ${kept_count}"
  ERROR=1
fi

exit $ERROR

The script lives in a directory, and I created keep/ and drop/ subdirectories containing the message corpus:

keep/bandcamp/order_confirmation.txt
keep/inbox/unfiltered_message.txt
keep/school/assignment.txt
drop/calendar_reminder.txt
drop/useless_message.txt
...

Each of those files contains an RFC 822 email message with full headers (i.e. what you get if you copy a message file out of a Maildir or click "Show original" and then "Download Original" in Gmail). Each kept message is expected to be delivered to the Maildir in its path (e.g. inbox), while the dropped messages should either end up in the trash Maildir or get deleted entirely.

I try to keep a good mix of different types of messages in the corpus. When I make changes to fdm's configuration, I run the script to make sure that I didn't cause any unexpected changes to message delivery.

When I was running the script against procmail instead of fdm, it was almost the same, except the fdm command was replaced with something along the lines of:

procmail TEST_MAILDIR="$MAIL_DIR" <"$f" >>"$LOG_FILE" 2>&1

rendmail

One thing that's given me some trouble when using both procmail and fdm is that it's challenging to write rules that match From and Subject header fields containing non-ASCII characters. RFC 2047 specifies a way to include arbitrary character encodings in header fields, but the MDAs that I've used don't seem to provide an easy way to decode these strings, so I often ended up needing write regular expressions like ^Subject: =\?UTF-8\?Q\?Confirmaci=C3=B3n_de_Pago\?=.

To make this less ugly, I wrote a small program called rendmail. I've configured fdm to pipe messages through rendmail before evaluating my rules, and rendmail accepts a -decode-subject flag that instructs it to add ASCII header fields like X-Rendmail-Subject: Confirmacion de Pago that are easier to match.

rendmail also supports removing binary attachments from messages, which I use to make sure that bulky images don't end up in my backups.


  1. Sometimes called a Mail Delivery Agent instead, but henceforth just referred to as an MDA. [return]