kaashif's blog

Programming, software freedom and Unix


Sorting a ton of mail

Migrating mail servers is a tricky business, especially when one server doesn’t have IMAP set up. The easiest way is to download all the mail and reupload it to the new mail server.

This seems simple enough, but I ran into problems. After all, I was going from an IMAP server somewhere to a maildir (no IMAP sync tool supports mbox for some reason) to an mbox through procmail to a directory of mboxes. Not trivial.

The Problem

I am a user of Zoho’s mail service. It serves me well but there are usage limits. I set up my own mail server so I wouldn’t have to deal with there limits. But how do I get my mail from an IMAP server to a non-IMAP server.

I’m obviously not going to setup IMAP, another public service which increases my attack surface, just to migrate mail.

The Solution Part 1: Downloading the mail

This was supposed to be simple. I installed isync and wrote an .mbsyncrc which would fetch mail and deliver to a maildir. But there are usage limits, and I ran into them while fetching my 100k messages.

IMAP command 'AUTHENTICATE PLAIN <authdata>' returned an error: NO [ALERT] Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.
*** IMAP ALERT *** Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.

Argh! So annoying. I noticed that I could try again after a few minutes. So what if I slowed down mbsync to the point where it takes longer to hit the usage limit than the time it takes for the limit to reset. This turned out to be simple, I just added this line to my .mbsyncrc account:

PipelineDepth 1

This entirely disables pipelining, i.e. only one IMAP command can happen concurrently whereas before, the limit was infinite.

Converting to mbox

So I have all my mail in ~/.mail. But now what? My mailserver deals in mboxes, not maildir.

I came across a Python script that converts a maildir to a mailbox:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import mailbox
import sys
import email

mdir = mailbox.Maildir(sys.argv [-2], email.message_from_file)
outfile = file(sys.argv[-1], 'w')

for mdir_msg in mdir:
    # parse the message:
    msg = email.message_from_string(str(mdir_msg))
    outfile.write(str(msg))
    outfile.write('\n')

outfile.close()

It’s so simple and speaks to the simplicity of the mbox format. So I ran it:

$ cd mail
$ for d in *; do python2.7 mailconv.py "$d" "${d}.mbox"; done

It eventually completed and I was left with a mess of disjointed mailboxes.

Sorting

Now I had to sort them. But of course, mails get sent one at a time, so if my final intention is to pipe all of them through procmail to apply my new filters, I had to smush them all together first.

Funnily enough, I came across a Python script that sorts mbox:

#!/usr/bin/env python2.7
from email.utils import parsedate
import mailbox, sys

def extract_date(email):
    date = email.get('Date')
    return parsedate(date)

the_mailbox = mailbox.mbox(sys.argv[1])
sorted_mails = sorted(the_mailbox, key=extract_date)
the_mailbox.update(enumerate(sorted_mails))
the_mailbox.flush()

Using this script:

$ cat *.mbox > all.mbox
$ python2.7 sort.py all.mbox

And there we have it, a sorted mbox! That script uses a lot of memory and CPU, Python isn’t the best language for this.

Getting it to the mail server

This wasn’t too difficult. First I sent the blob:

$ gzip all.mbox
$ scp all.mbox.gz mail.kaashif.co.uk:~/

Then, on the server, I used a nifty program called formail which comes with procmail and applies a command to each mail in an mbox:

$ gunzip all.mbox.gz
$ formail -ds procmail < all.mbox

And the mail was sorted as I specified and it’s all there!