Sorting a ton of mail
2017-08-04
Migrating mail servers is a tricky business, especially when one server doesn't have IMAP set up. The easiest way is to download all the mail and reupload it to the new mail server.
This seems simple enough, but I ran into problems. After all, I was going from an IMAP server somewhere to a maildir (no IMAP sync tool supports mbox for some reason) to an mbox through procmail to a directory of mboxes. Not trivial.
The Problem
I am a user of Zoho's mail service. It serves me well but there are usage limits. I set up my own mail server so I wouldn't have to deal with there limits. But how do I get my mail from an IMAP server to a non-IMAP server.
I'm obviously not going to setup IMAP, another public service which increases my attack surface, just to migrate mail.
The Solution Part 1: Downloading the mail
This was supposed to be simple. I installed isync
and wrote an
.mbsyncrc
which would fetch mail and deliver to a maildir. But there
are usage limits, and I ran into them while fetching my 100k messages.
IMAP command 'AUTHENTICATE PLAIN <authdata>' returned an error: NO [ALERT] Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.
*** IMAP ALERT *** Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.
Argh! So annoying. I noticed that I could try again after a few
minutes. So what if I slowed down mbsync
to the point where it takes
longer to hit the usage limit than the time it takes for the limit to
reset. This turned out to be simple, I just added this line to my
.mbsyncrc
account:
PipelineDepth 1
This entirely disables pipelining, i.e. only one IMAP command can happen concurrently whereas before, the limit was infinite.
Converting to mbox
So I have all my mail in ~/.mail
. But now what? My mailserver deals
in mboxes, not maildir.
I came across a Python script that converts a maildir to a mailbox:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import mailbox
import sys
import email
mdir = mailbox.Maildir(sys.argv [-2], email.message_from_file)
outfile = file(sys.argv[-1], 'w')
for mdir_msg in mdir:
# parse the message:
msg = email.message_from_string(str(mdir_msg))
outfile.write(str(msg))
outfile.write('\n')
outfile.close()
It's so simple and speaks to the simplicity of the mbox format. So I ran it:
$ cd mail
$ for d in *; do python2.7 mailconv.py "$d" "${d}.mbox"; done
It eventually completed and I was left with a mess of disjointed mailboxes.
Sorting
Now I had to sort them. But of course, mails get sent one at a time, so if my final intention is to pipe all of them through procmail to apply my new filters, I had to smush them all together first.
Funnily enough, I came across a Python script that sorts mbox:
#!/usr/bin/env python2.7
from email.utils import parsedate
import mailbox, sys
def extract_date(email):
date = email.get('Date')
return parsedate(date)
the_mailbox = mailbox.mbox(sys.argv[1])
sorted_mails = sorted(the_mailbox, key=extract_date)
the_mailbox.update(enumerate(sorted_mails))
the_mailbox.flush()
Using this script:
$ cat *.mbox > all.mbox
$ python2.7 sort.py all.mbox
And there we have it, a sorted mbox! That script uses a lot of memory and CPU, Python isn't the best language for this.
Getting it to the mail server
This wasn't too difficult. First I sent the blob:
$ gzip all.mbox
$ scp all.mbox.gz mail.kaashif.co.uk:~/
Then, on the server, I used a nifty program called formail
which
comes with procmail and applies a command to each mail in an mbox:
$ gunzip all.mbox.gz
$ formail -ds procmail < all.mbox
And the mail was sorted as I specified and it's all there!