De-duping attachments

Rob Mueller robm at fastmail.fm
Tue Sep 14 22:13:03 EDT 2010


> How difficult or easy would it be to modify Cyrus to strip all
> attachments from emails and store them separately in files? In the
> message file, replace the attachment with a special tag which will point
> to the attachment file. Whenever the message is fetched for any reason,
> the original MIME-encoded message will be re-constructed and delivered.

Like anything, doable, but quite a lot of work.

cyrus likes to mmap the whole file so it can just offset into it to extract 
which ever part is requested. In IMAP, you can request any arbitrary byte 
range from the raw RFC822 message using the body[]<start.length> construct, 
so you have to be able to byte accurately reconstruct the original email if 
you remove attachments.

Consider the problem of transfer encoding. Say you have a base64 encoded 
attachment (which basically all are). When storing and deduping, you'd want 
to base64 decode it to get the underlying binary data. But depending on the 
line length of the base64 encoded data, the same file can be encoded in a 
large number of different ways. When you reconstruct the base64 data, you 
have to be byte accurate in your reconstruction so your offsets are correct, 
and so any signing of the message (eg DKIM) isn't broken.

Once you've solved those problems, the rest is pretty straight forward :)

Rob



More information about the Info-cyrus mailing list