De-duping attachments
Rob Mueller
robm at fastmail.fm
Tue Sep 14 22:13:03 EDT 2010
> How difficult or easy would it be to modify Cyrus to strip all
> attachments from emails and store them separately in files? In the
> message file, replace the attachment with a special tag which will point
> to the attachment file. Whenever the message is fetched for any reason,
> the original MIME-encoded message will be re-constructed and delivered.
Like anything, doable, but quite a lot of work.
cyrus likes to mmap the whole file so it can just offset into it to extract
which ever part is requested. In IMAP, you can request any arbitrary byte
range from the raw RFC822 message using the body[]<start.length> construct,
so you have to be able to byte accurately reconstruct the original email if
you remove attachments.
Consider the problem of transfer encoding. Say you have a base64 encoded
attachment (which basically all are). When storing and deduping, you'd want
to base64 decode it to get the underlying binary data. But depending on the
line length of the base64 encoded data, the same file can be encoded in a
large number of different ways. When you reconstruct the base64 data, you
have to be byte accurate in your reconstruction so your offsets are correct,
and so any signing of the message (eg DKIM) isn't broken.
Once you've solved those problems, the rest is pretty straight forward :)
Rob
More information about the Info-cyrus
mailing list