safe to modify messages stored in filesystem

Bron Gondwana brong at fastmail.fm
Tue Dec 2 15:41:47 EST 2014


On Wed, Dec 3, 2014, at 07:21 AM, David Mansfield wrote:
> Hi All:
> 
> Would it be "safe" to modify the message files in the /var/spool/imap 
> directory, say to strip out attachments?

No, don't do it.  You'd have to know a ton about how Cyrus internals
work to even try.
 
> Could it be done "online" or would I need to shut down the imapd 
> temporarily?
> 
> Or should it be done via IMAP protocol?  I'd be afraid of losing 
> internaldate or uid or something and having people's mailboxes go 
> squirrelly.

It's best done via IMAP for sure.  The mailboxes will still mess up a
bit, because messages will be out of order unless you re-inject EVERY
email regardless of whether you change it, hence keeping the ordering.

Their clients will have to download everything again though.

> Assuming some indexes are out-of-sync, would "reconstruct" be helpful? 
> I need to avoid blowing away seen/deleted/flagged flags etc.

This may have worked in Cyrus 2.3, but only because it changed uidvalidity
rather than changing UIDs.  Either way, clients have to download everything.

> I'd like to de-duplicate attachments by removing them from the stored 
> email, (storing them in another location), then modifying the message to 
> indicate what attachments were stripped and where to find them.

Sounds like you'd be better off buying some more disks, honestly.
Really.  We've thought about this over the years, and come to the conclusion
that it's going to be horrible either way.

We did write a perl thing that can talk IMAP to the server and inject a copy
of the message without the attachment, and then delete the old copy.

I'm working on the protocol lists to get an APPEND REPLACE command
implemented which allows you to append a new message which replaces a
previous one, so there's no risk of going over quota as you do this.

> This is against about 10 years of accumulated cruft in user's mailboxes, 
> not against incoming messages.

Yeah, but still.  Disk sizes are still growing fast enough that you can just
keep the lot for non-insane prices.

The work involved in doing this right is going to well outstrip the benefit unless
your archive is really massive - in which case, you're welcome to put the
effort into a Cyrus backend change that allows Cyrus to deduplicate attachments
inside its data model, and benefit everybody at the same time.

Regards,

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Cyrus-devel mailing list