Re: Backup compaction optimization in a block-level replication environment

ellie timoney ellie at fastmail.com
Thu Nov 7 17:13:14 EST 2019


I'm not sure if I'm just not understanding, but if the chunk offsets were to remain the same, then there's no benefit to compaction? A (say) 2gb file full of zeroes between small chunks is still the same 2gb on disk as one that's never been compacted at all!

And if you don't use the compaction feature, you might as well skip the backups system entirely, and have your backup server just be a normal replica that doesn't accept client traffic (maybe with a very long cyr_expire -D time?), and then you shut it down on schedule for safe block/file system backups to your offsite location.

> Right now I've set backup_compact_minsize and backup_compact_maxsize to 
> zero but I'm not sure if even that is sufficient to prevent chunk 
> offsets moving.  Perhaps I need to disable the compaction event in 
> cyrus.conf entirely.

I don't have this system entirely in my head at the moment so I'm kinda just reading documentation here, but these settings are about optimising the gzip algorithm.  Each chunk is compressed separately, and the tradeoff here is that bigger chunks compress better, but if the file becomes corrupted somehow you lose entire chunks, so smaller chunks are safer.

> A compromise would need to be 
> struck between keeping chunk offsets fixed and wasted fragmented space 
> between chunks as they shrink.

This setting might be helpful:

>           backup_compact_work_threshold: 1
>               The  number of chunks that must obviously need compaction before the com‐
>               pact tool will go ahead with the compaction.  If set to  less  than  one,
>               the value is treated as being one.

If you set your backup_compact_min/max_sizes to a size that's comfortable/practical for your block backup algorithm, but then set a very lax backup_compact_work_threshold, you might be able to find a sweet spot where you're getting the benefits of compaction eventually, but are not constantly changing every block in the file (until you do).  The default (1) is basically for compaction to occur as soon as there's something to compact out, just because the default had to be something, and without experiential data any other value would just be a hat rabbit.  But this sounds like a case where a big number would play nicer.

I guess I'd try to target a minimum size of 1 disk block per chunk, and a maximum of (fair dice roll) 4 disk blocks? But you'd need some experimentation to figure out ballpark numbers, and won't be able to tune it to exact block sizes, because the configured thresholds are the uncompressed data size, not the compressed chunk size on disk.

On Wed, Nov 6, 2019, at 8:20 PM, Deborah Pickett wrote:
> (Sorry, that's a lot of big words.  I'll try explaining what I want to do.)
> 
> On my LAN I have a Cyrus IMAP server (3.0.11), and a dedicated Cyrus 
> backup server (patched with Ellie's shared-mailbox and 64-bit fseek 
> fixes).  These are connected by a nice fat link so backups happen fast 
> and often.  A scheduled compaction occurs each morning thanks to an 
> event in cyrus.conf.
> 
> I now want to back up the backups to an off-site server over a much 
> slower link.  The off-site server doesn't speak the Cyrus sync 
> protocol.  What it does do well is block-level backups: if only a part 
> of a file has changed, only that part needs to be transferred over the 
> slow link.  [I haven't decided whether my technology will be the rsync 
> --checksum protocol, or Synology NAS XFS replication, or Microsoft 
> Server VFS snapshots.  They all do block-level backups well.]
> 
> Since Cyrus backup files are append-only, they should behave well with 
> block-level backups. But—correct me if I'm wrong—compaction is going to 
> ruin my day because a reduction in the size of chunk (say) 5 moves the 
> start offset of chunk 6 (and so on).  Even if chunk 6 doesn't change 
> it'll have to be retransmitted in its entirety.
> 
> Right now I've set backup_compact_minsize and backup_compact_maxsize to 
> zero but I'm not sure if even that is sufficient to prevent chunk 
> offsets moving.  Perhaps I need to disable the compaction event in 
> cyrus.conf entirely.
> 
> I really want compaction, though, or else my backups are going to get 
> very, very big.
> 
> Which leads me to my idea.  What if compaction could be friendlier 
> towards block-level backups, by deliberately avoiding changing chunk 
> offsets in the backup file, even if that means gaps of unused bytes when 
> (the aforementioned) chunk 5 shrinks?  It won't always work out, for 
> instance when a chunk grows in size. A compromise would need to be 
> struck between keeping chunk offsets fixed and wasted fragmented space 
> between chunks as they shrink.
> 
> I haven't collected enough data to know if I am making the right 
> assumptions about how chunk size evolves over time and how effective 
> compaction is at removing cruft from a backup file.  Has anyone thought 
> about doing something like this with Cyrus backups?
> 
> -- 
> Deborah Pickett
> System Administrator
> Polyfoam Australia Pty Ltd
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


More information about the Info-cyrus mailing list