Re: Backup compaction optimization in a block-level replication environment

ellie timoney ellie at fastmail.com
Mon Nov 18 00:36:00 EST 2019


> Related: I had to apply the patch described in
> (https://www.mail-archive.com/info-cyrus@lists.andrew.cmu.edu/msg47320.html),
>  "backupd IOERROR reading backup files larger than 2GB", because during
> initial population of my backup, chunks tended to by multiple GB in size
> (my %SHARED user backup is 23 GB, compressed).  Will this patch be
> merged to a main line?

Those were on master but I'm not sure why I didn't cherry-pick them back to 3.0.... Anyway, I've done that now, they'll be in the next release.

> Progress report: I started with very large chunks (minimum 64 MB,
> maximum 1024 MB) and a threshold of 8 chunks but I found that compaction
> was running every time, even on a backup file that hardly changed.  Not
> certain why this would be; my current theory is that in chunks that size
> there is almost always some benefit to compacting, so the threshold is
> passed easily.  There were 41 chunks in my %SHARED backup.

Hmm.  Yeah, the threshold is "number of chunks that would benefit from compaction", so the larger the chunks, the more likely any given chunk is to benefit from compaction, and the more likely you are to trip that threshold.

On Sat, Nov 16, 2019, at 12:10 PM, Deborah Pickett wrote:
> Further progress report: with small chunks, compaction takes about 15 
> times longer.  It's almost as if there is an O(n^2) complexity 
> somewhere, looking at the rate that the disk file grows.  (Running perf 
> on a compaction suggests that 90% of the time ctl_backups is doing 
> compression, decompression, or calculating SHA1 hashes.) So I'm going 
> back to large-ish chunks again.  Current values:
> 
> backup_compact_minsize: 1024
> backup_compact_maxsize: 65536
> backup_compact_work_threshold: 10
> 
> The compression ratio was hardly any different (less than 1%) with many 
> small chunks compared with huge chunks.

That's really interesting to hear.  It sounds like maybe the startup and cleanup of a gzip stream are more expensive than the compression/decompression parts, so it's cheaper to aim for fewer larger chunks than many smaller ones.  

zlib provides a range of 0-9 (default: 6) for whether to prioritise speed (0) or size (9) in its compression algorithm, but the backup system isn't using it in a way that exposes this as a tunable option (it's just letting it use the default by default).  With enough data it might be interesting to make it tunable and see what impact it has, but I don't think we're at a stage of needing to care this much yet.

> Setting the work threshold to a number greater than 1 is only helping a 
> bit.  I think that the huge disparity between my smaller and larger user 
> backups is hurting me here.  Whatever I set the threshold to, it is 
> going to be simultaneously too large for most users, and too small for 
> the huge %SHARED user.

Food for thought.  Maybe instead of having one "%SHARED" backup, having one "%SHARED.foo" backup per top-level shared folder would be a better implementation?  I haven't seen shared folders used much in practice, so it's interesting to hear about it.

Looking at your own data, if you had one "%SHARED.foo" backup per top level shared folder, would they be roughly user-sized pieces, or still too big?  If too big, how deep would you need to go down the tree until the worst offenders are a manageable size?  (If I make it split shared folders like this, maybe "how-deep-to-split-shared-folders" needs to be a configuration parameter, because I guess it'll vary from installation to installation.)

> Confession time: having inspected the source of ctl_backups, I admit to 
> misunderstanding what happens to chunks when compaction is triggered.  I 
> thought that each chunk was examined, and either the chunk is compacted, 
> or it is not (and the bytes in the chunk are copied from old to new 
> unchanged).  But compaction happens to the entire file: every chunk in 
> turn is inflated to /tmp and then deflated again from /tmp, minus any 
> messages that may have expired, so the likelihood of the compressed byte 
> stream being the same is slim.  That will confound the rsync rolling 
> checksum algorithm and the entire backup file will likely have to be 
> transmitted again.

Yeah, these files are append-only even within the backup system's own tooling.  Compacting a backup file to be smaller is literally re-streaming it to a new file, minus bits that aren't needed anymore, and then (if all goes well) renaming it back over the original.  It's meant to be atomic -- either it works, and you get the updated file, or something goes wrong, and the file is unchanged.  It's never modified in place!  (There's a note about this somewhere in the documentation, with regard to needing enough free disk space to write the second file in order to compact the first.)
 
> With that in mind I've decided that I'll make compaction a weekend-only 
> task, take it out of cyrus.conf EVENTS and put a weekly cron/systemd job 
> in place.  During the week backups will be append-only, to keep rsync 
> happy.  At weekends, compaction will combine the last week of small 
> chunks, and I've got all weekend to transmit the hundred GB of backup 
> files offsite.

That sounds like a pretty sensible approach

Cheers,

ellie


More information about the Info-cyrus mailing list