Backup compaction optimization in a block-level replication environment
Deborah Pickett
debbiep at polyfoam.com.au
Fri Nov 15 20:10:45 EST 2019
Further progress report: with small chunks, compaction takes about 15
times longer. It's almost as if there is an O(n^2) complexity
somewhere, looking at the rate that the disk file grows. (Running perf
on a compaction suggests that 90% of the time ctl_backups is doing
compression, decompression, or calculating SHA1 hashes.) So I'm going
back to large-ish chunks again. Current values:
backup_compact_minsize: 1024
backup_compact_maxsize: 65536
backup_compact_work_threshold: 10
The compression ratio was hardly any different (less than 1%) with many
small chunks compared with huge chunks.
Setting the work threshold to a number greater than 1 is only helping a
bit. I think that the huge disparity between my smaller and larger user
backups is hurting me here. Whatever I set the threshold to, it is
going to be simultaneously too large for most users, and too small for
the huge %SHARED user.
Confession time: having inspected the source of ctl_backups, I admit to
misunderstanding what happens to chunks when compaction is triggered. I
thought that each chunk was examined, and either the chunk is compacted,
or it is not (and the bytes in the chunk are copied from old to new
unchanged). But compaction happens to the entire file: every chunk in
turn is inflated to /tmp and then deflated again from /tmp, minus any
messages that may have expired, so the likelihood of the compressed byte
stream being the same is slim. That will confound the rsync rolling
checksum algorithm and the entire backup file will likely have to be
transmitted again.
With that in mind I've decided that I'll make compaction a weekend-only
task, take it out of cyrus.conf EVENTS and put a weekly cron/systemd job
in place. During the week backups will be append-only, to keep rsync
happy. At weekends, compaction will combine the last week of small
chunks, and I've got all weekend to transmit the hundred GB of backup
files offsite.
--
Deborah Pickett
System Administrator
Polyfoam Australia Pty Ltd
More information about the Info-cyrus
mailing list