Backup compaction optimization in a block-level replication environment

Deborah Pickett debbiep at polyfoam.com.au
Fri Nov 15 20:10:45 EST 2019


Further progress report: with small chunks, compaction takes about 15 
times longer.  It's almost as if there is an O(n^2) complexity 
somewhere, looking at the rate that the disk file grows.  (Running perf 
on a compaction suggests that 90% of the time ctl_backups is doing 
compression, decompression, or calculating SHA1 hashes.) So I'm going 
back to large-ish chunks again.  Current values:

backup_compact_minsize: 1024
backup_compact_maxsize: 65536
backup_compact_work_threshold: 10

The compression ratio was hardly any different (less than 1%) with many 
small chunks compared with huge chunks.

Setting the work threshold to a number greater than 1 is only helping a 
bit.  I think that the huge disparity between my smaller and larger user 
backups is hurting me here.  Whatever I set the threshold to, it is 
going to be simultaneously too large for most users, and too small for 
the huge %SHARED user.

Confession time: having inspected the source of ctl_backups, I admit to 
misunderstanding what happens to chunks when compaction is triggered.  I 
thought that each chunk was examined, and either the chunk is compacted, 
or it is not (and the bytes in the chunk are copied from old to new 
unchanged).  But compaction happens to the entire file: every chunk in 
turn is inflated to /tmp and then deflated again from /tmp, minus any 
messages that may have expired, so the likelihood of the compressed byte 
stream being the same is slim.  That will confound the rsync rolling 
checksum algorithm and the entire backup file will likely have to be 
transmitted again.

With that in mind I've decided that I'll make compaction a weekend-only 
task, take it out of cyrus.conf EVENTS and put a weekly cron/systemd job 
in place.  During the week backups will be append-only, to keep rsync 
happy.  At weekends, compaction will combine the last week of small 
chunks, and I've got all weekend to transmit the hundred GB of backup 
files offsite.

-- 
Deborah Pickett
System Administrator
Polyfoam Australia Pty Ltd



More information about the Info-cyrus mailing list