Backup compaction optimization in a block-level replication environment
Deborah Pickett
debbiep at polyfoam.com.au
Thu Nov 7 21:35:40 EST 2019
On 2019-11-08 09:13, ellie timoney wrote:
> I'm not sure if I'm just not understanding, but if the chunk offsets were to remain the same, then there's no benefit to compaction? A (say) 2gb file full of zeroes between small chunks is still the same 2gb on disk as one that's never been compacted at all!
That's true. I suppose I'm imagining a threshold, where if the file
hits, say, 20% wasted space, then I can "defrag" the file and recover
the lost space, on the understanding that the next sync will have to
copy the entire file again.
But you mentioned:
> And if you don't use the compaction feature, you might as well skip the backups system entirely, and have your backup server just be a normal replica that doesn't accept client traffic (maybe with a very long cyr_expire -D time?), and then you shut it down on schedule for safe block/file system backups to your offsite location.
... and that seems a more reasonable approach. I didn't know if copying
the filesystem of a (paused) Cyrus replica was a supported way of
backing up, but now I do. Is there a list of which database and index
files I need to copy apart from the files inside the partition structure?
> This setting might be helpful:
>
>> backup_compact_work_threshold: 1
>> The number of chunks that must obviously need compaction before the com‐
>> pact tool will go ahead with the compaction. If set to less than one,
>> the value is treated as being one.
> If you set your backup_compact_min/max_sizes to a size that's comfortable/practical for your block backup algorithm, but then set a very lax backup_compact_work_threshold, you might be able to find a sweet spot where you're getting the benefits of compaction eventually, but are not constantly changing every block in the file (until you do). The default (1) is basically for compaction to occur as soon as there's something to compact out, just because the default had to be something, and without experiential data any other value would just be a hat rabbit. But this sounds like a case where a big number would play nicer.
>
> I guess I'd try to target a minimum size of 1 disk block per chunk, and a maximum of (fair dice roll) 4 disk blocks? But you'd need some experimentation to figure out ballpark numbers, and won't be able to tune it to exact block sizes, because the configured thresholds are the uncompressed data size, not the compressed chunk size on disk.
Thanks, I saw that setting but didn't really think through how it would
help me. I'll experiment with it and report back.
--
*Deborah Pickett*
System Administrator
*Polyfoam Australia Pty Ltd*
More information about the Info-cyrus
mailing list