Re: [RFC] multiplexing cyrus replication with log/log-run sharding & multiple sync_client

Bron Gondwana brong at fastmailteam.com
Thu Nov 21 10:37:49 EST 2019


Oh I should just add the other thing that you might be interested in that I've got some initial stabs at - synchronous replication. Embedding the sync_client logic into mailbox commit such that any action that writes to a mailbox does a pass through and creates a "SYNC APPLY MAILBOX" dlist stanza and shoves it down the wire at a replica. There's a couple of bits missing so far - it needs a way to upload the message content as well, which I'm probably just going to embed directly the the RECORD compontent - it's kind of ugly but it makes it a single DLIST - and obviously it needs to not fail entirely if the replica is down, so it might be fire and forget or it might have a timeout after which it syslogs but returns.

It builds on top of the existing SINCE_MODSEQ and SINCE_UIDNEXT logic that's already in master, and will also want a "sync cache" - which will store the remote MAILBOX line for each mailbox, so you can generate a SYNC APPLY without first having to do a SYNC GET to find the current remote state. Assuming the replica hasn't changed in the meanwhile, this will allow for single round trip apply of changes rather than the current 4 round trips for an APPEND.

(truly, it's 4 round trips!)

S0 SYNCGET MAILBOX user.cassandane
S1 SYNCAPPLY RESERVE
S2 SYNCAPPLY MESSAGE
S3 SYNCAPPLY MAILBOX

In my plan, the "SYNCGET MAILBOX" would not be needed, because you'd already know the remote state. The "RESERVE" would not be needed because you'd already know from the local conversations.db that this message wasn't listed in any other mailbox or with a UID less than the UIDNEXT of the remote mailbox from that known remote state, so all you'd have left is the SYNCAPPLY MESSAGE and the SYNCAPPLY MAILBOX. So it's just a matter of merging those into a single round trip with some nice combined format, and you're done :)

Bron.

On Fri, Nov 22, 2019, at 02:25, Bron Gondwana wrote:
> Wow, interesting. That definitely works, though I'd probably normalise everything to the user ID so that the seen and mailbox events for the same user got the same channel.
> 
> We're looking at similar things for our setup too, either shading or even per user logs with a daemon which farms users out to multiple channels.
> 
> As for when we'd look at a sync daemon: probably next year. We're planning to land uuid based storage soon, which means that renaming users and mailboxes is really fast, then looking at replication channels on top of that would make more sense, because otherwise user renames become tricky.
> 
> I'll have a look at the diff when it isn't 11:30pm for me.
> 
> Cheers,
> 
> Bron
> 
> On Thu, Nov 21, 2019, at 18:50, Thomas Cataldo wrote:
>> Hi,
>> 
>> In our workload, cyrus replication latency is pretty critical as we serve most read requests from the replica.
>> Having a single network channel between master & replica is a big issue for us.
>> 
>> Trying to improve our latency, we implemented the following approach : instead of writing “channel/log” we write “channel/log.<shard_index>”.
>> We compute our shard key this way :
>> 
>> # cat log.0 
>> APPEND devenv.blue!user.tom.Sent
>> MAILBOX devenv.blue!user.tom.Sent
>> 
>> # cat log.2 
>> SEEN tom at devenv.blue 9f799278-a6cd-45b7-9546-0e861d5e15d6
>> root at bm1804:/var/lib/cyrus/sync/core# cat log.3 
>>>> APPEND devenv.blue!user.sga
>> MAILBOX devenv.blue!user.sga
>> 
>> We compute an hashcode of the first argument. We normalize it so devenv.blue!user.tom.Sent and devenv.blue!user.tom have the same hashcode then we “hashcode % shard_count” to figure out which log file to use.
>> We patched sync_client to add a “-i <shard_index>”. sync_client -i 0 will process log.0 and use log-run.0, etc.
>> 
>> We don’t spawn sync_client from cyrus.conf but we prefer systemd tricks :
>> 
>> /lib/systemd/system/bm-cyrus-syncclient at .service which is a template and we then enable :
>> systemctl enable bm-cyrus-syncclient@{0..3} to spawn 4 sync_client.
>> 
>> 
>> Attached diff of what we changed. 
>> 
>> As a side note, our usage forbids moving a mailbox folder into another mailbox (ie. moving user.tom.titi into user.sga.stuff is forbidden in our setup). I guess this approach would be problematic we moving a mailbox subfolder to another mailbox as they might be sharded to separate log files.
>> 
>> Any feedback on this approach ? I read that you planned to turn sync_client into a sync daemon. Any schedule estimate on that ?
>> 
>> Regards,
>> Thomas.
>> 
>> 
>> sync_client systemd configuration template :
>> /lib/systemd/system/bm-cyrus-syncclient at .service (%i is expanded to 42 by systemd when you enable syncclient at 42)
>> [Unit]
>> Description=BlueMind Cyrus sync_client service
>> After=bm-cyrus-imapd.service
>> PartOf=bm-cyrus-imapd.service
>> ConditionPathExists=!/etc/bm/bm-cyrus-imapd.disabled
>> 
>> [Service]
>> Type=forking
>> Environment=CONF=/etc/imapd.conf
>> ExecStartPre=/usr/bin/find /var/lib/cyrus/sync -name ‘log*.%i' -type f -exec rm -f {} \;
>> ExecStart=/usr/sbin/sync_client -C $CONF -t 1800 -n core -i %i -l -r
>> SuccessExitStatus=75
>> RemainAfterExit=no
>> Restart=always
>> RestartSec=5s
>> TimeoutStopSec=20s
>> 
>> [Install]
>> WantedBy=bm-cyrus-imapd.service
>> 
>> 
>> 
>> 
>> 
>> Thomas Cataldo
>> Directeur Technique
>> 
>> (+33) 6 42 25 91 38
>> 
>> BlueMind
>> +33 (0)5 81 91 55 60
>> Hotel des Télécoms, 40 rue du village d'entreprises
>> 31670 Labège, France
>> www.bluemind.net / https://blog.bluemind.net/fr/
>> 
>> 
>> 
>> 
>> *Attachments:*
>>  * replication_multiplexing.diff
> 
> --
>  Bron Gondwana, CEO, Fastmail Pty Ltd
>  brong at fastmailteam.com
> 

--
 Bron Gondwana, CEO, Fastmail Pty Ltd
 brong at fastmailteam.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/cyrus-devel/attachments/20191122/1f69138d/attachment.html>


More information about the Cyrus-devel mailing list