Split conversations and cost
Bron Gondwana
brong at fastmail.fm
Thu Apr 20 22:28:59 EDT 2017
So running commands on alh's massively split folder in the new conversation-splitting-world that we have on master, I get this annotated callgrind.
Commands:
. login
. select INBOX.alerts
. xconvsort (reverse arrival) (conversations position (691 30)) utf-8 all
. logout
1 OK Completed (in 92.756 secs) (it's about 15 times slower than running without callgrind - this is a 6s command usually)
16,154,183,725 /usr/src/cyrus-next-build/cyrus/master/service.c:main [/usr/cyrus-next/libexec/imapd]
16,148,984,824 /usr/src/cyrus-next-build/cyrus/imap/imapd.c:service_main [/usr/cyrus-next/libexec/imapd]
16,148,289,216 /usr/src/cyrus-next-build/cyrus/imap/imapd.c:cmdloop [/usr/cyrus-next/libexec/imapd]
15,826,028,681 /usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_reload_index_record [/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,820,943,656 /usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_read_index_record [/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,447,150,662 /usr/src/cyrus-next-build/cyrus/imap/annotate.c:annotatemore_msg_lookup [/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
And there we have it. We're spending 95% of that time doing annotation lookups for split conversations:
if ((record->system_flags & FLAG_SPLITCONVERSATION)) {
struct buf annotval = BUF_INITIALIZER;
annotatemore_msg_lookup(mailbox->name, record->uid, IMAP_ANNOT_NS "basethrid", "", &annotval);
if (annotval.len == 16) {
const char *p = buf_cstring(&annotval);
/* we have a new canonical CID */
r = parsehex(p, &p, 16, &record->basecid);
}
buf_free(&annotval);
}
Which is all well and good, and we could possibly optimise this a bit more, but over 50% is just seeking to find the record in the twoskip file. That's not going to optimise much. Random access lookups on every read.
Alternative pathway ahead here is to change EVERYONE's cyrus.index files to have space to store a separate basecid field. Downside, it adds 8 bytes to every single cyrus.index record, regardless of whether they have any split conversations. Upside, fast access and simplicity on the split conversation handling.
(we also want to add some other interesting fields at the same time, for example a CREATEDDATE which will allow us to implement purge based on when a message was added to a folder by a move rather than on its INTERNALDATE).
... another alternative here is to delay reading basecid until we actually need it. There aren't many places that request it. If we had rsto's soon-to-land wrapper around index records, we could probably delay reading until someone asks for the basecid.
Bron.
--
Bron Gondwana
brong at fastmail.fm
More information about the Cyrus-devel
mailing list