Cyrus bulk-loading with APPEND (was: Re: imclient >4k literal error)

dkaiser at som.llu.edu dkaiser at som.llu.edu
Mon Apr 7 14:40:17 EDT 2003


Hi,

I've just written a similar script, except that it runs via command-line PHP,
(although you could wrap the user-copy function via a web-form).  My script
runs a  LDAP query, grabs the userlist, does the mail migration for each user,
then leaves a "bookmark" attribute in LDAP, in the event I have to restart the
script, I won't re-copy those.  (I'm running against a temporary LDAP server
where I've set each user's password to something like "default", then will
point cyrus-imap to the real LDAP server following the migration.)

(Elmo, I'm going to e-mail you off-list and exchange scripts, if you don't
mind...)

This script is pretty efficient, except that I'm coming to the conclusion that
cyrus IMAP is not really setup for bulk loading via APPEND.

[root at sweeper imap]# pwd
/usr/src/redhat/BUILD/cyrus-imapd-2.1.12/imap

[root at sweeper imap]# grep -c fsync * | grep -v ":0$"
append.c:4
mailbox.c:11
message.c:1
reconstruct.c:1
seen_local.c:3

The reason for all those fsync calls is simple.  I would claim that cyrus has
the inherent design goal of guaranteeing that an incoming message (delivered
asynchronously from any number of lmtp/deliver agents,) is written to disk ASAP
to minimize the potential data loss due to system outage.

Since I have the goal of implementing a bulk migration, and I also own the
source of the messages and can guarantee the delivery time (I run the script)
it is advantageous to me to try and find a way to minimize the amount of
blocking due to the fsync() calls.

I am working to try and patch cyrus to only fsync at the end of each mailbox's
full of APPEND calls, if that's even possible (I haven't really dug in to the
code yet.) My average mailbox has ~~1500 messages in it, and my server can
easily buffer that and then do a single batch of disk activity to write it to
storage.

Right now I'm looking at the possibility of doing something like:
*  remove the fsync() calls from append_commit in imap/append.c
*  call fsync() inside of mailbox_close() in imap/mailbox.c

I have a dual Xeon (2.4ghz) with 4Gb of ram, with multiple gigabit ethernet,
etc...  this thing is fast.  When running my mail migration script from a
similar speed server, the most I get is about 3.5Mb/minute copied.  Also
getting about 2500 - 3000 interrupts per second on the RAID controller.  :)

I don't have a whole week to migrate mail (5500 users) whilst watching a server
run 99% idle because of syncing each 20k of data.

Is anyone else attempting to do a large-scale bulk migration via the imap APPEND
command?  If so, have you done any speed testing/measuring, and/or attempts at
increasing the server speed?  I would be interested in hearing from anyone,
success or failure.

Thanks!

On Wed, 2 Apr 2003, E.M. Recio wrote:
> I wrote a PHP script that does just that. It comes up with a form for old
> mail username/pass, new mail username/pass and click "go" and it transfers
> the mail. Let me know if you want a copy... this is what we are using to
> migrate from OpenMail to Cyrus-IMAPd for 2500 users.

--
David Kaiser <dkaiser at som.llu.edu>
Loma Linda University School of Medicine Information Systems


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/




More information about the Info-cyrus mailing list