Updating /seen from concurrent sessions

Lawrence Greenfield leg+ at andrew.cmu.edu
Thu Nov 14 21:16:41 EST 2002


--On Friday, November 15, 2002 12:52 PM +1100 Andrew McNamara 
<andrewm at object-craft.com.au> wrote:

> What's the general feeling on the skiplist implementation used in
> conjunction with Sun and NetApp's NFS (we're locked in to using this
> combination for various reasons)? Would you be more or less likely to
> trust it over db3?

In general none of Cyrus will necessarily work over NFS. If you're only 
accessing the NFS store from a single client, things have a much better 
chance of working---but I really don't know what semantics Sun's NFS client 
and NetApp's NFS filer guarantee with regards to mmap() and write(). If it 
doesn't support mmap() showing changes by write() immediately (Cyrus tests 
for this in the configure script but the configure script is probably not 
doodling on an NFS partition) you need to use map_nommap, which is very 
slow.

Berkeley db makes no guarantees of working over NFS. skiplist should work 
over NFS with a single client and map_nommap.

> Another question - it looks to me like I have to recompile to switch
> database types - is this true? The code looks like it would be flexible
> enough to allow a run-time config option to chose the method with very
> little modification?

It probably could be made a run-time option. Since you need to convert all 
of the different files, making it an easy run-time switch has never been a 
priority.

>> It would be possible to flush the seen state more often; it's just a
>> question of how often and when should other imapds look for it.
>
> If the imapd already can cope with asynchronous events, I would flush the
> state after a second or two of inactivity from the client. Failing that,
> I would probably flush the state before replying to the client (yes,
> this would hurt performance, although probably not much, particularly
> if we skip the fsync()).

You can't skip the fsync() because the fsync()s are what guarantees that 
the files will be in a consistent form if the system crashes. (The fsync()s 
are needed for ordering guarantees of operation. This is true for Berkeley 
db, skiplist, flat files, whatever.)

> But this just fixes the OE problem - Cyrus would still have a problem
> (as far as I can see): all the other copies accessing that mailbox
> will still have their old seen files open (maybe using skiplist fixes
> this). The flat-file seen implementation needs to check to see if the
> file has been renamed under it (and do what?).

The flat file database layer (cyrusdb_flat) already knows how to do this at 
the appropriate time. The caching is being implemented in the seen layer 
(seen_db.c) not the flat file implementation.

> To be honest, the flat file seen implementation is way more complicated
> than I would have thought was worthwhile. My preference would be to
> not hold the file open, and simply re-write the whole file each time we
> updated it, renaming the replacement into place (to make the operation
> atomic - this is also the only synchronous operation). My experience has
> been that unix is quite happy doing naive things like this while the
> file remains small (say less than 10k).

Whenever there is a change, the flat file does rewrite the entire file. The 
database layer holds the file open because the database layer assumes that 
other operations (reads on other keys, things like that). Updates are very 
frequent, which is why the skiplist implementation can perform better.

However, updates can be an order of magnitude more frequent if we're going 
to write for every flag change. Cyrus is written with the expectation that 
you will have thousands of simultaneous clients working on tens or hundreds 
of thousands of mailboxes.

> I implemented a Postfix map that works this way - for lookups, it simply
> does a linear read/search of the file. For update, it writes a new file,
> and moves it into place. Generally this performed much better than
> more complex schemes such as the Sleepycat DB's - particularly when you
> consider memory footprint (this was on a machine with about 100k users,
> handling 10's of messages per second).

It doesn't scale when there are frequent updates. That's why we have the 
database abstraction, so we can choose the file format that does the job 
most effectively. cyrusdb_flat does exactly this, and it works ok when you 
don't need frequent updates. Seen state has frequent updates.

Larry





More information about the Info-cyrus mailing list