Updating /seen from concurrent sessions

Andrew McNamara andrewm at object-craft.com.au
Thu Nov 14 23:44:45 EST 2002


>A lot of problems also result when people try to run the application on 
>more than one computer hitting the same NFS server. But things that drive 
>us application writers mad is the idea that rename() can return failure but 
>have actually happened; and if you're trying to write a reliable 
>application, you don't want to rely on the fact that the chance of this is 
>minimized, since you know it's going to happen and you're going to be sorry.

That's certainly the NFS flaw that comes to mind. I happen to agree with
you that it's not enough to simply minimise the chances of something
untoward happening. 

>I would hope it would work with a single server with multiple processes. 
>But I really haven't thought about all the possibilities with NFS. (The 
>"return error and succeed" problem is just one that springs to mind, and 
>I've never audited the code thinking about that.)

Okay. Your comments are valued.

>Great, now I need to do bookkeeping to do this. Plus on most Unix 
>filesystems, rename() is a more expensive operation than 1 fsync() and 
>probably even 2 fsync()s. And how am I suppose to programmatically 
>determine whether or not a given version is valid?

Mmm. It was a half-baked idea that came from the observation that the
flat-file \Seen code was doing renames() anyway.

>Linux ext2 has this metadata problem. ext3 and reiserfs are both suppose to 
>force metadata to disk when fsync() is called, similiar to how softupdates 
>on BSD, Veritas, or most other modern filesystems. I'm willing to bet that 
>I've wasted more time than you have worrying about the semantics of fsync() 
>on various Unix filesystems.

Quite possibly. I've certainly wasted enough time on them over the years.
It's hard to prove what a given O/S is doing is correct, even when you
have inside knowledge.

>You need to do the stat() regardless if you want the latest data. By 
>keeping the file open, you potentially amortize the cost of an open(), 
>another fstat (find out the file descriptor of your open'd fd) and an 
>mmap(). All of these have various different costs depending on your 
>platform and your Unix.

Mmap is the killer - it often involves a lot of expensive setup within the
kernel. I'd tend to think that if you were using mmap() for read access to
the file, it probably should be modified in place, rather than renamed.
The flat-file \Seen implementation both mmap()'s and renames() and this
looks to me like the source of it's pain. But then you need some sort of
cheap synchronization scheme.

BTW, have you looked at Andrew Tridgell's Trivial Database? It uses mmaped
files and spin-locks to achieve good write performance, although I don't
think resilience in the face of crashes was a high priority. However the
architecture-dependent spin lock code may be handy if you ever decide
to follow this route.

>You have one database and weren't fsync()ing the data. Cyrus has thousands 
>of active databases and cares about the reliability of the data.

As it should.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/




More information about the Info-cyrus mailing list