Cyrus crashed on redundant platform - need better availability?

Wed Sep 15 11:04:29 EDT 2004

Hi,

--On Mittwoch, 15. September 2004 13:38 Uhr +0200 Paul Dekkers 
<Paul.Dekkers at surfnet.nl> wrote:

>>> You are not using a clustered filesystem,
>>> right?
>>
>> No.
>
> I can imagine that would be one of the advantages of RH's clustering,
> since you don't have to mount a filesystem in that case for a machine
> that just crashed - it would safe time...

I'm not sure if Red Hat even supports a clustered FS at this time. It 
certainly didn't when we set up the system more than two years ago.

> But I suppose RH's cluster manager takes care of mounting the partitions
> and checking them if there are any errors.

Right. The unmounting/mounting of partitions usually works fine, but there 
have been problems at times. The worst one was causing alternating crashes 
of both nodes:

sd(8,73)): ext3_free_blocks: Freeing blocks not in datazone - block = 
225139276, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1919637002, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 894788200, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1883792719, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1347113037, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 829312330, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 893538370, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1450341715, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 909390198, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1366706293, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 846548333, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 1630746450, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in 
datazone - block = 860649837, count = 1
EXT3-fs error (device sd(8,73)

leading to this:

Assertion failure in journal_forget_Rsmp_094dfde7() at transaction.c:1226: 
"!jh->b_committed_data"
------------[ cut here ]------------
kernel BUG at transaction.c:1226!
invalid operand: 0000
Kernel 2.4.9-e.38enterprise
CPU:    3
EIP:    0010:[<f885b636>]    Not tainted
EFLAGS: 00010282
EIP is at journal_forget_Rsmp_094dfde7 [jbd] 0xd6
eax: 00000025   ebx: ce6e8c10   ecx: c02f7f84   edx: 0008dad9
esi: cd95f3e0   edi: cd7a3094   ebp: cd7a3000   esp: cb947d70
ds: 0018   es: 0018   ss: 0018
Process ctl_cyrusdb (pid: 4500, stackpage=cb947000)
Stack: f8863f30 000004ca e7b08b20 cd95f3e0 cd7a3000 0000000b cd95f3e0 
f885ee69
       ce14ac40 cd95f3e0 cd95f3e0 cd95f3e0 cab35900 ce14ac40 f886bc8c 
ce14ac40
       00020000 cd95f3e0 cd95f3e0 cd6ad000 cd6ae000 cdd93000 cd95f3e0 
00020000
Call Trace: [<f8863f30>] .LC7 [jbd] 0x0 (0xcb947d70)
[<f885ee69>] journal_revoke_Rsmp_56fa5ece [jbd] 0xf9 (0xcb947d8c)
[<f886bc8c>] ext3_forget [ext3] 0x7c (0xcb947da8)
[<f886df3a>] ext3_free_branches [ext3] 0xda (0xcb947dd8)
[<f886df2c>] ext3_free_branches [ext3] 0xcc (0xcb947e30)
[<f886e2ec>] ext3_truncate [ext3] 0x2bc (0xcb947e74)
[<f885a285>] start_this_handle [jbd] 0x125 (0xcb947eac)
[<f885a38f>] journal_start_Rsmp_ec53be73 [jbd] 0xbf (0xcb947ec4)
[<f886bd5e>] start_transaction [ext3] 0x4e (0xcb947ee4)
[<f886bee7>] ext3_delete_inode [ext3] 0xe7 (0xcb947f08)
[<f887a080>] ext3_sops [ext3] 0x0 (0xcb947f28)
[<c015dd1c>] iput_free [kernel] 0x14c (0xcb947f2c)
[<f886f9c3>] ext3_lookup [ext3] 0x73 (0xcb947f40)
[<c015addb>] dentry_iput [kernel] 0x4b (0xcb947f50)
[<c01541ab>] vfs_unlink [kernel] 0x1eb (0xcb947f60)
[<c0152c41>] lookup_hash [kernel] 0x91 (0xcb947f6c)
[<c015427a>] sys_unlink [kernel] 0x9a (0xcb947f88)
[<c01181c0>] do_page_fault [kernel] 0x0 (0xcb947fb0)
[<c01073e3>] system_call [kernel] 0x33 (0xcb947fc0)

Code: 0f 0b 59 58 53 e8 40 03 00 00 8b 43 24 c7 43 14 00 00 00 00
 <0>Kernel panic: not continuing

I had to intercept the boot process manually before the cluster software 
starts and fsck the partition. Not good. But this problem has been fixed in 
a kernel update.

>>>> It's good but not perfect. We recently installed a huge SAN and are
>>>> now in the process of moving over the mail data to reside there.
>>>> Fibrechannel seems to be much more error tolerant than SCSI.
>>>
> Where you working with a "multi-initiator enviroment" (as RH calls it) or
> "single initiator" (e.g. with 2 machines on exactly the same SCSI bus, or
> two seperate interfaces on your array's SCSI controller?)
> I think with a multi-initiator enviroment (as we have it) there is a very
> limited chance of failures.

I'm not sure about the terminology, but we have two separate SCSI busses on 
the RAID, one for each host. I thought that was "single initiator"? The 
problem that regularly occurred is the following: the cluster software 
requires a raw partition that's mounted by both nodes, called the "quorum 
partition". Each node regularly writes a timestamp on the quorum partition 
to prove it's alive. This is in addition to heartbeat channels over serial 
lines and ethernet. When one of the nodes doesn't write to the quorum 
partition for more than an adjustable period of time, the other node 
"shoots it in the head". That happened several times, even though the slow 
node hadn't actually crashed.

>>> Hmm, I don't expect the problems to be SCSI-related. Maybe it has to
>>> do...
>>
>> That's not what I was talking about. We have a similar setup, yet
>> still there were instances when Red Hat's cluster software failed to
>> write to the shared storage. I guess this was caused by the slow-downs
>> connected to the memory management, but Red Hat support indicated that
>> shared storage connected via FibreChannel would not have been as
>> susceptible to these problems.
>
> Do you think using RH's cluster software is a valuable consideration for
> this kind of clustering setup?

Yes, I do.

> Using FreeBSD there are not that many
> clustering solutions for now, and if it's advisable to at least consider
> using RH here (although I have no experience with RH) we can certainly
> look at it. (Any idea how fast RH would "recover services"?)

That depends on how you configure it, but usually within a minute.

Cheers, Sebastian Hagedorn
--
Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
Url : https://lists.andrew.cmu.edu/mailman/private/info-cyrus/attachments/20040915/552b890b/attachment.bin