My cassandane tests break. Yours too?

Bron Gondwana brong at fastmail.fm
Sat Dec 5 05:30:46 EST 2015


On Fri, Dec 4, 2015, at 13:04, ellie timoney via Cyrus-devel wrote:
> On Fri, Dec 4, 2015, at 12:20 PM, ellie timoney via Cyrus-devel wrote:
> > 
> > But at this rate running it single threaded will take the rest of the day.
> 
> ... or not.  It just "finished", in dramatic style.
> 
> It reached a point where it started rapidly outputting result messages,
> interspersed with syslog terminal spam.  I have my terminal set to save
> 100,000 lines of scrollback.  I was watching it fly past, crossing my
> fingers that the point where it started going pear shaped would still be
> in the buffer by the time it finished...
> 
> and then the oom killer got my window manager, shortly followed by xorg.
> Doh!
> 
> So that was frustrating and, in the end, uninformative.  I've saved the
> contents of the log directory so I can dig through that later at some
> point, but in the meantime I'm going back to my notifyd-less branch.

Yeah, ok, running without -j gives you a different run mode.  Annoying.
I always run it with -j.  So it's failing to read from the temp file because...

brong at wot:~/src/cassandane$ sudo -u cyrus strace -o /tmp/st.out -s 5000 perl testrunner.pl -f pretty ACL
Cyrus::ACL.delete                                                       [  OK  ]
Perl exception: Cannot open /tmp/4LGRwctL8O for reading: No such file or directory
 at Cassandane/Unit/TestCase.pm line 82.
	Cassandane::Unit::TestCase::annotate_from_file(Cassandane::Cyrus::ACL=HASH(0x38fd568), "/tmp/4LGRwctL8O") called at Cassandane/Unit/TestPlan.pm line 756

...

stat("/tmp/", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=57344, ...}) = 0
stat("/tmp/", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=57344, ...}) = 0
open("/tmp/4LGRwctL8O", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, 0600) = 6
ioctl(6, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffd5bfc3030) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(6, 0, SEEK_CUR)                   = 0
fstat(6, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
chmod("/tmp/4LGRwctL8O", 0600)          = 0
dup(6)                                  = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffd5bfc30a0) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(7, 0, SEEK_CUR)                   = 0
fstat(7, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
dup2(7, 1)                              = 1
close(7)                                = 0
fcntl(1, F_SETFD, 0)                    = 0
dup(6)                                  = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7ffd5bfc30a0) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(7, 0, SEEK_CUR)                   = 0
fstat(7, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
dup2(7, 2)                              = 2
close(7)                                = 0
fcntl(2, F_SETFD, 0)                    = 0
close(6)                                = 0
write(2, "=====> Cyrus::TestCase[302] ---------- BEGIN test_delete ----------\n", 68) = 68

Nothing too exciting yet - it's dup'd 6 to 7 and then to 1 and then again to 2, then closed 6 - leaving the file open and connected to both 1 and 2 (STDOUT and STDERR) - I could already tell that from reading the code.

And then nothing.  No mention of unlinking.  Oh, hang on - running it with -f:

21910 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f165f5c79d0) = 21912
21910 write(2, "=====> Instance[801] started notifyd for /tmpfs/cas/1027481 as 21912\n", 69 <unfinished ...>

21912 open("/tmp/Jdwi3ZiOiS", O_RDONLY) = 4
...
21912 read(4, "=====> Cyrus::TestCase[302] ---------- BEGIN test_delete ----------\n=====> Instance[862] start main "..., 8192) = 2185
...
21912 read(4, "", 8192)                 = 0
21912 close(4)                          = 0
21912 lstat("/tmp/Jdwi3ZiOiS", {st_mode=S_IFREG|0600, st_size=2185, ...}) = 0
21912 unlink("/tmp/Jdwi3ZiOiS") 

So it smells like the notifyd isn't dying correctly when it finishes running, and is instead continuing to run the same codepath as the master, hence cleaning up the file, and the master then can't find it.

Damn.

I guess I'll be fixing that now :)

Bron.


-- 
  Bron Gondwana
  brong at fastmail.fm


More information about the Cyrus-devel mailing list