Memory management bug in squatter

David Carter dpc22 at cam.ac.uk
Mon Dec 31 10:02:09 EST 2007


Given the following set of 4 byte input "words":

   0bcd
   1abc
   1cde

squatter builds an all documents trie for initial character "0", with 
singleton sub-tries for "bcd", "cd" and "d". It frees the leaf subtrie for 
"d", but fails to dereference and free the other two subtries. The top 
level trie is then reused for the next initial character, which is "1".

This is most obvious if you try to do a depth first recursive scan
of an exiting cyrus.squat file. The output looks something like:

   0bcd
   1abc
   1b^Eb
   1b^F^B
   1cde
   . . .

^E, ^F and ^B are ASCII chars 5, 6 and 2 respectively. These will be 
variable length integer offsets encoded by squatter.

With small amounts of data I can get away with ignoring any branch node 
which branches on characters < 32. I suspect that very large tries will 
have offsets which fall into the normal ASCII working space.

Here is a fix:

http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/patches/2.3cvs/squat.patch

-- 
David Carter                             Email: David.Carter at ucs.cam.ac.uk
University Computing Service,            Phone: (01223) 334502
New Museums Site, Pembroke Street,       Fax:   (01223) 334679
Cambridge UK. CB2 3QH.


More information about the Cyrus-devel mailing list