Memory management bug in squatter
David Carter
dpc22 at cam.ac.uk
Mon Dec 31 10:02:09 EST 2007
Given the following set of 4 byte input "words":
0bcd
1abc
1cde
squatter builds an all documents trie for initial character "0", with
singleton sub-tries for "bcd", "cd" and "d". It frees the leaf subtrie for
"d", but fails to dereference and free the other two subtries. The top
level trie is then reused for the next initial character, which is "1".
This is most obvious if you try to do a depth first recursive scan
of an exiting cyrus.squat file. The output looks something like:
0bcd
1abc
1b^Eb
1b^F^B
1cde
. . .
^E, ^F and ^B are ASCII chars 5, 6 and 2 respectively. These will be
variable length integer offsets encoded by squatter.
With small amounts of data I can get away with ignoring any branch node
which branches on characters < 32. I suspect that very large tries will
have offsets which fall into the normal ASCII working space.
Here is a fix:
http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/patches/2.3cvs/squat.patch
--
David Carter Email: David.Carter at ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
More information about the Cyrus-devel
mailing list