Precipoint VMIC driver and DeepZoom

Benjamin Gilbert bgilbert at cs.cmu.edu
Fri May 26 02:46:47 EDT 2017


On Thu, Mar 23, 2017 at 02:43:24PM +0100, Markus Pöpping wrote:
> With the given versions of libzip, as soon as zip_open is called, the
> *whole* zip directory is always read into memory.  For any slides larger
> than tiny, the overhead would be immense.  For slides with the size of
> 12GB  and 600k zip entries, it already can take a 2 seconds on a 10 year
> old computer, possibly longer if opened from a NAS.  And reading the zip
> directory also allocates tens of megabytes on the heap.

OK.  Your compromise approach sounds reasonable.

>> If we wanted to be rigorous, we'd have to search the last 64 KB of the
>> member for an End of Central Directory record.  It's probably best to
>> keep the check as is, but it does impose an additional constraint on the
>> format of the slide file.
>
> As for the zip magic number, I have not found any archive with none
> as the first bytes of an archive yet.

They do exist, but typically only in special cases (such as embedding a ZIP
file inside another type of file).  If it's safe to assume that VMIC will
never create such a file as the inner archive, and since we have to accept a
file called "Image" (which is too generic to be a sufficient check on its
own), it seems reasonable to leave the magic number check.

> For more constraints, a check for the .vmic file name suffix may be
> possible but is not being done in the current implementation.
> This would probably speed things up when the library is used to scan
> a tree of files which are zips but not vmics. - Can such a situation
> occur ?

Yes.  The VIPS image-processing library uses OpenSlide as a high-priority
loader, so OpenSlide will end up inspecting a lot of files that it will not
be able to open.  We need to reject such files efficiently to avoid causing
performance problems for VIPS.

> And does it even happen that slide files suffixes are altered in any other
> than test environments ?

Probably not.  OpenSlide generally tries not to rely on file suffixes, but
there is precedent for doing so (MIRAX).  Since reading the ZIP directory is
expensive and there appears to be little alternative, we'll probably need to
depend on the file extension.

> I've been going through the code style checklist several times - the
> restriction to 80 chars per line can be a bitch, to not say more.

Thanks for doing that.  There are elements of the coding style which are not
my preference either.  :-)

> Despite the above mentioned performance issues, on which I will try
> to improve whereever possible, this driver is already running in at
> least two different productive environments. Hopefully the given
> fixes are ok to consider. Once asked to do so, I'll join the git
> commits.

Great, thanks for the update.  I'll take a look at the code when I have some
time.

--Benjamin Gilbert


More information about the openslide-users mailing list