Precipoint VMIC driver and DeepZoom

Sat Oct 1 20:28:10 EDT 2016

On Wed, Sep 21, 2016 at 04:05:08PM +0200, Markus Pöpping via openslide-users wrote:
> The notion to move to a simple grid is valid.  However, after very long
> pondering, I left the strategic approach as is, first because it's been
> tested well, and the performance is good enough.  Secondly, I don't know
> if a simple grid can be used because truncated tiles do appear in VMIC and
> general DeepZoom on right and bottom borders.

The grid itself doesn't care about truncated tiles on the right/bottom
edges.  The read function would need to check for those edges and adjust the
size of the tile it expects to read.

> Thirdly, this also would require specialized code paths for overlapping
> and non-overlapping DZ.

You said earlier that there are no overlapping VMICs; is that still true? 
In general we try not to ship code that is more flexible than we really
need.

> > The existing drivers re-open their data files on every
> > openslide_read_region().  That avoids this kind of thread-safety issue
> > (provided that libzip doesn't have any global state) and also prevents
> > an idle openslide_t from consuming file handles.  If creating a ZIP
> > handle is expensive, you can use a handle cache + your own zip_source,
> > similar to the approach used by _openslide_tiffcache.
> 
> The nefarious global GMutex has been removed, instead, the zip_t* handle
> got wrapped by the _openslide_ziphandle structure, so we can have one
> mutex for each instance of a zip archive.

That's still inconsistent with OpenSlide's typical approach.  We generally
try to avoid shared mutable state, and if we're going to add more, there
should be a good reason.

> Also, there are now wrapper functions for zip_open, zip_open_from_source,
> zip_close, zip_fopen and zip_fclose.  This is simpler than a handle cache
> and apparently enough to allow multithreading.

Yes, but not enough to allow parallel I/O.  If multiple threads were using
the same openslide_t, only one would be able to do I/O at a time.

> Reopening the zip for every image access would of course be out of
> question.

Why?

> The slide has a title, which can be found in vendor-specific property
> "PreciPoint.ScanData.Name". As of now, the slide title is copied into
> the "openslide.comment" tag. Is there a better place ? I cannot find any
> property similar to "openslide.title" or "openslide.name".

There's no generic "slide name" property, and openslide.comment is not
well-defined.  It's as good a place as any, I suppose.

> > What are VMIC's exact rules on the name of the inner archive?  All three
> > sample files use a filename of exactly "Image.vmici".  If the name check
> > could be restricted further (in particular, if the .vmici extension is
> > always present), it might be reasonable to drop the check for the inner ZIP
> > magic number.
> 
> Basically, the inner archive is supposed to be named either
> "Image.vmici", or, for older slides up to 2015, "Image".
> Since we do want to support older slides, and the name "Image" is very
> generic, I believe, dropping the check for the magic number is no good
> idea, and I see no problem because the check is inexpensive.

ZIP doesn't have a magic number in a fixed location.  You're checking for
the first local file header, but there can be arbitrary amounts of data
prepended to a ZIP, so the header may not be at offset 0.  If we wanted to
be rigorous, we'd have to search the last 64 KB of the member for an End of
Central Directory record.  It's probably best to keep the check as is, but
it does impose an additional constraint on the format of the slide file.

--Benjamin Gilbert