<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hello Benjamin, hello OpenSlide

      community,<br>

      <br>

      A lot of changes to the VMIC driver have been made.<br>

      <br>

      The code is in the repository

      <a class="moz-txt-link-freetext" href="https://github.com/Markus-PP/openslide-vmic">https://github.com/Markus-PP/openslide-vmic</a> <br>

      <br>

      <u>Lots of technical details follow.</u><br>

      <br>

      <blockquote type="cite">

        <pre wrap="">If there are Deep Zoom features which are not used by

VMIC, you can just check for unsupported parameters and fail if present.

(Would that allow you to switch to the simple grid?)

</pre>

      </blockquote>

      <blockquote type="cite">

        <pre wrap="">The grid itself doesn't care about truncated tiles on the right/bottom

edges.  The read function would need to check for those edges and adjust the

size of the tile it expects to read.</pre>

      </blockquote>

      <br>

      I'm convinced and have moved to simple grid. Now, the read_tile

      function has it's own way to know the tile size. The compelling

      reason for this is memory foot print on large slides - a million

      heap allocations of 64 bit chunks which the tile grid does on

      large slides would create a system load nobody wants.<br>

    </div>

    <blockquote cite="mid:20161002002809.GF3744@hyliota.hl.backtick.net"

      type="cite">

      <pre wrap="">You said earlier that there are no overlapping VMICs; is that still true? 

</pre>

    </blockquote>

    Yes it's still true the VMIC deepzoom images don't overlap. One can

    never be 100% sure, but the precipoint team stated they are not

    planning to alter any file internals without a *very* important

    reason. So, yes, the simple grid will do.<br>

    <br>

    The following issues are not entirely solved:<br>

    <blockquote cite="mid:20161002002809.GF3744@hyliota.hl.backtick.net"

      type="cite"><br>

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">The existing drivers re-open their data files on every

openslide_read_region().  That avoids this kind of thread-safety issue

(provided that libzip doesn't have any global state) and also prevents

an idle openslide_t from consuming file handles.  If creating a ZIP

handle is expensive, you can use a handle cache + your own zip_source,

similar to the approach used by _openslide_tiffcache.

</pre>

        </blockquote>

        <pre wrap="">

The nefarious global GMutex has been removed, instead, the zip_t* handle

got wrapped by the _openslide_ziphandle structure, so we can have one

mutex for each instance of a zip archive.

</pre>

      </blockquote>

      <pre wrap="">

That's still inconsistent with OpenSlide's typical approach.  We generally

try to avoid shared mutable state, and if we're going to add more, there

should be a good reason.

</pre>

      <blockquote type="cite">

        <pre wrap="">Also, there are now wrapper functions for zip_open, zip_open_from_source,

zip_close, zip_fopen and zip_fclose.  This is simpler than a handle cache

and apparently enough to allow multithreading.

</pre>

      </blockquote>

      <pre wrap="">

Yes, but not enough to allow parallel I/O.  If multiple threads were using

the same openslide_t, only one would be able to do I/O at a time.

</pre>

    </blockquote>

    Parallel I/O appears kind of hard to implement under the given

    circumstances. At this iteration, a handle queue similar to the one

    in the tiffcache is implemented. BUT......<br>

    <blockquote cite="mid:20161002002809.GF3744@hyliota.hl.backtick.net"

      type="cite">

      <blockquote type="cite">

        <pre wrap="">Reopening the zip for every image access would of course be out of

question.

</pre>

      </blockquote>

      <pre wrap="">

Why?</pre>

    </blockquote>

    <br>

    With the given versions of libzip, as soon as zip_open is called,

    the *whole* zip directory is always read into memory. For any slides

    larger than tiny, the overhead would be immense. For slides with the

    size of 12GB  and 600k zip entries, it already can take a 2 seconds

    on a 10 year old computer, possibly longer if opened from a NAS. And

    reading the zip directory also allocates tens of megabytes on the

    heap. The problem arises only really for large slides, smaller

    slides up to 2GB size open very fast.<br>

    <br>

    On the other hand, as stated, we want to avoid to put any of the

    threads in a wait state as much as possible.<br>

    <br>

    The compromise solution is in the functions

    vmic_handlecache_create(), vmic_handle_get() and vmic_handle_put()<br>

    <br>

    * Calculate max number M from <font size="-1">[constant] /

      [expected resident memory allocation per archive instance]</font><br>

    * Open up to M instances of the zip in parallel, as required by

    multithreading request.<br>

    * When more than M of these are in use simultaneously, and a new

    thread requests a tile, wait for one of the instances to be

    released. Each thread is locked at maximum for the duration of the

    zip_fopen/zip_fread/zip_fclose sequence, so it's basically only the

    file read and zip decoding which is serialized. The JPEG decoding

    and cairo rendering still works in parallel threads.<br>

    <br>

    The performance of zip_open for large slides > 10GB is not

    entirely what we would wish for. Alternative solutions may include

    improvements of the libzip code itself, or move to a different

    library, if such exists in the standard distribution packages.<br>

    <blockquote cite="mid:20161002002809.GF3744@hyliota.hl.backtick.net"

      type="cite">

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">What are VMIC's exact rules on the name of the inner archive?  All three

sample files use a filename of exactly "Image.vmici".  If the name check

could be restricted further (in particular, if the .vmici extension is

always present), it might be reasonable to drop the check for the inner ZIP

magic number.

</pre>

        </blockquote>

        <pre wrap="">Basically, the inner archive is supposed to be named either

"Image.vmici", or, for older slides up to 2015, "Image".

Since we do want to support older slides, and the name "Image" is very

generic, I believe, dropping the check for the magic number is no good

idea, and I see no problem because the check is inexpensive.

</pre>

      </blockquote>

      <pre wrap="">

ZIP doesn't have a magic number in a fixed location.  You're checking for

the first local file header, but there can be arbitrary amounts of data

prepended to a ZIP, so the header may not be at offset 0.  If we wanted to

be rigorous, we'd have to search the last 64 KB of the member for an End of

Central Directory record.  It's probably best to keep the check as is, but

it does impose an additional constraint on the format of the slide file.

</pre>

    </blockquote>

    The checking code has been made a bit shorter, the name now must be

    exactly "Image" or "Image.vmici" - anything else would be rejected.<br>

    <br>

    As for the zip magic number, I have not found any archive with none

    as the first bytes of an archive yet.<br>

    Scanning ancient backup drives for random zip archives with <font

      size="-1"><i>find -name "*.zip" -exec hexdump</i></font> keeps

    giving zip magic numbers "PK34", or "PK00PK" for the rare multiple

    volume archives.<br>

    <br>

    For more constraints, a check for the .vmic file name suffix may be

    possible but is not being done in the current implementation.<br>

    This would probably speed things up when the library is used to scan

    a tree of files which are zips but not vmics. - Can such a situation

    occur ? And does it even happen that slide files suffixes are

    altered in any other than test environments ?<br>

    <blockquote type="cite">

      <pre wrap="">In C, a cast is a warning sign that something unusual is going on.  In

general, don't explicitly cast pointers unless the compiler complains.

</pre>

    </blockquote>

    Those casts between pointers have been introduced to prevent

    compiler warnings, such as signed/unsigned mismatch in an equality

    comparision, or ensuring correct word lengths in printf-like

    statements.<br>

    <blockquote type="cite">

      <blockquote type="cite">

        <pre wrap="">[...] make a format such as VMIC optional, e.g. for reasons to avoid the libzip dependency [...]

</pre>

      </blockquote>

      <pre wrap="">I've tried to avoid making formats optional, since it complicates testing

and support.  However, in this case it's unavoidable.  Debian stable, Ubuntu

Xenial (the current LTS release), and RHEL/CentOS 7 all ship versions of

libzip older than 1.1, and people often want to compile new releases of

OpenSlide on older Linux distributions.

In existing cases when an optional format is disabled, detection still

succeeds, but opening the file returns an error.  That's not possible here,

or at least it's not easy.  We'll probably want to implement the optionality

by setting an AC_DEFINE and an AM_CONDITIONAL in configure.ac based on the

pkg-config detection results, conditionally compiling decode-zip and

format-vmic (using the AM_CONDITIONAL), and wrapping the formats array

member in openslide.c in an #ifdef (using the AC_DEFINE).  If you're not

comfortable doing this yourself, you can implement the driver as a mandatory

dependency, and I'll fix up the build system myself.

(GitHub recently added a feature allowing me to push commits into OpenSlide

pull requests opened by other people, so I may sometimes do that instead of

asking you to fix a complex problem yourself.)

</pre>

      <br>

    </blockquote>

    I would happily like to take this offer, I'm entirely not acquainted

    with automake syntax.<br>

    <br>

    We do need a libzip version >= 1.1 because<br>

    <br>

    * libzip 0.11.2 does not even compile, because aliases such as zip_t

    are undefined and required features are missing.<br>

    * libzip 1.0.1 compiles ok, but zip_open_from_source returns a zip

    error 28 "Operation not supported". It must be avoided.<br>

    * libzip 1.1, 1.1.2 and 1.1.3 and 1.2 are fully functional. <br>

    <br>

    <blockquote type="cite">

      <pre wrap="">[...] code formatting [...]</pre>

    </blockquote>

    <br>

    I've been going through the code style checklist several times - the

    restriction to 80 chars per line can be a bitch, to not say more. <br>

    <br>

    Despite the above mentioned performance issues, on which I will try

    to improve whereever possible, this driver is already running in at

    least two different productive environments. Hopefully the given

    fixes are ok to consider. Once asked to do so, I'll join the git

    commits.<br>

    <br>

    Markus Pöpping<br>

    <br>

  </body>

</html>