A portable OpenSlide viewer for Windows: Smart Zoom Viewer

Benjamin Gilbert bgilbert at cs.cmu.edu
Mon Nov 16 22:33:59 EST 2015


On 2015-11-12 12:49, John Cupitt via openslide-users wrote:
> It's just deepzoom in an uncompressed zip container. The idea is that
> deepzoom is great, because it can be served extremely quickly, but
> also very annoying, since each 256x256 tile is a separate JPEG file.
> If your slide is 100,000 x 100,000 pixels, your highest-resolution
> directory will contain 150,000 files.

So SZI is intended to hold a single pyramidal image with no metadata, 
rather than an entire slide?

> This isn't so bad on Linux hosts, but Windows really struggles with
> large directories. Very large numbers of small files can also be
> rather inefficient in disk usage: many filesystems will allocate a
> separate 4kb page for each file, so for 1kb JPEGs, 3kb will be wasted 
> per tile.
> Huge directory trees are also rather slow to copy about between hosts,
> especially on Windows.

What advantages does this format have over pyramidal tiled TIFF?  I'm 
seeing a couple disadvantages:

- File size.  I used VIPS d88304a2 to make an SZI file and a TIFF file 
like this:

vips dzsave "CMU-3-40x - 2010-01-12 13.57.09.vms" cmu-3.szi --suffix 
.jpeg[Q=80] --overlap=0
vips extract_band "CMU-3-40x - 2010-01-12 13.57.09.vms" 
cmu-3.tiff[tile,pyramid,tile-width=256,tile-height=256,compression=jpeg,bigtiff,Q=80] 
0 --n 3

The output files break down as follows:

TIFF:
    38.0 MB - JPEG quantization tables [*]
   827.2 MB - Other JPEG headers and data
     4.5 MB - Remainder of file (TIFF metadata)
   869.7 MB - Total size

ZIP:
     4.5 MB - JPEG JFIF headers [*]
    49.9 MB - JPEG EXIF headers [*]
    38.0 MB - JPEG quantization tables
   950.7 MB - Other JPEG headers and data
    18.3 MB - ZIP member filenames (two copies per member)
    32.3 MB - Remainder of file (ZIP metadata)
  1093.8 MB - Total size

[*] entries are caused by bugs, and are not fundamental requirements of 
the formats.  ZIP metadata is 50.6 MB (4.9% excluding bugs) vs. 4.5 MB 
(0.5%) for TIFF.  Also, the ZIP has to include JPEG quantization tables 
with every tile, while TIFF can consolidate these into one set of tables 
per pyramid level.  (Supporting that would require a little extra code 
in the tile server, but not much, I think.)  This costs 38.0 MB for this 
sample, so in total the ZIP has 88.6 MB (8.5%) of overhead.

- From an interoperability perspective, ZIP is not ideal.  The spec is 
large, occasionally ambiguous, and has many optional features.  For SZI 
to be well-defined, a profile of ZIP would need to be specified.  (E.g., 
is central directory encryption allowed?  Should ZIP64 be enabled 
conditionally or unconditionally?)  The ZIP format also contains 
redundancy (between the local file headers and the central directory) 
which tends to lead to implementation errors.  I have, on several 
different occasions, encountered interoperability problems between 
widely-deployed ZIP writers and readers.

--Benjamin Gilbert



More information about the openslide-users mailing list