A portable OpenSlide viewer for Windows: Smart Zoom Viewer
Benjamin Gilbert
bgilbert at cs.cmu.edu
Mon Nov 16 22:33:59 EST 2015
On 2015-11-12 12:49, John Cupitt via openslide-users wrote:
> It's just deepzoom in an uncompressed zip container. The idea is that
> deepzoom is great, because it can be served extremely quickly, but
> also very annoying, since each 256x256 tile is a separate JPEG file.
> If your slide is 100,000 x 100,000 pixels, your highest-resolution
> directory will contain 150,000 files.
So SZI is intended to hold a single pyramidal image with no metadata,
rather than an entire slide?
> This isn't so bad on Linux hosts, but Windows really struggles with
> large directories. Very large numbers of small files can also be
> rather inefficient in disk usage: many filesystems will allocate a
> separate 4kb page for each file, so for 1kb JPEGs, 3kb will be wasted
> per tile.
> Huge directory trees are also rather slow to copy about between hosts,
> especially on Windows.
What advantages does this format have over pyramidal tiled TIFF? I'm
seeing a couple disadvantages:
- File size. I used VIPS d88304a2 to make an SZI file and a TIFF file
like this:
vips dzsave "CMU-3-40x - 2010-01-12 13.57.09.vms" cmu-3.szi --suffix
.jpeg[Q=80] --overlap=0
vips extract_band "CMU-3-40x - 2010-01-12 13.57.09.vms"
cmu-3.tiff[tile,pyramid,tile-width=256,tile-height=256,compression=jpeg,bigtiff,Q=80]
0 --n 3
The output files break down as follows:
TIFF:
38.0 MB - JPEG quantization tables [*]
827.2 MB - Other JPEG headers and data
4.5 MB - Remainder of file (TIFF metadata)
869.7 MB - Total size
ZIP:
4.5 MB - JPEG JFIF headers [*]
49.9 MB - JPEG EXIF headers [*]
38.0 MB - JPEG quantization tables
950.7 MB - Other JPEG headers and data
18.3 MB - ZIP member filenames (two copies per member)
32.3 MB - Remainder of file (ZIP metadata)
1093.8 MB - Total size
[*] entries are caused by bugs, and are not fundamental requirements of
the formats. ZIP metadata is 50.6 MB (4.9% excluding bugs) vs. 4.5 MB
(0.5%) for TIFF. Also, the ZIP has to include JPEG quantization tables
with every tile, while TIFF can consolidate these into one set of tables
per pyramid level. (Supporting that would require a little extra code
in the tile server, but not much, I think.) This costs 38.0 MB for this
sample, so in total the ZIP has 88.6 MB (8.5%) of overhead.
- From an interoperability perspective, ZIP is not ideal. The spec is
large, occasionally ambiguous, and has many optional features. For SZI
to be well-defined, a profile of ZIP would need to be specified. (E.g.,
is central directory encryption allowed? Should ZIP64 be enabled
conditionally or unconditionally?) The ZIP format also contains
redundancy (between the local file headers and the central directory)
which tends to lead to implementation errors. I have, on several
different occasions, encountered interoperability problems between
widely-deployed ZIP writers and readers.
--Benjamin Gilbert
More information about the openslide-users
mailing list