A portable OpenSlide viewer for Windows: Smart Zoom Viewer

Priv.-Doz. Dr. M. Weihrauch martin.weihrauch at uni-koeln.de
Tue Nov 17 03:11:27 EST 2015


John will be able to answer that question better than I can as he knows
more about the TIFF format in comparison.

However, our requirement was a) to replace tiled images on a server with
a container, b) high speed of serving tiles and c) low server CPU load.
Thus, we put the deepzoom image pyramid (zoomify is also supported) in
an uncompressed ZIP. The beauty of it is that you can also take existing
pyramids and just zip them (uncompressed).

To achieve a high speed, we want as few disk operations as possible to
retrieve a tile from a container. The minimum required amount is 2,
which we meet with the following strategy:
For each szi we created a corresponding szd (Smart Zoom directory) file,
which is just a header + a hashtable of offsets/lengths. With that
(first disk access for each tile), we have an offset to read
informations about the tile from the szd. Then we have the second disk
access and read the tile from the ZIP from offset X with length Y and
serve the tile to the client.

Currently, at Smart In Media, we are using this format for numerous
Universites, that are using our cloud-platform, serving thousands of
customers (mostly medical students, but also physicians and biologists).
We can move slides quickly (as they are in 1 file) between servers (in
contrast to using billions of tiles) and our servers never showed any
performance problems. Of course, we have to convert the original images
from some format into SZI, but we only have to do that once and can use
it forever.
The serving speed of 1 tile out of SZI through the internet is only
little slower than from a tiled image (appr. 10 ms --> from 25 to 35 ms
or so). Of course, we also add caching to speed up frequently used tiles
and to decrease server load.

Again, I don't know too much about TIFF, but if I remember correctly,
you would have at least 3 read accesses instead of 2...

Best regards

Martin

Am 17.11.2015 um 04:33 schrieb Benjamin Gilbert via openslide-users:
> On 2015-11-12 12:49, John Cupitt via openslide-users wrote:
>> It's just deepzoom in an uncompressed zip container. The idea is that
>> deepzoom is great, because it can be served extremely quickly, but
>> also very annoying, since each 256x256 tile is a separate JPEG file.
>> If your slide is 100,000 x 100,000 pixels, your highest-resolution
>> directory will contain 150,000 files.
>
> So SZI is intended to hold a single pyramidal image with no metadata,
> rather than an entire slide?
>
>> This isn't so bad on Linux hosts, but Windows really struggles with
>> large directories. Very large numbers of small files can also be
>> rather inefficient in disk usage: many filesystems will allocate a
>> separate 4kb page for each file, so for 1kb JPEGs, 3kb will be wasted
>> per tile.
>> Huge directory trees are also rather slow to copy about between hosts,
>> especially on Windows.
>
> What advantages does this format have over pyramidal tiled TIFF?  I'm
> seeing a couple disadvantages:
>
> - File size.  I used VIPS d88304a2 to make an SZI file and a TIFF file
> like this:
>
> vips dzsave "CMU-3-40x - 2010-01-12 13.57.09.vms" cmu-3.szi --suffix
> .jpeg[Q=80] --overlap=0
> vips extract_band "CMU-3-40x - 2010-01-12 13.57.09.vms"
> cmu-3.tiff[tile,pyramid,tile-width=256,tile-height=256,compression=jpeg,bigtiff,Q=80]
> 0 --n 3
>
> The output files break down as follows:
>
> TIFF:
>    38.0 MB - JPEG quantization tables [*]
>   827.2 MB - Other JPEG headers and data
>     4.5 MB - Remainder of file (TIFF metadata)
>   869.7 MB - Total size
>
> ZIP:
>     4.5 MB - JPEG JFIF headers [*]
>    49.9 MB - JPEG EXIF headers [*]
>    38.0 MB - JPEG quantization tables
>   950.7 MB - Other JPEG headers and data
>    18.3 MB - ZIP member filenames (two copies per member)
>    32.3 MB - Remainder of file (ZIP metadata)
>  1093.8 MB - Total size
>
> [*] entries are caused by bugs, and are not fundamental requirements
> of the formats.  ZIP metadata is 50.6 MB (4.9% excluding bugs) vs. 4.5
> MB (0.5%) for TIFF.  Also, the ZIP has to include JPEG quantization
> tables with every tile, while TIFF can consolidate these into one set
> of tables per pyramid level.  (Supporting that would require a little
> extra code in the tile server, but not much, I think.)  This costs
> 38.0 MB for this sample, so in total the ZIP has 88.6 MB (8.5%) of
> overhead.
>
> - From an interoperability perspective, ZIP is not ideal.  The spec is
> large, occasionally ambiguous, and has many optional features.  For
> SZI to be well-defined, a profile of ZIP would need to be specified. 
> (E.g., is central directory encryption allowed?  Should ZIP64 be
> enabled conditionally or unconditionally?)  The ZIP format also
> contains redundancy (between the local file headers and the central
> directory) which tends to lead to implementation errors.  I have, on
> several different occasions, encountered interoperability problems
> between widely-deployed ZIP writers and readers.
>
> --Benjamin Gilbert
>
> _______________________________________________
> openslide-users mailing list
> openslide-users at lists.andrew.cmu.edu
> https://lists.andrew.cmu.edu/mailman/listinfo/openslide-users
>


-- 
---------------------------------
Priv.-Doz. Dr. med. Martin Weihrauch
Facharzt für Innere Medizin,
Hämatologie und internistische Onkologie
Ärztlicher Geschäftsführer des
MVZ der Uniklinik Köln
50924 Köln
Tel: 0221-47886615
Fax: 0221-47886616



More information about the openslide-users mailing list