Introduction to MIRAX/MRXS

Tue Jul 24 08:29:01 EDT 2012

Awesome, thanks Benjamin!

-----Original Message-----
From: openslide-users-bounces+jminnie=live.com at lists.andrew.cmu.edu
[mailto:openslide-users-bounces+jminnie=live.com at lists.andrew.cmu.edu] On
Behalf Of Benjamin Gilbert
Sent: Tuesday, July 24, 2012 12:17 AM
To: OpenSlide Users
Subject: Introduction to MIRAX/MRXS

I've been asked to write some introductory material on the MIRAX format,
since the documentation on the OpenSlide website is incomplete and
occasionally cryptic.  Here it is.  Comments welcome.

There are two parts to this story: the tile structure of MIRAX slides and
the layout of the data on disk.  Both parts can be fully appreciated only
with a certain sense of humor or a certain type of beverage. 
Throughout, I will be using the CMU-1.mrxs slide to illustrate.

A note on terminology: the vendor calls the format MRXS.  For historical
reasons, OpenSlide calls it MIRAX.

Tile structure
--------------

The scanner works by taking multiple photos of the slide as the camera moves
past the glass.  (Or as the glass moves past the camera; I'm not sure
which.)  The scanner tries to overlap these photos by an amount specified in
the OVERLAP_X and OVERLAP_Y Slidedat keys, stated in pixels.  However, the
camera's movements are not very precise, so in fact the position of each
photo will be slightly different than the nominal overlap values would
suggest.

On older scanners, the hardware knew the position of the camera with high
precision, even though it couldn't move it very accurately.  These positions
were recorded in the slide file in the VIMSLIDE_POSITION_BUFFER.default
non-hierarchical section and used to properly position each photo.  I
suspect that newer scanners cannot detect the position of the camera,
relying instead on post-processing to detect the degree of overlap between
adjacent photos.  This would explain the format change between version 1.9
and 2.2 slides.

The camera's photos are fairly high-resolution, too large to be practically
used as image tiles.  So, on disk, they are broken up into multiple JPEGs, N
on a side.  In CMU-1.mrxs and many other slides, N is
4 (it's the GENERAL.CameraImageDivisionsPerSide value in the Slidedat), so
there are 16 tiles per camera position.  This is in level 0, the
highest-resolution level.

Each numerically higher (lower-resolution) level concatenates four tiles
from the previous level in a 2x2 grid and scales the image down by a factor
of 2.  So level 1 has 4 tiles per camera position and level 2 has 1.

Level 3 then has to concatenate tiles corresponding to *different* camera
positions.  And indeed it does, in the exact same way: a 2x2 grid.  But
those camera positions overlap!  So in the middle of the new concatenated
tile is a block of garbage: 15 pixels, nominally, which are redundant with
the 15 pixels next to them.  Of course the actual number of pixels of
garbage depends on how much the camera positions overlap, which varies from
photo to photo.

This problem gets worse and worse as we move through the levels.  By the
time we get to level 9, each 340x256 pixel tile has 127 blocks of garbage in
each dimension, each of which is nominally 0.234375 pixels wide.  In order
to render this tile, we have to separately extract the pixels corresponding
to each camera position -- many of which are fractional pixels due to
repeated downsampling -- and render them in their correct positions at
sub-pixel resolution.  By the nature of sub-pixel image manipulation, the
result can only be an approximation of a cleanly-downsampled image.

All other slide formats supported by OpenSlide process any overlaps during
the scanning process, before generating reduced-resolution levels.  MIRAX is
the only supported format which defers the processing of image overlaps to
the viewer application, and it is what drove OpenSlide to depend so
extensively on the Cairo graphics library.

On-disk format
--------------

The MIRAX on-disk format is complicated, full of
things-pointing-to-other-things.  The format stores two types of data: 
hierarchical data (that is, pyramidal images: the actual slide data, plus
some other stuff we don't decode), and non-hierarchical data (thumbnail
images, etc.).  Each type of data is stored in a tree structure dedicated to
that type, and finding a block of data requires us to traverse a lot of
pointers.

Let's say we want to draw a single JPEG tile from the seventh pyramid level
of CMU-1.mrxs.  We do the following:

1.  We start with the [HIERARCHICAL] section in Slidedat.ini.  We want to
read the image pyramid, which is hierarchical data, so we look at the
HIER_* keys.  HIER_COUNT is 3, so there are three hier trees.  We read each
HIER_%d_NAME key, for %d from 0 to 2, until we find one with a value of
"Slide zoom level".  We've now discovered that we want HIER_0.

2.  HIER_0_COUNT is 10, so this hier tree has ten leaves, each corresponding
to a pyramid level.  We want to read from the seventh pyramid level, so we
read the HIER_0_VAL_6_SECTION key to get the name of a different Slidedat
section: in this case, LAYER_0_LEVEL_6_SECTION.

3.  We look at LAYER_0_LEVEL_6_SECTION.  There we find some values that may
be useful: the nominal camera position overlap for this level (1.875
pixels), MICROMETER_PER_PIXEL values, etc.  But this doesn't help us find
the image data.

4.  To locate the image data, we need to look at the Index.dat. 
Index.dat begins with a version string and a UUID.  Immediately after that
are two 4-byte pointer values (little-endian) which we call the hier_root
and the nonhier_root.  They give the locations within the index file of,
respectively, the hierarchical and non-hierarchical offset tables.  We seek
to the location specified by the hier_root.

5.  The offset table is an array of, again, 4-byte little-endian pointers.
Now we need to determine which entry to read.  If we were to build a flat
list of all of the HIER_0 sections in numerical order, followed by the
HIER_1 sections, etc., the entry we need would correspond to our section's
position in that list.  In this case we need the seventh entry.  We seek to
that location.

6.  Here we find a linked list of data pages.  Each page begins with two
4-byte values (little-endian as always): the number of data entries in the
page and the address of the next page (or 0 if this is the end of the list).
For some reason, the initial page in the list always has 0 data entries.  We
follow the pointer to the next entry.

7.  Now we have a page with entries in it.  Each entry consists of four
4-byte integers: the tile index, offset, length, and file number.  The file
number is an index into the array of filenames formed by the [DATAFILE]
Slidedat section, and tells us which file to read.  The offset and length
tell us what bytes to read out of that file.  So all we have to do is
traverse the linked list until we find the tile index we want.  Now we need
to calculate that tile index.

8.  The tile index is defined as (y * tiles_across + x), where tiles_across
is really GENERAL.IMAGENUMBER_X from the Slidedat file. 
That's fine for level 0.  In higher levels, x and y are always multiples of
2^level to account for the lower number of JPEG tiles.  So if we want the
tile at position (3, 4) within level 6, we need tile index (4 << 6)
* 352 + (3 << 6) = 90304.  (This tile may not even exist.  If the scanning
software determines that a particular tile is blank, it omits the tile
entirely.  At higher levels, a tile exists if any of the constituent level 0
tiles also exist.)

9.  Suppose the tile does exist.  Now we can finally read out the data for a
single 340x256 JPEG.  Hooray!  Now all we need to do is extract and render
1,024 subtiles to account for the 31 overlapped regions on each axis of the
tile.  Of course, in order to know exactly *where* to render those subtiles
within the output image, we need to know the exact position of the camera
when it produced each subtile.

10.  The camera position map is stored in a non-hierarchical section called
"default" in a tree called "VIMSLIDE_POSITION_BUFFER".  To find it, we need
to start all the way back at the top, in the Slidedat file.

Aside: Reading non-hierarchical sections
----------------------------------------

10a.  We again start with the Slidedat [HIERARCHICAL] section.  By
traversing NONHIER_COUNT, NONHIER_%d_NAME, NONHIER_%d_COUNT, and
NONHIER_%d_VAL_%d, we eventually find our nonhier section at
NONHIER_3_VAL_0.  So the index into the nonhier offset table is
NONHIER_0_COUNT + NONHIER_1_COUNT + NONHIER_2_COUNT + 0 = 12.

10b.  We read the Index.dat as before: nonhier_root to nonhier offset table
to linked list head.  Again the first page in the linked list has no data
entries.  The second page has one entry and a 0 next pointer. 
The data entry itself is an array of five 4-byte values: 0, 0, offset,
length, and file number.  Good enough!  Now we can look up the file number
in the [DATAFILE] Slidedat section, read out the data, and if we were
reading the nonhier section for a thumbnail or barcode image, we'd be done.
But we're not.  We still need to decode the slide position file.

Tile decoding, part II
----------------------

11.  The slide position file is an array of 9-byte entries, one for each
camera position, in row-major order.  Each entry consists of a flag byte of
unknown purpose (which is always 0 or 1) and two 4-byte signed integers
representing the level 0 X and Y pixel coordinates of the camera position.
(Negative coordinate values do occasionally occur.) If a camera position was
omitted from the slide file because its region was empty, its coordinate
values will be garbage or 0.  So, to finally draw our tile, all we need to
do read the camera positions for each of its 1,024 subtiles which have
corresponding tiles in level 0, divide their coordinates by 2^level, and
render away!

Epilogue
--------

With MIRAX, as with all formats, OpenSlide actually loads all of the 
pertinent slide metadata before openslide_open() returns.  At runtime 
(that is, during openslide_read_region()) it can simply look up subtile 
positions in memory, do lots and lots of compositing, and return the 
desired pixels.

MRXS files that are generated by the Export function of the vendor's 
viewer application don't have any overlaps, because the viewer is kind 
enough to preprocess them away.  In this case there is no 
VIMSLIDE_POSITION_BUFFER, no nominal overlaps, and OpenSlide skips the 
subtile processing for greater performance.  The application also has a 
"Save" command which can produce a downsampled version of a slide; the 
resulting slide simply omits the requisite number of bottom levels, 
divides all of the coordinate values in the slide position file by 
2^levels_skipped, and updates the IMAGE_CONCAT_FACTOR of the now-lowest 
level to reflect the number of levels that were skipped.

Please use caution when depending on any of the details described above, 
as some of them are from memory and may have shifted during flight. 
Almost all of the above was discovered by Adam Goode, who has more 
patience than I do; all errors of narrative are mine; all design choices 
are the original vendor's.  Now, if you'll excuse me, I need to go find 
a certain type of beverage.

--Benjamin Gilbert

_______________________________________________
openslide-users mailing list
openslide-users at lists.andrew.cmu.edu
https://lists.andrew.cmu.edu/mailman/listinfo/openslide-users