Introduction to MIRAX/MRXS

Tue Jul 24 01:16:31 EDT 2012

I've been asked to write some introductory material on the MIRAX format, 
since the documentation on the OpenSlide website is incomplete and 
occasionally cryptic.  Here it is.  Comments welcome.

There are two parts to this story: the tile structure of MIRAX slides 
and the layout of the data on disk.  Both parts can be fully appreciated 
only with a certain sense of humor or a certain type of beverage. 
Throughout, I will be using the CMU-1.mrxs slide to illustrate.

A note on terminology: the vendor calls the format MRXS.  For historical 
reasons, OpenSlide calls it MIRAX.

Tile structure
--------------

The scanner works by taking multiple photos of the slide as the camera 
moves past the glass.  (Or as the glass moves past the camera; I'm not 
sure which.)  The scanner tries to overlap these photos by an amount 
specified in the OVERLAP_X and OVERLAP_Y Slidedat keys, stated in 
pixels.  However, the camera's movements are not very precise, so in 
fact the position of each photo will be slightly different than the 
nominal overlap values would suggest.

On older scanners, the hardware knew the position of the camera with 
high precision, even though it couldn't move it very accurately.  These 
positions were recorded in the slide file in the 
VIMSLIDE_POSITION_BUFFER.default non-hierarchical section and used to 
properly position each photo.  I suspect that newer scanners cannot 
detect the position of the camera, relying instead on post-processing to 
detect the degree of overlap between adjacent photos.  This would 
explain the format change between version 1.9 and 2.2 slides.

The camera's photos are fairly high-resolution, too large to be 
practically used as image tiles.  So, on disk, they are broken up into 
multiple JPEGs, N on a side.  In CMU-1.mrxs and many other slides, N is 
4 (it's the GENERAL.CameraImageDivisionsPerSide value in the Slidedat), 
so there are 16 tiles per camera position.  This is in level 0, the 
highest-resolution level.

Each numerically higher (lower-resolution) level concatenates four tiles 
from the previous level in a 2x2 grid and scales the image down by a 
factor of 2.  So level 1 has 4 tiles per camera position and level 2 has 1.

Level 3 then has to concatenate tiles corresponding to *different* 
camera positions.  And indeed it does, in the exact same way: a 2x2 
grid.  But those camera positions overlap!  So in the middle of the new 
concatenated tile is a block of garbage: 15 pixels, nominally, which are 
redundant with the 15 pixels next to them.  Of course the actual number 
of pixels of garbage depends on how much the camera positions overlap, 
which varies from photo to photo.

This problem gets worse and worse as we move through the levels.  By the 
time we get to level 9, each 340x256 pixel tile has 127 blocks of 
garbage in each dimension, each of which is nominally 0.234375 pixels 
wide.  In order to render this tile, we have to separately extract the 
pixels corresponding to each camera position -- many of which are 
fractional pixels due to repeated downsampling -- and render them in 
their correct positions at sub-pixel resolution.  By the nature of 
sub-pixel image manipulation, the result can only be an approximation of 
a cleanly-downsampled image.

All other slide formats supported by OpenSlide process any overlaps 
during the scanning process, before generating reduced-resolution 
levels.  MIRAX is the only supported format which defers the processing 
of image overlaps to the viewer application, and it is what drove 
OpenSlide to depend so extensively on the Cairo graphics library.

On-disk format
--------------

The MIRAX on-disk format is complicated, full of 
things-pointing-to-other-things.  The format stores two types of data: 
hierarchical data (that is, pyramidal images: the actual slide data, 
plus some other stuff we don't decode), and non-hierarchical data 
(thumbnail images, etc.).  Each type of data is stored in a tree 
structure dedicated to that type, and finding a block of data requires 
us to traverse a lot of pointers.

Let's say we want to draw a single JPEG tile from the seventh pyramid 
level of CMU-1.mrxs.  We do the following:

1.  We start with the [HIERARCHICAL] section in Slidedat.ini.  We want 
to read the image pyramid, which is hierarchical data, so we look at the 
HIER_* keys.  HIER_COUNT is 3, so there are three hier trees.  We read 
each HIER_%d_NAME key, for %d from 0 to 2, until we find one with a 
value of "Slide zoom level".  We've now discovered that we want HIER_0.

2.  HIER_0_COUNT is 10, so this hier tree has ten leaves, each 
corresponding to a pyramid level.  We want to read from the seventh 
pyramid level, so we read the HIER_0_VAL_6_SECTION key to get the name 
of a different Slidedat section: in this case, LAYER_0_LEVEL_6_SECTION.

3.  We look at LAYER_0_LEVEL_6_SECTION.  There we find some values that 
may be useful: the nominal camera position overlap for this level (1.875 
pixels), MICROMETER_PER_PIXEL values, etc.  But this doesn't help us 
find the image data.

4.  To locate the image data, we need to look at the Index.dat. 
Index.dat begins with a version string and a UUID.  Immediately after 
that are two 4-byte pointer values (little-endian) which we call the 
hier_root and the nonhier_root.  They give the locations within the 
index file of, respectively, the hierarchical and non-hierarchical 
offset tables.  We seek to the location specified by the hier_root.

5.  The offset table is an array of, again, 4-byte little-endian 
pointers.  Now we need to determine which entry to read.  If we were to 
build a flat list of all of the HIER_0 sections in numerical order, 
followed by the HIER_1 sections, etc., the entry we need would 
correspond to our section's position in that list.  In this case we need 
the seventh entry.  We seek to that location.

6.  Here we find a linked list of data pages.  Each page begins with two 
4-byte values (little-endian as always): the number of data entries in 
the page and the address of the next page (or 0 if this is the end of 
the list).  For some reason, the initial page in the list always has 0 
data entries.  We follow the pointer to the next entry.

7.  Now we have a page with entries in it.  Each entry consists of four 
4-byte integers: the tile index, offset, length, and file number.  The 
file number is an index into the array of filenames formed by the 
[DATAFILE] Slidedat section, and tells us which file to read.  The 
offset and length tell us what bytes to read out of that file.  So all 
we have to do is traverse the linked list until we find the tile index 
we want.  Now we need to calculate that tile index.

8.  The tile index is defined as (y * tiles_across + x), where 
tiles_across is really GENERAL.IMAGENUMBER_X from the Slidedat file. 
That's fine for level 0.  In higher levels, x and y are always multiples 
of 2^level to account for the lower number of JPEG tiles.  So if we want 
the tile at position (3, 4) within level 6, we need tile index (4 << 6) 
* 352 + (3 << 6) = 90304.  (This tile may not even exist.  If the 
scanning software determines that a particular tile is blank, it omits 
the tile entirely.  At higher levels, a tile exists if any of the 
constituent level 0 tiles also exist.)

9.  Suppose the tile does exist.  Now we can finally read out the data 
for a single 340x256 JPEG.  Hooray!  Now all we need to do is extract 
and render 1,024 subtiles to account for the 31 overlapped regions on 
each axis of the tile.  Of course, in order to know exactly *where* to 
render those subtiles within the output image, we need to know the exact 
position of the camera when it produced each subtile.

10.  The camera position map is stored in a non-hierarchical section 
called "default" in a tree called "VIMSLIDE_POSITION_BUFFER".  To find 
it, we need to start all the way back at the top, in the Slidedat file.

Aside: Reading non-hierarchical sections
----------------------------------------

10a.  We again start with the Slidedat [HIERARCHICAL] section.  By 
traversing NONHIER_COUNT, NONHIER_%d_NAME, NONHIER_%d_COUNT, and 
NONHIER_%d_VAL_%d, we eventually find our nonhier section at 
NONHIER_3_VAL_0.  So the index into the nonhier offset table is 
NONHIER_0_COUNT + NONHIER_1_COUNT + NONHIER_2_COUNT + 0 = 12.

10b.  We read the Index.dat as before: nonhier_root to nonhier offset 
table to linked list head.  Again the first page in the linked list has 
no data entries.  The second page has one entry and a 0 next pointer. 
The data entry itself is an array of five 4-byte values: 0, 0, offset, 
length, and file number.  Good enough!  Now we can look up the file 
number in the [DATAFILE] Slidedat section, read out the data, and if we 
were reading the nonhier section for a thumbnail or barcode image, we'd 
be done.  But we're not.  We still need to decode the slide position file.

Tile decoding, part II
----------------------

11.  The slide position file is an array of 9-byte entries, one for each 
camera position, in row-major order.  Each entry consists of a flag byte 
of unknown purpose (which is always 0 or 1) and two 4-byte signed 
integers representing the level 0 X and Y pixel coordinates of the 
camera position.  (Negative coordinate values do occasionally occur.) 
If a camera position was omitted from the slide file because its region 
was empty, its coordinate values will be garbage or 0.  So, to finally 
draw our tile, all we need to do read the camera positions for each of 
its 1,024 subtiles which have corresponding tiles in level 0, divide 
their coordinates by 2^level, and render away!

Epilogue
--------

With MIRAX, as with all formats, OpenSlide actually loads all of the 
pertinent slide metadata before openslide_open() returns.  At runtime 
(that is, during openslide_read_region()) it can simply look up subtile 
positions in memory, do lots and lots of compositing, and return the 
desired pixels.

MRXS files that are generated by the Export function of the vendor's 
viewer application don't have any overlaps, because the viewer is kind 
enough to preprocess them away.  In this case there is no 
VIMSLIDE_POSITION_BUFFER, no nominal overlaps, and OpenSlide skips the 
subtile processing for greater performance.  The application also has a 
"Save" command which can produce a downsampled version of a slide; the 
resulting slide simply omits the requisite number of bottom levels, 
divides all of the coordinate values in the slide position file by 
2^levels_skipped, and updates the IMAGE_CONCAT_FACTOR of the now-lowest 
level to reflect the number of levels that were skipped.

Please use caution when depending on any of the details described above, 
as some of them are from memory and may have shifted during flight. 
Almost all of the above was discovered by Adam Goode, who has more 
patience than I do; all errors of narrative are mine; all design choices 
are the original vendor's.  Now, if you'll excuse me, I need to go find 
a certain type of beverage.

--Benjamin Gilbert