Introduction to MIRAX/MRXS
Benjamin Gilbert
bgilbert at cs.cmu.edu
Tue Jul 24 01:16:31 EDT 2012
I've been asked to write some introductory material on the MIRAX format,
since the documentation on the OpenSlide website is incomplete and
occasionally cryptic. Here it is. Comments welcome.
There are two parts to this story: the tile structure of MIRAX slides
and the layout of the data on disk. Both parts can be fully appreciated
only with a certain sense of humor or a certain type of beverage.
Throughout, I will be using the CMU-1.mrxs slide to illustrate.
A note on terminology: the vendor calls the format MRXS. For historical
reasons, OpenSlide calls it MIRAX.
Tile structure
--------------
The scanner works by taking multiple photos of the slide as the camera
moves past the glass. (Or as the glass moves past the camera; I'm not
sure which.) The scanner tries to overlap these photos by an amount
specified in the OVERLAP_X and OVERLAP_Y Slidedat keys, stated in
pixels. However, the camera's movements are not very precise, so in
fact the position of each photo will be slightly different than the
nominal overlap values would suggest.
On older scanners, the hardware knew the position of the camera with
high precision, even though it couldn't move it very accurately. These
positions were recorded in the slide file in the
VIMSLIDE_POSITION_BUFFER.default non-hierarchical section and used to
properly position each photo. I suspect that newer scanners cannot
detect the position of the camera, relying instead on post-processing to
detect the degree of overlap between adjacent photos. This would
explain the format change between version 1.9 and 2.2 slides.
The camera's photos are fairly high-resolution, too large to be
practically used as image tiles. So, on disk, they are broken up into
multiple JPEGs, N on a side. In CMU-1.mrxs and many other slides, N is
4 (it's the GENERAL.CameraImageDivisionsPerSide value in the Slidedat),
so there are 16 tiles per camera position. This is in level 0, the
highest-resolution level.
Each numerically higher (lower-resolution) level concatenates four tiles
from the previous level in a 2x2 grid and scales the image down by a
factor of 2. So level 1 has 4 tiles per camera position and level 2 has 1.
Level 3 then has to concatenate tiles corresponding to *different*
camera positions. And indeed it does, in the exact same way: a 2x2
grid. But those camera positions overlap! So in the middle of the new
concatenated tile is a block of garbage: 15 pixels, nominally, which are
redundant with the 15 pixels next to them. Of course the actual number
of pixels of garbage depends on how much the camera positions overlap,
which varies from photo to photo.
This problem gets worse and worse as we move through the levels. By the
time we get to level 9, each 340x256 pixel tile has 127 blocks of
garbage in each dimension, each of which is nominally 0.234375 pixels
wide. In order to render this tile, we have to separately extract the
pixels corresponding to each camera position -- many of which are
fractional pixels due to repeated downsampling -- and render them in
their correct positions at sub-pixel resolution. By the nature of
sub-pixel image manipulation, the result can only be an approximation of
a cleanly-downsampled image.
All other slide formats supported by OpenSlide process any overlaps
during the scanning process, before generating reduced-resolution
levels. MIRAX is the only supported format which defers the processing
of image overlaps to the viewer application, and it is what drove
OpenSlide to depend so extensively on the Cairo graphics library.
On-disk format
--------------
The MIRAX on-disk format is complicated, full of
things-pointing-to-other-things. The format stores two types of data:
hierarchical data (that is, pyramidal images: the actual slide data,
plus some other stuff we don't decode), and non-hierarchical data
(thumbnail images, etc.). Each type of data is stored in a tree
structure dedicated to that type, and finding a block of data requires
us to traverse a lot of pointers.
Let's say we want to draw a single JPEG tile from the seventh pyramid
level of CMU-1.mrxs. We do the following:
1. We start with the [HIERARCHICAL] section in Slidedat.ini. We want
to read the image pyramid, which is hierarchical data, so we look at the
HIER_* keys. HIER_COUNT is 3, so there are three hier trees. We read
each HIER_%d_NAME key, for %d from 0 to 2, until we find one with a
value of "Slide zoom level". We've now discovered that we want HIER_0.
2. HIER_0_COUNT is 10, so this hier tree has ten leaves, each
corresponding to a pyramid level. We want to read from the seventh
pyramid level, so we read the HIER_0_VAL_6_SECTION key to get the name
of a different Slidedat section: in this case, LAYER_0_LEVEL_6_SECTION.
3. We look at LAYER_0_LEVEL_6_SECTION. There we find some values that
may be useful: the nominal camera position overlap for this level (1.875
pixels), MICROMETER_PER_PIXEL values, etc. But this doesn't help us
find the image data.
4. To locate the image data, we need to look at the Index.dat.
Index.dat begins with a version string and a UUID. Immediately after
that are two 4-byte pointer values (little-endian) which we call the
hier_root and the nonhier_root. They give the locations within the
index file of, respectively, the hierarchical and non-hierarchical
offset tables. We seek to the location specified by the hier_root.
5. The offset table is an array of, again, 4-byte little-endian
pointers. Now we need to determine which entry to read. If we were to
build a flat list of all of the HIER_0 sections in numerical order,
followed by the HIER_1 sections, etc., the entry we need would
correspond to our section's position in that list. In this case we need
the seventh entry. We seek to that location.
6. Here we find a linked list of data pages. Each page begins with two
4-byte values (little-endian as always): the number of data entries in
the page and the address of the next page (or 0 if this is the end of
the list). For some reason, the initial page in the list always has 0
data entries. We follow the pointer to the next entry.
7. Now we have a page with entries in it. Each entry consists of four
4-byte integers: the tile index, offset, length, and file number. The
file number is an index into the array of filenames formed by the
[DATAFILE] Slidedat section, and tells us which file to read. The
offset and length tell us what bytes to read out of that file. So all
we have to do is traverse the linked list until we find the tile index
we want. Now we need to calculate that tile index.
8. The tile index is defined as (y * tiles_across + x), where
tiles_across is really GENERAL.IMAGENUMBER_X from the Slidedat file.
That's fine for level 0. In higher levels, x and y are always multiples
of 2^level to account for the lower number of JPEG tiles. So if we want
the tile at position (3, 4) within level 6, we need tile index (4 << 6)
* 352 + (3 << 6) = 90304. (This tile may not even exist. If the
scanning software determines that a particular tile is blank, it omits
the tile entirely. At higher levels, a tile exists if any of the
constituent level 0 tiles also exist.)
9. Suppose the tile does exist. Now we can finally read out the data
for a single 340x256 JPEG. Hooray! Now all we need to do is extract
and render 1,024 subtiles to account for the 31 overlapped regions on
each axis of the tile. Of course, in order to know exactly *where* to
render those subtiles within the output image, we need to know the exact
position of the camera when it produced each subtile.
10. The camera position map is stored in a non-hierarchical section
called "default" in a tree called "VIMSLIDE_POSITION_BUFFER". To find
it, we need to start all the way back at the top, in the Slidedat file.
Aside: Reading non-hierarchical sections
----------------------------------------
10a. We again start with the Slidedat [HIERARCHICAL] section. By
traversing NONHIER_COUNT, NONHIER_%d_NAME, NONHIER_%d_COUNT, and
NONHIER_%d_VAL_%d, we eventually find our nonhier section at
NONHIER_3_VAL_0. So the index into the nonhier offset table is
NONHIER_0_COUNT + NONHIER_1_COUNT + NONHIER_2_COUNT + 0 = 12.
10b. We read the Index.dat as before: nonhier_root to nonhier offset
table to linked list head. Again the first page in the linked list has
no data entries. The second page has one entry and a 0 next pointer.
The data entry itself is an array of five 4-byte values: 0, 0, offset,
length, and file number. Good enough! Now we can look up the file
number in the [DATAFILE] Slidedat section, read out the data, and if we
were reading the nonhier section for a thumbnail or barcode image, we'd
be done. But we're not. We still need to decode the slide position file.
Tile decoding, part II
----------------------
11. The slide position file is an array of 9-byte entries, one for each
camera position, in row-major order. Each entry consists of a flag byte
of unknown purpose (which is always 0 or 1) and two 4-byte signed
integers representing the level 0 X and Y pixel coordinates of the
camera position. (Negative coordinate values do occasionally occur.)
If a camera position was omitted from the slide file because its region
was empty, its coordinate values will be garbage or 0. So, to finally
draw our tile, all we need to do read the camera positions for each of
its 1,024 subtiles which have corresponding tiles in level 0, divide
their coordinates by 2^level, and render away!
Epilogue
--------
With MIRAX, as with all formats, OpenSlide actually loads all of the
pertinent slide metadata before openslide_open() returns. At runtime
(that is, during openslide_read_region()) it can simply look up subtile
positions in memory, do lots and lots of compositing, and return the
desired pixels.
MRXS files that are generated by the Export function of the vendor's
viewer application don't have any overlaps, because the viewer is kind
enough to preprocess them away. In this case there is no
VIMSLIDE_POSITION_BUFFER, no nominal overlaps, and OpenSlide skips the
subtile processing for greater performance. The application also has a
"Save" command which can produce a downsampled version of a slide; the
resulting slide simply omits the requisite number of bottom levels,
divides all of the coordinate values in the slide position file by
2^levels_skipped, and updates the IMAGE_CONCAT_FACTOR of the now-lowest
level to reflect the number of levels that were skipped.
Please use caution when depending on any of the details described above,
as some of them are from memory and may have shifted during flight.
Almost all of the above was discovered by Adam Goode, who has more
patience than I do; all errors of narrative are mine; all design choices
are the original vendor's. Now, if you'll excuse me, I need to go find
a certain type of beverage.
--Benjamin Gilbert
More information about the openslide-users
mailing list