Openslide Usage

Mon Nov 10 04:50:28 EST 2014

On 11/07/2014 08:55 AM, Nikolay Kladt wrote:
> The scn file has multiple associated images (scanned ROIs) that have
> been scanned with identical resolution. How do I access individual
> images or get their bounds? It seems that the properties do not show the
> required information but the file format description includes multiple
> images.

OpenSlide doesn't currently expose that information.  In the long run, 
we will probably need to support multiple image pyramids per file (e.g. 
for Leica slides with ROIs scanned at different resolutions).  At the 
moment, however, the philosophy is to merge multiple ROIs into one 
unified image pyramid when possible, and reject the slide otherwise.

For now, your choices are to do a brute-force search for non-transparent 
pixels as you described, or to extract and parse the ImageDescription 
XML yourself as John proposed.  If you try the brute-force search, you 
can restrict it to the region enclosed by the openslide.bounds-[xywh] 
properties.

We could expose ROI information through the API in several ways:

1. Once we get the multiple-pyramid support mentioned above, we could 
always expose each ROI as a separate pyramid.  This doesn't help for 
MIRAX, whose on-disk format doesn't distinguish between ROIs.

2. There's a longstanding issue 
(https://github.com/openslide/openslide/issues/35) proposing new API to 
list individual tile positions for every image tile in the slide.  That 
assumes the information can be generalized into a consistent and useful 
format, which I think is probably not the case.  I'm also not convinced 
an application should ever have to work with tile information at this 
level of detail.

3. We could expose bounding boxes for each of the ROIs.  This could be 
an array of x/y/w/h properties similar to openslide.bounds-[xywh], 
either in the leica or openslide namespaces, or perhaps a real API call. 
  For Leica this could be okay, if verbose.  The problem is MIRAX and 
formats like it, where ROIs are not rectangular: we could compute a 
bounding box but it might still contain significant blank areas.

4. Provide an API call returning a bitmap image depicting non-empty 
slide regions.  The call could allow the user to specify a "virtual 
tile" size and perhaps a step size, and then return a white pixel for 
each populated virtual tile and a black pixel for each empty one.  This 
would produce similar results to your brute-force search, but much more 
efficently because OpenSlide has access to the underlying slide 
metadata.  It should also generalize to arbitrary slide formats.

e.g.:

bool openslide_get_presence_bitmap(openslide_t *osr, uint8_t *out_buf,
                                    int64_t x, int64_t y,
                                    int64_t w, int64_t h,
                                    int64_t tile_w, int64_t tile_h,
                                    int64_t step_x, int64_t step_y);

However, this might be hard for applications to use, and we'd have to 
provide a utility function to calculate the correct size for *out_buf.

5. The simpler version of #4: provide an API call returning whether a 
particular region is non-empty.  e.g.:

bool openslide_is_present(openslide_t *osr,
                           int64_t x, int64_t y,
                           int64_t w, int64_t h);

6. Others?

What are you trying to accomplish with the ROI information?  Are you 
looking for pixel-precise bounds for the ROIs, or perhaps even some 
additional information about them (like which is the first, second, 
third ROI in the file)?  Or are you just trying to avoid doing lots of 
image processing on empty parts of the slide?

> If this is a problem of having test data, we can easily generate data
> sets that can be shared.

I do have a couple slides with multiple ROIs, but nothing in the public 
dataset.  If you'd willing to provide scans for the public dataset, that 
would be very helpful.

--Benjamin Gilbert