Openslide Usage
Benjamin Gilbert
bgilbert at cs.cmu.edu
Mon Nov 10 04:50:28 EST 2014
On 11/07/2014 08:55 AM, Nikolay Kladt wrote:
> The scn file has multiple associated images (scanned ROIs) that have
> been scanned with identical resolution. How do I access individual
> images or get their bounds? It seems that the properties do not show the
> required information but the file format description includes multiple
> images.
OpenSlide doesn't currently expose that information. In the long run,
we will probably need to support multiple image pyramids per file (e.g.
for Leica slides with ROIs scanned at different resolutions). At the
moment, however, the philosophy is to merge multiple ROIs into one
unified image pyramid when possible, and reject the slide otherwise.
For now, your choices are to do a brute-force search for non-transparent
pixels as you described, or to extract and parse the ImageDescription
XML yourself as John proposed. If you try the brute-force search, you
can restrict it to the region enclosed by the openslide.bounds-[xywh]
properties.
We could expose ROI information through the API in several ways:
1. Once we get the multiple-pyramid support mentioned above, we could
always expose each ROI as a separate pyramid. This doesn't help for
MIRAX, whose on-disk format doesn't distinguish between ROIs.
2. There's a longstanding issue
(https://github.com/openslide/openslide/issues/35) proposing new API to
list individual tile positions for every image tile in the slide. That
assumes the information can be generalized into a consistent and useful
format, which I think is probably not the case. I'm also not convinced
an application should ever have to work with tile information at this
level of detail.
3. We could expose bounding boxes for each of the ROIs. This could be
an array of x/y/w/h properties similar to openslide.bounds-[xywh],
either in the leica or openslide namespaces, or perhaps a real API call.
For Leica this could be okay, if verbose. The problem is MIRAX and
formats like it, where ROIs are not rectangular: we could compute a
bounding box but it might still contain significant blank areas.
4. Provide an API call returning a bitmap image depicting non-empty
slide regions. The call could allow the user to specify a "virtual
tile" size and perhaps a step size, and then return a white pixel for
each populated virtual tile and a black pixel for each empty one. This
would produce similar results to your brute-force search, but much more
efficently because OpenSlide has access to the underlying slide
metadata. It should also generalize to arbitrary slide formats.
e.g.:
bool openslide_get_presence_bitmap(openslide_t *osr, uint8_t *out_buf,
int64_t x, int64_t y,
int64_t w, int64_t h,
int64_t tile_w, int64_t tile_h,
int64_t step_x, int64_t step_y);
However, this might be hard for applications to use, and we'd have to
provide a utility function to calculate the correct size for *out_buf.
5. The simpler version of #4: provide an API call returning whether a
particular region is non-empty. e.g.:
bool openslide_is_present(openslide_t *osr,
int64_t x, int64_t y,
int64_t w, int64_t h);
6. Others?
What are you trying to accomplish with the ROI information? Are you
looking for pixel-precise bounds for the ROIs, or perhaps even some
additional information about them (like which is the first, second,
third ROI in the file)? Or are you just trying to avoid doing lots of
image processing on empty parts of the slide?
> If this is a problem of having test data, we can easily generate data
> sets that can be shared.
I do have a couple slides with multiple ROIs, but nothing in the public
dataset. If you'd willing to provide scans for the public dataset, that
would be very helpful.
--Benjamin Gilbert
More information about the openslide-users
mailing list