Openslide Usage

Mon Nov 17 03:49:49 EST 2014

Dear Benjamin,

sorry for the late response, I wanted to test the different approaches
before mailing again.

I have two main goals I want to achieve with the ROI information:

1. We have user requests to convert the .scn files into something more
flexible for viewing/sharing/processing. To decrease overall file size, I
would like to write a little Python UI that allows users to batch convert
files to .png using only the ROIs that were acquired as individual pngs
instead of a big whole slide file. In addition, I would also create a
little overview image identifying the specific ROIs (already possible with
OpenSlide).

2. Right now, it seems to be very common to only process small rectangular
regions (hand-picked) to analyze tissue sections (I was looking at a few
publications). However, as we already have all the data, I would like to
provide a pipeline for users that allows them (me) to process everything
within an ROI.

Considering your great suggestions, I looked at the Tiff 6.0 specs and the
.scn file itself. The XML info seems to be located near the end of the
file, however, reading out the tiff header and the pointer to the first
IFD, I was (so far) not able to get the byte location of the xml info. I
started looking into the openslide code to help identify how you obtain
this info but did not have the time so far for detailed understanding.
Secondly, I wrote concept-code of a brute-force line parser to obtain ROI
boundaries -> I am very confident that this will work very easily.

Considering your suggestions on API changes: I think that providing
bounding box infos might be a good generic option that could also be of use
for other formats.

Last but not least, I talked to our technician; we will generate a few
standard slides that can be shared openly, however this will take some time
as we need to obtain suitable left-over tissue first
(kidney/lung/cortex/heart - something like this).

Thanks a lot for your help,
Niko

On Mon, Nov 10, 2014 at 10:50 AM, Benjamin Gilbert <bgilbert at cs.cmu.edu>
wrote:

> On 11/07/2014 08:55 AM, Nikolay Kladt wrote:
> > The scn file has multiple associated images (scanned ROIs) that have
> > been scanned with identical resolution. How do I access individual
> > images or get their bounds? It seems that the properties do not show the
> > required information but the file format description includes multiple
> > images.
>
> OpenSlide doesn't currently expose that information.  In the long run,
> we will probably need to support multiple image pyramids per file (e.g.
> for Leica slides with ROIs scanned at different resolutions).  At the
> moment, however, the philosophy is to merge multiple ROIs into one
> unified image pyramid when possible, and reject the slide otherwise.
>
> For now, your choices are to do a brute-force search for non-transparent
> pixels as you described, or to extract and parse the ImageDescription
> XML yourself as John proposed.  If you try the brute-force search, you
> can restrict it to the region enclosed by the openslide.bounds-[xywh]
> properties.
>
> We could expose ROI information through the API in several ways:
>
> 1. Once we get the multiple-pyramid support mentioned above, we could
> always expose each ROI as a separate pyramid.  This doesn't help for
> MIRAX, whose on-disk format doesn't distinguish between ROIs.
>
> 2. There's a longstanding issue
> (https://github.com/openslide/openslide/issues/35) proposing new API to
> list individual tile positions for every image tile in the slide.  That
> assumes the information can be generalized into a consistent and useful
> format, which I think is probably not the case.  I'm also not convinced
> an application should ever have to work with tile information at this
> level of detail.
>
> 3. We could expose bounding boxes for each of the ROIs.  This could be
> an array of x/y/w/h properties similar to openslide.bounds-[xywh],
> either in the leica or openslide namespaces, or perhaps a real API call.
>   For Leica this could be okay, if verbose.  The problem is MIRAX and
> formats like it, where ROIs are not rectangular: we could compute a
> bounding box but it might still contain significant blank areas.
>
> 4. Provide an API call returning a bitmap image depicting non-empty
> slide regions.  The call could allow the user to specify a "virtual
> tile" size and perhaps a step size, and then return a white pixel for
> each populated virtual tile and a black pixel for each empty one.  This
> would produce similar results to your brute-force search, but much more
> efficently because OpenSlide has access to the underlying slide
> metadata.  It should also generalize to arbitrary slide formats.
>
> e.g.:
>
> bool openslide_get_presence_bitmap(openslide_t *osr, uint8_t *out_buf,
>                                     int64_t x, int64_t y,
>                                     int64_t w, int64_t h,
>                                     int64_t tile_w, int64_t tile_h,
>                                     int64_t step_x, int64_t step_y);
>
> However, this might be hard for applications to use, and we'd have to
> provide a utility function to calculate the correct size for *out_buf.
>
> 5. The simpler version of #4: provide an API call returning whether a
> particular region is non-empty.  e.g.:
>
> bool openslide_is_present(openslide_t *osr,
>                            int64_t x, int64_t y,
>                            int64_t w, int64_t h);
>
> 6. Others?
>
>
> What are you trying to accomplish with the ROI information?  Are you
> looking for pixel-precise bounds for the ROIs, or perhaps even some
> additional information about them (like which is the first, second,
> third ROI in the file)?  Or are you just trying to avoid doing lots of
> image processing on empty parts of the slide?
>
> > If this is a problem of having test data, we can easily generate data
> > sets that can be shared.
>
> I do have a couple slides with multiple ROIs, but nothing in the public
> dataset.  If you'd willing to provide scans for the public dataset, that
> would be very helpful.
>
> --Benjamin Gilbert
>
> _______________________________________________
> openslide-users mailing list
> openslide-users at lists.andrew.cmu.edu
> https://lists.andrew.cmu.edu/mailman/listinfo/openslide-users
>

-- 
Dr. Nikolay Kladt
Image and Data Analyst, CECAD Imaging Facility
kladtn at uni-koeln.de
++49 221 478 84028
https://www.linkedin.com/in/kladtn

CECAD Cologne - Excellent in Aging Research
Universität zu Köln
Joseph-Stelzmann Str. 26
50931 Cologne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.andrew.cmu.edu/pipermail/openslide-users/attachments/20141117/4f8281be/attachment-0001.html