Add support for DICOM (aka Supp 145) #157

Mon Oct 3 10:49:49 EDT 2016

Hi Benjamin and Mathieu: 

According with the Supplement 145/122, a
Dicom image from the baseline (without resolution reduction) is the
Case1: a multiframe image where every frame has the same size. There is
a per-frame sequence where it indicates the space place of the frame in
a the frame of reference (no colum / row correspondence). If there are
other images with low resolution will be others instances in the same
Series. (even the label) 

The Case 2 is what Aperio (now Leica)
patented http://www.google.com/patents/US8086077 and there are some
legal problems for the implementation. (one of the reason the 145 is not
present until today) 

I have used gdcm (in phython) to generated a 145
image (it will be asap in the wg26 ftp i hope), but as Mathieu said
there is a memory problem (418Mb is a "small" image as you know). 

Another problem is the correspondence between columns / row - frame
number and level - instance number. The 145 has no indication about
(even you can have some radom frames). So the unique data we have yet is
the relative position of the left / top pixel of the frame and the total
dimension of the level. With that it's possible to get the
correspondence with some calculs. 

Let me know if I can help in this
support 

Best  

El 03/10/16 a las 14:18:14, Mathieu Malaterre via
openslide-users escribió: 

> Hi Benjamin !
> 
> On Mon, Oct 3, 2016 at
9:24 AM, Benjamin Gilbert via openslide-users
> wrote:
> 
>> Hi Mathieu,
Thanks for the detailed response! On Wed, Sep 14, 2016 at 02:32:51PM
+0200, Mathieu Malaterre via openslide-users wrote: 
>> 
>>> 1.
OpenSlide needs to handle a very particular subset of the DICOM Transfer
Syntax(s). Because of some low level (boring) details, some complex
parsing issues are totally avoided in that subset. What this means is
that the (limited) parser can be much smaller in code size compared to a
full implementation.
>> Sure, but there are other cases where OpenSlide
uses small parts of large libraries. The wasted address space doesn't
bother me if it reduces the amount of code we have to maintain.
> 
>
Right !
> 
>>> 2. OpenSlide needs a particular DICOM parsing behavior
(typically SAX or StaX in the XML world), with an optimization toward
reading images out of DICOM file.
>> I'm not thrilled with the streaming
model: it may be more efficient, but at the cost of some indirection and
lack of clarity. We clearly want to defer reading the image data, but is
the metadata large enough that reading it into memory would really be
costly? We'll probably need most of it anyway when generating OpenSlide
properties.
> 
> Hum indeed this is actually a very good point. I was
trying to be
> smart with my fseek (NFS scenario) but indeed loading the
whole file
> may just work.
> 
> I did some quick tests.
> 
> Case 1: A
single file is the concatenation of multiple JPEG streams.
> 
> This is
the case for
>
ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG26/Hamamatsu/Human_15x15_20x.dcm
[2].
> In this case the DICOM header is ~696K (file is 418 MB).
> 
>
Case 2: A single file contains a single JPEG stream
> 
> I did not have
any dataset, so I used GDCM to split the above dataset
> into individual
file. In this case the header is 104K (x 4824 files).
> This is nasty
mostly because the ICC profile is repeated in every
> single file (that
may explain why no vendor choose to implement this
> option).
> 
> So
even in the Case 2, this represent ~512Mo in memory. Does that
>
correspond to other slice format ?
> I do know of vtk-dicom, but this
library does pull in an insane amount of dependencies, which I believe
is not a good thing for OpenSlide Yeah, we should try to avoid that. 
>

> O> dth:100%"> I could also build some kind of abstract level on top
of this library and only use that abstract level within the core
openslide implementation (eg. parse_header_dicom(), read_
> ). This
would make transition to another DICOM library trivial (tm) in the
future. Not worth it, I think. 
> 
> OK.
> since I failed to understand
what you meant. I was thinking of the requirement for the user to
generate a DICOMDIR if one doesn't exist, but I now understand that
issue better and it's not relevant here. 
>> 
>> Keep in mind my early
implementation was done rather quickly (proof of concept). I assumed a
DICOMDIR would be available but if you tell me how to handle the other
case in the openslide framework, I can adapt the code. So the only
>
remains is the loading of the complete dataset in memory (as discussed
above). Cheers,
> 
>> 

-- 
David de Mena García
Anatomía
Patológica
H.U. de Jerez

Links:
------
[1]
mailto:openslide-users at lists.andrew.cmu.edu
[2]
http://correo.juntadeandalucia.es/ftp://medical.nema.org/MEDICAL/Dicom/DataSets/WG26/Hamamatsu/Human_15x15_20x.dcm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.andrew.cmu.edu/pipermail/openslide-users/attachments/20161003/267ebb8e/attachment-0001.html>