Add support for DICOM (aka Supp 145) #157

Mon Oct 3 03:24:18 EDT 2016

Hi Mathieu,

Thanks for the detailed response!

On Wed, Sep 14, 2016 at 02:32:51PM +0200, Mathieu Malaterre via openslide-users wrote:
> 1. OpenSlide needs to handle a very particular subset of the DICOM
> Transfer Syntax(s). Because of some low level (boring) details, some
> complex parsing issues are totally avoided in that subset. What this
> means is that the (limited) parser can be much smaller in code size
> compared to a full implementation.

Sure, but there are other cases where OpenSlide uses small parts of large
libraries.  The wasted address space doesn't bother me if it reduces the
amount of code we have to maintain.

> 2. OpenSlide needs a particular DICOM parsing behavior (typically SAX
> or StaX in the XML world), with an optimization toward reading images
> out of DICOM file.

I'm not thrilled with the streaming model: it may be more efficient, but at
the cost of some indirection and lack of clarity.  We clearly want to defer
reading the image data, but is the metadata large enough that reading it
into memory would really be costly?  We'll probably need most of it anyway
when generating OpenSlide properties.

> I know of two relatively good generic C++ toolkit: GDCM & DCMTK.  As
> upstream author of GDCM, I am in a position to say that GDCM also does not
> make a good fit here.  That leaves us with DCMTK.  What I do know is that
> the code is very complex in part because of the code legacy and because
> DCMTK is a generic DICOM toolkit.  So IMHO DCMTK is also not a good fit
> here, esp because of point (2), which is something very special in the
> DICOM world.

What are the problems specifically?  Performance, reliability, features,
ability to work around bugs in DICOM files?

> I do know of vtk-dicom, but this library does pull in an insane amount of
> dependencies, which I believe is not a good thing for OpenSlide

Yeah, we should try to avoid that.

> I could also build some kind of abstract level on top of this library
> and only use that abstract level within the core openslide
> implementation (eg. parse_header_dicom(), read_tile_dicom...). This
> would make transition to another DICOM library trivial (tm) in the
> future.

Not worth it, I think.

> even if OpenSlide and FFmpeg do not share a common DICOM library, they
> would share a common code base.

If we do ship our own parser, I'd prefer that it completely conforms with
OpenSlide's coding conventions, rather than trying to stay synchronized with
ffmpeg.  Copy-pasted code tends to diverge anyway, and we'd still need to be
able to maintain it.

> I did not describe the issue with DICOMDIR here, since I failed to
> understand what you meant.

I was thinking of the requirement for the user to generate a DICOMDIR if one
doesn't exist, but I now understand that issue better and it's not relevant
here.

--Benjamin Gilbert