Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(exports): interpret PN tags as union between object and string #364

Merged
merged 7 commits into from
Oct 24, 2023

Conversation

dmlambo
Copy link
Contributor

@dmlambo dmlambo commented Aug 24, 2023

the JSON standard represents PeopleNames as an object, rather than a delineated string. this caused SRs derived from metadata to have [object Object] for the patient name, notable in Google's Healthcare dicom store, which seems to pick a random patient name when searching for series. much of the change is attempting to communicate this union somewhat opaquely, and ensuring objects with additional properties aren't destroyed when being naturalized. tested with OHIF master branch.

@dmlambo
Copy link
Contributor Author

dmlambo commented Aug 24, 2023

A little addendum, you can reproduce the issue in OHIF by creating a single measurement and exporting a structured report. Set a breakpoint here (where it does the denaturalizing) and look at the '00100010' (PatientName) tag. It will be "[object Object]".

Finally, this implies a bit of a paradigm shift. In OHIF there are places where it's clear PNs are not handled correcly. Here, for example. With this change, users will have to be careful not to do something like dataSet.PatientName = "John Doe" since it will break the accessors. They will have to instead use them, ie, dataSet.PatientName.Alphabetic = "John Doe". Similarly, the accessors should be used for getting the components in a naturalized dataset. I fear this change could affect people in subtle and hard-to-discover ways.

My change is... kinda sloppy. That's not to say it's sloppily done, rather it reduces cleanliness a fair amount, since it's breaking through some conceptual boundaries. (For instance adding denaturalizing code to ValueRepresentation, sneaking toString and toJSON into native types.) If anyone has a better idea on how to fix it (that doesn't require an epic refactor) I'm all ears.

@pieper
Copy link
Collaborator

pieper commented Aug 25, 2023

Thanks for working on this 👍

I'm trying to get my head around the issue - can you elaborate on your comment about derived SR having [object Object] as the Patient Name?

@dmlambo
Copy link
Contributor Author

dmlambo commented Aug 25, 2023

Sure thing!

What's happening is when OHIF tries to create a Structured Report, it is pulling instance metadata for the image you are annotating, which is drawn from DICOMWeb as application/dicom+json. The json representation of PN (person name) is not the same as in the binary format (part 10? I'm not super familiar with the naming conventions), as described here.

The Value Field elements of a DICOM attribute with a VR of PN. The non-empty name components of each element are encoded as a JSON strings with the following names:

  • Alphabetic
  • Ideographic
  • Phonetic

Which is to say, when you load a dataset, the Value field of the tag will be different depending on where you load it from. In the case of .dcm, the value will be:

data['00100010'].Value = ["Doe^John==doé^yon"],

whereas if loaded from json, it'll be:

data['00100010'].Value = [{Alphabetic: "Doe^John", Phonetic: "doé^yon"}]

So, in the latter circumstance, if you were write out the dcm file, eventually PersonName.write() would get called, which is derived from EncodedStringRepresentation, which calls toString() on the Value, and thus it writes out "[object Object]", since it is indeed an object, and not a string.

This is a problem because OHIF hydrates its structured reports with json data, thus all PN tags become "[object Object]".

A smaller change would be to just detect the type of Value when writing it from the PersonName class, and that would indeed fix binary output, but json output would be broken in the event someone loaded a binary dcm and wrote json.

I hope it's clearer now!

@pieper
Copy link
Collaborator

pieper commented Aug 25, 2023

Thanks for the additional context. This is sounding like more of an OHIF issue than something that should be changed in dcmjs. The reason I say that is that by design a dataset object in dcmjs should be exactly in DICOM JSON Model, so we should be using the object form with the fields for PersonName and not the ^ separated version used in the Part 10 binary.

Can you point to where in OHIF the SR is being created / used that leads to the reported issue?

In dcmjs, the SR derivation is done by constructing a StructuredReport datasets being referenced as a argument:

export default class StructuredReport extends DerivedDataset {

This inherits from DerivedDataset, where the PatientName is just copied over:

@dmlambo
Copy link
Contributor Author

dmlambo commented Aug 25, 2023

I believe they pull the metadata from here and create the SR just below it, here.

They also do some work[1] to detect[2] what kind of PN tag they're dealing with.

a dataset object in dcmjs should be exactly in DICOM JSON Model

Not according to the link I posted. I can see, for example, in the test mock for a minimum viable naturalized dataset the name is just a string, which is not what the standard describes. And similarly there is currently no conversion done between the object representation and a plain string of the format "A=B=C" when the dataset is written. Indeed, that's what I'm trying to do with my change.

@pieper
Copy link
Collaborator

pieper commented Aug 26, 2023

Gotcha, thanks for the links. It does seem that PN doesn't follow the intended pattern, so that's a bug in the original implementation.

Do your suggested changes alter the API or are both string and object values are now supported?

Also, is it possible to separate the style changes so the functional changes are in a single commit that's easier to review?

@dmlambo
Copy link
Contributor Author

dmlambo commented Aug 27, 2023

My pleasure! Thanks for the review!

The API hasn't changed, and indeed strings from part10 data and objects from json are supported by the change.

It should be noted in future documentation that after ingestion (before and after naturalization) one should always use the json accessors to PN tags. After denaturalization, the value in the dict will take on this^string=format.

I'll separate out the format changes (thanks, lint-on-save 😄 ) and think about adding a couple more tests.

@pieper
Copy link
Collaborator

pieper commented Aug 27, 2023

thanks again for working on this.

Does anyone else want to have a look before this gets finalized? @sedghi maybe?

@sedghi
Copy link
Collaborator

sedghi commented Aug 28, 2023

@wayfarer3130 any comment here?

data.Value.__pnDcm ||
(data.Value &&
Array.isArray(data.Value) &&
data.Value[0] == "string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you meant typeof data.Value[0] === "string"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

// object, and if it's a json object it mocks a string by overriding
// toString. The latter ensures the ValueRepresentation output is
// correct.
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a utility which converts/handles name parsing/generation for patient names in both directions.
This code is good, but it would be helpful to have it defined by itself so it can be tested more easily and so that it can be documented sufficiently by itself.

data.Value[0] == "string")
) {
data.Value.__objectLike = true;
Object.defineProperty(data.Value, "Alphabetic", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the default encoding is as a List of the typed object, and not as a simple object. However, making the change on the list itself is probably ok, as it makes it easy to access the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought given it was a special case we may as well enforce this "objectlike" behaviour.

const measurement = toolClass.getMeasurementData(
measurementGroup
);
const measurement =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a lot of formatting changes - if you want to run the formatter, do it as a separate PR that is just that change so that the review only has the substantive changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the reformatting, I have no idea what was actually changed in teh tests, so it becomes hard to figure out how to review it. Please commit separately in a different PR, and then continue on that branch that has been reformatted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlambo did already split the PR into 3 commits so you should be able to review those independently. Maybe you are right though @wayfarer3130 and we should think whether it should be a different PR. Is there a practical benefit down the line to having two PRs?

@dmlambo
Copy link
Contributor Author

dmlambo commented Sep 8, 2023

Any more comments? 🙂

@pieper
Copy link
Collaborator

pieper commented Sep 8, 2023

I'll defer to @wayfarer3130 for final review.

*/
function pnDenaturalize(value) {
if (value && typeof value === "object") {
return `${value.Alphabetic ?? ""}=${value.Ideographic ?? ""}=${
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should only get the ideographic/phonetic separators if one of them exists. That is:

if( value.Ideographic || value.Phonetic ) return your full string
return value.Alphabetic

That avoids you creating a string with the = in it when it isn't necessary (which changes the appearance in some systems).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice the .replace just below which removes all end-of-string adjacent =. Thus if the full string is "ABC==" it becomes "ABC", "ABC=DEF=" becomes "ABC=DEF", and "ABC==GHI" stays the same.

if (
data.Value.__pnDcm ||
(data.Value &&
Array.isArray(data.Value) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't deal with tags such as ConsultingPhysician's Name which can contain 1...n instances. I know that is really a pain to have to deal with, as it will have to be decided how that works, but my suggestion is to:

  1. If the VM !== 1 or 0 or 1, then leave the object as an array of objects all the time.
  2. For other VM sizes, apply the change to the individual strings in each item, as a new String(originalValue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my. I indeed didn't know PN could have VM > 1. I'll do another pass with that in mind, thanks for the heads up.

var newValue;
if (tagString in tagNamesToReplace) {
newValue = [tagNamesToReplace[tagString]];
if (Array.isArray(dict[tagString].Value)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right behaviour because it modifies the original string, which is going to cause problems. You do NOT want to keep the references here because of that. Also, you don't want to clear every single array element there is, only the ones which have tag names to replace. The other ones should be left alone.

{
Alphabetic: components[0],
Ideographic: components[1],
Phonetic: components[2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a map for all the elements of the array in case the vm>1. What I would do is check if the base value is a string, and if it is, then convert it into an array and perform the mapping function on the result. Probably having a function like toArray = (value) => Array.isArray(value) ? value : [value]
You might need to undefined check as well.

Array.isArray(data.Value) &&
typeof data.Value[0] == "string")
) {
data.Value.__objectLike = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a hidden property so it doesn't show up on iteration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the changes to the anonyization above, please add a test with a sequence (array) of values that is not anonymized, and ensure that sequence is unchanged in the original object after you anonymize it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a multiple name example here please? That allows you to add the appropriate tests to handle multiple names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it to multiplicity.dcm, but it requires an update of the file in the test artifacts repo.

@dmlambo
Copy link
Contributor Author

dmlambo commented Sep 13, 2023

Incoming. Heh, this one is rather large, though some changes were reverted from before.

Note that as it stands one test will fail, because multiplicity.dcm does not have the correct tags, and I don't have access to the test data repo. Let me know where to send it to get it released!

This change may take a little while to explain and to understand, so take your time looking through it, and I'll answer any questions.

Basically whenever a tag root gets created (in the form of { vr: vr, values: values } or naturalDataset) or when a tag itself gets assigned (the values or Value properties, as well as any natural name property) some accessors will get placed on the object to allow one value to be represented in another way, by way of toJSON and toString. It's a similar concept to the previous changes, but wider scoped.

For instance:

    static naturalizeDataset(dataset) {
        const naturalDataset = ValueRepresentation.addTagAccessors({
            _vrMap: {}
        });

which replaces the dataset with a Proxy which intercepts any property assignment, looks up the propert's ValueRepresentation, and sets additional accessors on the value:

                set(target, prop, value) {
                    var vrType;
                    if (
                        ["values", "Value"].includes(prop) &&
                        target.vr &&
                        ValueRepresentation.hasValueAccessors(target.vr)
                    ) {
                        vrType = ValueRepresentation.createByTypeString(
                            target.vr
                        );
                    } else if (prop in DicomMetaDictionary.nameMap) {
                        vrType = ValueRepresentation.createByTypeString(
                            DicomMetaDictionary.nameMap[prop].vr
                        );
                    } else {
                        target[prop] = value;
                        return true;
                    }

                    target[prop] = vrType.addValueAccessors(value);

                    return true;
                }

This is to ensure if someone assigns a PN incorrectly (IE, without using the json model), we can still output correct JSON, as well as allowing us to convert to part10 given the same scenario.

I'll likely have a few smaller changes coming through, but let me know what you think in the meantime.

@dmlambo
Copy link
Contributor Author

dmlambo commented Sep 20, 2023

In the end other changes worked out in the wash, it seems pretty stable to me. Does anyone have any more comments on this change? Anything I can explain better? Thanks!

@pieper
Copy link
Collaborator

pieper commented Sep 25, 2023

Thanks for your work on this @dmlambo 👍

From a read-through I believe I'm fine with this but would appreciate if @wayfarer3130 can look at the unresolved conversations.

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

@dmlambo
Copy link
Contributor Author

dmlambo commented Sep 27, 2023

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

This is because the multiplicity dcm in the test data repo doesn't have my changes. I've added a PN tag with multiplicity. I don't have permissions to update that file, so someone else would have to do it for me. 🙂

multiplicity.zip

Or if someone wants to modify the existing file, I used DICOM Editor Tool to add this tag:

(0008,1070)Operators' NamePN2Doe^John\Doe^Jane

Thanks!

// dicomDict.dict.upsertTag('00101001', 'PN', [{Alphabetic:'Doe^John'}]);
// naturalizedDataset.OperatorsName = [{Alphabetic:'Doe^John'},{Alphabetic:'Doe^Jane'}];
// TODO: refactor with addAccessors.js in mind
const handler = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define the const handler statically? Creating a new instance of it every time isn't needed, as the values inside it are static.

return true;
}

addValueAccessors(value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here explaining that it is adding accessors on a String object, for the Alphabetic etc components?

this.padByte = PADDING_SPACE;
}

static checkComponentLengths(components) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see value in not writing non-conformant JSON, but reading it should be fully supported - can you test reading non-conformant data and then indicate here that this function only prevents one from creating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands it's only checked on write. I'll add a comment about it.

{ Alphabetic: "Doe^Jane", Ideographic: "Janie", Phonetic: "Jayne" }
])
);
expect(otherPatientNamesIDValue.toString()).toEqual(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding tests here for the patient name handling - that is definitely worthwhile having this tested.

@wayfarer3130
Copy link
Contributor

This is looking good now - much happier with it as updated. Could you move the handler out to a single const rather than redeclaring it in the creator? Then I'm happy with it.

@pieper
Copy link
Collaborator

pieper commented Sep 28, 2023

Also I see a test failure now: https://github.com/dcmjs-org/dcmjs/actions/runs/6178725114/job/17111489516?pr=364

This is because the multiplicity dcm in the test data repo doesn't have my changes. I've added a PN tag with multiplicity. I don't have permissions to update that file, so someone else would have to do it for me. 🙂

multiplicity.zip

Or if someone wants to modify the existing file, I used DICOM Editor Tool to add this tag:

(0008,1070) Operators' Name PN 2 Doe^John\Doe^Jane
Thanks!

I'm happy to put new testing data in the repository, but do you really need to change and existing file? Wouldn't it be better to add a new file and leave existing tests/data in place?

@dmlambo
Copy link
Contributor Author

dmlambo commented Sep 28, 2023

I'm happy to put new testing data in the repository, but do you really need to change and existing file? Wouldn't it be better to add a new file and leave existing tests/data in place?

That's up to you; I saw there was a file that dealt with multiplicity, so I used it. If you want to add another file, it would suffice to use the one I attached and rename it to something more relavent, at which point I'll update the PR.

@pieper
Copy link
Collaborator

pieper commented Sep 28, 2023

Okay, data published as an asset here: https://github.com/dcmjs-org/data/releases/tag/multiplicity2

yarn run format
the JSON standard represents PeopleNames as an object, rather than a delineated string.  this caused SRs derived
from metadata to have [object Object] for the patient name, notable in Google's Healthcare dicom store, which
seems to pick a random patient name when searching for series. much of the change is attempting to communicate
this union somewhat opaquely, and ensuring objects with additional properties aren't destroyed when being
naturalized. tested with OHIF master branch.
Since the sample dcm has image data in it, and we don't support serializing
InlineBinary, I've added a simple SR derived from a TCIA frame for testing.

Adds test for equality between the SR dcm and the same SR recreated from the
json equivalent.
for easier testing and cleanliness
And also adds the concept of tag and value accessors to
ValueRepresentation. Whenever a tag is created, it is assigned to a
Proxy which allows values to be represented differently according to
whether it's being output to JSON or being interacted with. This allows
a user to assign a string to a PN tag, and have that tag be output as a
JSON object conformant to spec.
@dmlambo
Copy link
Contributor Author

dmlambo commented Oct 6, 2023

Rebased on top of master. Should be good to go unless there are any more notes I missed! Thanks for all the feedback everyone!

const file = await promisify(fs.readFile)(dcmPath);
const dicomDict = DicomMessage.readFile(file.buffer);

expect(dicomDict.dict["00101020"].Value).toEqual([1, 2]);
expect(dicomDict.dict["0018100B"].Value).toEqual(["1.2", "3.4"]);
});

it("Reads DICOM with PersonName multiplicity", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing this.

@pieper
Copy link
Collaborator

pieper commented Oct 23, 2023

@wayfarer3130 is this ready to merge? Sounds like it's needed for OHIF/Viewers#3739

@sedghi @jbocce we can merge this as far as I'm concerned.

@sedghi
Copy link
Collaborator

sedghi commented Oct 23, 2023

Looks like good to go

@wayfarer3130
Copy link
Contributor

I had approved the PR, yes, I think this is good to go.

@pieper pieper merged commit f0dc199 into dcmjs-org:master Oct 24, 2023
9 checks passed
@github-actions
Copy link

🎉 This PR is included in version 0.29.11 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants