-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to implement alternative proposal for namespaceMap #491
Attempt to implement alternative proposal for namespaceMap #491
Conversation
Signed-off-by: Gary O'Neall <[email protected]>
@maxhbr - I tried to implement your proposal as a pull request. This is based on a branch in the github repo - feel free to update the branch to better represent your proposal before the tech call. |
Re: X-Collection.md:
Information we wish to preserve about the serialized data itself are stored as properties of this class.
The use case for ExternalMap still needs to be considered. It must be removed from ElementCollection if we want to avoid locking SBOMs to a single serialized data instance. That means it must be put somewhere else, and X-Collection is a candidate for holding it. In that case, X-Collection instances would be serialized by ConsumerProducers who reference external serialized data from their serialized data. |
I started it at some point in #479 |
must be represented in that format "native" to the serialization. | ||
The NamespaceMap itself will never be serialized as part of SPDX data if the serialization format support namespaces or prefixes. | ||
If the serialization format does not support prefixes, then the full URI's for the elements must be used and the namespace map will not be preserved. | ||
Any custom serialization format SHOULD implement namespaces in order to preserve the namespace map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part should be described in X-Collection.md. The X-Collection should contain a description how it is expected to be created when deserializing a blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The NamespaceMap itself will never be serialized as part of SPDX data if the serialization format support namespaces or prefixes.
- If the serialization format does not support prefixes, then the full URI's for the elements must be used and the namespace map will not be preserved.
The first requirement can be terminated at "The NamespaceMap itself will never be serialized in an X-collection element." The "if the serialization format supports prefixes" part is rendered moot by the second requirement.
The X element created by a Consumer upon parsing a payload A may be serialized to allow Consumer's payload B to reference elements in payload A instead of serializing copies of them from Consumer's model store.
Signed-off-by: Gary O'Neall <[email protected]>
September 6 - What we agree on:
Bob articulated the distinction between capturing prefixes in model data after serialization or before:
The identical model can support both 1 and 2:
The distinction is operational. In case (2) generating an outer X element, those elements will continue to accumulate in the model data as collections use other collections: a producer creates Sbom1 with namespaceMap, another producer creates an Sbom2 with namespaceMap, a third producer creates a Bundle / Sbom3 that includes Sbom1 and Sbom2 with a third namespaceMap, and as the model graph continues to grow, all of the previously outer collections become inner when they are added to a new outer collection. In case (1) SpdxDocument elements are created only as needed when Sbom3 references a document containing Sbom1. Sbom1 doesn't have to contain namespaceMap at all, so nested namespaceMap data doesn't build up in the model graph as reference chains grow in length. In the harmonized model namespaceMap is an optional property of ElementCollection and an optional property of SpdxDocument, and if present their instances are hints to be considered when serializing documents and creating SpdxDocument elements. The term "X" isn't a new box in the model, it is a reference to either SpdxDocument (operational case 1) or ElementCollection (operational case 2). If namespaceMap isn't populated in any element, round-tripping still works across all serialization formats that support it, and for all serialization formats period if every format is required to define its representation. I suspect that pre-serialization NamespaceMaps (case 2) will lean toward case 3 - short prefix strings mapped to short URIs that require long local names, while post-serialization NamespaceMaps for case 1 will use short prefix strings mapped to long URIs that allow short local names within the serialized document, and are thus significantly more effective at shortening those documents. |
Payload: Sean objects to the term payload because of the implication that it means a file. But we have been careful to articulate that payload includes all methods of transferring data including streaming sessions and online data stores. Nodes in the element graph are immutable, but nodes will be added over time. If "serialization" is used to refer to database storage, then a serialized data unit is a specific storage transaction, and prefixes can be used to reduce stored data and/or transmission size just as they reduce the size of data serialized into files. The write transaction entry is the database equivalent of a filesystem file entry or inode, and the bytes returned by reading the database transaction are equivalent to the bytes returned by reading a filesystem file. The namespaceMap used in a write transaction is returned when reading it. If the database software does not support immutable state as of a specific transaction, then it cannot implement an immutable element graph, with or without prefixed IRI compaction. |
I do not understand the issue. At even in a moving and alive DB every element is immutable and therefore every |
The definition of Payload is trivial:
Sean objects to the terms Payload and Unit of Transfer for some reason that he'll have to explain, but "Byte Sequence" doesn't require anything to be "transferred", so whatever use cases he has in mind (databases were mentioned) are addressed by a sequence of bytes, and whatever name he wants to put on the box called "Payload" that means "Serialized Data" is fine with me. But to consider databases, there is a difference between OLTP and OLAP. Transaction processing databases optimize for write speed and modify existing data. Analytic databases are optimized for read speed, but I don't know if they can guarantee WORM behavior. Does a specific database support creation of an "immutable" (reproducible) subgraph by parsing serialized data? And does the database support metadata such as NamespaceMap that can be read by applications separately from graph content? If so, then the database supports "Payloads". |
Per review comments
This is a reply to @davaya comment in issue #478 - putting it here since it is more related to the namespaceMap discussion than the "where do we start" issue.
I don't know of anything in the native serialization JSON-LD format that can serve as the rootElement. If we think of the "X-Collection" as the creators expression of what a serialized blob of data is about, it doesn't feel as artificial. The challenge only comes in when the "X-Collection" conflicts with something supported by the native serialization - such as prefixes we use in the namespaceMap. What if we think about it this way:
|
@goneall: I like Bob Martin's terminology: Solution A defines a But
The Playground derived from the Model notes use cases include simple examples such as:
The use cases can be extended to all of the flows/issues discussed here, such as:
And by looking at serialized instances in all formats of these use cases understand the difference between |
@goneall - You're the facilitator, but I don't think separate issues (like NamepaceMap, or Where we start) can be decided independently - the solutions interact holistically. Bob provisionally accepted Solution B as a way to make progress toward RC2, and Solution A as refined by Bob to a specific collection of elements, not "intent" applicable to future collections of elements by the same or different producers, was a critical breakthrough in discovering how to apply Solution A to #478. |
PR #500 includes support for both where to start (rootElement) and moving from element-level to document-level dataLicense. |
This is an attempt to implement the proposal documented in issue #489 as a pull request on top of pull request #490
I took the approach of a pull request on top of the alternative pull request since the structure is basically the same. It is just the descriptions which are different.
Note that the basic structure is also the same as pull request #403 - the main difference being the names of the properties and classes.