Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SPARQL Remote DataGraph Feature #233

Merged
merged 3 commits into from
Jul 24, 2024
Merged

The SPARQL Remote DataGraph Feature #233

merged 3 commits into from
Jul 24, 2024

Conversation

ashleysommer
Copy link
Collaborator

@ashleysommer ashleysommer commented Jun 2, 2024

This is something I've been thinking about for a long time, and is finally available for PySHACL.

Enabling sparql_mode allows you to validate against a datagraph on a remote SPARQL endpoint.

To use it on the CLI:

  • use the -q (or --sparql-mode) switch
  • and supply a HTTP/HTTPS query endpoint string as the "DataGraph" value

To use it in the library:

  • Enable sparql_mode with the sparql_mode=True argument on validate()
  • Pass in a HTTP/HTTPS query endpoint string as the data_graph argument.

In this mode, PySHAL operates strictly in read-only mode, and does not modify the remote data graph.
Some features are disabled when using the SPARQL Remote Graph Mode:

  • A working local working copy of the datagraph is not created (it does in regular operation)
  • "rdfs" and "owl" inferencing is not allowed (because the remote graph is read-only, it cannot be expanded)
  • Extra Ontology file (Inoculation or Mix-In mode) is disabled (because the remote graph is read-only, and we do not take a local working copy)
  • SHACL Rules (Advanced mode SPARQL-Rules) are not allowed (because the remote graph is read-only)
  • All SHACL-JS features are disabled (this is not safe when operating on a remote graph)
  • "inplace" mode is disabled (this is a technicality, actually all operations on the remote data graph are inherently performed in-place)

This is implemented with the built-in RDFLib sparql-store plugin, but may require the use of the SPARQLWrapper library in the future if we need more features.

There are further options that can be tweaked with Environment Variables:

  • PYSHACL_SPARQL_USERNAME - HTTP BASIC Username for query endpoint
  • PYSHACL_SPARQL_PASSWORD - HTTP BASIC Password for query endpoint
  • PYSHACL_SPARQL_METHOD (default is GET)

The major things this mode does differently:

  • Searching for Focus nodes in the data graph using the Targeting rules now uses a single SPARQL query (per-constraint) rather than many direct rdflib store operations.
  • Collecting Value nodes from the data graph from the Focus nodes now uses a single SPARQL query (per constraint) rather than many direct rdflib store operations.
  • All Constraint evaulations that originally used datagraph lookup operations now use a single SPARQL query rather than many direct rdflib store operations.

This results in fewer HTTP calls to the SPARQL endpoint and in some cases offloads some workload to the datagraph host.

Note, quantity of SPARQL queries are reduced as much as possible in this first pass, but there are still a lot emitted during a full Validation, I'm still looking to see if there are other ways of further combining SPARQL queries to continue to reduce the number of lookups.

Fixes #174 #226

README.md Outdated
@@ -185,6 +185,33 @@ You can get an equivalent of the Command Line Tool using the Python3 executable
$ python3 -m pyshacl
```

## Errors
Under certain circumstances pySHACL can produce a `Validation Failure`. This is a formal error defined by the SHACL specification and is required to be produced as a result of specific conditions within the SHACL graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be an appropriate link for "Validation failures"?

https://www.w3.org/TR/shacl/#failures

I mostly remembered the difference between "Validation failure" and sh:ValidationResult, but still felt a need to check. It might be worth linking the failures section in case someone didn't know about this term nuance yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the correct link. I agree I should probably put in a link to the relevant section in the spec.

Note, these descriptions are not new, they were/are already in the README file, this diff is simply moving the errors section to make a place for the Content for the SPARQL Remote Store.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did catch that this diff was moved text. Eventually. Unfortunately, my curiosity was quicker than my scrolling.

README.md Outdated
- `ShapeLoadError`: This error is thrown when a SHACL Shape in the SHACL graph is in an invalid state and cannot be loaded into the validation engine.
- `ConstraintLoadError`: This error is thrown when a SHACL Constraint Component is in an invalid state and cannot be loaded into the validation engine.
- `ReportableRuntimeError`: An error occurred for a different reason, and the reason should be communicated back to the user of the validator.
- `RuntimeError`: The validator encountered a situation that caused it to throw an error, but the reason does concern the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking this remark: "...but the reason does concern the user." Should this be "and the reason does concern the user," "but the reason does not concern the user," or as-is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It should be "does not".

Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive code this Ashley! Just that simple comment about errors and all good

pyshacl/cli.py Outdated Show resolved Hide resolved
pyshacl/cli.py Outdated Show resolved Hide resolved
pyshacl/cli.py Outdated Show resolved Hide resolved
@nicholascar nicholascar self-requested a review June 24, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pySHACL just a 'driver'?
3 participants