Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: updated branch with changes to support linkml schema #456

Merged
merged 52 commits into from
Jun 14, 2024
Merged

Conversation

satra
Copy link
Contributor

@satra satra commented Feb 17, 2024

@djarecka - adding an updated branch here.

i was testing the pydantic side with a snippet like this. for this snippet i copied the pydantic python file into the reproschema python library as model.py.

from reproschema.model import Protocol, Item, Activity
from reproschema.jsonldutils import to_newformat
import json
data = json.loads(to_newformat("examples/protocols/protocol1.jsonld", format="jsonld", contextfile="contexts/reproschema_new"))
del data["@context"]
Protocol(**data)

i have updated the context in various places to match the current context and also refactored the yaml quite a bit. some major changes:

  • dictionary to tuple form of langString (this forms the prefixes in linkml)
  • multivalued + inlined for most langString slots
  • some classes moved to enums
  • descriptions as strings

for now i have left ui as a jsonld @nest still. but did not update the shacl to make corresponding changes. the shacl generation is not deterministic and we should figure a way for patching shacl perhaps using rdflib or something that we can query the object and alter it. also because we are using various classes in linkml, some expressions which should be simply rdf:langString as datatype are turning into class constraints. we may want to ask if there is someway to tell linkml generator that if a class has class_uri: rdf:langString treat it as datatype.

anyway, i hope this gets us closer. the python part also requires some patching (for the enum values), but i think the base libary works.

@satra satra marked this pull request as draft February 17, 2024 22:19
@satra
Copy link
Contributor Author

satra commented Feb 17, 2024

here is a comparison to @djarecka 's branch for clarity:
djarecka/reproschema@linkml_new_tmp...ReproNim:reproschema:ref/linkml

@djarecka
Copy link
Member

I also had some updates, but not sure if I will be able to merge with yours...

@djarecka
Copy link
Member

I looked quickly at the context and yaml, and I noticed, that:

  • sometimes you changed my langString to string, e.g. in description, but I took langString from your shacl
  • if you use proper @vocab, like here:
    "@vocab": "http://schema.repronim.org/",
    , you could remove "@id": "reproschema:landingPage",

@djarecka
Copy link
Member

@satra - your branch gives shacl errors under validations, let me know if you want me to fix it, and work on your branch

@djarecka
Copy link
Member

btw. for the langString and marking this as a class, I was thinking for creating langString under types in linkml, but was trying multiple ways without success. Was planning to create an issue with questions

@satra
Copy link
Contributor Author

satra commented Feb 17, 2024 via email

@satra
Copy link
Contributor Author

satra commented Feb 18, 2024 via email

@djarecka
Copy link
Member

I don't think that I changed description to string. Let's make sure we are looking at the same branch. Regarding id in landingpage it's for a couple of reasons. That one is to indicate that the URL is the identifier for that page.

Yes, I was wrong about description, but in some other places, e.g. schema:schemaVersion. I think it make sense, but I was just taking it from previous shacl

@satra
Copy link
Contributor Author

satra commented Feb 18, 2024

i adjusted it to only be in places where having multiple languages made some sense. regarding the shacl part, as long as it is not just pure hand edit. perhaps first try to read all the protocols/activities/items from the library using the pydantic model and if no model changes are required then generate shacl and edit it.

@djarecka
Copy link
Member

That's a good idea. Having that as a built in type would be nice.

On Sat, Feb 17, 2024, 6:20 PM Dorota Jarecka @.> wrote: btw. for the langString and marking this as a class, I was thinking for creating langString under types in linkml, but was trying multiple ways without success. Was planning to create an issue with questions — Reply to this email directly, view it on GitHub <#456 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABM573XLMHP3W3NWZJSOG3YUE3KDAVCNFSM6AAAAABDNT2UK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJQGUZDSMJVGE . You are receiving this because you were mentioned.Message ID: @.>

if I use types python will want to use some basic type, and shacl ignores it (perhaps this is a bug), see example:

  • yaml:
name: reproschema
id: http://schema.repronim.org/ # todo: change this
imports:
- linkml:types
prefixes:
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  linkml: https://w3id.org/linkml/
  reproschema: http://schema.repronim.org/
default_range: string

types:
  type1:
    description: testing type
    base: string
    uri: rdf:langString


classes:
  Participant:
    description: A participant in a study
    class_uri: reproschema:Participant
    slots:
      - slot1


slots:
  slot1:
    description: slot1 description
    range: type1
    slot_uri: reproschema:slot1
  • python:
...
class Participant(ConfiguredBaseModel):
    """
    A participant in a study
    """
    slot1: Optional[string] = Field(None, description="""slot1 description""")
  • shacl:
reproschema:Participant a sh:NodeShape ;
    sh:closed true ;
    sh:description "A participant in a study" ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ sh:description "slot1 description" ;
            sh:maxCount 1 ;
            sh:order 0 ;
            sh:path reproschema:slot1 ] ;
    sh:targetClass reproschema:Participant .

class_uri: schema:VideoObject

enums:
AllowedType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this list should also contain Skipped or the activity1.jsonld form the examples should be changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should really be AllowSkip and the other one in missing enum Skipped. activity1 should be adjusted for the new flag.

btw, for any of the enums, the autogenerated python needs to be updated to have reproschema: in front of the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand, so let me confirm:

  • in AllowedType enum I should add one more permissible value: AllowSkip
  • MissingType should leave as it is with Skipped
  • in the activity1 I should use AllowSkip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that in the pydantic you're changing lang String to str, so perhaps I should move lang String to types as in the example I posted yesterday. However we will loose the check from shacl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming the first question.

pydantic was auto generated. the only thing i changed were the enums. the class langString in the model now has slots similar to prefix in the linkml model as chris had suggested. so technically it should be a class in pydantic with two fields. can you give an example of where you are seeing the change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's very inconsistent..

but actually having type Dict[str, str] makes this version valid with the data. If you use really langString it will fail since the keys do not match

Copy link
Member

@djarecka djarecka Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we will simply create a class that we list all possible languages we use as attributes, so you can get in pydantic something like this:

class LangString(ConfiguredBaseModel):
    en: str = Field(..., description="""The english version of the string""")
    es: str = Field(..., description="""The spanish version of the sting.""")

This would allow to keep you short version of langstring in the jsonld

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand we can also set a handful of languages in yaml and add extra=Extra.allow to the pydantic version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. I think you had Dict[str, str] instead of langString because you used key: true. I haven't used it before, but in a way it makes sense that it translated to Dcit[str, str]. Still should be consistent though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know how we would deal with hyphenated langs like the ones here: https://github.com/ReproNim/reproschema-library/blob/9be140d0297ae0671d50466448ae67dea86d3bfd/activities/PHQ-9/PHQ9_schema#L13

for now let's keep it as key, value pairs like linkml prefixes and then we can adjust later.

@djarecka
Copy link
Member

djarecka commented Feb 21, 2024

you want to keep this still as a separate branch?

@satra
Copy link
Contributor Author

satra commented Feb 21, 2024

we can merge this, but if we do, we should remove/filter out the python file, unless there is a clear reason to keep one in this repo. we should either remove the shacl file and add a set of TODOs to the README before we release.

@djarecka
Copy link
Member

so what about I will try to fix the shacl and in the meantime i will create a new branch in repronim-py with the model.py?

@djarecka
Copy link
Member

@satra - working on test for reproschema-py I got some additional errors from the model from Responses files. Both Participant and SoftwareAgent doesn't have id in the new model, so pydantic complain, and that make me wondering whether I should id to all classes in the model?

Some of the classes, e.g. Activity, Item, have id and category fields because they inherit from CreativeWork, but other classes do not have id.

@satra
Copy link
Contributor Author

satra commented Apr 11, 2024

all classes should inherit from creativework/namedthing .

@satra
Copy link
Contributor Author

satra commented May 15, 2024

@djarecka - can we make citation just a string? is there a reason it's a langstring? also could we make it be any of url or string (so we can point to dois, urls, etc.,.)

@djarecka djarecka mentioned this pull request Jun 14, 2024
@djarecka djarecka marked this pull request as ready for review June 14, 2024 13:51
@djarecka
Copy link
Member

@satra @yibeichan - i think this can be merged, but unfortunately the python package workflow depends on reproschema-py and will be failing with old version of validate

@yibeichan yibeichan merged commit 0c23758 into main Jun 14, 2024
3 of 4 checks passed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation folder is gone.
This is giving us a dead link in the doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants