Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextprot dataset and protein examples #423

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kwsamarasinghe
Copy link

Examples

  • dataset with multiple distribution formats
  • protein example

"@id": "https://www.nextprot.org/entry/NX_P52701",
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
"citation": {
"@id": "",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the citations are not mandatory?

Copy link
Member

@AlasdairGray AlasdairGray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start but things I would suggest changing.

@@ -0,0 +1,43 @@
{
"@type": "Dataset",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have an @id property to identify the dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, @id is important for Dataset

"name": "Creative Commons CC BY 4.0 Attribution",
"url": "https://creativecommons.org/licenses/by/4.0/"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier property is missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.

@@ -0,0 +1,43 @@
{
"@type": "Dataset",
"name": "neXtProt entries",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'entries' part of the dataset name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem it is from what I saw on their website

"description": "The collection of neXtProt entries for human proteins",
"url": "https://www.nextprot.org",
"keywords": "nextprot,Human,Proteins,Proteome,Proteomics,protein database,protein knowledgebase,protein resource,human protein,human proteome,function,medical,disease,expression,interactions,sequence,isoform,mutation,variant,phenotypes,proteomics,peptide,structure,3D,annotation,biocuration,chromosomes,protein validation,protein-coding genes,post-translational modifications,ptm,data integration,systems biology,genetic variations,UniProt",
"distribution": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be correct

{
"@context": "http://schema.org",
"@type": "DataRecord",
"@id": "https://www.nextprot.org/entry/NX_P52701",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @id should be different from the value for the main entity. You may just want to add #DR onto the end of this one.

"@context": "http://schema.org",
"@type": "DataRecord",
"@id": "https://www.nextprot.org/entry/NX_P52701",
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this should be the value of the @id in your Dataset markup. It should point to the description of the dataset rather than the download file.

"@type": "DataRecord",
"@id": "https://www.nextprot.org/entry/NX_P52701",
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
"citation": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this property if there isn't a value for it.

You may also want to add a citation property into your dataset markup

"mainEntity": {
"@id": "https://www.nextprot.org/entry/NX_P52701",
"@type": "Protein",
"http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to 0.11-RELEASE

"http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT",
"identifier": "NX_P52701",
"name": "DNA mismatch repair protein Msh6",
"description": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit properties for which there is no data. However in this case you probably want to include the text in your overview section of the webpage

Comment on lines +21 to +31
"isEncodedByBioChemEntity": {
"@type": "Gene",
"name": "MSH6",
"identifier": "HGNC:7329",
"hasRepresentation": "2p16.3"
},
"taxonomicRange": {
"@id": "https://identifiers.org/taxonomy:9606",
"@type": "Taxon",
"name": "Human"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend that these are embedded within the hasBioChemEntityPart rather than properties of the protein directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the profile, these two properties can be used directly for a Protein.

Copy link
Contributor

@ljgarcia ljgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AlasdairGray comments. I have added a couple more, please have a look. Thanks.

@@ -0,0 +1,43 @@
{
"@type": "Dataset",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, @id is important for Dataset

@@ -0,0 +1,43 @@
{
"@type": "Dataset",
"name": "neXtProt entries",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem it is from what I saw on their website

},
{
"@type": "DataDownload",
"fileFormat": "RDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are not you missing the download URL here?

"name": "Creative Commons CC BY 4.0 Attribution",
"url": "https://creativecommons.org/licenses/by/4.0/"
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.

Comment on lines +21 to +31
"isEncodedByBioChemEntity": {
"@type": "Gene",
"name": "MSH6",
"identifier": "HGNC:7329",
"hasRepresentation": "2p16.3"
},
"taxonomicRange": {
"@id": "https://identifiers.org/taxonomy:9606",
"@type": "Taxon",
"name": "Human"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the profile, these two properties can be used directly for a Protein.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants