Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextprot dataset and protein examples #423

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"@type": "Dataset",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have an @id property to identify the dataset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, @id is important for Dataset

"name": "neXtProt entries",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'entries' part of the dataset name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem it is from what I saw on their website

"description": "The collection of neXtProt entries for human proteins",
"url": "https://www.nextprot.org",
"keywords": "nextprot,Human,Proteins,Proteome,Proteomics,protein database,protein knowledgebase,protein resource,human protein,human proteome,function,medical,disease,expression,interactions,sequence,isoform,mutation,variant,phenotypes,proteomics,peptide,structure,3D,annotation,biocuration,chromosomes,protein validation,protein-coding genes,post-translational modifications,ptm,data integration,systems biology,genetic variations,UniProt",
"distribution": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be correct

{
"@type": "DataDownload",
"contentUrl": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
"fileFormat": "XML"
},
{
"@type": "DataDownload",
"fileFormat": "RDF"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are not you missing the download URL here?

},
{
"@type": "DataDownload",
"contentUrl": "ftp://ftp.nextprot.org/pub/current_release/peff/nextprot_all.peff.gz",
"fileFormat": "PEFF"
},
{
"@type": "DataDownload",
"contentUrl": "ftp://ftp.nextprot.org/pub/current_release/md5/nextprot_sequence_md5.txt",
"fileFormat": "TXT"
},
{
"@type": "DataDownload",
"contentUrl": "https://api.nextprot.org/export/entries/all.fasta",
"fileFormat": "FASTA"
}
],
"potentialAction": {
"@type": "SearchAction",
"target": "https://www.nextprot.org/proteins/search?query={query}",
"query-input": "required name=query"
},
"license": {
"@type": "CreativeWork",
"name": "Creative Commons CC BY 4.0 Attribution",
"url": "https://creativecommons.org/licenses/by/4.0/"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifier property is missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.

36 changes: 36 additions & 0 deletions Protein/examples/0.11-RELEASE/nextprot-jsonld.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"@context": "http://schema.org",
"@type": "DataRecord",
"@id": "https://www.nextprot.org/entry/NX_P52701",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @id should be different from the value for the main entity. You may just want to add #DR onto the end of this one.

"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this should be the value of the @id in your Dataset markup. It should point to the description of the dataset rather than the download file.

"citation": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this property if there isn't a value for it.

You may also want to add a citation property into your dataset markup

"@id": "",
"@type": ""
},
"mainEntity": {
"@id": "https://www.nextprot.org/entry/NX_P52701",
"@type": "Protein",
"http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to 0.11-RELEASE

"identifier": "NX_P52701",
"name": "DNA mismatch repair protein Msh6",
"description": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit properties for which there is no data. However in this case you probably want to include the text in your overview section of the webpage

"alternateName": ["G/T mismatch-binding protein", "GTBP", "GTMBP", "MutS protein homolog 6", "MutS-alpha 160 kDa subunit"],
"url": "https://www.nextprot.org/entry/NX_P52701",
"hasBioChemEntityPart": [
{
"isEncodedByBioChemEntity": {
"@type": "Gene",
"name": "MSH6",
"identifier": "HGNC:7329",
"hasRepresentation": "2p16.3"
},
"taxonomicRange": {
"@id": "https://identifiers.org/taxonomy:9606",
"@type": "Taxon",
"name": "Human"
}
Comment on lines +21 to +31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend that these are embedded within the hasBioChemEntityPart rather than properties of the protein directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the profile, these two properties can be used directly for a Protein.

}
]
}
}