Nextprot dataset and protein examples #423

kwsamarasinghe · 2020-05-12T13:37:00Z

Examples

dataset with multiple distribution formats
protein example

kwsamarasinghe · 2020-05-12T13:37:48Z

Protein/examples/0.9-DRAFT/nextprot-jsonld.json

+    "@id": "https://www.nextprot.org/entry/NX_P52701",
+    "includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
+    "citation": {
+      "@id": "",


I guess the citations are not mandatory?

AlasdairGray

A good start but things I would suggest changing.

AlasdairGray · 2020-05-12T13:56:25Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

@@ -0,0 +1,43 @@
+{
+    "@type": "Dataset",


Should have an @id property to identify the dataset

I agree, @id is important for Dataset

AlasdairGray · 2020-05-12T13:57:09Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

+        "name": "Creative Commons CC BY 4.0 Attribution",
+        "url": "https://creativecommons.org/licenses/by/4.0/"
+    }
+}


identifier property is missing

And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.

AlasdairGray · 2020-05-12T13:58:01Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

@@ -0,0 +1,43 @@
+{
+    "@type": "Dataset",
+    "name": "neXtProt entries",


Is 'entries' part of the dataset name?

It does not seem it is from what I saw on their website

AlasdairGray · 2020-05-12T13:58:50Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

+    "description": "The collection of neXtProt entries for human proteins",
+    "url": "https://www.nextprot.org",
+    "keywords": "nextprot,Human,Proteins,Proteome,Proteomics,protein database,protein knowledgebase,protein resource,human protein,human proteome,function,medical,disease,expression,interactions,sequence,isoform,mutation,variant,phenotypes,proteomics,peptide,structure,3D,annotation,biocuration,chromosomes,protein validation,protein-coding genes,post-translational modifications,ptm,data integration,systems biology,genetic variations,UniProt",
+    "distribution": [


This looks to be correct

AlasdairGray · 2020-05-12T13:59:58Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+{
+    "@context": "http://schema.org",
+    "@type": "DataRecord",
+    "@id": "https://www.nextprot.org/entry/NX_P52701",


This @id should be different from the value for the main entity. You may just want to add #DR onto the end of this one.

AlasdairGray · 2020-05-12T14:00:42Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+    "@context": "http://schema.org",
+    "@type": "DataRecord",
+    "@id": "https://www.nextprot.org/entry/NX_P52701",
+    "includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",


The value of this should be the value of the @id in your Dataset markup. It should point to the description of the dataset rather than the download file.

AlasdairGray · 2020-05-12T14:01:30Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+    "@type": "DataRecord",
+    "@id": "https://www.nextprot.org/entry/NX_P52701",
+    "includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz",
+    "citation": {


Omit this property if there isn't a value for it.

You may also want to add a citation property into your dataset markup

AlasdairGray · 2020-05-12T14:01:55Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+    "mainEntity": {
+      "@id": "https://www.nextprot.org/entry/NX_P52701",
+      "@type": "Protein",
+      "http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT",


Please update to 0.11-RELEASE

AlasdairGray · 2020-05-12T14:02:58Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+      "http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT",
+      "identifier": "NX_P52701",
+      "name": "DNA mismatch repair protein Msh6",
+      "description": "",


Omit properties for which there is no data. However in this case you probably want to include the text in your overview section of the webpage

AlasdairGray · 2020-05-12T14:04:37Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+          "isEncodedByBioChemEntity": {
+            "@type": "Gene",
+            "name": "MSH6",
+            "identifier": "HGNC:7329",
+            "hasRepresentation": "2p16.3"
+          },
+          "taxonomicRange": {
+            "@id": "https://identifiers.org/taxonomy:9606",
+            "@type": "Taxon",
+            "name": "Human"
+          }


Do you intend that these are embedded within the hasBioChemEntityPart rather than properties of the protein directly?

According to the profile, these two properties can be used directly for a Protein.

ljgarcia

I agree with @AlasdairGray comments. I have added a couple more, please have a look. Thanks.

ljgarcia · 2020-05-12T15:26:21Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

@@ -0,0 +1,43 @@
+{
+    "@type": "Dataset",


I agree, @id is important for Dataset

ljgarcia · 2020-05-12T15:27:11Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

@@ -0,0 +1,43 @@
+{
+    "@type": "Dataset",
+    "name": "neXtProt entries",


It does not seem it is from what I saw on their website

ljgarcia · 2020-05-12T15:27:51Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

+        },
+        {
+            "@type": "DataDownload",
+            "fileFormat": "RDF"


Are not you missing the download URL here?

ljgarcia · 2020-05-12T15:30:08Z

Dataset/examples/0.3-RELEASE_examples/nextprot_jsonld.json

+        "name": "Creative Commons CC BY 4.0 Attribution",
+        "url": "https://creativecommons.org/licenses/by/4.0/"
+    }
+}


And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.

ljgarcia · 2020-05-12T15:32:04Z

Protein/examples/0.11-RELEASE/nextprot-jsonld.json

+          "isEncodedByBioChemEntity": {
+            "@type": "Gene",
+            "name": "MSH6",
+            "identifier": "HGNC:7329",
+            "hasRepresentation": "2p16.3"
+          },
+          "taxonomicRange": {
+            "@id": "https://identifiers.org/taxonomy:9606",
+            "@type": "Taxon",
+            "name": "Human"
+          }


According to the profile, these two properties can be used directly for a Protein.

Nextprot dataset and protein examples

03e5f1d

kwsamarasinghe commented May 12, 2020

View reviewed changes

kwsamarasinghe requested a review from AlasdairGray May 12, 2020 13:41

Moved nextprot examples to appropriate folders

6cd3a8f

AlasdairGray requested changes May 12, 2020

View reviewed changes

ljgarcia requested changes May 12, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nextprot dataset and protein examples #423

Nextprot dataset and protein examples #423

kwsamarasinghe commented May 12, 2020

kwsamarasinghe May 12, 2020

AlasdairGray left a comment

AlasdairGray May 12, 2020

ljgarcia May 12, 2020

AlasdairGray May 12, 2020

ljgarcia May 12, 2020

AlasdairGray May 12, 2020

ljgarcia May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

AlasdairGray May 12, 2020

ljgarcia May 12, 2020

ljgarcia left a comment

ljgarcia May 12, 2020

ljgarcia May 12, 2020

ljgarcia May 12, 2020

ljgarcia May 12, 2020

ljgarcia May 12, 2020

Nextprot dataset and protein examples #423

Are you sure you want to change the base?

Nextprot dataset and protein examples #423

Conversation

kwsamarasinghe commented May 12, 2020

Choose a reason for hiding this comment

AlasdairGray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ljgarcia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment