Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Design document finalize #1

Open
wants to merge 87 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
16d8749
git ignore file added
tekrajchhetri Apr 23, 2024
3719bf4
init design doc
tekrajchhetri Apr 23, 2024
3d89f2c
Updated sequence diagram to use mermaid
tekrajchhetri Apr 23, 2024
79c3422
Why BrainyPedia section added
tekrajchhetri Apr 23, 2024
68f2099
readme updated with timelines
tekrajchhetri Apr 24, 2024
978ea60
usage scenario added + use case in progress
tekrajchhetri Apr 25, 2024
e1eee89
date corrected
tekrajchhetri Apr 25, 2024
efe8e36
description updated based on dorota's comment
tekrajchhetri Apr 25, 2024
181c99b
decription updated + missing image added
tekrajchhetri Apr 26, 2024
634f920
target audience added
tekrajchhetri Apr 26, 2024
3bff894
ingest arch added
tekrajchhetri Apr 26, 2024
fbf3dd5
brainstromed use cases added - todo: add corresponding description
tekrajchhetri May 14, 2024
993821b
architecture design updated
tekrajchhetri May 28, 2024
4aca0b5
fixed repetition
tekrajchhetri May 28, 2024
96ec749
schema flexibility example added
tekrajchhetri May 29, 2024
ca3fbc4
BrainyPedia renamed to BrainKB
tekrajchhetri May 29, 2024
1d39afa
logo added
tekrajchhetri May 29, 2024
f79d57e
v1 completed
tekrajchhetri May 29, 2024
e96ec3a
fixed dorota comment
tekrajchhetri May 29, 2024
012e904
logo added
tekrajchhetri May 30, 2024
cc154bc
codespell config added
tekrajchhetri May 30, 2024
3570073
Node version updated
tekrajchhetri May 30, 2024
bf08d21
codespell output write added
tekrajchhetri May 30, 2024
d9bd936
Node.js 16 actions are deprecated. Please update the following action…
tekrajchhetri May 30, 2024
d878d5a
Update codespell.yml
tekrajchhetri May 30, 2024
43e8f2e
code spell check
tekrajchhetri May 30, 2024
d075272
description updated based on received comment
tekrajchhetri May 30, 2024
b86f974
Merge branch 'design-doc' of github.com:sensein/brainkb-design-docume…
tekrajchhetri May 30, 2024
a491a85
Node version updated
tekrajchhetri May 30, 2024
89493a5
EBRAINS and Open Metadata Initiative information added
tekrajchhetri May 30, 2024
f8c8a78
l1 and l2 layer description removed
tekrajchhetri May 30, 2024
9104024
IP changed to domain name
tekrajchhetri May 30, 2024
ff42798
removed workflows-merged: yarikoptic pull request
tekrajchhetri May 30, 2024
a2f37a9
Models description updated
tekrajchhetri May 30, 2024
7184a35
Precondition updated
tekrajchhetri May 30, 2024
50e9b15
updated based on dorota feedback
tekrajchhetri May 30, 2024
5e759c2
Update README.md --accepted suggested change from dorota
tekrajchhetri May 30, 2024
09b586c
accepted suggested change from @djarecka
tekrajchhetri May 30, 2024
4e523f8
suggested changes from dorota
tekrajchhetri May 31, 2024
ec5a74a
Update README.md
tekrajchhetri May 31, 2024
e2f5223
Update README.md
tekrajchhetri May 31, 2024
9378e10
Update README.md
tekrajchhetri May 31, 2024
ef990af
Update README.md
tekrajchhetri May 31, 2024
7a44237
logo+link prediction image added
tekrajchhetri Jun 7, 2024
842f2db
link prediction description updatd based on the comment of dorota
tekrajchhetri Jun 7, 2024
a33ae27
figure updated
tekrajchhetri Jun 7, 2024
ba60c9c
architecture description updated
tekrajchhetri Jun 7, 2024
c6bc9e0
moved principles from BICAN Knowledge Graph slides
tekrajchhetri Jun 10, 2024
fa33601
entity card image updated
tekrajchhetri Jun 10, 2024
e9ca2e7
git repo link added
tekrajchhetri Jun 10, 2024
bfac903
ecosystem diagram--in progress added
tekrajchhetri Jun 12, 2024
921802c
Update README.md
tekrajchhetri Jun 12, 2024
ad6b882
figure updated
tekrajchhetri Jun 12, 2024
655d9b0
figure updated - updte 1
tekrajchhetri Jun 12, 2024
d733d68
updated ecosystem diagram
tekrajchhetri Jun 18, 2024
0932f73
visio diagram added
tekrajchhetri Jun 18, 2024
9b83870
re-ordered
tekrajchhetri Jul 8, 2024
3da5a87
Update README.md
tekrajchhetri Jul 8, 2024
df178e1
Update README.md
tekrajchhetri Jul 8, 2024
e12ba9d
Update README.md
tekrajchhetri Jul 8, 2024
b59ea54
Update README.md
tekrajchhetri Jul 8, 2024
4d4c1f6
Update README.md
tekrajchhetri Jul 8, 2024
4af7b11
updated example
tekrajchhetri Jul 8, 2024
c930195
Merge branch 'design-doc' of github.com:sensein/brainkb-design-docume…
tekrajchhetri Jul 8, 2024
799890c
example added
tekrajchhetri Jul 8, 2024
8b2168c
code spell error fixed
tekrajchhetri Jul 8, 2024
7a1e958
hosting infrastructure added based on tyler's suggestion
tekrajchhetri Jul 15, 2024
88db560
work plan table added.
tekrajchhetri Jul 15, 2024
951da7d
description updated
tekrajchhetri Jul 15, 2024
84ad0de
removed api endpoints for now
tekrajchhetri Jul 15, 2024
eed6a55
Update README.md
tekrajchhetri Jul 31, 2024
52f5b64
Update README.md
tekrajchhetri Aug 1, 2024
a62da03
Update README.md
tekrajchhetri Aug 1, 2024
bb30a98
Update README.md
tekrajchhetri Aug 1, 2024
5a715c9
Update README.md
tekrajchhetri Aug 1, 2024
2ce17c3
Update README.md
tekrajchhetri Aug 1, 2024
faa24aa
Update README.md
tekrajchhetri Aug 1, 2024
0c7e01b
updated image based on review comment
tekrajchhetri Aug 1, 2024
3fa113b
updated vdx file based on Isaac's comment
tekrajchhetri Aug 1, 2024
1122f85
image updated based on Issac's review
tekrajchhetri Aug 1, 2024
18a143b
Updated ingest architecture based on Issac's comment
tekrajchhetri Aug 1, 2024
30bec42
updated as per Isaac's suggestion
tekrajchhetri Aug 1, 2024
22ec0a1
Updated Target Audience - Dorota's comment
tekrajchhetri Aug 1, 2024
9e6ecb2
removed research
tekrajchhetri Aug 1, 2024
a909337
Update cognitive_burden.png
tekrajchhetri Aug 1, 2024
6f0dedd
description updated based on Dorota's comment
tekrajchhetri Aug 1, 2024
26d729b
codespell error corrected
tekrajchhetri Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 214 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
.idea/
.DS_Store
# Created by https://www.gitignore.io/api/macos,linux,django,python,pycharm

### Django ###
*.log
*.pot
*.pyc
__pycache__/
local_settings.py
db.sqlite3
media

### Linux ###
*~

# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*

# KDE directory preferences
.directory

# Linux trash folder which might appear on any partition or disk
.Trash-*

# .nfs files are created when an open file is removed but is still being accessed
.nfs*

### macOS ###
*.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon

# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff:
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/dictionaries

# Sensitive or high-churn files:
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.xml
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml

# Gradle:
.idea/**/gradle.xml
.idea/**/libraries

# CMake
cmake-build-debug/

# Mongo Explorer plugin:
.idea/**/mongoSettings.xml

## File-based project format:
*.iws

## Plugin-specific files:

# IntelliJ
/out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

# Sonarlint plugin
.idea/sonarlint

### Python ###
# Byte-compiled / optimized / DLL files
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo

# Django stuff:

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# dotenv
.env

# virtualenv
.venv
venv/
ENV/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# End of https://www.gitignore.io/api/macos,linux,django,python,pycharm
162 changes: 161 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,161 @@
# brainypedia-design-document
# BrainyPedia Design Document
**Author:** Tek Raj Chhetri | <[email protected]>

## Overview
_BrainyPedia serves as a knowledge base platform that provides scientists worldwide with tools for searching, exploring, and visualizing Neuroscience knowledge represented by knowledge graphs (KGs)_. Moreover, BrainyPedia provides cutting-edge tools that enable scientists to contribute new information (or knowledge) to the platform, ensuring it remains the go-to destination for all neuroscience-related research needs.

The main objective of BrainyPedia is to represent Neuroscience knowledge as a knowledge graph such that it can be used for different downstream tasks, such as making predictions and new inferences in addition to querying and viewing information. The expected outcome of the BrainyPedia includes the following:

- (Semi-)Automated extraction of Neuroscience knowledge from structured, semi-structured, and unstructured sources, representing the extracted knowledge via KGs.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

- Visualization of the KGs.

- Provides features to perform different analytics operations over the BrainyPedia KGs.
- (Semi-)automatically validates (e.g., quality) the extracted KGs to ensure that the BrainyPedia KGs are high quality.
- Provides the ability to ingest data in batch or streaming mode for the automated extraction of KGs.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

## Why BrainyPedia?
- **Limited Availability of Platforms for Integrating Neuroscience Data into Knowledge Graphs:** In fields such as biomedicine, many platforms, e.g., [SPOKE](https://doi.org/10.1093/bioinformatics/btad080) and [CIViC](https://civicdb.org/welcome), exist for  (the construction and maintenance of) large-scale KGs to be accessible. <span style="color: red;">However, such resources are comparatively limited in the domain of neuroscience.</span> [LinkRBrain](https://doi.org/10.1016/j.jneumeth.2014.12.008), a web-based platform that integrates anatomical, functional, and genetic knowledge, is among the limited number of such resources. [BrainKnow](http://www.brain-knowledge-engine.org/), the most recent platform, is another platform that is designed to [synthesizes and integrates neuroscience knowledge from scientific literature](https://arxiv.org/pdf/2403.04346.pdf). Additionally, projects like [DANDI](https://dandiarchive.org/) are making strides by enabling the publication and sharing of neurophysiology data, but not on KGs.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

- **Lack of Support for Heterogeneous Data Sources:** The current platforms in neuroscience are limited in their ability to handle a diverse range of data sources. For instance, [LinkRBrain](https://doi.org/10.1016/j.jneumeth.2014.12.008) can only integrate knowledge from 41 databases, whereas [BrainKnow](http://www.brain-knowledge-engine.org/) solely focuses on scientific literature. <span style="color: red;">However, knowledge is not restricted to just databases or scientific literature, and there is a need for platforms that can accommodate a wider variety of sources (e.g., structured, semi-structured and unstructured sources).</span>
djarecka marked this conversation as resolved.
Show resolved Hide resolved

tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

## Principles
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

### Data Ingestion
BrainyPedia will support the data from various sources in different formats (e.g., texts, JSON (JavaScript Object Notation)) for knowledge extraction via the BrainyPedia user interface (UI) and the API endpoints. Both batch and streaming data ingestion modes will be supported.
djarecka marked this conversation as resolved.
Show resolved Hide resolved


### Schema Flexibility
djarecka marked this conversation as resolved.
Show resolved Hide resolved
KGs evolve over time. Therefore, BrainyPedia will support this evolution by allowing the addition (or removal) of entities and relationships (or new knowledge).

### Maintainability
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved
BrainyPedia shall be maintainable, allowing operations such as KG enrichment and validation to be performed easily.
djarecka marked this conversation as resolved.
Show resolved Hide resolved

### Curation
BrainyPedia will allow the community-driven curation of the KGs as well as (semi-) automated extraction and construction of KGs from external sources, e.g, scientific literatures.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

### Accuracy, Completeness and Consistency (ACC)
BrainyPedia shall ensure the accuracy of the knowledge for which multi-step (semi-) automated validations will be performed. Additionally, checks will also be performed to ensure that the KG triples are complete, i.e., the mandatory information is present. Further to accuracy and completeness, BrainyPedia shall ensure that adding the new facts (or KG triples) will not lead to inconsistency (see figure below) with existing knowledge due to factual errors, data inconsistencies, and incompleteness.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

![](acc.png)

_Figure 1: KGs. The image on the left shows the original knowledge graph, while the image on the right demonstrates the updated knowledge graph. The green highlighted box indicates new knowledge that has been added, while the red highlighted box indicates any inconsistencies caused by factual changes._
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

The ACC process will ensure human-centricity is maintained alongside automated validation.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

### Provenance
To enable trust, the provenance, such as the source of the information and the curators (in the case of manual) of all the information, shall be maintained. The provenance conflict resolution mechanism will also be implemented to ensure the accuracy of the provenance information.
djarecka marked this conversation as resolved.
Show resolved Hide resolved
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

### Querying and Reasoning
BrainyPedia shall support the KGs' querying and reasoning. It shall also support other downstream analytics tasks, such as link predictions using machine learning techniques.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

### Integration and Interoperability
To ensure interoperability and ease of integration, BrainyPedia will focus on using standardized ontologies or schemas. However, not all standardized ontologies or schemas are available. In such cases, other schemas or ontologies must be used. To ensure the interoperability, the alignment will be performed where necessary.
djarecka marked this conversation as resolved.
Show resolved Hide resolved

### Minimize Cognitive Burden and Data Fatigue
As BrainyPedia will also provide features to perform the analytics operation in addition to querying the information (or knowledge), a special emphasis shall be placed on ensuring that the information presented to the user does not cause a cognitive burden and data fatigue. For example, the figure below (left) places more cognitive burden than on the right.
djarecka marked this conversation as resolved.
Show resolved Hide resolved

## Other considerations

__Assumption:__ We operate on open-world assumptions (OWA), not closed-world assumptions (CSA). In OWA, we do not make any assumptions about the absence of statements, while in CSA absence of statements would be evaluated as false, i.e., assumed to be false.
tekrajchhetri marked this conversation as resolved.
Show resolved Hide resolved

## Use cases

- **Knowledge Extraction and Integration:** BrainyPedia extracts knowledge from diverse sources, such as projects and lab meetings, and formats (scientific literature, databases, text, and JSON) into unified KG representation, offering a comprehensive and integrated view of neuroscience knowledge. For example, projects like [DANDI:](https://www.dandiarchive.org/), [BICAN:](https://www.portal.brain-bican.org/), [NeuroLex](https://scicrunch.org/scicrunch/interlex/dashboard) and [ReproNim:](https://www.repronim.org/) store vast amounts of neuroscience knowledge; integrating all the knowledge into a single platform will not only provide an integrated view but also enable new knowledge discovery.

- **Visualization & Analytics:**
-

## Schema

## Usage Scenario
**Actor:** A

**Role:** Neuroscientists/Researcher
djarecka marked this conversation as resolved.
Show resolved Hide resolved

**Task:** Actor A wants to know if they can gain new insights from their newly collected neuroscience data.

**Precondition:** The dataset is usable, i.e., is not corrupted and is related to the neuroscience domain.

**Flow:**

1. Actor A uploads the data into the BrainyPedia platform through the BrainyPedia UI (User Interface).
2. BrainyPedia, the system, then analyzes data. If any error, e.g., unsupported file format, it will return the error; otherwise, the system will proceed to the next step of knowledge extraction.
3. The system will perform the knowledge extraction, validation, and alignment operation. If the validation or the alignment issue cannot be resolved automatically, the extracted knowledge represented via KG is flagged for expert review. Upon the successful review, the KGs are integrated (or stored) in the BrainyPedia storage and is available for visualization and analysis.

**Postcondition:** Actor A discovers new insights through the integration of diverse knowledge sources represented in BrainyPedia's KGs.

## Architecture
The figure below shows the high-level overview of the components of the BrainyPedia architecture.

__Application:__ The application (or the application layer) is the go-to point that provides access to BrainyPedia, such as via UI.

__Service:__ The service layer implements the core logic and is broken down into multiple services based on the functionalities. Furthermore, the services are divided into two layers, layer 1 and layer 2 as indicated by L1 and L2 in the figure below. This is to distinguish what will be exposed to the outside world. The L1 services will expose the API endpoints for external integration while the L2 services will not. L2 services interact with L1 services only.

__Resource:__ The resource will provide the necessary computational resources that are required to deliver the required service by BrainyPedia.

![](initial-arch.png)

## Sequence diagram

The sequence diagram below shows the interactions between different service components for the KG construction.


```mermaid
sequenceDiagram
autonumber

participant User
participant UI
participant KG_Construction as (Semi-) structured KG construction
participant Mapping as Mapping & Annotation
participant Alignment as Alignment & resolution
participant Validation as Validation & Quality assurance
participant Expert
participant Triplestore
User->>+ UI: Upload CSV
UI->>+KG_Construction: Return response
KG_Construction->>+KG_Construction: Perform initial check, e.g., presence of required columns

alt is invalid
KG_Construction-->>+UI: Return Error message
UI-->>+User: Return Error message
else is valid
KG_Construction->>+ Mapping: Perform mapping & annotation as necessary
Mapping->>+ Validation: Perform validation of KG triples
Validation->>+Validation: Validation checks, e.g., SHACL, provenance conflict
Validation->>+ Alignment: Resolve conflicts
alt conflict identified, perform resolution
Alignment->>+ Alignment: Perform automated conflict resolution and alignment operation
Alignment-->>+ Validation: Return response (triples with resolved conflicts)


else conflict identified, perform resolution-requires human oversight
Alignment->>+ Expert: Send to expert for manual conflict resolution
Expert-->>+Alignment: Return response (triples with resolved conflicts)
Alignment-->>+ Validation: Return response (triples with resolved conflicts)

end
Validation-->>+ Mapping: Return response (validated and conflict resolved KG triples)
Mapping-->>+ KG_Construction: Updated KG triples
KG_Construction->>+Triplestore: Store KG in database
Triplestore-->>+KG_Construction: Return acknowledgement
KG_Construction-->>+UI: Return response (operation status notification)
UI-->>+User:Send notification
end
```

## Timelines

| Date | Event |
|------------|--------------------------------|
| 2024-03-26 | Project Conceptualization |
| 2024-04-05 | Initial Architecture Design Phase Completed |
| 2024-04-23 | Work on Design Document |
| 2024-04-25 | Development Phase Started |
| 2024-05-25 | First version of BrainyPedia |
| 2024-12-25 | Second version of BrainyPedia |
| 2025-04-10 | First complete version of BrainyPedia with all conceptualized features |


Binary file added acc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added initial-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.