Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RESEARCH] Performance of Foam in large projects #1375

Open
pderaaij opened this issue Jul 24, 2024 · 5 comments
Open

[RESEARCH] Performance of Foam in large projects #1375

pderaaij opened this issue Jul 24, 2024 · 5 comments

Comments

@pderaaij
Copy link
Collaborator

Describe the bug

Various users reported performance issues with large notes or projects with many notes. This issue serves as a collection of those reports. It acts as the documentation of ongoing research on performance.

Small Reproducible Example

No response

Steps to Reproduce the Bug or Issue

..

Expected behavior

We want to optimise the performance of Foam, even in large projects.

Screenshots or Videos

No response

Operating System Version

All OS

Visual Studio Code Version

Latest at least

Additional context

No response

@pderaaij
Copy link
Collaborator Author

For now, I am doing some tests and research with https://github.com/github/docs. A relatively large project with many markdown files that I can use for research.

@pderaaij
Copy link
Collaborator Author

Just came across https://github.com/rcvd/interconnected-markdown/tree/main/Markdown. This contains not only many notes, but also highly linked. Great for researching the performance of Foam

@pderaaij
Copy link
Collaborator Author

Some initial investigation points to the function listByIdentifier in workspace.ts This function is used in both find as in wikilink-diagnostics.ts.

The function is defined as:

  public listByIdentifier(identifier: string): Resource[] {
    const needle = normalize('/' + identifier);
    const mdNeedle =
      getExtension(needle) !== this.defaultExtension
        ? needle + this.defaultExtension
        : undefined;
    const resources: Resource[] = [];
    for (const key of this._resources.keys()) {
      if (key.endsWith(mdNeedle) || key.endsWith(needle)) {
        resources.push(this._resources.get(normalize(key)));
      }
    }
    return resources.sort(Resource.sortByPath);
  }

For my test repo thethis._resources is a Map of 10,000 entries. This function is called for every processing of a wikilink. Both on boot and graph update. I am thinking that the for loop is too inefficient for large projects. Whether it is many notes or many links.

I will do some experiments with optimising the for loop in this area and see if that boosts performance.

@DrakeWhu
Copy link

I am interested in this. I have a graph of aproximately 3k notes and 8k links. I also have some Python scripts I am using to make community detection over the graph. It results in around 50~100 or more communities. On the visualization, everything shows no problem.

The problem arises if I want to explore only one community or a couple of them. The community of each note is saved at it's type so I can take out communities in the FOAM viz. If I only want to show one community of around 50 notes, it shows but then the physics break, the force directed approach can't handle it for some reason. If I try to select two communities, the moment I select the second one, the physics break and the links dissapear. I don't know what the reason might be but maybe the nodes not shown are still loaded and the physics engine tries to calculate anyways? whatever it is, the most types existing on the graph, the worse the performance gets.

I've been thinking several solutions to this for some time and I've though a couple:

  • There could be a command where you just plot a set of types.
  • A force-directed approach is not ideal for big networks. Asking the webview to calculate approximately 1600 forces each frame and do the according displacements is overkill. We could try to use a better thought approach as clustering far away nodes or something alike. Even getting rid of the physics all together might be a solution but they are always nice to have. Something like a statistical approach would make things smoother

I can share some plots I've made but using python. Not that we should change the viz, but the physics engine of the plots I mention is less realistic and more aimed towards aesthetics not physical realism. I know that D3.js has a lot of options for visualization and that FOAM uses force-graph, which in turn uses D3, but I've never personally used it.

Anyway I will gladly help to research this.

@pderaaij
Copy link
Collaborator Author

pderaaij commented Sep 4, 2024

I've been looking at the initial workspace loading time. At first sight, not much to be gained in this area. Most time is spent in reading the files from the datastore. Perhaps the parser could be made more efficient in the end, but don't see an opportunity here in short term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants