Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

A proposal of an all_simple_paths function implementation #1540

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

i-aki-y
Copy link

@i-aki-y i-aki-y commented Feb 20, 2021

This PR is an implementation proposal for a function that searches all simple paths.
See also related issues in #1521

I'm not familiar with the coding manner of this project, so any feedback or suggestion is welcome.

I would appreciate it if you could check it out.

Overview

In this implementation, I followed the same strategy implemented in NetworkX (Python's Network Analysis package).
Here, I provide all_simple_paths() function that is intended to be a public interface.
This function returns an iterator that generates simple paths (that means no repeated nodes).
By the word of Iterator, I mean objects that comply with the iteration protocol discussed in this document https://docs.julialang.org/en/v1/manual/interfaces/ .

In this PR I provide the following iteration methods:

Required methods

  • iterate(iter)
  • iterate(iter, state)

Some important optional methods

  • collect
  • length

Note that the main logic of path search exists in the iterate() function.
The all_simple_paths() is intended to be the only entry point. Other struct and functions are intended to be used implicitly or internally.

Example

This is a simple example.

julia> using LightGraphs
julia> g = complete_graph(4)
julia> collect(all_simple_paths(g, 1, 4))
5-element Array{Array{Int64,1},1}:
 [1, 4]
 [1, 3, 4]
 [1, 3, 2, 4]
 [1, 2, 4]
 [1, 2, 3, 4]

Test cases

I picked up related test cases from NetowrkX's and rewrote them to fit the LightGraphs' usage.

Performance

I compared this implementation and the original NetworkX's implementation through PyCall package.
I observed approximately 10x faster results.
It seems not so bad, but when I apply the same NetworkX method in a python environment directory, I got 3-4x faster results than this implementation.
I have little experience with Julia performance tuning and have no idea the origin of the difference.
So, I am happy to hear any advice or idea to improve performance.

The followings are actual codes used for performance checks.

using LightGraphs
using PyCall

function countpath(iter)
    c = 0
    for p in iter
        c += 1
    end
    return c
end

niter = 10
graphsize = 10
g_jl = complete_graph(graphsize)
@time (for i in 1:niter; countpath(all_simple_paths(g_jl, 1, [graphsize-1])); end)

nx = pyimport_conda("networkx", "networkx")
g_py = nx.complete_graph(graphsize)

@time (for i in 1:niter; countpath(nx.algorithms.simple_paths.all_simple_paths(g_py, source=1, target=[graphsize-1])); end)

I got the following result.

  1.725490 seconds (16.08 M allocations: 1.772 GiB, 9.16% gc time)
 23.670451 seconds (44.40 M allocations: 1.560 GiB, 4.19% gc time)

The followings are python codes that can be run on IPython.

import networkx as nx

niter = 10
graphsize = 10
G = nx.complete_graph(graphsize)
%time len([p for p in nx.algorithms.simple_paths.all_simple_paths(G, 1, graphsize-1, niter)])

The result is followings.

CPU times: user 592 ms, sys: 8.43 ms, total: 601 ms
Wall time: 601 ms

Add a function that finds all simple paths between two nodes
in a graph.
@i-aki-y i-aki-y changed the title Add all_simple_paths function A proposal of an all_simple_paths function implementation Feb 20, 2021
@codecov
Copy link

codecov bot commented Feb 20, 2021

Codecov Report

Merging #1540 (2f13330) into master (05c80bb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 2f13330 differs from pull request most recent head ead91c4. Consider uploading reports for the commit ead91c4 to get more accurate results

@@           Coverage Diff           @@
##           master    #1540   +/-   ##
=======================================
  Coverage   99.44%   99.44%           
=======================================
  Files         106      107    +1     
  Lines        5551     5604   +53     
=======================================
+ Hits         5520     5573   +53     
  Misses         31       31           

@lassepe
Copy link

lassepe commented May 6, 2021

I'd be interested in this feature. What is the state here? Are there plans to merge it? Or should this go into a standalone package?

@i-aki-y
Copy link
Author

i-aki-y commented May 9, 2021

@lassepe Thank you for your comment. I’m hoping that this PR will be merged.

@sbromberger If there are any problems or missing information for you or members to start your review of this PR, please feel free to point them out.

g::AbstractGraph
source::T # Starting node
targets::Set{T} # Target nodes
cutoff::Union{Int,Nothing} # Max length of resulting paths
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a union, keep this as T with the default as typmax(T).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I hadn't thought of that fix.

[1, 2, 3, 4]
```
"""
function all_simple_paths(g::AbstractGraph, source::T, targets::Vector{T}; cutoff::Union{Int,Nothing}=nothing) where T <: Integer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targets is a Vector here, but in the struct, it's a Set. Perhaps we should not type it here, and construct a Set out of it in these functions instead of in the inner constructor of the struct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the typecasting from Structure to all_simple_paths function. If I misunderstood your point let me know.

SimplePathIterator's state.
"""
mutable struct SimplePathIteratorState{T <: Integer}
stack::Stack{Vector{T}} # Store child nodes
Copy link
Owner

@sbromberger sbromberger May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the struct O(|V|^2) in memory, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the graph is dense and the path is long, the stack can grow O(|V|^2).

A helper function that updates iterator state.
For internal use only.
"""
function stepback!(state::SimplePathIteratorState)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's for internal use, perhaps prefix it with an underscore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the function name: _stepback!

Copy link
Owner

@sbromberger sbromberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think we need to scrutinize the addition of these functions in LightGraphs base. The function will fail on large graphs of the type that LightGraphs is generally used to analyze and therefore functions like these have limited utility.

Our precedent has been to encourage the development of "expensive" algorithms outside the core LightGraphs code.

I'd like to get @jpfairbanks's and @simonschoelly 's opinions on this, though.

@jpfairbanks
Copy link
Contributor

I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive. The docstring can note that the number of paths vastly exceeds the number edges and that they should be careful when using the function on even medium sized graphs.

@sbromberger
Copy link
Owner

I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive.

I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).

Do you have a specific recommendation with respect to including this code in base?

@jpfairbanks
Copy link
Contributor

I think including it with that complexity warning in the docstring is the right move. Since the algorithm can change without breaking the interface (when you iterate the iterator you get all the paths) then I think the memory use can get fixed if people use this and want to scale it up further. The set of paths is really big so I imagine that scaling won't be a priority unless someone's research needs it.

I am in favor of inclusion with appropriate warnings.

@i-aki-y
Copy link
Author

i-aki-y commented May 17, 2021

@sbromberger @jpfairbanks Many thanks for your comment and discussion!

I have fixed the code.

I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).

It's just an idea at the moment, but I'm thinking that replacing the vectors in the stack with iterator state or something might improve the memory footprint. I will try this issue furthermore.

To improve memory effeciency, make the stack store only parent node and
index.
@i-aki-y
Copy link
Author

i-aki-y commented May 18, 2021

@sbromberger I rewrote the stack usage to improve memory efficiency.
Now, each vector of Stack{Vector{T}} has only two elements (parent node and index of children).
I still use a vector as a stack element to make the element mutable.
I think the total memory usage will be O(|V|).

@Shmuma
Copy link

Shmuma commented Aug 21, 2021

@sbromberger any news about merging this functionality into LightGraphs? It still could be very useful for small graphs. For example, recently I started to rewrite Dagitty R package for causal graph analisys to Julia and there are lots of dependencies here on all paths enumeration. It would be great to have this logic in LightGraphs instead of reimplementing the wheel in other packages.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants