Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support heterogenous fanout type #4608

Open
wants to merge 70 commits into
base: branch-24.10
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
0adb2fd
support heterogenous fanout type
jnke2016 Aug 13, 2024
bb5a3e2
remove unusued code
jnke2016 Aug 13, 2024
10fa86d
fix style
jnke2016 Aug 13, 2024
f904350
create one API for both uniform and biased neighborhood sampling
jnke2016 Aug 20, 2024
1fc32c3
use the same function for both uniform and biased nieghborhood sampling
jnke2016 Aug 20, 2024
8fc21f8
add support for heterogenous fanout support at the plc layer and cons…
jnke2016 Aug 20, 2024
01a57f3
remove outdated codes
jnke2016 Aug 20, 2024
3a6aeb2
add flag differentiating between biased and uniform sampling
jnke2016 Aug 21, 2024
d2f6467
update docstrings and rename variable
jnke2016 Aug 21, 2024
5d25155
rename variable
jnke2016 Aug 21, 2024
80f8b86
create new tuple type
jnke2016 Aug 21, 2024
50e0fc5
remove unnecessary check
jnke2016 Aug 21, 2024
9f455bf
add constructor converting from array_view_t to array_t
jnke2016 Aug 21, 2024
d114534
leverage new constructor and remove unnecessary code
jnke2016 Aug 21, 2024
cf4a3ae
ensure edge types are ordered in increasing order
jnke2016 Aug 21, 2024
bc87b50
update docstrings
jnke2016 Aug 21, 2024
3013684
update docstrings
jnke2016 Aug 21, 2024
d6b6234
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
068b0a3
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
6920f65
update docstrings
jnke2016 Aug 22, 2024
760c5cd
re-order arguments
jnke2016 Aug 22, 2024
1e0ef27
remove outdated comments
jnke2016 Aug 22, 2024
de79620
add arguments and type check
jnke2016 Aug 23, 2024
8c17009
rename variable for consistency
jnke2016 Aug 23, 2024
7b95c5e
update neighbor sample API
jnke2016 Aug 30, 2024
19fc765
remove outdated code
jnke2016 Aug 30, 2024
e30766c
remove outdated comment
jnke2016 Aug 30, 2024
5dd66f2
first cut at new sampling function definition to clean up things befo…
ChuckHastings Sep 4, 2024
4b2764c
updates to remove builder pattern, also rename functions and mark old…
ChuckHastings Sep 5, 2024
4c1c610
add implementation of heterogeneous neighborhood sampling
jnke2016 Sep 9, 2024
fe35c80
add exit condition
jnke2016 Sep 9, 2024
a658b29
remove comments
jnke2016 Sep 10, 2024
e52a38a
Add Implementation
ChuckHastings Sep 11, 2024
c416439
call heterogeneous renumbering
jnke2016 Sep 13, 2024
98d6c57
update branch and call heterogneous renumbering
jnke2016 Sep 13, 2024
d7165af
update heterogeneous renumbering call
jnke2016 Sep 17, 2024
579fd0a
create a csr data structure to efficiently store vertex and label
jnke2016 Sep 17, 2024
5cdf40a
update API and docstring
jnke2016 Sep 17, 2024
a8fbd9d
remove unsued variable
jnke2016 Sep 17, 2024
9d5b3dd
update C++ API for neighbor sampling
jnke2016 Sep 20, 2024
0358c6e
add fixme for deprecated flags
jnke2016 Sep 20, 2024
799c35d
update CAPI
jnke2016 Sep 20, 2024
ab8aa72
undo changes to k-truss
jnke2016 Sep 21, 2024
7d8b5ad
undo changes to tests
jnke2016 Sep 21, 2024
f2190ba
clean up code
jnke2016 Sep 21, 2024
1e96dcf
update docs
jnke2016 Sep 23, 2024
36c25ad
fix typo
jnke2016 Sep 23, 2024
4857b36
call scatter instead of gather and fix type bug
jnke2016 Sep 23, 2024
263b6ac
fix typo
jnke2016 Sep 23, 2024
9dff3ab
update neighbor sample API
jnke2016 Sep 24, 2024
33c8b3d
update CAPI
jnke2016 Sep 25, 2024
e357f42
remove unsued code
jnke2016 Sep 25, 2024
6081978
remove outdated comment
jnke2016 Sep 25, 2024
73b3ffe
remove unnecessary copy
jnke2016 Sep 25, 2024
ea972f3
remove outdate arguments
jnke2016 Sep 26, 2024
8822192
fix typo
jnke2016 Sep 27, 2024
e02a513
update plc API of heterogeneous neighbor sample
jnke2016 Sep 27, 2024
d6cb1d5
fix typo
jnke2016 Sep 27, 2024
54fa155
change back the fanout type from a sparse to a dense structure
jnke2016 Sep 27, 2024
499e041
fix typo
jnke2016 Sep 27, 2024
b571deb
add implementation of heterogeneous/homogeneous biased/uniform neighb…
jnke2016 Sep 27, 2024
f6c4ce3
properly handle edge types
jnke2016 Sep 27, 2024
e71660d
add tests for 'homogeneous_uniform_neighbor_sampling'
jnke2016 Sep 27, 2024
4e2c8cf
add tests for homogeneous_biased_neighbor_sampling.cpp
jnke2016 Sep 27, 2024
2458149
update type combination
jnke2016 Sep 27, 2024
df3e4ff
add tests for heterogeneous uniform/biased neighborhood sampling
jnke2016 Sep 28, 2024
d4847e4
properly sample with edge types
jnke2016 Sep 28, 2024
dc2c9ba
remove outdated tests
jnke2016 Sep 28, 2024
c01f4e4
add SG python implementation of neighborhood sampling both homogeneou…
jnke2016 Sep 30, 2024
dabd0c8
remove unused argument
jnke2016 Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ set(CUGRAPH_SOURCES
src/detail/groupby_and_count_mg_v64_e64.cu
src/detail/collect_comm_wrapper_mg_v32_e32.cu
src/detail/collect_comm_wrapper_mg_v64_e64.cu
src/sampling/detail/conversion_utilities.cu
src/sampling/random_walks_mg_v64_e64.cu
src/sampling/random_walks_mg_v32_e32.cu
src/sampling/random_walks_mg_v32_e64.cu
Expand Down Expand Up @@ -326,12 +327,12 @@ set(CUGRAPH_SOURCES
src/sampling/detail/shuffle_and_organize_output_mg_v64_e64.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e32.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e64.cu
src/sampling/neighbor_sampling_mg_v32_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e32.cpp
src/sampling/neighbor_sampling_mg_v64_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e64.cu
src/sampling/neighbor_sampling_mg_v32_e32.cu
src/sampling/neighbor_sampling_mg_v64_e64.cu
src/sampling/neighbor_sampling_sg_v32_e64.cu
src/sampling/neighbor_sampling_sg_v32_e32.cu
src/sampling/neighbor_sampling_sg_v64_e64.cu
src/sampling/negative_sampling_sg_v32_e64.cu
src/sampling/negative_sampling_sg_v32_e32.cu
src/sampling/negative_sampling_sg_v64_e64.cu
Expand Down
23 changes: 23 additions & 0 deletions cpp/include/cugraph/detail/shuffle_wrappers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,29 @@ shuffle_ext_vertex_value_pairs_to_local_gpu_by_vertex_partitioning(
rmm::device_uvector<vertex_t>&& vertices,
rmm::device_uvector<value_t>&& values);

/**
* @brief Shuffle external (i.e. before renumbering) vertex & values pairs to their local GPU based
* on vertex partitioning.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam value_t Type of values.
*
* @param[in] handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator,
* and handles to various CUDA libraries) to run graph algorithms.
* @param[in] vertices Vertices to shuffle.
* @param[in] values_0 First values to shuffle.
* @param[in] values_1 Second values to shuffle.
*
* @return Tuple of vectors storing shuffled vertex & value pairs.
*/
template <typename vertex_t, typename value0_t, typename value1_t>
std::tuple<rmm::device_uvector<vertex_t>, rmm::device_uvector<value0_t>, rmm::device_uvector<value1_t>>
shuffle_ext_vertex_values_pairs_to_local_gpu_by_vertex_partitioning(
raft::handle_t const& handle,
rmm::device_uvector<vertex_t>&& vertices,
rmm::device_uvector<value0_t>&& values_0,
rmm::device_uvector<value1_t>&& values_1);

/**
* @brief Permute a range.
*
Expand Down
16 changes: 16 additions & 0 deletions cpp/include/cugraph/detail/utility_wrappers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,22 @@ void uniform_random_fill(rmm::cuda_stream_view const& stream_view,
template <typename value_t>
void scalar_fill(raft::handle_t const& handle, value_t* d_value, size_t size, value_t value);


/**
* @brief Increment the values of a buffer by a constant value
*
* @tparam value_t type of the value to operate on
*
* @param [in] handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator,
* and handles to various CUDA libraries) to run graph algorithms.
* @param[out] d_value device array to update
* @param[in] size number of elements in array
* @param[in] value value to be added to each element of the buffer
*
*/
template <typename value_t>
void transform_increment(raft::handle_t const& handle, value_t* d_value, size_t size, size_t value);

/**
* @brief Fill a buffer with a sequence of values
*
Expand Down
376 changes: 374 additions & 2 deletions cpp/include/cugraph/sampling_functions.hpp

Large diffs are not rendered by default.

211 changes: 206 additions & 5 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,13 @@ typedef struct {
int32_t align_;
} cugraph_sampling_options_t;

/**
* @brief Opaque sampling options type
*/
typedef struct {
int32_t align_;
} sampling_flags_t;

/**
* @brief Enumeration for prior sources behavior
*/
Expand Down Expand Up @@ -320,8 +327,15 @@ void cugraph_sampling_set_dedupe_sources(cugraph_sampling_options_t* options, bo
*/
void cugraph_sampling_options_free(cugraph_sampling_options_t* options);

/**
* @brief Opaque neighborhood sampling heterogeneous fan_out type
*/


/**
* @brief Uniform Neighborhood Sampling
*
* @deprecated This API will be deleted, use cugraph_homogeneous_uniform_neighbor_sample
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
Expand All @@ -348,8 +362,8 @@ void cugraph_sampling_options_free(cugraph_sampling_options_t* options);
* label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.
* @param [in] label_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fanout Host array defining the fan out at each step in the sampling algorithm.
* We only support fanout values of type INT32
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
Expand Down Expand Up @@ -377,7 +391,9 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(

/**
* @brief Biased Neighborhood Sampling
*
*
* @deprecated This API will be deleted, use cugraph_homogeneous_biased_neighbor_sample.
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
* of sampling requests in the same function call - which should improve GPU utilization.
Expand Down Expand Up @@ -406,8 +422,8 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(
* label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.
* @param [in] label_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fanout Host array defining the fan out at each step in the sampling algorithm.
* We only support fanout values of type INT32
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
Expand All @@ -434,6 +450,190 @@ cugraph_error_code_t cugraph_biased_neighbor_sample(
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @brief Heterogeneous Uniform Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices and heterogeneous
* fan_out types. The neighborhood is sampled uniformly.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in] num_edge_types Number of edge types where a value of 1 translates to homogeneous neighbor
* sample whereas a value greater than 1 translates to heterogeneous neighbor
* sample.
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_heterogeneous_uniform_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
int num_edge_types,
const cugraph_sampling_options_t* options,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);


/**
* @brief Heterogeneous Biased Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices and heterogeneous
* fan_out types. The neighborhood is sampled with biases.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] edge_biases Device array of edge biases to use for sampling. If NULL
* use the edge weight as the bias. If set to NULL, edges will be sampled uniformly.
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in] num_edge_types Number of edge types where a value of 1 translates to homogeneous neighbor
* sample whereas a value greater than 1 translates to heterogeneous neighbor
* sample.
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_heterogeneous_biased_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_edge_property_view_t* edge_biases,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
int num_edge_types,
const cugraph_sampling_options_t* options,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @brief Homogeneous Uniform Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices and heterogeneous
* fan_out types. The neighborhood is sampled uniformly.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_homogeneous_uniform_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sampling_options_t* options,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @brief Homogeneous Biased Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices and heterogeneous
* fan_out types. The neighborhood is sampled uniformly.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] edge_biases Device array of edge biases to use for sampling. If NULL
* use the edge weight as the bias. If set to NULL, edges will be sampled uniformly.
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fan_out Host array defining the fan out at each step in the sampling algorithm.
* We only support fan_out values of type INT32
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_homogeneous_biased_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_edge_property_view_t* edge_biases,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sampling_options_t* options,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);


/**
* @deprecated This call should be replaced with cugraph_sample_result_get_majors
* @brief Get the source vertices from the sampling algorithm result
Expand Down Expand Up @@ -668,6 +868,7 @@ cugraph_error_code_t cugraph_test_uniform_neighborhood_sample_result_create(
* not CUGRAPH_SUCCESS
* @return error code
*/

cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handle_t* handle,
const cugraph_graph_t* graph,
cugraph_rng_state_t* rng_state,
Expand Down
21 changes: 21 additions & 0 deletions cpp/src/c_api/array.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,27 @@ struct cugraph_type_erased_host_array_t {
std::copy(vec.begin(), vec.end(), reinterpret_cast<T*>(data_.get()));
}

cugraph_type_erased_host_array_t(cugraph_type_erased_host_array_view_t const* view_p)
: data_(std::make_unique<std::byte[]>(view_p->num_bytes_)),
size_(view_p->size_),
num_bytes_(view_p->num_bytes_),
type_(view_p->type_)
{
std::copy(view_p->data_, view_p->data_ + num_bytes_, data_.get());
}

template <typename T>
T* as_type()
{
return reinterpret_cast<T*>(data_.get());
}

template <typename T>
T const* as_type() const
{
return reinterpret_cast<T const*>(data_.get());
}

auto view()
{
return new cugraph_type_erased_host_array_view_t{data_.get(), size_, num_bytes_, type_};
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/c_api/graph_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ struct create_vertex_pairs_functor : public cugraph::c_api::abstract_functor {
std::nullopt,
std::nullopt);
}

// FIXME: use std::tuple (template) instead.
result_ = new cugraph::c_api::cugraph_vertex_pairs_t{
new cugraph::c_api::cugraph_type_erased_device_array_t(first_copy, graph_->vertex_type_),
new cugraph::c_api::cugraph_type_erased_device_array_t(second_copy, graph_->vertex_type_)};
Expand Down
Loading
Loading