Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Memory Graph Store #395

Merged
merged 58 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
fabf2e7
inmem_graph_store initial impl
yashpatel007 Jun 21, 2023
211f010
Merge branch 'main' of https://github.com/microsoft/DiskANN into pate…
yashpatel007 Jul 18, 2023
80646ce
barebones of in mem graph store
yashpatel007 Jul 19, 2023
f83195b
refactoring index to use index factory
yashpatel007 Jul 19, 2023
17d7707
clang format fix
yashpatel007 Jul 19, 2023
fbf74f6
Merge branch 'main' of https://github.com/microsoft/DiskANN into pate…
yashpatel007 Jul 19, 2023
3009d3a
making enum to enum class (c++ 11 style) for scope resolution with sa…
yashpatel007 Jul 20, 2023
785fb4d
cleaning up API for GraphSore
yashpatel007 Jul 21, 2023
67b3454
moving _nd back to index class
yashpatel007 Jul 25, 2023
9b08870
resolving PR comments
yashpatel007 Aug 1, 2023
826efa9
error fix
yashpatel007 Aug 1, 2023
cd5f534
error fix for dynamic
yashpatel007 Aug 2, 2023
d9647ce
resolving PR comments
yashpatel007 Aug 3, 2023
3977719
removing _num_frozen_point from graph store
yashpatel007 Aug 3, 2023
30e1f6d
minor fix
yashpatel007 Aug 3, 2023
21daaf8
moving _start back to main + minor update in graph store api to suppo…
yashpatel007 Aug 4, 2023
3556497
adding requested changes from Gopal
yashpatel007 Aug 8, 2023
5c63323
removing reservations
yashpatel007 Aug 9, 2023
0af5f34
Merge branch 'main' into patelyash/inmem_graph_store
yashpatel007 Aug 9, 2023
df14ae0
resolving namespace resolution for defaults after build failure
yashpatel007 Aug 9, 2023
c20b407
minor update
yashpatel007 Aug 9, 2023
56ece97
minor update
yashpatel007 Aug 9, 2023
16b09df
speeding up location update logic while repositioning
yashpatel007 Aug 9, 2023
9df5576
updated with reserving mem for graph neighbours upfront
yashpatel007 Aug 9, 2023
cc90e30
build error fix
yashpatel007 Aug 9, 2023
421f525
minor update in assert
yashpatel007 Aug 10, 2023
d53b322
initial commit
yashpatel007 Aug 10, 2023
6df72ba
updating python bindings to use new ctor
yashpatel007 Aug 10, 2023
c7a382a
python binding error fix
yashpatel007 Aug 10, 2023
f4d256f
error fix
yashpatel007 Aug 10, 2023
75d1680
reverting some changes -> experiment
yashpatel007 Aug 10, 2023
b3413e3
removing redundnt code from native index
yashpatel007 Aug 10, 2023
00ea657
python build error fix
yashpatel007 Aug 10, 2023
8808029
tyring to resolve python build error
yashpatel007 Aug 11, 2023
c3e064f
attempt at python build fix
yashpatel007 Aug 11, 2023
f4d4a3b
adding IndexSearchParams
yashpatel007 Aug 11, 2023
8fd6a68
setting search threads to non zero
yashpatel007 Aug 11, 2023
42e43ba
minor check removed
yashpatel007 Aug 11, 2023
1ab5bdc
eperiment 3-> making distance fully owned by data_store
yashpatel007 Aug 11, 2023
9e2a01d
exp 3 clang fix
yashpatel007 Aug 11, 2023
8267402
exp 4
yashpatel007 Aug 11, 2023
406862e
making distance as unique_ptr
yashpatel007 Aug 11, 2023
df0310c
trying to fix build
yashpatel007 Aug 11, 2023
9c7f5ca
finally fixing problem
yashpatel007 Aug 11, 2023
e829125
some minor fix
yashpatel007 Aug 12, 2023
e37505d
adding dll export to index_factory static function
yashpatel007 Aug 12, 2023
78d778c
adding dll export for static fn in index_factory
yashpatel007 Aug 12, 2023
3574428
code cleanup
yashpatel007 Aug 14, 2023
d1f595c
merging code from MergeIndexCtor
yashpatel007 Aug 15, 2023
decb877
resolving errors after merge
yashpatel007 Aug 15, 2023
c9b790d
resolving build errors
yashpatel007 Aug 15, 2023
ae4da8f
rebasing from main
yashpatel007 Aug 15, 2023
1af58fe
fixing build error for stitched index
yashpatel007 Aug 15, 2023
f4b4027
resolving build errors
yashpatel007 Aug 16, 2023
9e4d9aa
removing max_observed_degree set()
yashpatel007 Aug 16, 2023
bfcd5e1
removing comments + typo fix
yashpatel007 Aug 16, 2023
2edd45a
replacing add_neighbour with set_neighbours where we can
yashpatel007 Aug 17, 2023
881d8db
error fix
yashpatel007 Aug 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions apps/build_memory_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,32 +120,34 @@ int main(int argc, char **argv)
size_t data_num, data_dim;
diskann::get_bin_metadata(data_path, data_num, data_dim);

auto index_build_params = diskann::IndexWriteParametersBuilder(L, R)
.with_filter_list_size(Lf)
.with_alpha(alpha)
.with_saturate_graph(false)
.with_num_threads(num_threads)
.build();

auto build_params = diskann::IndexBuildParamsBuilder(index_build_params)
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
.with_universal_label(universal_label)
.with_label_file(label_file)
.with_save_path_prefix(index_path_prefix)
.build();
auto config = diskann::IndexConfigBuilder()
.with_metric(metric)
.with_dimension(data_dim)
.with_max_points(data_num)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
.with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
.with_data_type(data_type)
.with_label_type(label_type)
.is_dynamic_index(false)
.with_index_write_params(index_build_params)
.is_enable_tags(false)
.is_use_opq(use_opq)
.is_pq_dist_build(use_pq_build)
.with_num_pq_chunks(build_PQ_bytes)
.build();

auto index_build_params = diskann::IndexWriteParametersBuilder(L, R)
.with_filter_list_size(Lf)
.with_alpha(alpha)
.with_saturate_graph(false)
.with_num_threads(num_threads)
.build();

auto build_params = diskann::IndexBuildParamsBuilder(index_build_params)
.with_universal_label(universal_label)
.with_label_file(label_file)
.with_save_path_prefix(index_path_prefix)
.build();
auto index_factory = diskann::IndexFactory(config);
auto index = index_factory.create_instance();
index->build(data_path, data_num, build_params);
Expand Down
1 change: 1 addition & 0 deletions apps/build_stitched_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,7 @@ void prune_and_save(path final_index_path_prefix, path full_index_path_prefix, p
auto pruning_index_timer = std::chrono::high_resolution_clock::now();

diskann::get_bin_metadata(input_data_path, number_of_label_points, dimension);

diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points, nullptr, nullptr, 0, false, false);

// not searching this index, set search_l to 0
Expand Down
3 changes: 2 additions & 1 deletion apps/search_memory_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
.with_metric(metric)
.with_dimension(query_dim)
.with_max_points(0)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
.with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
.with_data_type(diskann_type_to_name<T>())
.with_label_type(diskann_type_to_name<LabelT>())
.with_tag_type(diskann_type_to_name<TagT>())
Expand Down
3 changes: 2 additions & 1 deletion apps/test_insert_deletes_consolidate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
.with_index_search_params(index_search_params)
.with_data_type(data_type)
.with_tag_type(tag_type)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
.with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
.is_enable_tags(enable_tags)
.is_concurrent_consolidate(concurrent)
.build();
Expand Down
3 changes: 2 additions & 1 deletion apps/test_streaming_scenario.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,8 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
.with_data_type(diskann_type_to_name<T>())
.with_index_write_params(params)
.with_index_search_params(index_search_params)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
.with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
.build();

diskann::IndexFactory index_factory = diskann::IndexFactory(index_config);
Expand Down
49 changes: 42 additions & 7 deletions include/abstract_graph_store.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

#include <string>
#include <vector>

#include "types.h"

namespace diskann
Expand All @@ -14,18 +13,54 @@ namespace diskann
class AbstractGraphStore
{
public:
AbstractGraphStore(const size_t max_pts) : _capacity(max_pts)
AbstractGraphStore(const size_t total_pts, const size_t reserve_graph_degree)
: _capacity(total_pts), _reserve_graph_degree(reserve_graph_degree)
{
}

// returns tuple of <nodes_read, start, num_frozen_points>
virtual std::tuple<uint32_t, uint32_t, size_t> load(const std::string &index_path_prefix,
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
const size_t num_points) = 0;
virtual int store(const std::string &index_path_prefix, const size_t num_points, const size_t num_fz_points,
const uint32_t start) = 0;

// not synchronised, user should use lock when necvessary.
virtual const std::vector<location_t> &get_neighbours(const location_t i) const = 0;
virtual void add_neighbour(const location_t i, location_t neighbour_id) = 0;
virtual void clear_neighbours(const location_t i) = 0;
virtual void swap_neighbours(const location_t a, location_t b) = 0;

virtual void set_neighbours(const location_t i, std::vector<location_t> &neighbours) = 0;

virtual size_t resize_graph(const size_t new_size) = 0;
virtual void clear_graph() = 0;

virtual uint32_t get_max_observed_degree() = 0;

// set during load
virtual size_t get_max_range_of_graph() = 0;

// Total internal points _max_points + _num_frozen_points
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
size_t get_total_points()
{
return _capacity;
}

virtual int load(const std::string &index_path_prefix) = 0;
virtual int store(const std::string &index_path_prefix) = 0;
protected:
// Internal function, changes total points when resize_graph is called.
void set_total_points(size_t new_capacity)
{
_capacity = new_capacity;
}

virtual void get_adj_list(const location_t i, std::vector<location_t> &neighbors) = 0;
virtual void set_adj_list(const location_t i, std::vector<location_t> &neighbors) = 0;
size_t get_reserve_graph_degree()
{
return _reserve_graph_degree;
}

private:
size_t _capacity;
size_t _reserve_graph_degree;
};

} // namespace diskann
} // namespace diskann
38 changes: 33 additions & 5 deletions include/in_mem_graph_store.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,41 @@ namespace diskann
class InMemGraphStore : public AbstractGraphStore
{
public:
InMemGraphStore(const size_t max_pts);
InMemGraphStore(const size_t total_pts, const size_t reserve_graph_degree);

int load(const std::string &index_path_prefix);
int store(const std::string &index_path_prefix);
// returns tuple of <nodes_read, start, num_frozen_points>
virtual std::tuple<uint32_t, uint32_t, size_t> load(const std::string &index_path_prefix,
const size_t num_points) override;
virtual int store(const std::string &index_path_prefix, const size_t num_points, const size_t num_frozen_points,
const uint32_t start) override;

void get_adj_list(const location_t i, std::vector<location_t> &neighbors);
void set_adj_list(const location_t i, std::vector<location_t> &neighbors);
virtual const std::vector<location_t> &get_neighbours(const location_t i) const override;
virtual void add_neighbour(const location_t i, location_t neighbour_id) override;
virtual void clear_neighbours(const location_t i) override;
virtual void swap_neighbours(const location_t a, location_t b) override;

virtual void set_neighbours(const location_t i, std::vector<location_t> &neighbors) override;

virtual size_t resize_graph(const size_t new_size) override;
virtual void clear_graph() override;

virtual size_t get_max_range_of_graph() override;
virtual uint32_t get_max_observed_degree() override;

protected:
virtual std::tuple<uint32_t, uint32_t, size_t> load_impl(const std::string &filename, size_t expected_num_points);
#ifdef EXEC_ENV_OLS
virtual std::tuple<uint32_t, uint32_t, size_t> load_impl(AlignedFileReader &reader, size_t expected_num_points);
#endif

int save_graph(const std::string &index_path_prefix, const size_t active_points, const size_t num_frozen_points,
const uint32_t start);

private:
size_t _max_range_of_graph = 0;
uint32_t _max_observed_degree = 0;

std::vector<std::vector<uint32_t>> _graph;
};

} // namespace diskann
19 changes: 9 additions & 10 deletions include/index.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "windows_customizations.h"
#include "scratch.h"
#include "in_mem_data_store.h"
#include "in_mem_graph_store.h"
#include "abstract_index.h"

#define OVERHEAD_FACTOR 1.1
Expand Down Expand Up @@ -58,9 +59,8 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
const bool pq_dist_build = false, const size_t num_pq_chunks = 0,
const bool use_opq = false);

// This is called by IndexFactory which returns AbstractIndex's simplified API
DISKANN_DLLEXPORT Index(const IndexConfig &index_config, std::unique_ptr<AbstractDataStore<T>> data_store
/* std::unique_ptr<AbstractGraphStore> graph_store*/);
DISKANN_DLLEXPORT Index(const IndexConfig &index_config, std::unique_ptr<AbstractDataStore<T>> data_store,
std::unique_ptr<AbstractGraphStore> graph_store);

DISKANN_DLLEXPORT ~Index();

Expand Down Expand Up @@ -327,10 +327,11 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas

// Data
std::unique_ptr<AbstractDataStore<T>> _data_store;
char *_opt_graph = nullptr;

// Graph related data structures
std::vector<std::vector<uint32_t>> _final_graph;
std::unique_ptr<AbstractGraphStore> _graph_store;

char *_opt_graph = nullptr;

T *_data = nullptr; // coordinates of all base points
// Dimensions
Expand All @@ -344,15 +345,13 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
// needed for a dynamic index. The frozen points have consecutive locations.
// See also _start below.
size_t _num_frozen_pts = 0;
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
size_t _max_range_of_loaded_graph = 0;
size_t _node_size;
size_t _data_len;
size_t _neighbor_len;

uint32_t _max_observed_degree = 0;
// Start point of the search. When _num_frozen_pts is greater than zero,
// this is the location of the first frozen point. Otherwise, this is a
// location of one of the points in index.
// Start point of the search. When _num_frozen_pts is greater than zero,
// this is the location of the first frozen point. Otherwise, this is a
// location of one of the points in index.
uint32_t _start = 0;
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
harsha-simhadri marked this conversation as resolved.
Show resolved Hide resolved

bool _has_built = false;
Expand Down
13 changes: 11 additions & 2 deletions include/index_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,16 @@

namespace diskann
{
enum DataStoreStrategy
enum class DataStoreStrategy
{
MEMORY
};

enum GraphStoreStrategy
enum class GraphStoreStrategy
{
MEMORY
};

struct IndexConfig
{
DataStoreStrategy data_strategy;
Expand Down Expand Up @@ -201,6 +203,13 @@ class IndexConfigBuilder
throw ANNException("Error: please pass initial_search_list_size for building dynamic index.", -1);
}

// sanity check
if (_dynamic_index && _num_frozen_pts == 0)
{
diskann::cout << "_num_frozen_pts passed as 0 for dynamic_index. Setting it to 1 for safety." << std::endl;
_num_frozen_pts = 1;
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
}

return IndexConfig(_data_strategy, _graph_strategy, _metric, _dimension, _max_points, _num_pq_chunks,
_num_frozen_pts, _dynamic_index, _enable_tags, _pq_dist_build, _concurrent_consolidate,
_use_opq, _data_type, _tag_type, _label_type, _index_write_params, _index_search_params);
Expand Down
12 changes: 7 additions & 5 deletions include/index_factory.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,17 @@ class IndexFactory

// Consruct a data store with distance function emplaced within
template <typename T>
DISKANN_DLLEXPORT static std::unique_ptr<AbstractDataStore<T>> construct_datastore(DataStoreStrategy stratagy,
size_t num_points,
size_t dimension, Metric m);
DISKANN_DLLEXPORT static std::unique_ptr<AbstractDataStore<T>> construct_datastore(const DataStoreStrategy stratagy,
const size_t num_points,
const size_t dimension,
const Metric m);

DISKANN_DLLEXPORT static std::unique_ptr<AbstractGraphStore> construct_graphstore(
const GraphStoreStrategy stratagy, const size_t size, const size_t reserve_graph_degree);

private:
void check_config();

std::unique_ptr<AbstractGraphStore> construct_graphstore(GraphStoreStrategy stratagy, size_t size);

template <typename data_type, typename tag_type, typename label_type>
std::unique_ptr<AbstractIndex> create_instance();

Expand Down
6 changes: 4 additions & 2 deletions src/filter_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,10 @@ void generate_label_indices(path input_data_path, path final_index_path_prefix,

size_t number_of_label_points, dimension;
diskann::get_bin_metadata(curr_label_input_data_path, number_of_label_points, dimension);
diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points, nullptr, nullptr, 0, false,
false);

diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points,
std::make_shared<diskann::IndexWriteParameters>(label_index_build_parameters), nullptr,
0, false, false);

auto index_build_timer = std::chrono::high_resolution_clock::now();
index.build(curr_label_input_data_path.c_str(), number_of_label_points, label_index_build_parameters);
Expand Down
Loading
Loading