Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement]: more precise CheckMem function #306

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions src/hnsw/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,13 @@ void CheckMem(int limit, Relation index, usearch_index_t uidx, uint32 n_nodes, c
double M = ldb_HnswGetM(index);
double mL = 1 / log(M);
metadata_t meta = usearch_index_metadata(uidx, &error);
// todo:: update sizeof(float) to correct vector size once #19 is merged
node_size = UsearchNodeBytes(&meta, meta.dimensions * sizeof(float), (int)round(mL + 1));
int vector_bytes_num = divide_round_up(meta.dimensions * GetUsearchBitsPerScalar(GetUsearchScalarKindFromIndexMeta(meta)), 8);

// use the node size at level `(int)rount(mL + 1)` as the average node size,
// the sizes of nodes in different levels is actually not linearly related, but
// since nodes are exponentially distributed between levels, dominated by bottom level,
// this is a reasonably good approximation.
node_size = UsearchNodeBytes(&meta, vector_bytes_num, (int)round(mL + 1));
}
// todo:: there's figure out a way to check this in pg <= 12
#if PG_VERSION_NUM >= 130000
Expand Down
19 changes: 19 additions & 0 deletions src/hnsw/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,25 @@ uint32 EstimateRowCount(Relation heap);
int32 GetColumnAttributeNumber(Relation rel, const char *columnName);
usearch_metric_kind_t GetMetricKindFromStr(char *metric_kind_str);

inline size_t divide_round_up(size_t num, size_t denominator) {
return (num + denominator - 1) / denominator;
}

inline size_t GetUsearchBitsPerScalar(usearch_scalar_kind_t scalar_kind) {
switch (scalar_kind) {
case usearch_scalar_f64_k: return 64;
case usearch_scalar_f32_k: return 32;
case usearch_scalar_f16_k: return 16;
case usearch_scalar_i8_k: return 8;
case usearch_scalar_b1_k: return 1;
default: return 0;
}
}

inline usearch_scalar_kind_t GetUsearchScalarKindFromIndexMeta(metadata_t meta) {
return meta.init_options.quantization;
}

// hoping to throw the error via an assertion, if those are on, before elog(ERROR)-ing as a last resort
// We prefer Assert() because this function is used in contexts where the stack contains non-POD types
// in which case elog-s long jumps cause undefined behaviour.
Expand Down
Loading