Tree-based APC storage engine #37

thedeacon · 2021-02-04T07:47:29Z

Initial pass at implementing the directed acyclic graph as described in #36 . This code also contains an upper-bound on how many failed calls to apcu_cas() in a tight loop will be allowed before an exception is thrown, to prevent an infinite loop arising from some unforeseen APCu misbehavior. I'm not strongly attached to this loop-catcher for the general case if anybody is opposed, but for my needs it was deemed important to have, as a defensive-coding pattern.

The DAG is a tree structure, starting with a root node, which contains a serialized array of all the metadata keys stored in APCu. Each metadata key contains a collection of the typical metadata (name, help, metric type) as well as an array of labelNames associated with the metric being measured, and if the metric is a histogram then an array of buckets. The metadata APCu key names end with ":meta".

For each labelNames item, a ":label" APCu key is created, listing all labels for each labelName. Then for each permutation of labels, a ":value" APCu key is created once the label-tuple contains a value. Similarly, for each histogram bucket assigned to a label-tuple, a bucket-specific ":value" APCu key is created once the bucket contains a value. This also applies for the special buckets "sum" and "+Inf", which are not stored as part of the metadata APCu key for the metric.

The described structure allows all APCu keys containing values to be quickly enumerated, simply by reading the root node, loading each metadata APCu key referenced by the root node, and programmatically generating every ":label" and ":value" APCu key that could exist based on the contents of that metadata key. Wiping all data follows a similar pattern: enumerate all keys which contain data and delete them, starting at the leaf nodes and working backward toward the root, deleting the root node last of all.

Note there is a small race condition in addItemToKey(), between reading the array and writing it back, another thread could add an element to the contents of the key. For my purposes this is acceptable, because if a metadata pointer gets deleted from the root node, for instance, the next thread to write to that metric will re-create the missing array pointer and will likely succeed. When hundreds of threads per second are writing metrics, the window where this pointer is missing will be exceedingly small -- and when traffic is light, chances are good that this race condition will not be triggered. However, it could be improved -- I'm still thinking about ways to serialize access to the critical part of that function -- ideas are welcome!

Signed-off-by: Kreiner <[email protected]>

LKaemmerling · 2021-02-10T05:57:52Z

I would prefer to make this a dedicated Storage type (and deprecate the old one), because of the changed Storage layout. Changing this might need a new major release which I would like to prevent.

Just out of curiosity, do you have any benchmarks? I would like to see the difference between the old implementation and the new implementation in the case of adding new keys (when N keys are already inserted) and.

Otherwise, the code looks fine on the first look, really good job! Thank you!

LKaemmerling · 2021-03-15T06:11:58Z

Hey @thedeacon,

ping in case you missed my comment :)

LKaemmerling · 2021-03-15T06:12:56Z

And a new fact, when #45 will be merged i would like to issue a new major release (because of the breaking change in the MR #45).

thedeacon · 2021-10-28T00:17:10Z

Hey @LKaemmerling I'm still alive!
I do have an improved and battle-tested version of the engine which does name a new Storage type due to the changed layout, and fixes a number of minor issues we have found while operating in production. I also performed some benchmarking, and found that -- depending on the amount of other records in APC and the number of metrics being tracked -- the performance improvement ranges from 2.8x to ~50x faster for collecting metrics, and is approximately the same for other operations (create, increment, wipeStorage).

~~Due to the large time which has passed and the number of changes, I'm debating whether to submit these as a new PR (and abandon this one), or to update this one. Do you have a preference?~~

Update: I have created a new/clean PR #72 which supersedes this PR. I will keep this PR open for a little while so the existing discussion is not lost, but new discussion should be on PR #72 .

LKaemmerling · 2021-11-05T05:53:58Z

As #72 will be merged i will close this MR :)

thedeacon mentioned this pull request Feb 4, 2021

APC engine very slow when APCu contains many keys - O(N^2) performance #36

Closed

Kreiner added 4 commits February 4, 2021 01:10

Much faster tree-based APC storage instead of APCUIterator

4eb9277

Signed-off-by: Kreiner <[email protected]>

Prevent possible infinite loop with APCu check-and-set

963afe3

Signed-off-by: Kreiner <[email protected]>

Clean up phpstan output

f96072f

Signed-off-by: Kreiner <[email protected]>

Clean up phpcs output

f7d7ade

Signed-off-by: Kreiner <[email protected]>

thedeacon force-pushed the master branch from d09159a to f7d7ade Compare February 4, 2021 08:10

thedeacon mentioned this pull request Oct 28, 2021

Implement next-gen APC storage engine (APCng) #72

Merged

LKaemmerling closed this Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree-based APC storage engine #37

Tree-based APC storage engine #37

thedeacon commented Feb 4, 2021

LKaemmerling commented Feb 10, 2021

LKaemmerling commented Mar 15, 2021

LKaemmerling commented Mar 15, 2021

thedeacon commented Oct 28, 2021 •

edited

Loading

LKaemmerling commented Nov 5, 2021

Tree-based APC storage engine #37

Tree-based APC storage engine #37

Conversation

thedeacon commented Feb 4, 2021

LKaemmerling commented Feb 10, 2021

LKaemmerling commented Mar 15, 2021

LKaemmerling commented Mar 15, 2021

thedeacon commented Oct 28, 2021 • edited Loading

LKaemmerling commented Nov 5, 2021

thedeacon commented Oct 28, 2021 •

edited

Loading