Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree-based APC storage engine #37

Closed
wants to merge 4 commits into from
Closed

Conversation

thedeacon
Copy link
Contributor

Initial pass at implementing the directed acyclic graph as described in #36 . This code also contains an upper-bound on how many failed calls to apcu_cas() in a tight loop will be allowed before an exception is thrown, to prevent an infinite loop arising from some unforeseen APCu misbehavior. I'm not strongly attached to this loop-catcher for the general case if anybody is opposed, but for my needs it was deemed important to have, as a defensive-coding pattern.

The DAG is a tree structure, starting with a root node, which contains a serialized array of all the metadata keys stored in APCu. Each metadata key contains a collection of the typical metadata (name, help, metric type) as well as an array of labelNames associated with the metric being measured, and if the metric is a histogram then an array of buckets. The metadata APCu key names end with ":meta".

For each labelNames item, a ":label" APCu key is created, listing all labels for each labelName. Then for each permutation of labels, a ":value" APCu key is created once the label-tuple contains a value. Similarly, for each histogram bucket assigned to a label-tuple, a bucket-specific ":value" APCu key is created once the bucket contains a value. This also applies for the special buckets "sum" and "+Inf", which are not stored as part of the metadata APCu key for the metric.

The described structure allows all APCu keys containing values to be quickly enumerated, simply by reading the root node, loading each metadata APCu key referenced by the root node, and programmatically generating every ":label" and ":value" APCu key that could exist based on the contents of that metadata key. Wiping all data follows a similar pattern: enumerate all keys which contain data and delete them, starting at the leaf nodes and working backward toward the root, deleting the root node last of all.

Note there is a small race condition in addItemToKey(), between reading the array and writing it back, another thread could add an element to the contents of the key. For my purposes this is acceptable, because if a metadata pointer gets deleted from the root node, for instance, the next thread to write to that metric will re-create the missing array pointer and will likely succeed. When hundreds of threads per second are writing metrics, the window where this pointer is missing will be exceedingly small -- and when traffic is light, chances are good that this race condition will not be triggered. However, it could be improved -- I'm still thinking about ways to serialize access to the critical part of that function -- ideas are welcome!

@LKaemmerling
Copy link
Member

I would prefer to make this a dedicated Storage type (and deprecate the old one), because of the changed Storage layout. Changing this might need a new major release which I would like to prevent.

Just out of curiosity, do you have any benchmarks? I would like to see the difference between the old implementation and the new implementation in the case of adding new keys (when N keys are already inserted) and.

Otherwise, the code looks fine on the first look, really good job! Thank you!

@LKaemmerling
Copy link
Member

Hey @thedeacon,

ping in case you missed my comment :)

@LKaemmerling
Copy link
Member

And a new fact, when #45 will be merged i would like to issue a new major release (because of the breaking change in the MR #45).

@thedeacon
Copy link
Contributor Author

thedeacon commented Oct 28, 2021

Hey @LKaemmerling I'm still alive!
I do have an improved and battle-tested version of the engine which does name a new Storage type due to the changed layout, and fixes a number of minor issues we have found while operating in production. I also performed some benchmarking, and found that -- depending on the amount of other records in APC and the number of metrics being tracked -- the performance improvement ranges from 2.8x to ~50x faster for collecting metrics, and is approximately the same for other operations (create, increment, wipeStorage).

Due to the large time which has passed and the number of changes, I'm debating whether to submit these as a new PR (and abandon this one), or to update this one. Do you have a preference?

Update: I have created a new/clean PR #72 which supersedes this PR. I will keep this PR open for a little while so the existing discussion is not lost, but new discussion should be on PR #72 .

@LKaemmerling
Copy link
Member

As #72 will be merged i will close this MR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants