Fix broken metrics part 1 #31

ximon18 · 2023-10-10T17:27:19Z

This PR replaces accidentally closed PR #27.

There will be a "part 2" soon, but for now this PR is stopping here.

Add unit or integration tests (as relevant) for all metrics showing that they update as expected.

Status:

BgpTcpInMetrics: ✅ (incomplete)
BmpInMetrics: ✅
BmpMetrics: TODO
BmpProxyMetrics: TODO
BmpTcpInMetrics: TODO
GateMetrics: TODO
MqttMetrics: TODO
RibUnitMetrics: TODO
RibMergeUpdateStatistics: TODO
RotoFilterMetrics: ✅
TokioTaskMetrics: TODO
TrackingAllocator: ✅

This PR fixes the following issues:

Fixes an issue where metrics for a router ID may not exist and cause a HTTP 404 Not Found error in the HTML UI.
Fixes an issue where a BMP re-initiate with a new sysName / sysDesc may not be detected.
Fixes an issue where a non-fatal I/O error would wrongly terminate a BMP input stream.
Fixes a typo in metric name bgp_tcp_in_diconnect_count.

And this PR also:

Introduces Target::Test for obtaining metric snapshots in a form more conducive to testing (only compiled in test mode, not production mode).
Introduces fn AnyStatusReporter::metrics() for easier access to metrics for testing.
Extends Filterable::filter() to take a filtered_fn(SourceId) callback to make it easier to know when payloads are filtered out, especially when filtering on a collection of payloads rather than a single payload.
Marks metrics with a // TEST STATUS: [ ] makes sense? [ ] passes tests? style comment to track which metrics have been assessed as to whether they actually make sense or not and whether or not they are covered by tests that pass.
Where needed code has been refactored to make it testable.

13/78 TEST STATUS comments are "green", so 65/78 remain to do, although not all are equally important.

…, fix typos and make various improvements to the config file text and structure.

When run without --config the embedded copy of etc/rotonda.conf is used instead, and targets.proxy is removed if --proxy-destination is not supplied on the command line. If --bgp-listen or --bmp-listen are supplied on the command line they override the BMP and BGP listen settings in the embedded config. - Allow BGP peers to be missing.

- Add missing semi-colon. - Filter out not in.

…nnections on a particular socket.

…age can link to the actual RIB query HTTP API rather than incorrectly assume a fixed path of /prefixes/.

…lt-mvp-config

…e-prefixes-query-link

…ined in roto scripts loaded from a roto script directory. - Refactors roto script file reading from the units/targets to the Manager. - Checks script loading, compilation and valid filter names at config load time instead of at filter execution time. - Add MVP behaviour tests (WIP). - Introduces a FilterName type.

…in case I return to this later.

…e required .roto files, e.g. if running from `cargo install`. Cleanup config initialisation and standard variable naming.

…ts_dir` dir `etc` doesn't exist. This change had several knock on effects: - Introduced RotoError::LoadError. - Upgrade clap crate to fix bug with clap switcing help on own line or not even when disabled. - Upgrade toml crate to fix inconsistent line breaks in diagnostic dump of post-processed config TOML. - Adjust position handling in `Marked` due to changes in serde_spanned pulled in by other crate upgrades. - Added new cmd line arg --print-config-and-exit and introduced `Terminated` for use in cases where `ExitError` wrongly limits us to just an error exit rather than a normal exit. - Various minor tweaks to the clap config for improved read and usability. - Log when exiting.

…using a mock BGP `Session`.

…uery-link

… whose APIs are linked to.

…ix-broken-metrics

…ater results may be different if called again.

… the BMP TCP input receiver on non-fatal errors (otherwise loss of the receiver causes the router read loop to abort anyway).

src/units/bmp_in/http/router_info/request.rs

…ess verbose.

* Add a special metric output format for use by unit tests, and a callback fired on each VM exec that rejects the input (to enable easy counting of filtered out input messages), and remove support for metric types we don't need. Use the new callback to invoke the till now unused message_filtered() status reporter fn. * Extend filter unit tests to show that the filtered message count metric now works. * Remove the bmp_in_connection_count metric as connections are handled by bmp_tcp_in while this metric is based on BMP initiation messages received, so this is (a) misleading and (b) doesn't work anyway as when the sys name changes so can the router id and thus the changed metric would be in a different metric set, there'd be no effect on the previous metric. * Follow changes in router id caused by a new sysName received by a subsequent BMP Initiation Message. * Make test metrics easier to query, and add initial tests for the one and only remaining bmp-in metric. TODO: add a test showing bmp_in_num_invalid_bmp_messages increasing. * Remove the footgun of having to initialiize metrics per router, just ensure they are initialized on first use, otherwise metrics get lost. * Add a test showing that invalid BMP messages cause the bmp_in_num_invalid_bmp_messages counter to increase. * Test the custom allocator metrics. * Refactor bgp_tcp_in to be testable and add first metric test. * Sort test metric output for easier reading when inspecting the contents manually during development. * Metric name typo correction. * Mock the TcpStream as well so that we can accept a simulated connection to test the `bgp_tcp_in_connection_accepted_count` metric. * Also test the connection lost and disconnected bgp-tcp-in metrics by using a mock BGP `Session`. * Improved naming of RIB merge update statistic metrics. * Introduce the concept of fatal vs non-fatal I/O errors and don't drop the BMP TCP input receiver on non-fatal errors (otherwise loss of the receiver causes the router read loop to abort anyway). * FIX: Don't abort BMP TCP input reading on non-fatal I/O errors. * Add a test for some of the bmp-tcp-in metrics. * Import SeqCst directly to be consistent with other usages and to be less verbose.

ximon18 added 30 commits September 13, 2023 12:44

Document the bgp-tcp-in unit settings, and cleanup dangling footnotes…

b832532

…, fix typos and make various improvements to the config file text and structure.

BGP, not BMP.

88ead40

Typo.

4753271

bmp-asn-filter.roto fixes:

505486e

- Add missing semi-colon. - Filter out not in.

Better message on roto script load failure.

eabc6b7

Make http_listen overridable as well for the MVP.

43e7142

Log at info rather than debug level that we are listening for HTTP co…

94a03f9

…nnections on a particular socket.

Clippy.

20d2879

Oops, invert the logic of the default bmp asn filter.

64f7d7c

FIX: Store HTTP provider relative base URLs so that the router info p…

b4c84c4

…age can link to the actual RIB query HTTP API rather than incorrectly assume a fixed path of /prefixes/.

Add missing changes.

dd1a128

Add missing import.

67b76d6

Merge branch 'blocks-wip' into add-bgp-tcp-in-settings-to-rotonda-conf

b59fe39

Merge branch 'add-bgp-tcp-in-settings-to-rotonda-conf' into add-defau…

321d2f0

…lt-mvp-config

Merge branch 'add-default-mvp-config' into fix-broken-router-info-pag…

5096f6c

…e-prefixes-query-link

Merge branch 'blocks-wip' into add-default-mvp-config

8dabed6

Fix copy-pasted misleading fn name.

3b5c78b

Cleanup, refactoring and Clippy suggestions.

c4af03a

Remove possibly incorrect usage of tokio-metrics instrument.

6f396ec

And mark the process_metrics as unused, but leave them there for now …

a57ec2a

…in case I return to this later.

Cleanup, refactoring and Clippy suggestions.

77884bb

Remove unnecessary generic introduced in recently this PR.

f66a9f5

More user friendly error message.

e4d5e55

Add TODO comment.

d1f6f98

Less childish name.

312480a

Permit filters to be missing in MVP mode, as the user may not have th…

4b6cf0c

…e required .roto files, e.g. if running from `cargo install`. Cleanup config initialisation and standard variable naming.

Remove errant blank line in config.

9b543e7

ximon18 added 7 commits October 5, 2023 18:54

s/think/thin/

a7e355d

Also test the connection lost and disconnected bgp-tcp-in metrics by …

f918b9c

…using a mock BGP `Session`.

Merge branch 'blocks-wip' into fix-broken-router-info-page-prefixes-q…

8aa601b

…uery-link

Merge branch 'blocks-wip' into fix-broken-router-info-page-prefixes-q…

48fd954

…uery-link

Merge branch 'blocks-wip' into fix-broken-metrics

d881c89

Remove accidentally re-added file.

2ea5c25

Fix broken links in bmp-in HTML interface, and name the RIB instances…

2de35f7

… whose APIs are linked to.

ximon18 changed the base branch from blocks-wip to fix-broken-bmp-routers-html-links October 12, 2023 07:45

ximon18 added 4 commits October 12, 2023 09:48

Merge branch 'fix-broken-router-info-page-prefixes-query-link' into f…

1b30295

…ix-broken-metrics

Reintroduce change lost during merge.

e6ae4e5

Fix merged change to follow change to router_metrics() fn return type.

3db6f53

Merge branch 'fix-broken-bmp-routers-html-links' into fix-broken-metrics

c30420a

Base automatically changed from fix-broken-bmp-routers-html-links to blocks-wip October 12, 2023 07:55

ximon18 added 7 commits October 12, 2023 10:03

Improved naming of RIB merge update statistic metrics.

75345a5

Fix: completely follow the rename.

b0abd6e

Rename fn to make it clearer that its results are only for now, and l…

0cda43b

…ater results may be different if called again.

Introduce the concept of fatal vs non-fatal I/O errors and don't drop…

f423b75

… the BMP TCP input receiver on non-fatal errors (otherwise loss of the receiver causes the router read loop to abort anyway).

FIX: Don't abort BMP TCP input reading on non-fatal I/O errors.

90f4844

Add a test for some of the bmp-tcp-in metrics.

e636a71

Test one more bmp-tcp-in metric.

f42e5a8

ximon18 changed the title ~~Fix broken metrics~~ Fix broken metrics part 1 Oct 12, 2023

ximon18 requested review from DRiKE and density215 October 12, 2023 13:15

ximon18 marked this pull request as ready for review October 12, 2023 13:15

density215 reviewed Oct 19, 2023

View reviewed changes

src/units/bmp_in/http/router_info/request.rs Outdated Show resolved Hide resolved

ximon18 added 2 commits October 25, 2023 21:51

Merge branch 'blocks-wip' into fix-broken-metrics

8aa7755

Import SeqCst directly to be consistent with other usages and to be l…

01ffb8f

…ess verbose.

ximon18 merged commit e250a6d into blocks-wip Oct 25, 2023
1 check failed

ximon18 deleted the fix-broken-metrics branch October 25, 2023 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken metrics part 1 #31

Fix broken metrics part 1 #31

ximon18 commented Oct 10, 2023 •

edited

Loading

Fix broken metrics part 1 #31

Fix broken metrics part 1 #31

Conversation

ximon18 commented Oct 10, 2023 • edited Loading

ximon18 commented Oct 10, 2023 •

edited

Loading