Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Collect metrics for direct connections & add opt-in push metrics #2805

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

matheus23
Copy link
Contributor

@matheus23 matheus23 commented Oct 15, 2024

New metrics

  1. nodes_contacted: Number of nodes we have attempted to contact.
  2. nodes_contacted_directly: Number of nodes we have managed to contact directly.
  3. connection_handshake_success: Number of connections with a successful handshake.
  4. connection_became_direct: Number of connections that became direct.

This sets up to compute some core metrics regarding direct connections:

  1. nodes_contacted_directly / nodes_contacted: Ratio of node pairs that manage to connect directly.
  2. connection_became_direct / connection_handshake_success: Ratio of connections that can communicate directly

Both of these metrics give us an idea on the number of network conditions that allow us to avoid relay traffic.
Our hypothesis is that this number is around 90% & that it's similar between these two metrics, but it's worth capturing both to see if we're right.

We also already capture enough information to get numbers on direct vs. relayed traffic volume:

recv_data_relay / (recv_data_ipv4 + recv_data_ipv6): Ratio of relay data vs. direct data.

This metric will help us explain how much bandwidth you can save by using iroh-net.

Opt-in Push Metrics Exporter

The iroh::NodeConfig now has a section metrics_exporter_config. This section looks like this:

[metrics_exporter_config]
interval = 5 # number, frequency in seconds
endpoint = "" # string, URL of endpoint
service_name =  "" # string, an identifier for the particular service/application (e.g. iroh-drop) (?)
instance_name = "" # string, Not sure
username = "" # string, (optional)
password = "" # string, non-optional

Breaking Changes

  • MagicsockMetrics is now #[non_exhaustive]. This allows us to add
    more metrics without breaking backwards compatibility in the future. The
    struct is not meant to be constructed outside of iroh-net anyways.

Notes & open questions

  • When do we merge this? (I'd say we merge sooner rather than later. Otherwise this will churn with iroh refactoring.)
  • Who will opt-in? Will we add the opt-in to iroh-drop and test this ourselves?
  • Do we talk to partners to see if they're interested in talking to their users to see if they will opt-in?
  • Do we protect our metrics collection from abuse? (I honestly doubt we need to do that & if it happens, we'll see it in the data and deem data poisoned until we fix them)

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
    (We tested this manually at the retreat at least. We don't have metrics tests at the moment.)
  • All breaking changes documented.

Arqu and others added 3 commits October 7, 2024 15:33
## Description

<!-- A summary of what this pull request achieves and a rough list of
changes. -->

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.

---------

Co-authored-by: Ruediger Klaehn <[email protected]>
## Description

<!-- A summary of what this pull request achieves and a rough list of
changes. -->

New metrics:
- `nodes_contacted`
- `nodes_contacted_directly`

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->


## Change checklist

- [x] Self-review.
- [x] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- ~~[ ] Tests if relevant.~~
- ~~[ ] All breaking changes documented.~~

---------

Co-authored-by: Diva M <[email protected]>
Co-authored-by: Divma <[email protected]>
Co-authored-by: Floris Bruynooghe <[email protected]>
## Description

Adds four new magic socket metrics:

1. `nodes_contacted`: Number of nodes we have attempted to contact.
2. `nodes_contacted_directly`: Number of nodes we have managed to
contact directly.
3. `connection_handshake_success`: Number of connections with a
successful handshake.
4. `connection_became_direct`: Number of connections that became direct.

This sets up to compute some core metrics regarding direct connections:
1. `nodes_contacted_directly` / `nodes_contacted`: Ratio of node pairs
that manage to connect directly.
2. `connection_became_direct` / `connection_handshake_success`: Ratio of
connections that can communicate directly

Both of these metrics give us an idea on the number of network
conditions that allow us to avoid relay traffic.
Our hypothesis is that this number is around 90% & that it's similar
between these two metrics, but it's worth capturing both to see if we're
right.

We also already capture enough information to get numbers on direct vs.
relayed traffic volume:

`recv_data_relay` / (`recv_data_ipv4` + `recv_data_ipv6`): Ratio of
relay data vs. direct data.

This metric will help us explain how much bandwidth you can save by
using iroh-net.

## Breaking Changes

- `MagicsockMetrics` is now `#[non_exhaustive]`. This allows us to add
more metrics without breaking backwards compatibility in the future. The
struct is not meant to be constructed outside of `iroh-net` anyways.

## Change checklist

- [x] Self-review.
- [x] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [x] Tests if relevant.
  No tests, but we verified this works manually at the retreat.
- [x] All breaking changes documented.
@matheus23 matheus23 self-assigned this Oct 15, 2024
Copy link

github-actions bot commented Oct 15, 2024

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2805/docs/iroh/

Last updated: 2024-10-18T07:47:31Z

Copy link

github-actions bot commented Oct 17, 2024

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 8a196f2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

2 participants