Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testnet] Setup prometheus server #18

Open
thomas-nguy opened this issue Jul 10, 2023 · 14 comments
Open

[Testnet] Setup prometheus server #18

thomas-nguy opened this issue Jul 10, 2023 · 14 comments
Assignees
Labels

Comments

@thomas-nguy
Copy link
Member

thomas-nguy commented Jul 10, 2023

  • Setup prometheus server for metrics gathering
  • Configure zkserver , explorer and prover servers (when ready) to send metrics to the server
@thomas-nguy
Copy link
Member Author

can use 10.202.4.155

@JayT106 JayT106 self-assigned this Jul 10, 2023
@JayT106
Copy link
Contributor

JayT106 commented Jul 10, 2023

is the explorer meaning the L2 explorer in #13 ?

@JayT106
Copy link
Contributor

JayT106 commented Jul 10, 2023

@thomas-nguy
Copy link
Member Author

yes it is the explorer in #13

@thomas-nguy
Copy link
Member Author

thanks, could you also install grafana in this machine and configure it so that we can visualize the prometheus metrics?
https://prometheus.io/docs/visualization/grafana/

@JayT106
Copy link
Contributor

JayT106 commented Jul 11, 2023

sure

@thomas-nguy
Copy link
Member Author

thomas-nguy commented Jul 11, 2023

Seems like we already have a prometheus server and grafana set up internally

I'll switch the requirements to

  • Connect the zkserver to our prometheus server. By default it uses pull mode (https://github.com/cronos-labs/zksync-era/blob/internal/core/bin/zksync_core/src/lib.rs#L262) and the server is running on port 3312

  • Investigate what kind of metrics are collected and what metrics can help to fill those 3 basic requirements :
    1 / Get alert when the server is down (can use the healthcheck running on port 3071)
    2/ Get alert when the eth_sender wallet is almost empty
    3/ Get alert when the circuit breaker is triggered (server is unable to run)

  • Based on metrics collected, draft some ideas of dashboard that we can build that might be usefull

@calvinaco @ivanslwong-crypto-com @henrywong-crypto

@thomas-nguy
Copy link
Member Author

fee_monitor_balances{account="fee_account_l1"} 0.5
fee_monitor_balances{account="operator_l1"} 1.7533511106980662
fee_monitor_balances{account="fee_account_l2"} 1.8844532495

operator_l1 gives you the balance for the eth_sender wallet

@calvinaco
Copy link
Collaborator

calvinaco commented Jul 13, 2023

fee_monitor_balances{account="fee_account_l1"} 0.5
fee_monitor_balances{account="operator_l1"} 1.7533511106980662
fee_monitor_balances{account="fee_account_l2"} 1.8844532495

operator_l1 gives you the balance for the eth_sender wallet

@thomas-nguy @JayT106 What would be the criteria for low balance?

@thomas-nguy
Copy link
Member Author

maybe below 0.5 to be conservative?
for testnet 0.1 is fine

@calvinaco
Copy link
Collaborator

3/ Get alert when the circuit breaker is triggered (server is unable to run)

which metrics should I use to monitor " the circuit breaker is triggered"?

@thomas-nguy
Copy link
Member Author

which metrics should I use to monitor " the circuit breaker is triggered"?

i checked but unfortunately there is no metrics.

I guess if circuit breaker is trigerred, the server will shut down and unable to restart due to exception

the healthcheck can be a good indicator that circuit breaker has been trigerred
-> if server is down and see what it contains

I guess we can remove this requirement from the alert system

@thomas-nguy
Copy link
Member Author

health check

http://10.202.3.175:3071/health

@calvinaco
Copy link
Collaborator

calvinaco commented Jul 28, 2023

@thomas-nguy I already implemented

  • node down check (more precisely check for Prometheus service down)
  • low balance alert

They will send an alert to #blockchain-scaling-team

For the alert part it can be treated as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants