Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm test fails to collect logs from test job pod #662

Open
maxlemieux opened this issue Jul 30, 2024 · 5 comments
Open

helm test fails to collect logs from test job pod #662

maxlemieux opened this issue Jul 30, 2024 · 5 comments

Comments

@maxlemieux
Copy link

When I run helm test --logs per the documentation here, I get an error on collecting logs from a test job pod.

helm test -n grafana grafana-k8s-monitoring --logs

Result:

Error: unable to get pod logs for test-grafana-k8s-monitoring: pods "test-grafana-k8s-monitoring" not found

Checking events suggests the pod has a slightly different name:

30m         Normal   SuccessfulCreate   job/test-grafana-k8s-monitoring         Created pod: test-grafana-k8s-monitoring-ss4p9
30m         Normal   Completed          job/test-grafana-k8s-monitoring         Job completed

I would expect the test routine to collect logs regardless of what the generated pod is called.

Cluster info:

  • Google Kubernetes Engine (standard nodes)
  • Kubernetes v1.29.6-gke.1038001
@maxlemieux
Copy link
Author

Just to confirm - the logs are actually there, example to retrieve:

k logs -n grafana test-grafana-k8s-monitoring-ss4p9
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=up{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=alloy_build_info{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=kubernetes_build_info{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=machine_memory_bytes{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=kube_node_info{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=node_exporter_build_info{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=opencost_build_info{cluster="my-cluster"}...
Running PromQL query: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/api/v1/query?query=grafana_kubernetes_monitoring_build_info{cluster="my-cluster"}...
All queries passed!

@petewall
Copy link
Collaborator

The issue is that the test is run in a Kubernetes Job, not a Pod. Helm test cannot get logs from Jobs (helm/helm#11236). Why not just make it a Pod? Because then it always fails on the first run, because there's no restart policies. This test will only pass after Alloy has had a chance to gather metrics, logs, traces, profiles, and send them to their data stores. This often only is true after a few minutes of runtime.

@petewall
Copy link
Collaborator

So, options:

  1. Drop this test. It's only truly useful in CI/CD, because anything else requires a read-path to the databases (the access policy tokens created by Grafana Cloud in k8s-monitoring only give metrics:read scopes). I like the idea, but having it be the built-in helm test is a bit much. Helm test would be replaced by somethign else that simply validates that the pods are online.
    CI/CD would use a custom test routine, rather than helm test directly.
  2. Convert to a Pod, deal with the failures in CI/CD somehow.
  3. Keep as-is

I'm leaning towards 1, personally.

@maxlemieux
Copy link
Author

Option 1 sounds good to me, since most installation problems will result in unhealthy or unavailable pods.

@petewall
Copy link
Collaborator

Likely, what I'll do is just keep things the same and then use smarter things going forward with 2.0.

I do want to utilize helm test to do valid work, I'm just trying to capture what that might be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants