Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The agent doesn’t send its own logs #944

Closed
chisuzume opened this issue Oct 4, 2023 · 11 comments
Closed

The agent doesn’t send its own logs #944

chisuzume opened this issue Oct 4, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@chisuzume
Copy link

chisuzume commented Oct 4, 2023

What happened?

Description

Field set:
logsEngine: otel

Setting the value values.logsCollection.containers.excludeAgentLogs: false doesn’t work.
The agent sends all logs except logs from: /var/log/pods/*/otel-collector/*.log.

After that I tried another option. I copied and created an additional filelog receiver, set include: [/var/log/pods/otel*/otel-collector/*.log]. I noticed that operators[14] doesn’t work in receiver:

- field: resource[“com.splunk.sourcetype”]
  type: add
  value: EXPR(“kube:container:”+resource[“k8s.container.name”])

If you remove this field or replace it with

- from: resource[“k8s.container.name”]
  to: resource[“com.splunk.sourcetype”]
  type: copy

, then the agent logs are sent to splunk.

But I would like to receive logs with correct sourcetype.

Also I noticed that I can’t change the resource[“com.splunk.sourcetype”].
It is not possible to change this field using processors: attributes, resource, transform. When setting a value with a string or expression, logs don’t come.
It also fails to change sourcetype if you specify it in

filelog:
  resource:
    com.splunk.sourcetype: “custom_value”

as described by the link

Expected Result

logs from agent (/var/log/pods/*/otel-collector/*)

Actual Result

No logs

Chart version

0.85.0

Environment information

Environment

Cloud: "EKS"
k8s version: 1.27.0

Chart configuration

No response

Log output

No response

Additional context

No response

@chisuzume chisuzume added the bug Something isn't working label Oct 4, 2023
@matthewmodestino
Copy link

I have not ever had issues with setting exclude agent logs to false.

You should not need to create your own filelog receiver..that gets into advanced pipeline building which may be prone to error that can explain the operator and processor not working.

Will test that version to confirm.

@chisuzume
Copy link
Author

chisuzume commented Oct 5, 2023

I have not ever had issues with setting exclude agent logs to false.

You should not need to create your own filelog receiver..that gets into advanced pipeline building which may be prone to error that can explain the operator and processor not working.

Will test that version to confirm.

@matthewmodestino A custom receiver and pipeline was created later as a separate option.

@omrozowicz-splunk
Copy link
Contributor

I've just tested this with 0.85.0 (but not EKS) and works fine.

@chisuzume at the beginning your values.yaml contained just the default options and values.logsCollection.containers.excludeAgentLogs: false? No custom changes?

@chisuzume
Copy link
Author

@omrozowicz-splunk , no special changes, but clusterReseiver is enabled, splunk_hec/platform_logs exporter is used in the logs pipeline.

@omrozowicz-splunk
Copy link
Contributor

@omrozowicz-splunk , no special changes, but clusterReseiver is enabled, splunk_hec/platform_logs exporter is used in the logs pipeline.

splunk_hec/platform_logs is something enabled by default. Can you send your anonymized values.yaml? We've tested it on EKS as well and it also works fine.

@matthewmodestino
Copy link

matthewmodestino commented Oct 5, 2023

@matthewmodestino A custom receiver and pipeline was created later as a separate option

Right, which also was unsuccessful, so lets focus on how you are trying to deploy the helm chart and if your settings are actually making it to the configmap..if not, something is wrong with your values.yaml

@chisuzume
Copy link
Author

Hi @matthewmodestino @omrozowicz-splunk , I tried to simplify values.yaml, the usual logs were received, as they were then, but the agent's own logs were not received. I mean the logs that you can see when executing the following command: kubectl logs -n <ns> <otel collector pod name> these are logs from the file /var/log/pods/*/otel-collector/*.log, not /var/log/pods/*/migration-checkpoint/*.log.

values.yaml:

clusterName: <CUSTOM>
logsEngine: otel
splunkPlatform:
  endpoint: <CUSTOM>
  index: <CUSTOM>
secret:
  create: <CUSTOM>
  name: <CUSTOM>
  validateSecret: <CUSTOM>
cloudProvider: "aws"
distribution: "eks"
image:
  otelcol:
    repository: <CUSTOM>
logsCollection:
  containers:
    enabled: true
    excludeAgentLogs: false

@matthewmodestino
Copy link

matthewmodestino commented Oct 13, 2023

Perhaps its how you are searching for them? Are you sending to Enterprise on-prem or Cloud? Testing again..

@atoulme
Copy link
Contributor

atoulme commented Oct 13, 2023

Nothing in your values.yaml points to misconfiguration. We have tests covering this functionality, showing we can get agent logs when excludeAgentLogs is set to false. See https://github.com/signalfx/splunk-otel-collector-chart/blob/main/test/k8s_agent_pod_tests/test_agent_correctness_tests.py#L67 for the query we use to search for them.

@atoulme atoulme changed the title The agent doesn’t send it’s own logs The agent doesn’t send its own logs Oct 13, 2023
@matthewmodestino
Copy link

matthewmodestino commented Oct 13, 2023

Tested on 0.85.0 with the following config:

values.yaml

clusterName: "mattymo-test"
endpoint: "https://foo.bar.baz:8088/services/collector"
token: "00000000-0000-0000-0000-000000000000"
insecureSkipVerify: true
logsEngine: otel
containerRuntime: "containerd"
excludeAgentLogs: false
helm -n otel install mattymo-test -f values.yaml splunk-otel-collector-chart/splunk-otel-collector --version 0.85.0
kubectl get pods -w -n otel
NAME                                             READY   STATUS            RESTARTS   AGE
mattymo-test-splunk-otel-collector-agent-ctr4g   0/1     PodInitializing   0          33s
mattymo-test-splunk-otel-collector-agent-ctr4g   0/1     Running           0          34s
mattymo-test-splunk-otel-collector-agent-ctr4g   1/1     Running           0          35s
kubectl -n otel logs -f mattymo-test-splunk-otel-collector-agent-ctr4g 
2023/10/13 23:20:33 settings.go:399: Set config to [/conf/relay.yaml]
2023/10/13 23:20:33 settings.go:452: Set ballast to 165 MiB
2023/10/13 23:20:33 settings.go:468: Set memory limit to 450 MiB
2023-10-13T23:20:34.017Z	info	service/telemetry.go:84	Setting up own telemetry...
2023-10-13T23:20:34.017Z	info	service/telemetry.go:201	Serving Prometheus metrics	{"address": "0.0.0.0:8889", "level": "Basic"}
2023-10-13T23:20:34.018Z	info	kube/client.go:107	k8s filtering	{"kind": "processor", "name": "k8sattributes", "pipeline": "logs", "labelSelector": "", "fieldSelector": "spec.nodeName=so1"}
2023-10-13T23:20:34.018Z	info	[email protected]/memorylimiter.go:102	Memory limiter configured	{"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 450, "spike_limit_mib": 90, "check_interval": 2}
2023-10-13T23:20:34.036Z	info	service/service.go:138	Starting otelcol...	{"Version": "v0.85.0", "NumCPU": 4}
2023-10-13T23:20:34.036Z	info	extensions/extensions.go:31	Starting extensions...
2023-10-13T23:20:34.036Z	info	extensions/extensions.go:34	Extension is starting...	{"kind": "extension", "name": "k8s_observer"}
2023-10-13T23:20:34.036Z	info	extensions/extensions.go:38	Extension started.	{"kind": "extension", "name": "k8s_observer"}
2023-10-13T23:20:34.036Z	info	extensions/extensions.go:34	Extension is starting...	{"kind": "extension", "name": "memory_ballast"}
2023-10-13T23:20:34.039Z	info	[email protected]/memory_ballast.go:41	Setting memory ballast	{"kind": "extension", "name": "memory_ballast", "MiBs": 165}
2023-10-13T23:20:34.116Z	info	extensions/extensions.go:38	Extension started.	{"kind": "extension", "name": "memory_ballast"}
2023-10-13T23:20:34.116Z	info	extensions/extensions.go:34	Extension is starting...	{"kind": "extension", "name": "zpages"}
2023-10-13T23:20:34.116Z	info	[email protected]/zpagesextension.go:53	Registered zPages span processor on tracer provider	{"kind": "extension", "name": "zpages"}
2023-10-13T23:20:34.116Z	info	[email protected]/zpagesextension.go:63	Registered Host's zPages	{"kind": "extension", "name": "zpages"}
2023-10-13T23:20:34.117Z	info	[email protected]/zpagesextension.go:75	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-10-13T23:20:34.117Z	info	extensions/extensions.go:38	Extension started.	{"kind": "extension", "name": "zpages"}
2023-10-13T23:20:34.117Z	info	extensions/extensions.go:34	Extension is starting...	{"kind": "extension", "name": "file_storage"}
2023-10-13T23:20:34.117Z	info	extensions/extensions.go:38	Extension started.	{"kind": "extension", "name": "file_storage"}
2023-10-13T23:20:34.117Z	info	extensions/extensions.go:34	Extension is starting...	{"kind": "extension", "name": "health_check"}
2023-10-13T23:20:34.117Z	info	[email protected]/healthcheckextension.go:35	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-10-13T23:20:34.117Z	warn	[email protected]/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks	{"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-13T23:20:34.117Z	info	extensions/extensions.go:38	Extension started.	{"kind": "extension", "name": "health_check"}
2023-10-13T23:20:34.117Z	info	internal/resourcedetection.go:125	began detecting resource information	{"kind": "processor", "name": "resourcedetection", "pipeline": "logs"}
2023-10-13T23:20:34.117Z	info	internal/resourcedetection.go:139	detected resource information	{"kind": "processor", "name": "resourcedetection", "pipeline": "logs", "resource": {"host.name":"so1","os.type":"linux"}}
2023-10-13T23:20:34.117Z	info	adapter/receiver.go:45	Starting stanza receiver	{"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-10-13T23:20:34.123Z	warn	[email protected]/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks	{"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-13T23:20:34.123Z	info	[email protected]/otlp.go:83	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2023-10-13T23:20:34.123Z	warn	[email protected]/warning.go:40	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks	{"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-13T23:20:34.123Z	info	[email protected]/otlp.go:101	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4318"}
2023-10-13T23:20:34.123Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2023-10-13T23:20:34.123Z	info	service/service.go:161	Everything is ready. Begin running and processing data.
2023-10-13T23:20:34.323Z	info	fileconsumer/file.go:194	Started watching file	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/kube-system_calico-kube-controllers-6c99c8747f-2mr29_fc0baf10-376a-438b-a967-70c4ad5d3c39/calico-kube-controllers/0.log"}

And in Splunk:

index=main sourcetype="kube:container:otel-collector" | reverse
image image

working as expected.

Here's the rendered configmap for good measure:

kubectl -n otel get cm mattymo-test-splunk-otel-collector-otel-agent -o yaml
<truncated>
  receivers:
      filelog:
        encoding: utf-8
        exclude: null
        fingerprint_size: 1kb
        force_flush_period: "0"
        include:
        - /var/log/pods/*/*/*.log
        include_file_name: false
        include_file_path: true
        max_concurrent_files: 1024
        max_log_size: 1MiB
        operators:
<truncated>

note it is not excluding the log path for the otel collector.

working as expected. must be how they are being searched or perhaps routed once they hit Splunk by props and transfroms. Keep in mind the collector doesnt say a whole lot when working...so ensure you are searching a long enough span.

@atoulme
Copy link
Contributor

atoulme commented Nov 1, 2023

Closing this issue as inactive. Please reopen if more work is needed.

@atoulme atoulme closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants