[Question] Trouble with HA for LAPI Pod #181

ImranR98 · 2024-08-26T00:20:03Z

I've been trying to get this to work in a small testing environment with Traefik. My current config seems to work fine with a single LAPI pod backed by a Postgres DB and connected to 2 agents on 2 nodes.

But if I try setting the lapi.replicas value to 2, I get the following error in one of the two pods when I try to run a cscli command (like cscli decisions list):
level=fatal msg="unable to retrieve decisions: performing request: Get \"http://localhost:8080/v1/alerts?has_active_decision=true&include_capi=false&limit=100\": API error: incorrect Username or Password" command terminated with exit code 1

This is my values.yaml:

config:
  config.yaml.local: |
    db_config:
      type:     postgresql
      user:     ${DB_USERNAME}
      password: ${DB_PASSWORD}
      db_name:  ${DB_NAME}
      host:     crowdsec-db.production.svc.cluster.local
      port:     5432
      sslmode:  disable

container_runtime: containerd

agent:
  acquisition:
    - namespace: production
      podName: traefik-*
      program: traefik
  env:
    - name: COLLECTIONS
      value: "crowdsecurity/traefik"
    - name: LEVEL_DEBUG
      value: "false"

lapi:
  replicas: 2 # Seems to not work with multiple replicas
  dashboard:
    enabled: true
  env:
    - name: BOUNCER_KEY_traefik
      value: "<some long value>"
    - name: DB_NAME
      valueFrom:
        secretKeyRef:
          name: crowdsec-db-secret
          key: POSTGRES_DB
    - name: DB_USERNAME
      valueFrom:
        secretKeyRef:
          name: crowdsec-db-secret
          key: POSTGRES_USER
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: crowdsec-db-secret
          key: POSTGRES_PASSWORD
  persistentVolume:
    config:
      enabled: false
    data:
      enabled: false
  secrets:
    csLapiSecret: "<some long value>" # I set this to try and fix the issue (it didn't)

My assumption was that since I have disabled persistent volumes and configured a DB instead, both LAPI instances would connect to the same DB and have no issues. But I've clearly misunderstood how everything fits together. Would appreciate anyone pointing me in the right direction!

The text was updated successfully, but these errors were encountered:

github-actions · 2024-08-26T00:20:14Z

@ImranR98: Thanks for opening an issue, it is currently awaiting triage.

If you haven't already, please provide the following information:

kind : bug, enhancementor documentation
area : agent, appsec, configuration, cscli, local-api

In the meantime, you can:

Check Crowdsec Documentation to see if your issue can be self resolved.
You can also join our Discord.
Check Releases to make sure your agent is on the latest version.

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the forked project rr404/oss-governance-bot repository.

github-actions · 2024-08-26T00:20:15Z

@ImranR98: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.

/kind bug
/kind documentation
/kind enhancement

Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the forked project rr404/oss-governance-bot repository.

ImranR98 · 2024-08-26T00:21:52Z

/kind documentation
/area local-api

he2ss · 2024-08-27T09:14:09Z

Hi, the solution is to check in the chart if the replica is enabled ( more than 1) then add suffix the env var CUSTOM_HOSTNAME with an index.

Discussed with @blotus.

ImranR98 · 2024-08-29T22:44:51Z

I'm not sure I understand, but glad to see there's a PR to fix it 🚀
Just to clarify, does this mean that - even without the PR you made - Crowdsec is actually working as expected aside from cscli availability? I assumed the lack of cscli access meant there was something else wrong with the pod.

LaurenceJJones · 2024-08-30T07:58:56Z

I'm not sure I understand, but glad to see there's a PR to fix it 🚀 Just to clarify, does this mean that - even without the PR you made - Crowdsec is actually working as expected aside from cscli availability? I assumed the lack of cscli access meant there was something else wrong with the pod.

So a not so tldr;

When the LAPI pods come up because they need to have working credentials they execute a direct machine add command and by default the container choose the name "localhost" as by the default value for CUSTOM_HOSTNAME. Since both LAPI's are using the same name within the startup script they delete the previous LAPI credentials that were just registered (because it believes itself to be unique and if the name already exists it thinks that the LAPI pod has been deleted and the credentials have been lost) , hence why you have one LAPI that works with cscli and another that does not.

The side effect is that one of the LAPIs will work for a couple of hours due to the JWT token being valid and once the token expires that LAPI will start to get authentication errors since the previously registered username and password does now not exist within the database.

The fix, we now force each LAPI to have a unique name by using the pod metadata of the randomly generated name, this will stop the name collision.

ImranR98 · 2024-08-30T18:42:00Z

Okay that makes sense, thanks for the explanation!

github-actions bot assigned blotus Aug 26, 2024

github-actions bot added the needs/triage Needs triage label Aug 26, 2024

github-actions bot added the needs/kind Kind label required label Aug 26, 2024

github-actions bot added kind/documentation Improvements or additions to documentation and removed needs/kind Kind label required labels Aug 26, 2024

crowdsecurity deleted a comment Aug 26, 2024

he2ss mentioned this issue Aug 27, 2024

fix(lapi): lapi should have unique internal machine for cscli #186

Merged

he2ss closed this as completed in #186 Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Trouble with HA for LAPI Pod #181

[Question] Trouble with HA for LAPI Pod #181

ImranR98 commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

ImranR98 commented Aug 26, 2024

he2ss commented Aug 27, 2024 •

edited

Loading

ImranR98 commented Aug 29, 2024

LaurenceJJones commented Aug 30, 2024 •

edited

Loading

ImranR98 commented Aug 30, 2024

[Question] Trouble with HA for LAPI Pod #181

[Question] Trouble with HA for LAPI Pod #181

Comments

ImranR98 commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

github-actions bot commented Aug 26, 2024

ImranR98 commented Aug 26, 2024

he2ss commented Aug 27, 2024 • edited Loading

ImranR98 commented Aug 29, 2024

LaurenceJJones commented Aug 30, 2024 • edited Loading

ImranR98 commented Aug 30, 2024

he2ss commented Aug 27, 2024 •

edited

Loading

LaurenceJJones commented Aug 30, 2024 •

edited

Loading