Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Nifi scale up #139

Open
iordaniordanov opened this issue Oct 13, 2021 · 4 comments
Open

Nifi scale up #139

iordaniordanov opened this issue Oct 13, 2021 · 4 comments
Assignees
Labels
community help wanted Extra attention is needed

Comments

@iordaniordanov
Copy link

iordaniordanov commented Oct 13, 2021

Type of question

About general context and help around nifikop

Question

What did you do?
Increased number of nodes in the nificluster CR from 3 to 6

What did you expect to see?
3 new nodes to be simultaneously created and joined in the cluster

What did you see instead? Under which circumstances?
3 new nodes were simultaneously created, they join the cluster, but after that they are one by one re-created and only after that the cluster is fully functional, which leads to a linear increase in the amount of time which is needed to scale the cluster up. If adding one node takes 5 min adding 2 nodes takes ~10 min and so on. Is this the expected behavior or it is an issue with our configuration/environment ?

Environment

  • nifikop version:

    v0.6.0

  • Kubernetes version information:

    Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-eks-087e67", GitCommit:"087e67e479962798594218dc6d99923f410c145e", GitTreeState:"clean", BuildDate:"2021-07-31T01:39:55Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:
    EKS

  • NiFi version:
    1.12.1

Additional context
Nifi cluster config

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: <name>
  namespace: <namespace>
spec:
  clusterImage: <image> # Nifi 1.12.1 image
  externalServices:
  - name: clusterip
    spec:
      portConfigs:
      - internalListenerName: http
        port: 8080
      type: ClusterIP
  initContainerImage: <busybox image>
  listenersConfig:
    internalListeners:
    - containerPort: 8080
      name: http
      type: http
    - containerPort: 6007
      name: cluster
      type: cluster
    - containerPort: 10000
      name: s2s
      type: s2s
    - containerPort: 9090
      name: prometheus
      type: prometheus
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  nodeConfigGroups:
    default_group:
      isNode: true
      resourcesRequirements:
        limits:
          cpu: "2"
          memory: 6Gi
        requests:
          cpu: "2"
          memory: 6Gi
      serviceAccountName: default
      storageConfigs:
      - mountPath: /opt/nifi/data
        name: data
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 30Gi
          storageClassName: general
      - mountPath: /opt/nifi/content_repository
        name: content-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/flowfile_repository
        name: flowfile-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/provenance_repository
        name: provenance-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/nifi-current/work
        name: work
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: general
  nodes:
  - id: 0
    nodeConfigGroup: default_group
  - id: 1
    nodeConfigGroup: default_group
  - id: 2
    nodeConfigGroup: default_group
  oneNifiNodePerNode: true
  propagateLabels: true
  readOnlyConfig:
    bootstrapProperties:
      nifiJvmMemory: 2g
      overrideConfigs: |
        java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000
        conf.dir=./conf
    nifiProperties:
      overrideConfigs: |
        nifi.nar.library.autoload.directory=./extensions
        nifi.web.http.network.interface.default=eth0
        nifi.web.http.network.interface.lo=lo
        nifi.web.proxy.context.path=<proxy_path>
        nifi.database.directory=/opt/nifi/data/database_repository
        nifi.flow.configuration.archive.dir=/opt/nifi/data/archive
        nifi.flow.configuration.file=/opt/nifi/data/flow.xml.gz
        nifi.templates.directory=/opt/nifi/data/templates
        nifi.provenance.repository.max.storage.size=2GB
        nifi.provenance.repository.indexed.attributes=te$containerId,te$id
      webProxyHosts:
      - <proxy_host>
    zookeeperProperties: {}
  service:
    headlessEnabled: true
  zkAddress: <zk_addr>
  zkPath: <zk_path>
@erdrix erdrix self-assigned this Oct 13, 2021
@erdrix erdrix added community help wanted Extra attention is needed labels Oct 13, 2021
@iordaniordanov
Copy link
Author

Hello, any info here ?

@erdrix
Copy link
Contributor

erdrix commented Nov 12, 2021

Hello, yes this is the expected behaviour, we are forced to because if all the init cluster nodes are down, it could lead in the situation where the new joining node decides to be the reference, and in this case all information would be erased from the other nodes once they rejoin ...
So we have an init script specific for new node, and once the node has explicitly joined the cluster, we need to restart the pod with a "non-joining" script : https://github.com/Orange-OpenSource/nifikop/blob/master/pkg/resources/nifi/pod.go#L392

@iordaniordanov
Copy link
Author

Okey, thanks for the clarification :)

@iordaniordanov
Copy link
Author

I'm sure you thought it trough, but just a suggestion - maybe before scaling up you can check if the cluster reports that it is healthy and if it is not abort the scale operation, because otherwise if someone wants to add lets say 50 nodes because of a spike in usage in this case he needs to wait for multiple hours before all nodes successfully join the cluster ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants