Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nifi scale up #50

Closed
juldrixx opened this issue Mar 24, 2022 · 4 comments
Closed

Nifi scale up #50

juldrixx opened this issue Mar 24, 2022 · 4 comments
Labels
community help wanted Extra attention is needed

Comments

@juldrixx
Copy link
Contributor

From nifikop created by iordaniordanov: Orange-OpenSource/nifikop#139

Type of question

About general context and help around nifikop

Question

What did you do?
Increased number of nodes in the nificluster CR from 3 to 6

What did you expect to see?
3 new nodes to be simultaneously created and joined in the cluster

What did you see instead? Under which circumstances?
3 new nodes were simultaneously created, they join the cluster, but after that they are one by one re-created and only after that the cluster is fully functional, which leads to a linear increase in the amount of time which is needed to scale the cluster up. If adding one node takes 5 min adding 2 nodes takes ~10 min and so on. Is this the expected behavior or it is an issue with our configuration/environment ?

Environment

  • nifikop version:

    v0.6.0

  • Kubernetes version information:

    Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.17-eks-087e67", GitCommit:"087e67e479962798594218dc6d99923f410c145e", GitTreeState:"clean", BuildDate:"2021-07-31T01:39:55Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:
    EKS

  • NiFi version:
    1.12.1

Additional context
Nifi cluster config

apiVersion: nifi.orange.com/v1alpha1
kind: NifiCluster
metadata:
  name: <name>
  namespace: <namespace>
spec:
  clusterImage: <image> # Nifi 1.12.1 image
  externalServices:
  - name: clusterip
    spec:
      portConfigs:
      - internalListenerName: http
        port: 8080
      type: ClusterIP
  initContainerImage: <busybox image>
  listenersConfig:
    internalListeners:
    - containerPort: 8080
      name: http
      type: http
    - containerPort: 6007
      name: cluster
      type: cluster
    - containerPort: 10000
      name: s2s
      type: s2s
    - containerPort: 9090
      name: prometheus
      type: prometheus
  nifiClusterTaskSpec:
    retryDurationMinutes: 10
  nodeConfigGroups:
    default_group:
      isNode: true
      resourcesRequirements:
        limits:
          cpu: "2"
          memory: 6Gi
        requests:
          cpu: "2"
          memory: 6Gi
      serviceAccountName: default
      storageConfigs:
      - mountPath: /opt/nifi/data
        name: data
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 30Gi
          storageClassName: general
      - mountPath: /opt/nifi/content_repository
        name: content-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/flowfile_repository
        name: flowfile-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/provenance_repository
        name: provenance-repository
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 2Gi
          storageClassName: general
      - mountPath: /opt/nifi/nifi-current/work
        name: work
        pvcSpec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
          storageClassName: general
  nodes:
  - id: 0
    nodeConfigGroup: default_group
  - id: 1
    nodeConfigGroup: default_group
  - id: 2
    nodeConfigGroup: default_group
  oneNifiNodePerNode: true
  propagateLabels: true
  readOnlyConfig:
    bootstrapProperties:
      nifiJvmMemory: 2g
      overrideConfigs: |
        java.arg.debug=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000
        conf.dir=./conf
    nifiProperties:
      overrideConfigs: |
        nifi.nar.library.autoload.directory=./extensions
        nifi.web.http.network.interface.default=eth0
        nifi.web.http.network.interface.lo=lo
        nifi.web.proxy.context.path=<proxy_path>
        nifi.database.directory=/opt/nifi/data/database_repository
        nifi.flow.configuration.archive.dir=/opt/nifi/data/archive
        nifi.flow.configuration.file=/opt/nifi/data/flow.xml.gz
        nifi.templates.directory=/opt/nifi/data/templates
        nifi.provenance.repository.max.storage.size=2GB
        nifi.provenance.repository.indexed.attributes=te$containerId,te$id
      webProxyHosts:
      - <proxy_host>
    zookeeperProperties: {}
  service:
    headlessEnabled: true
  zkAddress: <zk_addr>
  zkPath: <zk_path>
@juldrixx juldrixx added community help wanted Extra attention is needed labels Mar 24, 2022
@juldrixx
Copy link
Contributor Author

Okey, thanks for the clarification :)

@juldrixx
Copy link
Contributor Author

I'm sure you thought it trough, but just a suggestion - maybe before scaling up you can check if the cluster reports that it is healthy and if it is not abort the scale operation, because otherwise if someone wants to add lets say 50 nodes because of a spike in usage in this case he needs to wait for multiple hours before all nodes successfully join the cluster ...

@juldrixx
Copy link
Contributor Author

Hello, yes this is the expected behaviour, we are forced to because if all the init cluster nodes are down, it could lead in the situation where the new joining node decides to be the reference, and in this case all information would be erased from the other nodes once they rejoin ...
So we have an init script specific for new node, and once the node has explicitly joined the cluster, we need to restart the pod with a "non-joining" script : https://github.com/Orange-OpenSource/nifikop/blob/master/pkg/resources/nifi/pod.go#L392

@juldrixx
Copy link
Contributor Author

Hello, any info here ?

@erdrix erdrix closed this as completed Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants