Skip to content
This repository has been archived by the owner on Mar 27, 2022. It is now read-only.

Stability of script for HDP platform #53

Open
stev-0 opened this issue Aug 10, 2015 · 1 comment
Open

Stability of script for HDP platform #53

stev-0 opened this issue Aug 10, 2015 · 1 comment

Comments

@stev-0
Copy link

stev-0 commented Aug 10, 2015

This isn't meant as a criticism, as I realise there are 1,000 possible things that could be going wrong, but this script seems to only successfully deploy a cluster in around 1 in 5 attempts.

The exception seems to be different each time, but the common ones are:
at upload of config scripts:

Uploading   ...20150811-000113-Hq6/install-ambari-components.sh: 3.9 KiB/3.9 KiB
CommandException: 1 files/objects could not be transferred.

when running deploy scripts on master / workers:

Mon, Aug 10, 2015 11:55:27 PM: Exited 1 : gcloud --project=yyyy --quiet --verbosity=info compute   ssh hadoop-w-1 --command=sudo su -l -c "cd ${PWD} && ./ambari-setup.sh" 2>>ambari-setup_deploy.stderr 1>>ambari-setup_deploy.stdout --ssh-flag=-tt --ssh-flag=-oServerAliveInterval=60 --  ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=europe-west1-b
 Mon, Aug 10, 2015 11:55:28 PM: Fetching on-VM logs from hadoop-w-1
 Warning: Permanently added 'x.y.z.m' (RSA) to the list of known hosts.
...Mon, Aug 10, 2015 11:57:43 PM: Command failed: wait ${SUBPROC} on line 326.

during the ambari-components install

 Mon, Aug 10, 2015 11:43:54 PM: Step 'deploy-client-nfs-setup,deploy-client-nfs-setup' done...

Mon, Aug 10, 2015 11:43:54 PM: Invoking on master: ./install-ambari-components.sh
../bdutil: line 318: 10548 Segmentation fault sleep '0.5'

By their nature they are hard to reproduce, as I am running the same script each time.

@dennishuo
Copy link
Contributor

Thanks, every report helps :)

The "Segmentation fault" error is something we've never seen before; do you happen to know if the errors you're hitting are specific to ambari_env.sh, or do they also happen when you try to deploy default bdutil clusters?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants