Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node[:fqdn] is blank for knife ec2 bootstrap #30

Open
fletchowns opened this issue Sep 25, 2014 · 16 comments
Open

node[:fqdn] is blank for knife ec2 bootstrap #30

fletchowns opened this issue Sep 25, 2014 · 16 comments

Comments

@fletchowns
Copy link

OS is RHEL6. This hostname cookbook is the first recipe in my chef run. I have set_fqdn = "*.mydomain.com" in my attributes and I'm trying to do a knife ec2 server create --node-name nexus and the run fails because the the nexus cookbook (https://github.com/RiotGames/nexus-cookbook) uses node[:fqdn] in the nginx config. The config becomes malformed as a result, nginx fails to start, and the run fails.

From the output of the hostname portion, it looks like everything should work fine. hostname -f after the failed run returns the correct value (mynode.mydomain.com). What would cause node[:fqdn] to be blank on the bootstrap run?

Recipe: hostname::default
  * ruby_block[Update /etc/sysconfig/network] action run
    - execute the ruby block Update /etc/sysconfig/network
  * ohai[reload] action reload
    - re-run ohai and merge results into node attributes
  * ruby_block[Update /etc/sysctl.conf] action run
    - execute the ruby block Update /etc/sysctl.conf
  * ohai[reload] action reload
    - re-run ohai and merge results into node attributes
  * execute[hostname nexus] action run
    - execute hostname nexus
  * ohai[reload] action reload
    - re-run ohai and merge results into node attributes
  * service[network] action restart
    - restart service service[network]
  * hostsfile_entry[localhost] action append
  Recipe: <Dynamically Defined Resource>
    * file[/etc/hosts] action create
      - update content in file /etc/hosts from 498f49 to cc06b0
      --- /etc/hosts    2010-01-12 08:28:22.000000000 -0500
      +++ /tmp/.hosts20140924-1366-k5b3cu   2014-09-24 20:15:17.823261121 -0400
      @@ -1,3 +1,12 @@
      -127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
      -::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
      +#
      +# This file is managed by Chef, using the hostsfile cookbook.
      +# Editing this file by hand is highly discouraged!
      +#
      +# Comments containing an @ sign should not be modified or else
      +# hostsfile will be unable to guarantee relative priority in
      +# future Chef runs!
      +#
      +
      +127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4
      +::1  localhost localhost.localdomain localhost6 localhost6.localdomain6
      - restore selinux security context
    - Append hostsfile_entry[localhost]
Recipe: hostname::default
  * hostsfile_entry[set hostname] action create
  Recipe: <Dynamically Defined Resource>
    * file[/etc/hosts] action create
      - update content in file /etc/hosts from cc06b0 to db56ba
      --- /etc/hosts    2014-09-24 20:15:17.823261121 -0400
      +++ /tmp/.hosts20140924-1366-y8dngg   2014-09-24 20:15:18.262261122 -0400
      @@ -8,5 +8,6 @@
       #

       127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4
      +127.0.1.1    nexus.mydomain.com nexus
       ::1  localhost localhost.localdomain localhost6 localhost6.localdomain6
      - restore selinux security context
    - Create hostsfile_entry[set hostname]
Recipe: hostname::default
  * ohai[reload] action reload
    - re-run ohai and merge results into node attributes
  * ohai[reload] action nothing (skipped due to action :nothing)
@xamebax
Copy link
Contributor

xamebax commented Sep 26, 2014

Hi! Do you have RHEL6 on your node or on your workstation? Sorry if that's a silly question.

@fletchowns
Copy link
Author

Not a silly question at all! RHEL6 is the OS on the node. ami-fe393ebb on EC2 to be more specific.

@xamebax
Copy link
Contributor

xamebax commented Sep 26, 2014

Ok, thanks! Could you - obviously omitting sensitive data - paste your run list, and the exact error output (where the run fails)? The cookbooks in between could reset some attributes - either in the recipes, or in the attribute files. Nexus works with Ubuntu and CentOS, so RHEL should theoretically be fine here. So yeah, lots of moving parts. :)

@fletchowns
Copy link
Author

Thank you! I'll do another chef run and get the log posted here.

@fletchowns
Copy link
Author

I think the issue might be the value of node[:fqdn] during compilation phase vs. execution phase. I have a related thread going on the chef mailing list. The last response is an opscode employee saying I shouldn't use this hostname cookbook. I'm still not sure if that's what I should do but I thought you should be aware of it.

@flaccid
Copy link

flaccid commented Oct 7, 2014

@fletchowns I'd be curious to see if you have the problem by using https://github.com/xhost-cookbooks/system where the hostname recipe uses an LWRP to set the hostname/domain name via node attributes.

@fletchowns
Copy link
Author

@flaccid I ran into the same issue with the system cookbook (after scratching my head for awhile running into xhost-cookbooks/system#6)

@flaccid
Copy link

flaccid commented Oct 14, 2014

I'm either a complete n00b missing something or there is an issue supermarket server-side, xhost-cookbooks/system#6 (comment). Emailed Chef, I want to fix up xhost-cookbooks/system#6 too so we can move on.
xhost-cookbooks/system#6 (comment) is a basic workaround for now though @fletchowns.

@flaccid
Copy link

flaccid commented Nov 1, 2014

@fletchowns can you re-test system cookbook for this issue (xhost-cookbooks/system#6 was resolved).

@fletchowns
Copy link
Author

@flaccid I didn't see any changes to xhost-cookbooks/system related to ensure the fqdn is set at compile time instead of converge time, so I don't think the test would yield any different results. See this post in the chef mailing list (it was in response to me saying I tried the system cookbook and it also had the same issue).

@flaccid
Copy link

flaccid commented Nov 9, 2014

@fletchowns this issue actually goes way back, see https://tickets.opscode.com/browse/OHAI-389?focusedCommentId=48141&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-48141. Let me test it out with the nexus cookbook, perhaps there is still some kind of race condition.

@flaccid
Copy link

flaccid commented Nov 9, 2014

Looks like the nexus cookbook requires chef server and databags because of https://github.com/RiotGames/nexus-cookbook/blob/master/recipes/app_server_proxy.rb#L47. As a result testing in vagrant/chef-solo I get 'can't convert Array into String' so can't really test quickly.

I did notice using this cookbook did some ohai stuff:

==> default: [2014-11-09T02:54:51+00:00] INFO: ohai plugins will be at: /etc/chef/ohai_plugins
==> default: [2014-11-09T02:54:51+00:00] INFO: remote_directory[/etc/chef/ohai_plugins for cookbook ohai] created directory /etc/chef/ohai_plugins
==> default: [2014-11-09T02:54:51+00:00] INFO: remote_directory[/etc/chef/ohai_plugins for cookbook ohai] mode changed to 755
==> default: [2014-11-09T02:54:51+00:00] INFO: cookbook_file[/etc/chef/ohai_plugins/README] created file /etc/chef/ohai_plugins/README
==> default: [2014-11-09T02:54:51+00:00] INFO: cookbook_file[/etc/chef/ohai_plugins/README] updated file contents /etc/chef/ohai_plugins/README
==> default: [2014-11-09T02:54:51+00:00] INFO: cookbook_file[/etc/chef/ohai_plugins/README] mode changed to 644
==> default: [2014-11-09T02:54:52+00:00] INFO: ohai[custom_plugins] reloaded

In the system cookbook I don't reload ohai at all, because in OHAI-389 we discovered 'its not what you think it is'. Only this is what is required (example):

  node.automatic_attrs['fqdn'] = fqdn
  node.automatic_attrs['hostname'] = new_resource.short_hostname

Perhaps take the issue upstream with that cookbook. In my testing I found that system::hostname works ok.
This line, https://github.com/xhost-cookbooks/system/blob/master/providers/hostname.rb#L112 logs node['fqdn'] results in printing the new hostname ok:

==> default: [2014-11-09T03:11:40+00:00] INFO: == New host/node information ==
==> default: [2014-11-09T03:11:40+00:00] INFO: Hostname: foo.bar.suf
==> default: [2014-11-09T03:11:40+00:00] INFO: Network node hostname: foo.bar.suf
==> default: [2014-11-09T03:11:40+00:00] INFO: Alias names of host: foo
==> default: [2014-11-09T03:11:40+00:00] INFO: Short host name (cut from first dot of hostname): foo
==> default: [2014-11-09T03:11:40+00:00] INFO: Domain of hostname: bar.suf
==> default: [2014-11-09T03:11:40+00:00] INFO: FQDN of host: foo.bar.suf
==> default: [2014-11-09T03:11:40+00:00] INFO: IP address(es) for the hostname: 10.0.2.15
==> default: [2014-11-09T03:11:40+00:00] INFO: Current FQDN in node object: foo.bar.suf

What I don't know is if if setting it in a provider like I do in the system cookbook is still prior to things like template parsing/rendering. But either what nexus cookbook does with ohai/reload is the problem or templates are rendered too early with the initial value before updating node.automatic_attrs is likely the root cause - this cookbook does it all within a recipe not a provider also.

@xamebax
Copy link
Contributor

xamebax commented Nov 20, 2014

Hi @flaccid, thank you so much with helping out @fletchowns with this issue. I'm very sorry for not responding in a more timely manner. Is there anything left here I can do to help?

@flaccid
Copy link

flaccid commented Nov 20, 2014

@xamebax See the previous comments - basically test the system cookbook to see if it has the same issue. It doesn't do anything with ohai, but need to see if using node['fqdn'] in a template gets the new hostname.

@fletchowns
Copy link
Author

@flaccid I'm also seeing a few issues with the system cookbook, filed an issue for you over in that repo.

@xamebax I setup a repo so you can reproduce the issue I experienced: https://github.com/fletchowns/hostname-playground/tree/hostname-cookbook

After a vagrant up I'm seing the following (same tests that I used for the system cookbook issue):

[root@chef ~]# hostname -f
chef
[root@chef ~]# hostname -a
localhost localhost.localdomain localhost4 localhost4.localdomain4 chef chef.mycorp.com

[root@chef ~]# cat /template_test
Is chef the same as chef ?

[root@chef ~]# ls -la /etc/pki/tls/certs/
total 1220
drwxr-xr-x. 2 root root   4096 Jan 23 20:34 .
drwxr-xr-x. 5 root root   4096 Mar  7  2014 ..
-rw-r--r--. 1 root root 244954 Jan 29  2014 ca-bundle.crt
-rw-r--r--. 1 root root 978662 Sep  3  2013 ca-bundle.trust.crt
-rw-r--r--. 1 root root   1710 Jan 23 20:34 chef.pem
-rwxr-xr-x. 1 root root    610 Nov 22  2013 make-dummy-cert
-rw-r--r--. 1 root root   2242 Nov 22  2013 Makefile
-rwxr-xr-x. 1 root root    829 Nov 22  2013 renew-dummy-cert

[root@chef ~]# grep ssl_certificate /var/opt/chef-server/nginx/etc/chef_https_lb.conf
  ssl_certificate /etc/pki/tls/certs/chef.pem;
  ssl_certificate_key /etc/pki/tls/private/chef.key;

And then after a subsequent vagrant provision I see this:

[root@chef ~]# hostname -f
chef
[root@chef ~]# hostname -a
localhost localhost.localdomain localhost4 localhost4.localdomain4 chef chef.mycorp.com

[root@chef ~]# cat /template_test
Is chef the same as chef ?

[root@chef ~]# ls -la /etc/pki/tls/certs/
total 1220
drwxr-xr-x. 2 root root   4096 Jan 23 20:34 .
drwxr-xr-x. 5 root root   4096 Mar  7  2014 ..
-rw-r--r--. 1 root root 244954 Jan 29  2014 ca-bundle.crt
-rw-r--r--. 1 root root 978662 Sep  3  2013 ca-bundle.trust.crt
-rw-r--r--. 1 root root   1710 Jan 23 20:34 chef.pem
-rwxr-xr-x. 1 root root    610 Nov 22  2013 make-dummy-cert
-rw-r--r--. 1 root root   2242 Nov 22  2013 Makefile
-rwxr-xr-x. 1 root root    829 Nov 22  2013 renew-dummy-cert

[root@chef ~]# grep ssl_certificate /var/opt/chef-server/nginx/etc/chef_https_lb.conf
  ssl_certificate /etc/pki/tls/certs/chef.pem;
  ssl_certificate_key /etc/pki/tls/private/chef.key;

I would have expected the fqdn here to be 'chef.mycorp.com' for everything, but it's just using 'chef'.

The results start to get even more interesting when you spin up an EC2 instance with vagrant up --provider aws:

[root@chef ~]# hostname -f
chef.mycorp.com
[root@chef ~]# hostname -a
chef

[root@chef ~]# cat /template_test
Is  the same as chef.mycorp.com ?

[root@chef ~]# ls -la /etc/pki/tls/certs/
total 1720
drwxr-xr-x. 2 root root   4096 Jan 23 15:57 .
drwxr-xr-x. 5 root root   4096 Jun  9  2014 ..
-rw-r--r--. 1 root root 757191 Dec 17  2013 ca-bundle.crt
-rw-r--r--. 1 root root 978662 Dec 17  2013 ca-bundle.trust.crt
-rwxr-xr-x. 1 root root    610 Jun  2  2014 make-dummy-cert
-rw-r--r--. 1 root root   2242 Jun  2  2014 Makefile
-rw-r--r--. 1 root root   1710 Jan 23 15:57 .pem
-rwxr-xr-x. 1 root root    829 Jun  2  2014 renew-dummy-cert

[root@chef ~]# grep ssl_certificate /var/opt/chef-server/nginx/etc/chef_https_lb.conf
  ssl_certificate /etc/pki/tls/certs/.pem;
  ssl_certificate_key /etc/pki/tls/private/.key;

The AWS stuff is on a separate branch: https://github.com/fletchowns/hostname-playground/tree/hostname-aws
Here's the debug log for the AWS run: https://gist.github.com/fletchowns/3aa7650fdcb4de2d560c

The packer definition for the AMI that was used is also included in the repo.

@flaccid
Copy link

flaccid commented Feb 25, 2015

Basically, need to lazy load (xhost-cookbooks/system#7 (comment)). When referring to node['fqdn'] in recipe code to return the new hostname, I think this cookbook lacks updating the automatic_attrs, see https://github.com/xhost-cookbooks/system/blob/master/providers/hostname.rb#L34. You'll notice that I don't reload ohai either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants