TLDR
- Cloud Init is an invaluable resource for Cloud Engineers and Software Developers alike.
- It’s a straightforward service on the surface but is highly customizable to whatever needs an org may have the case for.
- Cloud Init isn’t only AWS EC2 user data; it does network configuration, vendor configuration, and provides metadata services.
You’re probably using Cloud Init and don’t even realize it. Created by Canonical in the early days of EC2, it helped revolutionize how we treat our servers and how runtime initialization is conducted. Since its inception, it has been one of the primary methods of early configurations for our infrastructure. It’s also run in the stacks of every major public cloud provider and many private cloud environments like LXD, KVM, and OpenStack.
Cloud Init allows engineers to reduce or even eliminate package installs or configurations during application deployment. “Why should I need to install ImageMagick on every single Rails deployment?” Similarly, Cloud Init can provide breathing room between OS image builds since you can do any security patching as a part of Cloud Init so you can rotate your AMI on a more manageable basis, such as a weekly cadence.
How Does Cloud Init Work
Cloud Init Stages
Cloud Init works in a couple of different stages.
First, for systemd machines, is the Generator
stage. If you're unfamiliar with systemd, a generator is a binary executed early in the boot process to dynamically generate unit files, symlinks, and more. Cloud Init's generator determines if the rest of the Cloud Init process should continue. If so, Cloud Init is included in the list of boot goals for the system.
Next is the Local
phase. This phase runs the cloud-init-local.service
systemd service and runs as early as possible. Essentially its entire purpose is to locate data sources and generate (or apply) networking configurations for the system. It's worth noting that this phase blocks much of the boot process, including the network initialization.
The Network
phase continues the Cloud Init boot. This phase relies on networking being up (and, by association, the Local
phase). This stage will run any cloud_init
modules found. These might be things such as mount
and bootcmd
options.
After the Network
phase is the Config
phase, this is the phase that runs the modules that don't affect any other stages. Specifically, it runs the cloud_config
modules in the Cloud Init config directory. runcmd
is included in this step.
Cloud Init closes out with the Final
phase. Running any cloud_final
modules, this phase runs as late as possible. It is the stage that includes any user data scripts and configuration management tooling (Puppet, Chef, etc.).
Instance Metadata
Each server using Cloud Init also has a collection of data that Cloud Init uses to configure the instance. This includes what we generally think of as instance metadata on EC2 instances but also more.
Some providers will create or attach a config drive containing metadata service information files. OpenStack is an example of one such provider.
While we interact with user data, Cloud providers can also implement vendor data. The idea here is the same as user data; it exists to allow the cloud provider to customize the image at runtime. Some potential vendor data tasks might involve setting the instance’s hostname or configuring package repository paths. Vendor data can be disabled if desired. It’s also worth mentioning that user data overwrites vendor data when Cloud Init determines the final configuration.
Getting Started with Cloud Init
Cloud Init can be instrumented in two ways: a shell script or a YAML formatted cloud-config file. Both approaches are pretty straightforward:
#!/bin/sh
sudo yum --assumeyes --security update-minimal
Or, the equivalent cloud-config:
#cloud-configruncmd:
- [ sudo yum --assumeyes --security update-minimal ]
The script option is pretty easy to understand. As mentioned above, it’s executed in the Final
phase. The cloud-config option is more interesting since you can set up modules to run in the different phases, such as the bootcmd
option. Check out the module reference page for a complete list of available modules. There is also a great list of example configurations on the cloud-config examples page.
Disabling Cloud Init
If for some reason, you want to, you can prevent Cloud Init from running. This can be accomplished in a couple of different ways. The easiest is to add a file during the AMI build time:
touch /etc/cloud/cloud-init.disabled
You can also add a parameter to proc’s cmdline file:
cloud-init=disabled
It’s also possible to disable only the user data by setting the allow_userdata
parameter in /etc/cloud/cloud.cfg
:
allow_userdata: false
Troubleshooting Cloud Init
Logs
Occasionally, you may want to dig deeper into Cloud Init. Maybe your user data isn’t executing how you expect or possibly taking longer than expected. Fortunately, Cloud Init tracks a lot of details for debugging.
The main logs are:
/var/log/cloud-init.log
/var/log/cloud-init-output.log
These logs can interact with the cloud-init
command with the analyze
sub-command. This can help parse the logs into a more usable format.
There are also logs in the /run/cloud-init
directory. These logs are more related to some of the inner workings and decisions of Cloud Init.
Data Files
The /var/lib/cloud/
directory is where the data files are kept. A handy file in this directory is the status.json
file. This includes the stages ran and the start/finish times for each one (in epoch format).
[ec2-user@ip-10-0-0-60 data]$ cat /var/lib/cloud/data/status.json
{
"v1": {
"datasource": "DataSourceEc2",
"init": {
"errors": [],
"finished": 1655096178.478916,
"start": 1655096152.503821
},
"init-local": {
"errors": [],
"finished": 1655096151.389412,
...File snipped for brevity
Configuration Files
Config files are kept in /etc/cloud/cloud.cfg
and the /etc/cloud/cloud.cfg.d/
directory.
Useful Cloud Init Commands to Know
Systems equipped with Cloud Init come with a binary used to interact with it. The command to use is cloud-init
.
One of the most useful commands is cloud-init status
which returns the status of the Cloud Init run. An optional --long
flag grants more detail:
[ec2-user@ip-10-0-0-41 ~]# sudo cloud-init status
status: running
[ec2-user@ip-10-0-0-41 ~]# sudo cloud-init status --long
status: done
time: Mon, 13 Jun 2022 04:47:45 +0000
detail:
DataSourceEc2
The cloud-init status
command also has another great flag: --wait
. This flag waits until Cloud Init is completed before returning. It's helpful if you are using AWS CodeDeploy or a configuration management system that phones home on startup but isn't tied to Cloud Init for some reason. There is a very real chance that your CodeDeploy may start up before Cloud Init is finished which means any configuration, binaries, or environment variables set by your user data script would not be available.
[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init status --wait
..................
status: done
Another useful command is cloud-init query
which references the cached instance metadata that was captured by Cloud Init:
[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init query cloud_name
aws
[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init query availability_zone
us-west-2b
Wrap Up
Knowing more about Cloud Init and how to properly leverage it can be extremely advantageous to multiple facets of an org. It can make Cloud Engineers and System Administrators’ lives easier by reducing the need for configuration tooling and AMI rotations. It can also speed up application deployments.
The documentation for Cloud Init is pretty in-depth and a valuable resource. It has great details on many of the cloud providers’ implementations of the metadata service. The documentation also has information about creating custom modules that can be injected and executed just like runcmd
or mounts
.
Now that you know, go take advantage of it!