Linux’s Cloud Init — Benefits, Quirks, and Drawbacks

TLDR

Cloud Init is an invaluable resource for Cloud Engineers and Software Developers alike.
It’s a straightforward service on the surface but is highly customizable to whatever needs an org may have the case for.
Cloud Init isn’t only AWS EC2 user data; it does network configuration, vendor configuration, and provides metadata services.

You’re probably using Cloud Init and don’t even realize it. Created by Canonical in the early days of EC2, it helped revolutionize how we treat our servers and how runtime initialization is conducted. Since its inception, it has been one of the primary methods of early configurations for our infrastructure. It’s also run in the stacks of every major public cloud provider and many private cloud environments like LXD, KVM, and OpenStack.

Cloud Init allows engineers to reduce or even eliminate package installs or configurations during application deployment. “Why should I need to install ImageMagick on every single Rails deployment?” Similarly, Cloud Init can provide breathing room between OS image builds since you can do any security patching as a part of Cloud Init so you can rotate your AMI on a more manageable basis, such as a weekly cadence.

How Does Cloud Init Work

Cloud Init Stages

Cloud Init works in a couple of different stages.

First, for systemd machines, is the Generator stage. If you're unfamiliar with systemd, a generator is a binary executed early in the boot process to dynamically generate unit files, symlinks, and more. Cloud Init's generator determines if the rest of the Cloud Init process should continue. If so, Cloud Init is included in the list of boot goals for the system.

Next is the Local phase. This phase runs the cloud-init-local.servicesystemd service and runs as early as possible. Essentially its entire purpose is to locate data sources and generate (or apply) networking configurations for the system. It's worth noting that this phase blocks much of the boot process, including the network initialization.

The Network phase continues the Cloud Init boot. This phase relies on networking being up (and, by association, the Local phase). This stage will run any cloud_init modules found. These might be things such as mount and bootcmdoptions.

After the Network phase is the Config phase, this is the phase that runs the modules that don't affect any other stages. Specifically, it runs the cloud_config modules in the Cloud Init config directory. runcmd is included in this step.

Cloud Init closes out with the Final phase. Running any cloud_finalmodules, this phase runs as late as possible. It is the stage that includes any user data scripts and configuration management tooling (Puppet, Chef, etc.).

Instance Metadata

Each server using Cloud Init also has a collection of data that Cloud Init uses to configure the instance. This includes what we generally think of as instance metadata on EC2 instances but also more.

Some providers will create or attach a config drive containing metadata service information files. OpenStack is an example of one such provider.

While we interact with user data, Cloud providers can also implement vendor data. The idea here is the same as user data; it exists to allow the cloud provider to customize the image at runtime. Some potential vendor data tasks might involve setting the instance’s hostname or configuring package repository paths. Vendor data can be disabled if desired. It’s also worth mentioning that user data overwrites vendor data when Cloud Init determines the final configuration.

Getting Started with Cloud Init

Cloud Init can be instrumented in two ways: a shell script or a YAML formatted cloud-config file. Both approaches are pretty straightforward:

#!/bin/sh

sudo yum --assumeyes --security update-minimal

Or, the equivalent cloud-config:

#cloud-configruncmd:
 - [ sudo yum --assumeyes --security update-minimal ]

The script option is pretty easy to understand. As mentioned above, it’s executed in the Final phase. The cloud-config option is more interesting since you can set up modules to run in the different phases, such as the bootcmd option. Check out the module reference page for a complete list of available modules. There is also a great list of example configurations on the cloud-config examples page.

Disabling Cloud Init

If for some reason, you want to, you can prevent Cloud Init from running. This can be accomplished in a couple of different ways. The easiest is to add a file during the AMI build time:

touch /etc/cloud/cloud-init.disabled

You can also add a parameter to proc’s cmdline file:

cloud-init=disabled

It’s also possible to disable only the user data by setting the allow_userdata parameter in /etc/cloud/cloud.cfg:

allow_userdata: false

Troubleshooting Cloud Init

Logs

Occasionally, you may want to dig deeper into Cloud Init. Maybe your user data isn’t executing how you expect or possibly taking longer than expected. Fortunately, Cloud Init tracks a lot of details for debugging.

The main logs are:

/var/log/cloud-init.log
/var/log/cloud-init-output.log

These logs can interact with the cloud-init command with the analyzesub-command. This can help parse the logs into a more usable format.

There are also logs in the /run/cloud-init directory. These logs are more related to some of the inner workings and decisions of Cloud Init.

Data Files

The /var/lib/cloud/ directory is where the data files are kept. A handy file in this directory is the status.json file. This includes the stages ran and the start/finish times for each one (in epoch format).

[ec2-user@ip-10-0-0-60 data]$ cat /var/lib/cloud/data/status.json
{
 "v1": {
  "datasource": "DataSourceEc2",
  "init": {
   "errors": [],
   "finished": 1655096178.478916,
   "start": 1655096152.503821
  },
  "init-local": {
   "errors": [],
   "finished": 1655096151.389412,
...File snipped for brevity

Configuration Files

Config files are kept in /etc/cloud/cloud.cfg and the /etc/cloud/cloud.cfg.d/ directory.

Useful Cloud Init Commands to Know

Systems equipped with Cloud Init come with a binary used to interact with it. The command to use is cloud-init.

One of the most useful commands is cloud-init status which returns the status of the Cloud Init run. An optional --long flag grants more detail:

[ec2-user@ip-10-0-0-41 ~]# sudo cloud-init status
status: running
[ec2-user@ip-10-0-0-41 ~]# sudo cloud-init status --long
status: done
time: Mon, 13 Jun 2022 04:47:45 +0000
detail:
DataSourceEc2

The cloud-init status command also has another great flag: --wait. This flag waits until Cloud Init is completed before returning. It's helpful if you are using AWS CodeDeploy or a configuration management system that phones home on startup but isn't tied to Cloud Init for some reason. There is a very real chance that your CodeDeploy may start up before Cloud Init is finished which means any configuration, binaries, or environment variables set by your user data script would not be available.

[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init status --wait
..................
status: done

Another useful command is cloud-init query which references the cached instance metadata that was captured by Cloud Init:

[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init query cloud_name
aws
[ec2-user@ip-10-0-0-41 ~]$ sudo cloud-init query availability_zone
us-west-2b

Wrap Up

Knowing more about Cloud Init and how to properly leverage it can be extremely advantageous to multiple facets of an org. It can make Cloud Engineers and System Administrators’ lives easier by reducing the need for configuration tooling and AMI rotations. It can also speed up application deployments.

The documentation for Cloud Init is pretty in-depth and a valuable resource. It has great details on many of the cloud providers’ implementations of the metadata service. The documentation also has information about creating custom modules that can be injected and executed just like runcmd or mounts.

Now that you know, go take advantage of it!