In order to understand what image caching is and why it is beneficial, it helps to be familiar with the process by which an instance is booted from a given base image. When a new instance is created on a compute node, the following general steps are performed by the compute manager in conjunction with the virt driver:
The first step involves downloading the entire base image to the local disk on the compute node, which could involve many gigabytes of network traffic, storage, and many minutes of latency between the start of the boot process and actually running the instance. When the virt driver supports image caching, step #1 above may be skipped if the base image is already present on the compute node. This is most often the case when another instance has been booted on that node from the same base image recently. If present, the download operation can be skipped, which greatly reduces the time-to-boot for the second and subsequent instances that use the same base image, as well as avoids load on the glance server and the network connection.
By default, the compute node will periodically scan the images it has cached, looking for base images that are not used by any instances on the node that are older than a configured lifetime (24 hours by default). Those unused images are deleted from the cache directory until they are needed again.
For more information about configuring image cache behavior, see the documentation for the following configuration options:
image_cache_subdirectory_name
image_cache_manager_interval
remove_unused_base_images
remove_unused_original_minimum_age_seconds
Note
Some ephemeral backend drivers may not use or need image caching,
or may not behave in the same way as others. For example, when
using the rbd
backend with the libvirt
driver and a shared
pool with glance, images are COW’d at the storage level and thus
need not be downloaded (and thus cached) at the compute node at
all.
Generally the size of the image cache is not part of the data Nova
includes when reporting available or consumed disk space. This means
that when nova-compute
reports 100G of total disk space, the
scheduler will assume that 100G of instances may be placed
there. Usually disk is the most plentiful resource and thus the last
to be exhausted, so this is often not problematic. However, if many
instances are booted from distinct images, all of which need to be
cached in addition to the disk space used by the instances themselves,
Nova may overcommit the disk unintentionally by failing to consider
the size of the image cache.
There are two approaches to addressing this situation:
nova-compute
and accurately considered by the scheduler.workarounds.reserve_disk_resource_for_image_cache
will cause nova-compute
to periodically update the reserved disk
amount to include the statically configured value, as well as the
amount currently consumed by the image cache. This will cause the
scheduler to see the available disk space decrease as the image
cache grows. This is not updated synchronously and thus is not a
perfect solution, but should vastly increase the scheduler’s
visibility resulting in better decisions. (Note this solution is
currently libvirt-specific)As above, not all backends and virt drivers use image caching, and thus a third option may be to consider alternative infrastructure to eliminate this problem altogether.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.