(NUMA) is the phenomenon that memory at
various points in the address space of a
processor have different performance
characteristics. At current processor speeds,
the signal path length from the processor to
memory plays a significant role. Increased
signal path length not only increases latency to
memory but also quickly becomes a
throughput bottleneck if the signal path is
shared by multiple processors.
How Does Linux Handle NUMA?
Linux manages memory in zones. In a
non-NUMA Linux system, zones are used
to describe memory ranges required to
support devices that are not able to
perform DMA (direct memory access) to
all memory locations. Zones are also
used to mark memory for other special
needs such as movable memory or
memory that requires explicit mappings
for access by the kernel (HIGHMEM), but
that is not relevant to the discussion
here. When NUMA is enabled, then more
memory zones are created and they are
also associated with NUMA nodes.
NUMA scheduling.
The Linux scheduler had no notion of the
page placement of memory in a process until
Linux 3.8. Decisions about migrating
processes were made on an estimate of the
cache hotness of a process’s memory. If the
Linux scheduler moved the execution of a
process to a different NUMA node, then the
performance of that process could be
significantly impacted because its memory
now would require access via the
cross-connect. Once that move was
complete the scheduler would estimate the
process memory is cache hot on the remote
node and leave the process there as long as
possible. As a result, administrators who
wanted the best performance felt it best not
to let the Linux scheduler interfere with
memory placement.
NUMA support has been around for a
while in various operating systems.
NUMA support in Linux has been
available since early 2000 and is being
continually refined. Frequently kernel
NUMA support will optimize process
execution without the need for user
intervention, and in most use cases an
operating system can simply be run on
a NUMA system, providing decent
performance for typical applications.
Special NUMA
configuration through
tools and kernel
configuration comes into
play when the heuristics
provided by the operating
system do not provide
satisfactory application
performance to the end
user.