No pressure, Mon!
Handling low memory conditions in iOS and Mavericks
Handling low memory conditions in iOS and Mavericks
Jonathan Levin, http://newosxbook.com/ - 11/03/13
Memory pressure in OS X and iOS is a very important aspect of virtual memory management which has been explored little in my book1. While I refer to Jetsam/memorystatus, the mechanism has undergone significant changes over time, culminating in a few very important sysctls and system calls recently introduced in Mavericks. While working on my version of Process Explorer for OS X and iOS, I got to encounter these new additions head on - and am therefore documenting them here. This is meant as an addendum to chapter 12 of the book, but can be read on its own, as well.
Why should you care? (Target Audience)
Physical memory (RAM), alongside CPU, is the scarcest resource in the system, and the one most likely to cause contention as apps vie for every bit available. More memory for an app directly correlates to better performance - usually at the cost of others. In iOS, where there is no swap space to fall back on, this is even more critical of a resource. This article is meant to make you think twice before the next time you call malloc() or mmap(), as well as elucidate the most common cause of crashes on iOS - those corresponding to low system memory.
Prerequisite: Virtual Memory in a nutshell
Whatever it is an application is programmed for, it must operate in a memory space. This space is where an application may hold its own code, data, and state. Naturally, one benefits if such a space is isolated from other applications, to provide for more security and stability. We call this space the virtual memory of the application, and it is one of the defining characteristics of the application as a process: All of an application's threads will share the same virtual memory space, and are thus defined to be in the same process.
The term "virtual" in virtual memory implies that the memory space, while very tangible to the process in question, does not exactly correspond to real memory on the system. This manifests itself in several ways:
- The virtual memory space can exceed the amount of real memory available - Depending on the processor word size and OS in question, the virtual memory space can be up to 4GB (32-bits), or 256TB (64 bits)1. This, especially in the latter case, can far exceed the amount of actual memory available.
- Virtual memory can, in fact, not exist at all: Given such a huge memory space which exceeds the physical memory backing capabilities, the system will only bother backing virtual memory with physical memory if an application explicitly requested it (that is, allocated it). A process virtual memory image is, therefore, quite sparse, with "islands" of memory inside a vast ocean of nothingness.
- Even when allocated, virtual memory may still be quite virtual: - Just because you call malloc(3) doesn't mean the system should jump to and physically commit your memory by finding the approriate amount of RAM to back it. Most often, programmers allocate far more than they need. The malloc(3) operation, therefore, only allocated page table entries, but seldom commits the memory itself. It is actually accessing the memory (say, by memset(3)-ing it) which will cause the physical allocation.
- The system may back up memory on the disk or network - Otherwise known as "swapping" memory out to a backing store. OS X traditionally uses swap files (in /var/vm). iOS has no swap.
- Virtual memory you use may or may not be shared - The operating system reserves the right to implicitly share your virtual memory with other processes. This applies to file-backed memory you use (that is, memory claimed by a call to mmap(2)). If your process and another process mmap(2) the same file, the OS can give you each your private virtual copy, which is in fact backed by a single physical copy. Said physical copy will be marked unwritable. So long as everyone reads from the memory, a single copy suffices. If anyone, however, writes to such implicitly shared memory, the writing process will trigger a page fault, which will cause the kernel to perform a copy-on-write (COW), which produces a new physical copy whose contents may be modified.
Putting the above together, we can arrive at the following "formula":
VSS Virtual Set Size, as reported by top, ps(1), and others RSS Resident Set Size - the actual RAM footprint of the process. Also shown in top(1), ps(1), etc LSS "Lazy" Set Size - Memory which the system has agreed to allocate, but not yet allocated SwSS "Swap" Set Size - Memory which was previously in RAM, but has been pushed out to swap.
In iOS , this is always 0
All the above can be demonstrated succintly by a simple example - using vmmap(1) on any random process, in this case the shell itself:
Throughout this article, the following terms are used:
- Page - The basic unit of memory management. In Intel and ARM, commonly 4k (4096), or 16K in ARM64. You can use the pagesize(1) command on OS X (or sysctl hw.pagesize on either OS) to figure out what the default page size is. Intel architectures support super pages (8k) and huge pages (2MB), but in practice those are relatively few and far between.
- Phsyical Memory/RAM - The finite amount of memory installed on a host (Mac or i-Device). You can use the
hostinfo(1)command to obtain this value.
- Virtual Memory - Memory allocated by programs or the system itself, usually by a call to
mmap(2), or higher level calls (e.g. Objective-C's [ alloc], etc). Virtual memory may be private (owned by a single process) or shared (owned by 2+ processes). Shared memory may be either explicitly or implicitly shared.
- Page Fault - occurs when the memory management unit (MMU) detects access to virtual memory which is a violation, namely one of :
- Accessing unallocated memory: Dereferencing a pointer to memory which has not previously been allocated - XNU translates that to an EXC_BAD_ACCESS exception, and the process receives a segmentation fault (SIGSEGV, Signal #11).
- Accessing allocated, but not committed memory: Dereferencing a pointer to memory which has previously been allocated, but not yet used (or madvise(2)d accordingly) - XNU intercepts that and realizes that it can no longer procrastinate, and must allocate the physical page(s). The thread which caused the fault is frozen while those pages are allocated.
- Accessing memory, but failing to comply with its permissions: Memory pages are protected by r/w/x in a similar manner to standard UNIX file permissions. Attempting to write to a read only (r-- or r-x) will cause a page fault which XNU will either translate to a Bus Fault (SIGBUS, Signal #7) or force a Copy-On-Write (COW) operation (if implicitly shared).
Tools:Apple provides several important tools to inspect virtual memory:
- vmmap(1) - Inspects the virtual memory of a single process, laying out its "map" in a manner akin to Linux's /proc/<pid>/maps.
- vm_stat(1) - Provides statistics on virtual memory from a system-wide perspective. This is essentially just a wrapper over a call to the Mach host_statistics64 API, and printing out the vm_statistics64_t (from <mach/vm_statistics.h>.
- top(1) - Provides system-wide and per-process statistics relating to performance. In it, the MemRegions, PhysMem and VM statistics pertain to virtual memory.
Memory PressureMemory pressure is defined by two counters Mach keeps internally:
vm_page_free_count: How many pages of RAM are presently free
vm_page_free_target: How many pages of RAM, at a minimum, should optimally be free.
if the amount of free pages falls below the target amount - we have a pressure situation (there are other potential cases, but I'm omitting them here for the sake of simplicity2). You can also use sysctl(8) to query the value of
vm.memory_pressure. In OS X 10.9 and later, you can also query
kern.memorystatus_vm_pressure_level, which is a 1 (NORMAL), 2 (WARN) or 4 (CRITICAL)
Following kernel initiaization, the main thread becomes vm_pageout, and spawns a dedicated thread, aptly called vm_pressure_thread, to monitor pressure events. This thread is idle (blocking on its own continuation). The thread will be woken up from vm_pageout when pressure is detected. This behavior has been modified in XNU 2422/3 (OSX 10.9/iOS 7) (most notably packaged in vm_pressure_response).
As a side note, VM pressure handling is conditionally compiled into XNU, assuming VM_PRESSURE_EVENTS is
#define (which it is). If it isn't (say, by custom-compiling), vm_pressure_thread does nothing in 2050, and will not even be started in 2422/3. Also, in iOS kernels, defining CONFIG_JETSAM changes some of the behavior by dispatching memory handling to the memorystatus thread more frequently, as well as updating its counters (more on that later).
XNU exports the undocumented system call #296, vm_pressure_monitor(bsd/vm/vm_unix.c), which is a wrapper over mach_vm_pressure_monitor (osfmk/vm/vm_pageout.c). The system call (and, consequently, the internal Mach call) is defined as follows:
int vm_pressure_monitor(int wait_for_pressure, int nsecs_monitored, uint32_t *pages_reclaimed);
The call will either return immediately, or block (if
wait_for_pressure is non-zero). It will return in pages_reclaimed how many physical pages were freed in the count of nsecs_monitored (not really nsecs so much as loop iterations). As its return value, it will provide how many pages were wanted (
vm.page_free_wanted in the sysctl(8) output, above). Calling the system call is straightforward, and will not require root privileges. (Again, note you can use sysctl(8) to query
vm.memory_pressure, as well, though that will not wait for pressure).
You can run process explorer with the "vmmon" argument to try this system call (otherwise, process explorer will do this for you in a separate thread when in interactive mode, to show pressure warnings). Specifying an additional parameter of "oneshot" will run the call without waiting for pressure. Otherwise, the call will wait until pressure is detected:
But how does the system actually reclaim the memory? For that, we need to involve memorystatus.
MemoryStatus and Jetsam
When XNU was ported for iOS, Apple encountered a significant challenge which arose from the mobile device constraints - no swap space. Unlike a desktop, wherein virtual memory can "spill over" to external storage, the same does not hold true here (largely due to limitations of flash memory). Memory, therefore, has become an even more important (and more scarce) resource.
Enter: MemoryStatus. This mechanism, originally introduced in iOS, is a kernel thread responsible for handling low RAM events in the only way iOS deems possible: Jettison (eject) as much RAM as possible in order to free it up for applications - even if it means killing applications along the way. This is what iOS refers to as jetsam, and can be seen in the XNU source code as
#if CONFIG_JETSAM. In OS X, memorystatus instead kills only those processes marked for idle exit, which is a somewhat more gentle approach, more suitable for a desktop environment3 You can probably see memorystatus in action if you use dmesg, with grep:
The memorystatus thread is a separate thread (that is, not directly related to vm_pressure_thread), which is started in the BSD portion of XNU (by a call to
bsd/kern/bsd_init.c). If CONFIG_JETSAM is defined (iOS), memorystatus starts another thread,
memorystatus_jetsam_thread, which will essentially run in a blocking loop, waking up when necessary to kill the top processes on the memory list, as long as memorystatus_available_pages <= memorystatus_available_pages_critical, before blocking again.
In iOS, memorystatus/jetsam does not print out messages, but certainly leaves a trail of its victims' carcasses in
/Library/Logs/CrashReporter/LowMemory-YYYY-MM-DD-hhmmss.plist - These logs are generated by the CrashReporter, and similar to crash logs they contain a dump. If you have a Jailbroken device, an easy way to force mass executions by jetsam is to run a small binary which keeps on allocating and memset()ing memory in chunks of 8MB (left as an exercise for the avid reader), and run it. You will see applications die, until the offending binary is (eventually) slain. The Logs will look something like:
(Note that you can do this on a non-jailbroken device as well, if you've configured it for development, you can create a simple iOS app in Objective-C which does the same allocations, then collect the logs via XCode's Organizer).
It should be noted that outright killing a process with Jetsam, while ruthless, is not all that unusual: Linux (and, by inheritance, Android) has a similar mechanism in its "OOM" (out-of-memory) killer, which keeps a (possibly adjustable) score for each process, and kills processes with a high score when a memory shortage is encountered. In desktop Linux, OOM wakes up when the system runs out of swap; In Android, a lot sooner, when RAM is running low. Whereas Android's method is score driven (the score, in effect being a heuristic of how much RAM was used, and how frequently), iOS's approach is priority based.
As of XNU 2423, Jetsam uses "priority bands" (q.v. <sys/kern_memorystatus.h> JETSAM_PRIORITY constants), which is another way of saying that jetsam tracked processes are maintained in an array of 21 linked lists in kernel space (memstat_bucket). Jetsam will pick the first process in the lowest priority bucket (starting with 0, or JETSAM_PRIORITY_IDLE), moving to the next priority list if the current priority is empty (q.v. memorystatus_get_first_proc_locked, in bsd/kern/kern_memorystatus.c). The default priority for processes is set at 18, allowing for jetsam to choose idle and background processes before interactive and potentially important ones. This is shown in the figure below:
Jetsam has another modus operandi, which uses a process memory "high water mark", and will outright kill processes exceeding their HWM. The HWM mode in Jetsam is triggered when a task's RSS ledger exceeds a system wide limit (more accurately, this would be the task phys_footprint ledger, which accounts for RSS, but also compressed and I/O Kit related memory). The HWM can be set with memorystatus_control operation #5 (
MEMORYSTATUS_CMD_SET_JETSAM_HIGH_WATER_MARK, discussed later).
On iOS, Launchd can set jetsam priority bands. Originally this was done on a per daemon basis (i.e. in its plist). It seems that nowadays the settings have been moved to
Killing a process outright because of RAM consumption may seem overly harsh, but for lack of swap, there is really little else which can be done. Prior to killing a process with Jetsam, however, memorystatus does allow a process to "redeem itself", and avoid untimely termination, by getting the memorystatus thread to first send a kernel note (a.k.a kevent) to processes which are "candidates" for termination. This knote (
NOTE_VM_PRESSURE, <sys/event.h>) will be picked up by
EVFILT_VM kevent() filters, like what UIKit translates to the
didReceieveMemoryWarning notification, which is undoubtedly familiar to (and loathed by) iOS App developers. Both Darwin's libC and GCD and are laced with memory pressure handlers, specifically:
- Darwin's LibC ( <malloc/malloc.h>) defines a
malloc_zone_pressure_relief(as of OSX 10.7/iOS 4.3)
- LibCache (<cache.h>) defines a cache cost (for cache_set_and_retain), which allows caches to be purged automatically when a pressure event is encountered
- GCD (<dispatch/source.h>) defines a
DISPATCH_SOURCE_TYPE_MEMORYPRESSURE(as of OSX 10.9)
Controlling memorystatusHaving a thread which can randomly decide on killing processes could be a bit dangerous. Apple therefore uses several APIs to "reign in" Jetsam/memorystatus. Naturally, these are private and undocumented (and Apple will likely kill *your* developer account if you use them in your apps..), but nonetheless, here they are:
- Using sysctl kern.memorystatus_jetsam_change: Jetsam's priority list can be changed from userspace. This is a bit like Linux's oom_adj, which enables processes to escape the OOM's wrath by specifying a negative adjustment number (effectively reducing their score). Likewise in iOS, launchd (which starts all apps) can set the Jetsam priority list. (As an example, q.v com.apple.voiced.plist, which specifies JetSamMemoryLimit (8000) and JetsamPriority (-49). The sysctl internally calls memorystatus_list_change (in bsd/kern/kern_memorystatus.c), which sets the priority and state flags (active, foreground, etc). Again - similar to what Linux would do, in this case Android's "Low Memory Killer" (which enables the runtime to tweak the OOM_ADJ according to the application/activity's foreground status, thus preferring to kill backgrounded apps first). This method works up till iOS 6.x.
- Using the memorystatus_control (#440) system call: Introduced somewhere around xnu 2107 (that is, as early as iOS 6 but not until OS X 10.9), this (undocumented) syscall enables you to control both memorystatus and jetsam (the latter, on iOS) with one of several "commands", as shown in the following table:
MEMORYSTATUS_CMD_ const availability usage
OS X 10.9, iOS 6+ Get priority list - array of memorystatus_priority_entry from <sys/kern_memorystatus.h> Example code can be seen Here
iOS only (or CONFIG_JETSAM) Update properties for a given proess
iOS only (or CONFIG_JETSAM) Get Jetsam snapshot - array of memorystatus_jetsam_snapshot_t entries (from <sys/kern_memorystatus.h>
iOS (or CONFIG_JETSAM) Privileged call: returns 1 if memorystatus_vm_pressure_level is not normal
iOS (or CONFIG_JETSAM) Sets the maximum memory utilization for a given PID, after which it may be killed. Used by launchd for processes with a memory limit
iOS 8 (or CONFIG_JETSAM) Sets the maximum memory utilization for a given PID, after which it will be killed. Used by launchd for processes with a memory limit
iOS 9 (or CONFIG_JETSAM) Sets memory limits + attributes
iOS 9 (or CONFIG_JETSAM) Retrieves memory limits + attributes
Xnu-3247 (10.11, iOS 9) Registers self to receive memory notifications
Stops self receiving memory notifications
CONFIG_JETSAM && (DEVELOPMENT || DEBUG) Test Jetsam, kill specific processes (Debug/Development kernels only)
iOS 9 && (DEVELOPMENT || DEBUG) Test Jetsam sorting (Debug/Development kernels only)
CONFIG_JETSAM && (DEVELOPMENT || DEBUG) Alter Jetsam's panic settings (Debug/Development kernels only)
- Using posix_spawnattr_setjetsam: From the posix_spawnattr family of functions, but undocumented and present only in iOS (This is how launchd handles Jetsam as of iOS 7)..
- Using sysctl kern.memorypressure_manual_trigger Used for simulating memory pressure levels, without actually hogging memory - used by OS X 10.9's memory_pressure utility (-S). This is a value from <sys/event.h>>, NOTE_MEMORYSTATUS_PRESSURE_[NORMAL|WARN|CRITICAL]
Other memorystatus configurable values:
- Using sysctl kern.memorystatus_purge_on_* values (OS X) These values don't affect memorystatus so much as the pageout daemon, forcing it to force purge on warning (2), urgent (5) or critical (8) values. Setting these values to 0 will disable purging.
- Using memorystatus_get_level (#453): This system call returns (into an int *) a number between 0 and 100 specifying the %-age of free memory. Diagnostics only. Used by Activity Monitor (and my Process Explorer) to show pressure in Mavericks and later
iOS reintroduced ledgers around iOS 5 (or 5.1?), and the concept has since been ported to OS X as well. I say "reintroduced", because ledgers have been around since the original design of Mach, but have never really been properly implemented until that point.
Ledgers can help solve the problem of excessive resource utilization. Unlike the classic UN*X model (
setrlimit(2), known to users as
Down the road, it makes sense for Apple to shift entirely to a ledger based mechanism for RAM management, especially with RAM being such a scarce resource in iOS (and no swap, to boot). Jetsam will likely remain as a method of last resort.
- Mac OS X and iOS Internals, J Levin
- 3/1/2014 - Added jetsam properties plist from iPhone5s, and note about ledgers
- 2/10/2016 - Added jetsam/memorystatus commands for xnu 32xx (iOS 9, OS X 10.11). Also updated procexp to show mem limits on iOS
- In an effort to maintain simplicitly, let's ignore the fact that some of the virtual memory provided for any given process is actually reserved and mapped for kernel use only. That 256TB for 64-bit, incidentally, is due to hardware imposed limits (plus the fact that nobody would actually use it all, much less 16EB of a full 64-bits). Mac OS X caps user space virtual memory at 47-bits (0x7fffffffffff) for 128-TB, and the topmost (technically, 0xffffffff8...) 128TB are reserved for the kernel.
- Again, to simplify, I'm not going into the actual conditions.
- I'm not going into the process of idle demotion, wherein (as of 10.9) processes may be moved to the idle band so they can be candidates for idle exit. A process can call proc_info with PROC_INFO_CALL_DIRTYCONTROL to have the kernel track its state, seek protection from killing when "dirty" and voluntarily consenting to killing when "clean" (idle). This is used with the vproc mechanism (<vproc.h>)