In addition to exporting these services, UVM has two kernel-level processes: pagedaemon and swapper. The pagedaemon process sleeps until physical memory becomes scarce. When that happens, pagedaemon is awoken. It scans physical memory, paging out and freeing memory that has not been recently used. The swapper process swaps in runnable processes that are currently swapped out, if there is room.
There are also several miscellaneous functions.
void
void
);
void
uvm_init_limits(struct lwp *l
);
void
uvm_setpagesize(void
);
void
uvm_swap_init(void
);
uvm_init()
sets up the UVM system at system boot time, after the
console has been setup.
It initializes global state, the page, map, kernel virtual memory state,
machine-dependent physical map, kernel memory allocator,
pager and anonymous memory sub-systems, and then enables
paging of kernel objects.
uvm_init_limits()
initializes process limits for the named process.
This is for use by the system startup for process zero, before any
other processes are created.
uvm_setpagesize()
initializes the uvmexp members pagesize (if not already done by
machine-dependent code), pageshift and pagemask.
It should be called by machine-dependent code early in the
pmap_init(
)
call (see
pmap(9)).
uvm_swap_init()
initializes the swap sub-system.
int
struct vm_map *map
, vaddr_t *startp
, vsize_t size
, struct uvm_object *uobj
, voff_t uoffset
, vsize_t align
, uvm_flag_t flags
);
void
uvm_unmap(struct vm_map *map
, vaddr_t start
, vaddr_t end
);
int
uvm_map_pageable(struct vm_map *map
, vaddr_t start
, vaddr_t end
, bool new_pageable
, int lockflags
);
bool
uvm_map_checkprot(struct vm_map *map
, vaddr_t start
, vaddr_t end
, vm_prot_t protection
);
int
uvm_map_protect(struct vm_map *map
, vaddr_t start
, vaddr_t end
, vm_prot_t new_prot
, bool set_max
);
int
uvm_deallocate(struct vm_map *map
, vaddr_t start
, vsize_t size
);
struct
vmspace
*
uvmspace_alloc(vaddr_t min
, vaddr_t max
, int pageable
);
void
uvmspace_exec(struct lwp *l
, vaddr_t start
, vaddr_t end
);
struct
vmspace
*
uvmspace_fork(struct vmspace *vm
);
void
uvmspace_free(struct vmspace *vm1
);
void
uvmspace_share(struct proc *p1
, struct proc *p2
);
void
uvmspace_unshare(struct lwp *l
);
bool
uvm_uarea_alloc(vaddr_t *uaddrp
);
void
uvm_uarea_free(vaddr_t uaddr
);
uvm_map()
establishes a valid mapping in map
map
,
which must be unlocked.
The new mapping has size
size
,
which must be a multiple of
PAGE_SIZE
.
The
uobj
and
uoffset
arguments can have four meanings.
When
uobj
is
NULL
and
uoffset
is
UVM_UNKNOWN_OFFSET
,
uvm_map()
does not use the machine-dependent
PMAP_PREFER
function.
If
uoffset
is any other value, it is used as the hint to
PMAP_PREFER
.
When
uobj
is not
NULL
and
uoffset
is
UVM_UNKNOWN_OFFSET
,
uvm_map()
finds the offset based upon the virtual address, passed as
startp
.
If
uoffset
is any other value, we are doing a normal mapping at this offset.
The start address of the map will be returned in
startp
.
align
specifies alignment of mapping unless
UVM_FLAG_FIXED
is specified in
flags
.
align
must be a power of 2.
flags
passed to
uvm_map()
are typically created using the
UVM_MAPFLAG(
vm_prot_t prot
, vm_prot_t maxprot
, vm_inherit_t inh
, int advice
, int flags
)
macro, which uses the following values.
The
prot
and
maxprot
can take are:
#define UVM_PROT_MASK 0x07 /* protection mask */
#define UVM_PROT_NONE 0x00 /* protection none */
#define UVM_PROT_ALL 0x07 /* everything */
#define UVM_PROT_READ 0x01 /* read */
#define UVM_PROT_WRITE 0x02 /* write */
#define UVM_PROT_EXEC 0x04 /* exec */
#define UVM_PROT_R 0x01 /* read */
#define UVM_PROT_W 0x02 /* write */
#define UVM_PROT_RW 0x03 /* read-write */
#define UVM_PROT_X 0x04 /* exec */
#define UVM_PROT_RX 0x05 /* read-exec */
#define UVM_PROT_WX 0x06 /* write-exec */
#define UVM_PROT_RWX 0x07 /* read-write-exec */
The values that
inh
can take are:
#define UVM_INH_MASK 0x30 /* inherit mask */
#define UVM_INH_SHARE 0x00 /* "share" */
#define UVM_INH_COPY 0x10 /* "copy" */
#define UVM_INH_NONE 0x20 /* "none" */
#define UVM_INH_DONATE 0x30 /* "donate" << not used */
The values that
advice
can take are:
#define UVM_ADV_NORMAL 0x0 /* 'normal' */
#define UVM_ADV_RANDOM 0x1 /* 'random' */
#define UVM_ADV_SEQUENTIAL 0x2 /* 'sequential' */
#define UVM_ADV_MASK 0x7 /* mask */
The values that
flags
can take are:
#define UVM_FLAG_FIXED 0x010000 /* find space */
#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */
#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */
#define UVM_FLAG_AMAPPAD 0x100000 /* for bss: pad amap to reduce malloc() */
#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */
The
UVM_MAPFLAG
macro arguments can be combined with an or operator.
There are several special purpose macros for checking protection
combinations, e.g., the
UVM_PROT_WX
macro.
There are also some additional macros to extract bits from the flags.
The
UVM_PROTECTION
,
UVM_INHERIT
,
UVM_MAXPROTECTION
and
UVM_ADVICE
macros return the protection, inheritance, maximum protection and advice,
respectively.
uvm_map()
returns a standard UVM return value.
uvm_unmap()
removes a valid mapping,
from
start
to
end
,
in map
map
,
which must be unlocked.
uvm_map_pageable()
changes the pageability of the pages in the range from
start
to
end
in map
map
to
new_pageable
.
uvm_map_pageable()
returns a standard UVM return value.
uvm_map_checkprot()
checks the protection of the range from
start
to
end
in map
map
against
protection
.
This returns either
true
or
false
.
uvm_map_protect()
changes the protection
start
to
end
in map
map
to
new_prot
,
also setting the maximum protection to the region to
new_prot
if
set_max
is true.
This function returns a standard UVM return value.
uvm_deallocate()
deallocates kernel memory in map
map
from address
start
to
start
+
size
.
uvmspace_alloc()
allocates and returns a new address space, with ranges from
min
to
max
,
setting the pageability of the address space to
pageable
.
uvmspace_exec()
either reuses the address space of lwp
l
if there are no other references to it, or creates
a new one with
uvmspace_alloc().
The range of valid addresses in the address space is reset to
start
through
end
.
uvmspace_fork()
creates and returns a new address space based upon the
vm1
address space, typically used when allocating an address space for a
child process.
uvmspace_free()
lowers the reference count on the address space
vm
,
freeing the data structures if there are no other references.
uvmspace_share()
causes process
p2
to share the address space of
p1
.
uvmspace_unshare()
ensures that lwp
l
has its own, unshared address space, by creating a new one if
necessary by calling
uvmspace_fork().
uvm_uarea_alloc()
allocates virtual space for a u-area (i.e., a kernel stack) and stores
its virtual address in
*uaddrp
.
The return value is
true
if the u-area is already backed by wired physical memory, otherwise
false
.
uvm_uarea_free()
frees a u-area allocated with
uvm_uarea_alloc(
),
freeing both the virtual space and any physical pages which may have been
allocated to back that virtual space later.
int
struct vm_map *orig_map
, vaddr_t vaddr
, vm_prot_t access_type
);
uvm_fault()
is the main entry point for faults.
It takes
orig_map
as the map the fault originated in, a
vaddr
offset into the map the fault occurred, and
access_type
describing the type of access requested.
uvm_fault()
returns a standard UVM return value.
void
struct vnode *vp
, voff_t newsize
);
void
*
ubc_alloc(struct uvm_object *uobj
, voff_t offset
, vsize_t *lenp
, int advice
, int flags
);
void
ubc_release(void *va
, int flags
);
int
ubc_uiomove(struct uvm_object *uobj
, struct uio *uio
, vsize_t todo
, int advice
, int flags
);
uvm_vnp_setsize()
sets the size of vnode
vp
to
newsize
.
Caller must hold a reference to the vnode.
If the vnode shrinks, pages no longer used are discarded.
ubc_alloc()
creates a kernel mapping of
uobj
starting at offset
offset
.
The desired length of the mapping is pointed to by
lenp
,
but the actual mapping may be smaller than this.
lenp
is updated to contain the actual length mapped.
advice
is the access pattern hint, which must be one of
The possible
flags
are
Once the mapping is created, it must be accessed only by methods that can
handle faults, such as
uiomove()
or
kcopy(
).
Page faults on the mapping will result in the object's pager
method being called to resolve the fault.
ubc_release()
frees the mapping at
va
for reuse.
The mapping may be cached to speed future accesses to the same region
of the object.
The flags can be any of
ubc_uiomove()
allocates an UBC memory window, performs I/O on it and unmaps the window.
The
advice
parameter takes the same values as the respective parameter in
ubc_alloc()
and the
flags
parameter takes the same arguments as
ubc_alloc()
and
ubc_unmap(
).
Additionally, the flag
UBC_PARTIALOK
can be provided to indicate that it is acceptable to return if an error
occurs mid-transfer.
int
struct vm_map *map
, struct uio *uio
);
uvm_io()
performs the I/O described in
uio
on the memory described in
map
.
vaddr_t
struct vm_map *map
, vsize_t size
, vsize_t align
, uvm_flag_t flags
);
void
uvm_km_free(struct vm_map *map
, vaddr_t addr
, vsize_t size
, uvm_flag_t flags
);
struct
vm_map
*
uvm_km_suballoc(struct vm_map *map
, vaddr_t *min
, vaddr_t *max
, vsize_t size
, bool pageable
, bool fixed
, struct vm_map *submap
);
uvm_km_alloc()
allocates
size
bytes of kernel memory in map
map
.
The first address of the allocated memory range will be aligned according to the
align
argument
(specify 0 if no alignment is necessary).
The alignment must be a multiple of page size.
The
flags
is a bitwise inclusive OR of the allocation type and operation flags.
The allocation type should be one of:
The following operation flags are available:
UVM_KMF_NOWAIT
is not specified and
UVM_KMF_WAITVA
is specified.
UVM_KMF_WIRED
.
Shouldn't be used with other types.
(If neither
UVM_KMF_NOWAIT
nor
UVM_KMF_CANFAIL
are specified and
UVM_KMF_WAITVA
is specified,
uvm_km_alloc()
will never fail, but rather sleep indefinitely until the allocation succeeds.)
Pageability of the pages allocated with
UVM_KMF_PAGEABLE
can be changed by
uvm_map_pageable().
In that case, the entire range must be changed atomically.
Changing a part of the range is not supported.
uvm_km_free()
frees the memory range allocated by
uvm_km_alloc(
).
addr
must be an address returned by
uvm_km_alloc().
map
and
size
must be the same as the ones used for the corresponding
uvm_km_alloc().
flags
must be the allocation type used for the corresponding
uvm_km_alloc().
uvm_km_free()
is the only way to free memory ranges allocated by
uvm_km_alloc(
).
uvm_unmap(
)
must not be used.
uvm_km_suballoc()
allocates submap from
map
,
creating a new map if
submap
is
NULL
.
The addresses of the submap can be specified exactly by setting the
fixed
argument to true, which causes the
min
argument to specify the beginning of the address in the submap.
If
fixed
is false, any address of size
size
will be allocated from
map
and the start and end addresses returned in
min
and
max
.
If
pageable
is true, entries in the map may be paged out.
struct
vm_page
*
struct uvm_object *uobj
, voff_t off
, struct vm_anon *anon
, int flags
);
void
uvm_pagerealloc(struct vm_page *pg
, struct uvm_object *newobj
, voff_t newoff
);
void
uvm_pagefree(struct vm_page *pg
);
int
uvm_pglistalloc(psize_t size
, paddr_t low
, paddr_t high
, paddr_t alignment
, paddr_t boundary
, struct pglist *rlist
, int nsegs
, int waitok
);
void
uvm_pglistfree(struct pglist *list
);
void
uvm_page_physload(vaddr_t start
, vaddr_t end
, vaddr_t avail_start
, vaddr_t avail_end
, int free_list
);
uvm_pagealloc()
allocates a page of memory at virtual address
off
in either the object
uobj
or the anonymous memory
anon
,
which must be locked by the caller.
Only one of
uobj
and
anon
can be non
NULL
.
Returns
NULL
when no page can be found.
The flags can be any of
#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */
#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */
UVM_PGA_USERESERVE
means to allocate a page even if that will result in the number of free pages
being lower than
uvmexp.reserve_pagedaemon
(if the current thread is the pagedaemon) or
uvmexp.reserve_kernel
(if the current thread is not the pagedaemon).
UVM_PGA_ZERO
causes the returned page to be filled with zeroes, either by allocating it
from a pool of pre-zeroed pages or by zeroing it in-line as necessary.
uvm_pagerealloc()
reallocates page
pg
to a new object
newobj
,
at a new offset
newoff
.
uvm_pagefree()
frees the physical page
pg
.
If the content of the page is known to be zero-filled,
caller should set
PG_ZERO
in pg->flags so that the page allocator will use
the page to serve future
UVM_PGA_ZERO
requests efficiently.
uvm_pglistalloc()
allocates a list of pages for size
size
byte under various constraints.
low
and
high
describe the lowest and highest addresses acceptable for the list.
If
alignment
is non-zero, it describes the required alignment of the list, in
power-of-two notation.
If
boundary
is non-zero, no segment of the list may cross this power-of-two
boundary, relative to zero.
nsegs
is the maximum number of physically contiguous segments.
If
waitok
is non-zero, the function may sleep until enough memory is available.
(It also may give up in some situations, so a non-zero
waitok
does not imply that
uvm_pglistalloc()
cannot return an error.)
The allocated memory is returned in the
rlist
list; the caller has to provide storage only, the list is initialized by
uvm_pglistalloc().
uvm_pglistfree()
frees the list of pages pointed to by
list
.
If the content of the page is known to be zero-filled,
caller should set
PG_ZERO
in pg->flags so that the page allocator will use
the page to serve future
UVM_PGA_ZERO
requests efficiently.
uvm_page_physload()
loads physical memory segments into VM space on the specified
free_list
.
It must be called at system boot time to set up physical memory
management pages.
The arguments describe the
start
and
end
of the physical addresses of the segment, and the available start and end
addresses of pages not already in use.
void
void
);
void
uvm_scheduler(void
);
void
uvm_swapin(struct lwp *l
);
uvm_pageout()
is the main loop for the page daemon.
uvm_scheduler()
is the process zero main loop, which is to be called after the
system has finished starting other processes.
It handles the swapping in of runnable, swapped out processes in priority
order.
uvm_swapin()
swaps in the named lwp.
int
struct vm_map *map
, vaddr_t start
, vsize_t len
, void *v
, int flags
);
void
uvm_unloan(void *v
, int npages
, int flags
);
uvm_loan()
loans pages in a map out to anons or to the kernel.
map
should be unlocked,
start
and
len
should be multiples of
PAGE_SIZE
.
Argument
flags
should be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
v
should be pointer to array of pointers to
struct
anon
or
struct
vm_page
,
as appropriate.
The caller has to allocate memory for the array and
ensure it's big enough to hold
len
/
PAGE_SIZE
pointers.
Returns 0 for success, or appropriate error number otherwise.
Note that wired pages can't be loaned out and
uvm_loan()
will fail in that case.
uvm_unloan()
kills loans on pages or anons.
The
v
must point to the array of pointers initialized by previous call to
uvm_loan().
npages
should match number of pages allocated for loan, this also matches
number of items in the array.
Argument
flags
should be one of
#define UVM_LOAN_TOANON 0x01 /* loan to anons */
#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */
and should match what was used for previous call to
uvm_loan().
struct
uvm_object
*
vsize_t size
, int flags
);
void
uao_detach(struct uvm_object *uobj
);
void
uao_reference(struct uvm_object *uobj
);
bool
uvm_chgkprot(void *addr
, size_t len
, int rw
);
void
uvm_kernacc(void *addr
, size_t len
, int rw
);
int
uvm_vslock(struct vmspace *vs
, void *addr
, size_t len
, vm_prot_t prot
);
void
uvm_vsunlock(struct vmspace *vs
, void *addr
, size_t len
);
void
uvm_meter(void
);
void
uvm_fork(struct lwp *l1
, struct lwp *l2
, bool shared
);
int
uvm_grow(struct proc *p
, vaddr_t sp
);
void
uvn_findpages(struct uvm_object *uobj
, voff_t offset
, int *npagesp
, struct vm_page **pps
, int flags
);
void
uvm_swap_stats(int cmd
, struct swapent *sep
, int sec
, register_t *retval
);
The
uao_create(),
uao_detach(
),
and
uao_reference(
)
functions operate on anonymous memory objects, such as those used to support
System V shared memory.
uao_create(
)
returns an object of size
size
with flags:
#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */
#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */
which can only be used once each at system boot time.
uao_reference()
creates an additional reference to the named anonymous memory object.
uao_detach(
)
removes a reference from the named anonymous memory object, destroying
it if removing the last reference.
uvm_chgkprot()
changes the protection of kernel memory from
addr
to
addr
+
len
to the value of
rw
.
This is primarily useful for debuggers, for setting breakpoints.
This function is only available with options
KGDB
.
uvm_kernacc()
checks the access at address
addr
to
addr
+
len
for
rw
access in the kernel address space.
uvm_vslock()
and
uvm_vsunlock(
)
control the wiring and unwiring of pages for process
p
from
addr
to
addr
+
len
.
These functions are normally used to wire memory for I/O.
uvm_meter()
calculates the load average and wakes up the swapper if necessary.
uvm_fork()
forks a virtual address space for process' (old)
p1
and (new)
p2
.
If the
shared
argument is non zero, p1 shares its address space with p2,
otherwise a new address space is created.
This function currently has no return value, and thus cannot fail.
In the future, this function will be changed to allow it to
fail in low memory conditions.
uvm_grow()
increases the stack segment of process
p
to include
uvn_findpages()
looks up or creates pages in
uobj
at offset
offset
,
marks them busy and returns them in the
pps
array.
Currently
uobj
must be a vnode object.
The number of pages requested is pointed to by
npagesp
,
and this value is updated with the actual number of pages returned.
The flags can be
#define UFP_ALL 0x00 /* return all pages requested */
#define UFP_NOWAIT 0x01 /* don't sleep */
#define UFP_NOALLOC 0x02 /* don't allocate new pages */
#define UFP_NOCACHE 0x04 /* don't return pages which already exist */
#define UFP_NORDONLY 0x08 /* don't return PG_READONLY pages */
UFP_ALL
is a pseudo-flag meaning all requested pages should be returned.
UFP_NOWAIT
means that we must not sleep.
UFP_NOALLOC
causes any pages which do not already exist to be skipped.
UFP_NOCACHE
causes any pages which do already exist to be skipped.
UFP_NORDONLY
causes any pages which are marked PG_READONLY to be skipped.
uvm_swap_stats()
implements the
SWAP_STATS
and
SWAP_OSTATS
operation of the
swapctl(2)
system call.
cmd
is the requested command,
SWAP_STATS
or
SWAP_OSTATS
.
The function will copy no more than
sec
entries in the array pointed by
sep
.
On return,
retval
holds the actual number of entries copied in the array.
CTL_VM
domain of the
sysctl(3)
hierarchy.
It handles the
VM_LOADAVG
,
VM_METER
,
VM_UVMEXP
,
and
VM_UVMEXP2
nodes, which return the current load averages, calculates current VM
totals, returns the uvmexp structure, and a kernel version independent
view of the uvmexp structure, respectively.
It also exports a number of tunables that control how much VM space is
allowed to be consumed by various tasks.
The load averages are typically accessed from userland using the
getloadavg(3)
function.
The uvmexp structure has all global state of the UVM system,
and has the following members:
/* vm_page constants */
int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */
int pagemask; /* page mask */
int pageshift; /* page shift */
/* vm_page counters */
int npages; /* number of pages we manage */
int free; /* number of free pages */
int active; /* number of active pages */
int inactive; /* number of pages that we free'd but may want back */
int paging; /* number of pages in the process of being paged out */
int wired; /* number of wired pages */
int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
int reserve_kernel; /* number of pages reserved for kernel */
/* pageout params */
int freemin; /* min number of free pages */
int freetarg; /* target number of free pages */
int inactarg; /* target number of inactive pages */
int wiredmax; /* max number of wired pages */
/* swap */
int nswapdev; /* number of configured swap devices in system */
int swpages; /* number of PAGE_SIZE'ed swap pages */
int swpginuse; /* number of swap pages in use */
int nswget; /* number of times fault calls uvm_swap_get() */
int nanon; /* number total of anon's in system */
int nfreeanon; /* number of free anon's */
/* stat counters */
int faults; /* page fault count */
int traps; /* trap count */
int intrs; /* interrupt count */
int swtch; /* context switch count */
int softs; /* software interrupt count */
int syscalls; /* system calls */
int pageins; /* pagein operation count */
/* pageouts are in pdpageouts below */
int swapins; /* swapins */
int swapouts; /* swapouts */
int pgswapin; /* pages swapped in */
int pgswapout; /* pages swapped out */
int forks; /* forks */
int forks_ppwait; /* forks where parent waits */
int forks_sharevm; /* forks where vmspace is shared */
/* fault subcounters */
int fltnoram; /* number of times fault was out of ram */
int fltnoanon; /* number of times fault was out of anons */
int fltpgwait; /* number of times fault had to wait on a page */
int fltpgrele; /* number of times fault found a released page */
int fltrelck; /* number of times fault relock called */
int fltrelckok; /* number of times fault relock is a success */
int fltanget; /* number of times fault gets anon page */
int fltanretry; /* number of times fault retrys an anon get */
int fltamcopy; /* number of times fault clears "needs copy" */
int fltnamap; /* number of times fault maps a neighbor anon page */
int fltnomap; /* number of times fault maps a neighbor obj page */
int fltlget; /* number of times fault does a locked pgo_get */
int fltget; /* number of times fault does an unlocked get */
int flt_anon; /* number of times fault anon (case 1a) */
int flt_acow; /* number of times fault anon cow (case 1b) */
int flt_obj; /* number of times fault is on object page (2a) */
int flt_prcopy; /* number of times fault promotes with copy (2b) */
int flt_przero; /* number of times fault promotes with zerofill (2b) */
/* daemon counters */
int pdwoke; /* number of times daemon woke up */
int pdrevs; /* number of times daemon rev'd clock hand */
int pdswout; /* number of times daemon called for swapout */
int pdfreed; /* number of pages daemon freed since boot */
int pdscans; /* number of pages daemon scanned since boot */
int pdanscan; /* number of anonymous pages scanned by daemon */
int pdobscan; /* number of object pages scanned by daemon */
int pdreact; /* number of pages daemon reactivated since boot */
int pdbusy; /* number of times daemon found a busy page */
int pdpageouts; /* number of times daemon started a pageout */
int pdpending; /* number of times daemon got a pending pageout */
int pddeact; /* number of pages daemon deactivates */
)
is only available if the kernel has been compiled with options
KGDB
.
All structure and types whose names begin with ``vm_'' will be renamed to ``uvm_''.
UVM appeared in NetBSD1.4.
Matthew Green <mrg@eterna.com.au> wrote the swap-space management code and handled the logistical issues involved with merging UVM into the NetBSD source tree.
Chuck Silvers <chuq@chuq.com> implemented the aobj pager, thus allowing UVM to support System V shared memory and process swapping. He also designed and implemented the UBC part of UVM, which uses UVM pages to cache vnode data rather than the traditional buffer cache buffers.