------------------------------------------------------------------------- OpenBSD Security Advisory February 20, 1998 4.4BSD mmap() Vulnerability ------------------------------------------------------------------------- SYNOPSIS Due to a 4.4BSD VM system problem, it is possible to memory-map a read-only descriptor to a character device in read-write mode. This allows group "kmem" programs to become root, and root to lower the system securelevel, both by writing to the kernel memory device. ------------------------------------------------------------------------- AFFECTED SYSTEMS This vulnerability has been confirmed against OpenBSD 2.2 (and below), FreeBSD 2.2.5 (and below), and BSDI 3.0. NetBSD-current (without UVM) and below is also affected. ------------------------------------------------------------------------- DETAILS The 4.4BSD VM system allows files to be "memory mapped", which causes the specified contents of a file to be made available to a process via its address space. Manipulations of that file can then be performed simply by manipulating memory, rather than using filesystem I/O calls. This technique is used to simplify code, speed up access to files, and provide interprocess communication. Memory mappings can be "private" or "shared". In a private memory mapping, changes to the mapped memory are not committed back to the original file. Multiple processes with private mappings of the same file will not see each other's changes. In a shared mapping, changes to the mapped memory are reflected in the original file, and all processes mapping the same file see each others's changes. In order to create a writeable mapping for a file descriptor, that file descriptor must be open in read-write mode. This prevents users from using read-only access to system files to change the system configuration (by taking the read-only descriptors and mapping them read-write). The 4.4BSD VM system verifies that an open file descriptor is read-write before allowing a shared read-write mapping. 4.4BSD does not perform this access check when the mapping is not shared; a process with a private mapping cannot modify the original file, so the potential for danger is minimized. Unfortunately, the 4.4BSD VM system automatically changes any private mapping of a character device to "shared", regardless of the flags passed to mmap(), after the access check is performed. This allows a user with read-only access to a character device to create a read-write mapping to that device, and thus write to the device. This can be used against the raw memory device ("/dev/mem") to write arbitrary bytes directly to physical memory; if a process has read-only access to "/dev/mem" (processes in group "kmem" have this access), it can become "root" by altering kernel data structures. Furthermore, a process with a read-write mapping on "/dev/mem" can rewrite the system securelevel back to zero after it has been raised. This allows an attacker to bypass the "immutable" and "append-only" filesystem flags, along with any other securelevel protections. ------------------------------------------------------------------------- TECHNICAL DETAILS The code exhibiting this problem is located in "sys/vm/vm_mmap.c", in the functions "mmap()" (the mmap system call handler), and "vm_mmap()", the VM function that actually performs memory mapping. The problem is due to a faulty access check in mmap(), combined with a side-effect of character device mapping in vm_mmap(). The mmap() system call handler performs a read-write access check by examining the file descriptor passed in as an argument to the system call. Before allowing a shared read-write mapping, the system verifies that the file being mapped is open in write mode: if (flags & MAP_SHARED) { if (fp->f_flag & FWRITE) maxprot |= VM_PROT_WRITE; else if (prot & PROT_WRITE) return (EACCES); } If the requested mapping is not shared, the access check against the file (the check for FWRITE in fp->f_flag, which is the file structure for the descriptor passed to mmap) is not performed. For regular files, this check is sufficient; a non-shared mapping will not allow a process to write to the actual file, only to a private copy in memory. The vm_mmap() kernel VM function handles memory mapping for all of the kernel facilities that require this capability, including execve(), System V shared memory, and the mmap() system call. vm_mmap() checks to see if a mapping is requested is associated with a character device, and, if so, automatically creates a shared mapping (comments from original source code): if (vp->v_type == VCHR) { type = OBJT_DEVICE; handle = (caddr_t) vp->v_rdev; } ... /* * Force device mappings to be shared. */ if (type == OBJT_DEVICE) { flags &= ~(MAP_PRIVATE|MAP_COPY); flags |= MAP_SHARED; } As a result of this code, it is possible to request a non-shared mapping of a character device (which will appear innocuous to the mmap() access checking code), and receive a shared, writeable mapping. This can be used to obtain write access to any readable character device. This problem is particularly serious when a hostile process has read access to kernel memory devices. The system status utilities "ps", "netstat", "systat", and others operate setgid "kmem", allowing them to use the KVM library to directly access kernel memory. A bug in any of these programs can allow an attacker to trivially obtain root access, by mmap()'ing a read-only descriptor to "/dev/mem" and altering process credential structures. This issue also directly subverts the system securelevel. 4.4BSD has a facility called "securelevels" which adds restrictions to the kernel that take effect only when a flag in the kernel (the "securelevel") is set. These restrictions include "immutable" files, which cannot be altered (even by root), and "append-only" files, which can only have data appended to. The former is useful for system binaries (to prevent attackers from backdooring libraries and executables), and the latter is useful for logs (to prevent attackers from covering their tracks by deleting log data). The 4.4BSD securelevel features are active when the securelevel is nonzero. The securelevel is set using the "sysctl" facility. The system does not allow the securelevel to be lowered once it is nonzero; if an attacker can lower the securelevel, she can evade securelevels protections by turning them off. The 4.4BSD kernel does not allow processes to write directly to kernel memory when the securelevel is nonzero; this prevents "root" from bypassing the securelevel simply by writing to "/dev/kmem". This is controlled by an access check in "sys/miscfs/specfs/spec_vnops.c", which provides vnode operations (open, read, write, etc) for special files (like character devices). The access check is performed in the "spec_open()" function, which handles the "open" system call for special files. When the securelevel is nonzero, the system explicitly checks for attempts to open devices in read-write mode, and prevents read-write opens for disk and kernel memory devices. Unfortunately, the mmap() bug allows a process to write to a descriptor even if it is open read-only; the assumption made in spec_open() thus fails to catch attempts to reset the securelevel using mmap(). ------------------------------------------------------------------------- RESOLUTION This is a kernel problem that can only be fixed by patching or upgrading the problematic system code. Patches for the OpenBSD operating system are provided in this advisory. The problem is fixed in OpenBSD-current and must be patched in versions 2.2 and below. The attached OpenBSD patch causes any attempt to create a private mapping of a character device to fail, and enhances access checking in mmap() to explicitly verify that the mapping requested is consistant with the open mode on the file descriptor being mapped. Accelerated X from X Inside relies on this bug to operate correctly; this patch thus breaks the Accelerated X server. Contact your Accelerated X vendor for more information about this. XFree86 is not believed to be affected by the problem. More information about the OpenBSD resolution to the problem is available at "http://www.openbsd.org/errata.html". ------------------------------------------------------------------------- CREDITS Documentation and testing of this problem was conducted by Theo de Raadt and Chuck Cranor. Theo de Raadt, Chuck Cranor, and Niklas Hallqvist of the OpenBSD project provided the OpenBSD patch for the problem. The developers at OpenBSD would like to extend their gratitude to Perry "Scare Bear" Metzger for his continued support of their efforts. ------------------------------------------------------------------------- OPENBSD PATCH Index: vm_mmap.c =================================================================== RCS file: /cvs/src/sys/vm/vm_mmap.c,v retrieving revision 1.10 retrieving revision 1.13 diff -u -9 -u -r1.10 -r1.13 --- vm_mmap.c 1997/11/14 20:56:08 1.10 +++ vm_mmap.c 1998/02/25 22:13:46 1.13 @@ -1,10 +1,10 @@ -/* $OpenBSD: mmap.txt,v 1.1 1999/09/28 21:11:35 deraadt Exp $ */ +/* $OpenBSD: mmap.txt,v 1.1 1999/09/28 21:11:35 deraadt Exp $ */ /* $NetBSD: vm_mmap.c,v 1.47 1996/03/16 23:15:23 christos Exp $ */ /* * Copyright (c) 1988 University of Utah. * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer @@ -207,48 +207,60 @@ * Mapping file, get fp for validation. * Obtain vnode and make sure it is of appropriate type. */ if (((unsigned)fd) >= fdp->fd_nfiles || (fp = fdp->fd_ofiles[fd]) == NULL) return (EBADF); if (fp->f_type != DTYPE_VNODE) return (EINVAL); vp = (struct vnode *)fp->f_data; - if (vp->v_type != VREG && vp->v_type != VCHR) - return (EINVAL); + /* * XXX hack to handle use of /dev/zero to map anon * memory (ala SunOS). */ if (vp->v_type == VCHR && iszerodev(vp->v_rdev)) { flags |= MAP_ANON; goto is_anon; } + + /* + * Only files and cdevs are mappable, and cdevs does not + * provide private mappings of any kind. + */ + if (vp->v_type != VREG && + (vp->v_type != VCHR || (flags & (MAP_PRIVATE|MAP_COPY)))) + return (EINVAL); /* * Ensure that file and memory protections are * compatible. Note that we only worry about * writability if mapping is shared; in this case, * current and max prot are dictated by the open file. * XXX use the vnode instead? Problem is: what * credentials do we use for determination? * What if proc does a setuid? */ maxprot = VM_PROT_EXECUTE; /* ??? */ if (fp->f_flag & FREAD) maxprot |= VM_PROT_READ; else if (prot & PROT_READ) + return (EACCES); + + /* + * If we are sharing potential changes (either via MAP_SHARED + * or via the implicit sharing of character device mappings), + * and we are trying to get write permission although we + * opened it without asking for it, bail out. + */ + if (((flags & MAP_SHARED) != 0 || vp->v_type == VCHR) && + (fp->f_flag & FWRITE) == 0 && (prot & PROT_WRITE) != 0) return (EACCES); - if (flags & MAP_SHARED) { - if (fp->f_flag & FWRITE) - maxprot |= VM_PROT_WRITE; - else if (prot & PROT_WRITE) - return (EACCES); - } else + else maxprot |= VM_PROT_WRITE; handle = (caddr_t)vp; } else { /* * (flags & MAP_ANON) == TRUE * Mapping blank space is trivial. */ if (fd != -1) return (EINVAL);