Reversing engineering the fix of CVE-2020-9771

When I originally found the mount_apfs bug back in December, 2019, I honestly had no idea what was the root cause of it, nor had a clue how to even start looking into it. The only thing I knew for sure that the answer is within kernel. My macOS knowledge was still quite fresh that time (and even today), and was busy with so many other stuff that I never had the time to start looking into it.

Spring came, and Apple fixed the bug in macOS Catalina 10.15.4, and they attributed this issue to the “Sandbox”. I changed jobs, summer came, and still didn’t have time for it.

Now we are into one year since I originally reported this. Recently I came across an article about APFS snapshots, which was written back in 2017 by Adam Leventhal for Ars Technica. It’s titled as Testing out snapshots in Apple’s next-generation APFS file system. This showed some system calls, which could be used for making APFS snapshots, and finally gave me a boost, and a pointer where to start looking.

I previously wrote about the vulnerability, which allowed someone to mount an APFS snapshot and gain read access to all files within the system, included those protected by privacy. The bug was fixed by only allowing apps having “Full Disk Access” right to make this mount operation.

So I finally decided to sit down and take an in-depth look into how Apple fixed it, and find out what’s going on.

Chapter 1 - The Source

Although Apple doesn’t open source everything, there is still a decent amount source code available, largely the XNU kernel, which is very useful for any similar investigation. Since image mounting is done via system calls, the source code of the XNU kernel can be very fruitful.

During the past year Apple released the source code for XNU till macOS 10.15.6, so I decided to start looking there. I keep a copy of all tarballs, so extracted an earlier version of XNU, 6153.41.3 and also the one used in macOS 10.15.4, which is 6153.141.1. If any changes were made in the open source part we should see that. Let’s try to find out if anything changed.

First we will start with tracking how snapshot mounting happens through system calls using xnu-6153.41.3. The file ./xnu-6153.41.3/libsyscall/wrappers/fs_snapshot.c contains wrapper for various snapshot mounting related system calls. The one that is interesting for us is fs_snapshot_mount.

fs_snapshot_mount(int dirfd, const char *dir, const char *snapshot,
    uint32_t flags)
{
	return __fs_snapshot(SNAPSHOT_OP_MOUNT, dirfd, snapshot, dir,
	           NULL, flags);
}

This function does a call to fs_snapshot with the opcode SNAPSHOT_OP_MOUNT. This function eventually leads to fs_snapshot , which can be found in ./xnu-6153.41.3/bsd/vfs/vfs_syscalls.c. Here is a small section of it.

/*
 * FS snapshot operations dispatcher
 */
int
fs_snapshot(__unused proc_t p, struct fs_snapshot_args *uap,
    __unused int32_t *retval)
{
...
	case SNAPSHOT_OP_MOUNT:
		error = snapshot_mount(uap->dirfd, uap->name1, uap->name2,
		    uap->data, uap->flags, ctx);
		break;
...

It’s not a very long function, and at the very end it has a switch table, which is based on the opcodes passed. In our case it leads to snapshot_mount. This function is in the same file, and it looks like the following.

static int
snapshot_mount(int dirfd, user_addr_t name, user_addr_t directory,
    __unused user_addr_t mnt_data, __unused uint32_t flags, vfs_context_t ctx)
{
	vnode_t rvp, snapdvp, snapvp, vp, pvp;
	int error;
	struct nameidata *snapndp, *dirndp;
	/* carving out a chunk for structs that are too big to be on stack. */
	struct {
		struct nameidata snapnd;
		struct nameidata dirnd;
	} * __snapshot_mount_data;

	MALLOC(__snapshot_mount_data, void *, sizeof(*__snapshot_mount_data),
	    M_TEMP, M_WAITOK);
	snapndp = &__snapshot_mount_data->snapnd;
	dirndp = &__snapshot_mount_data->dirnd;

	error = vnode_get_snapshot(dirfd, &rvp, &snapdvp, name, snapndp, LOOKUP,
	    OP_LOOKUP, ctx);
	if (error) {
		goto out;
	}

	snapvp  = snapndp->ni_vp;
	if (!vnode_mount(rvp) || (vnode_mount(rvp) == dead_mountp)) {
		error = EIO;
		goto out1;
	}

	/* Get the vnode to be covered */
	NDINIT(dirndp, LOOKUP, OP_MOUNT, FOLLOW | AUDITVNPATH1 | WANTPARENT,
	    UIO_USERSPACE, directory, ctx);
	error = namei(dirndp);
	if (error) {
		goto out1;
	}

	vp = dirndp->ni_vp;
	pvp = dirndp->ni_dvp;

	if ((vp->v_flag & VROOT) && (vp->v_mount->mnt_flag & MNT_ROOTFS)) {
		error = EINVAL;
	} else {
		mount_t mp = vnode_mount(rvp);
		struct fs_snapshot_mount_args smnt_data;

		smnt_data.sm_mp  = mp;
		smnt_data.sm_cnp = &snapndp->ni_cnd;
		error = mount_common(mp->mnt_vfsstat.f_fstypename, pvp, vp,
		    &dirndp->ni_cnd, CAST_USER_ADDR_T(&smnt_data), flags & MNT_DONTBROWSE,
		    KERNEL_MOUNT_SNAPSHOT, NULL, FALSE, ctx);
	}

	vnode_put(vp);
	vnode_put(pvp);
	nameidone(dirndp);
out1:
	vnode_put(snapvp);
	vnode_put(snapdvp);
	vnode_put(rvp);
	nameidone(snapndp);
out:
	FREE(__snapshot_mount_data, M_TEMP);
	return error;
}

This function is eventually calls out to mount_common, which is also in the same file, and it will be responsible for the actual mount. Let’s see how this function has changed, and for that we will use the code from xnu-6153.141.1 specifically the file ./xnu-6153.141.1/bsd/vfs/vfs_syscalls.c. I won’t paste the entire function again, just the new important part.

#if CONFIG_MACF
	error = mac_mount_check_snapshot_mount(ctx, rvp, vp, &dirndp->ni_cnd, snapndp->ni_cnd.cn_nameptr,
	    mp->mnt_vfsstat.f_fstypename);
	if (error) {
		goto out2;
	}
#endif

We can see that a callout to a MACF (Mandatory Access Control Framework) policy was added. On a very-very high level this is an extendable framework which can enforce various policies, restriction, and authorize or reject certain operations. On macOS the largest MACF policy extensions is the Sandbox. There are others, like AppleMobileFileIntegrity, but Sandbox is by far the largest with hooking more than 100 system calls.

This is where things start to make sense, because if we recall that Apple considered this a bug in “Sandbox”, this callout makes sense.

Apparently the bug is that a Sandbox check was missing. If we search for mac_mount_* in the old source code, we will see that there are many callouts, but not with the snapshot call, and specifically the mac_mount_check_snapshot_mount doesn’t exists in earlier versions.

Let’s move on. This function is defined in two places, first in xnu-6153.141.1/security/mac_base.c.

int mac_mount_check_snapshot_mount(vfs_context_t ctx, struct vnode *rvp, struct vnode *vp, struct componentname *cnp,
    const char *name, const char *vfc_name);
int
mac_mount_check_snapshot_mount(vfs_context_t ctx __unused, struct vnode *rvp __unused, struct vnode *vp __unused,
    struct componentname *cnp __unused, const char *name __unused, const char *vfc_name __unused)
{
	return 0;
}

Here this function simply returns 0, which in case of the MACF framework, means “allow”. I’m guessing here, but I think this is just to ensure that if none of the MACF policy hooks support this check, the operation will be allowed. I think this is like a default allow action.

The other implementation is in xnu-6153.141.1/security/mac_vfs.c.

mac_mount_check_snapshot_mount(vfs_context_t ctx, struct vnode *rvp, struct vnode *vp, struct componentname *cnp,
    const char *name, const char *vfc_name)
{
	kauth_cred_t cred;
	int error;

#if SECURITY_MAC_CHECK_ENFORCE
	/* 21167099 - only check if we allow write */
	if (!mac_vnode_enforce) {
		return 0;
	}
#endif
	cred = vfs_context_ucred(ctx);
	if (!mac_cred_check_enforce(cred)) {
		return 0;
	}
	VFS_KERNEL_DEBUG_START1(92, vp);
	MAC_CHECK(mount_check_snapshot_mount, cred, rvp, vp, cnp, name, vfc_name);
	VFS_KERNEL_DEBUG_END1(92, vp);
	return error;
} 

This function uses the macro MAC_CHECK. This is defined in ./xnu-6153.141.1/security/mac_internal.h.

#define MAC_CHECK(check, args...) do {                                  \
	struct mac_policy_conf *mpc;                                    \
	u_int i;                                                        \
                                                                        \
	error = 0;                                                      \
	for (i = 0; i < mac_policy_list.staticmax; i++) {               \
	        mpc = mac_policy_list.entries[i].mpc;                   \
	        if (mpc == NULL)                                        \
	                continue;                                       \
                                                                        \
	        if (mpc->mpc_ops->mpo_ ## check != NULL)                \
	                error = mac_error_select(                       \
	                    mpc->mpc_ops->mpo_ ## check (args),         \
	                    error);                                     \
	}                                                               \
	if (mac_policy_list_conditional_busy() != 0) {                  \
	        for (; i <= mac_policy_list.maxindex; i++) {            \
	                mpc = mac_policy_list.entries[i].mpc;           \
	                if (mpc == NULL)                                \
	                        continue;                               \
                                                                        \
	                if (mpc->mpc_ops->mpo_ ## check != NULL)        \
	                        error = mac_error_select(               \
	                            mpc->mpc_ops->mpo_ ## check (args), \
	                            error);                             \
	        }                                                       \
	        mac_policy_list_unbusy();                               \
	}                                                               \
} while (0)

Eventually this macro will iterate over all MACF policy extension, and call the related function. There are many callouts to mac_error_select function, which is implemented in ./xnu-6153.141.1/security/mac_base.c.

/*
 * Define an error value precedence, and given two arguments, selects the
 * value with the higher precedence.
 */
int
mac_error_select(int error1, int error2)
{
	/* Certain decision-making errors take top priority. */
	if (error1 == EDEADLK || error2 == EDEADLK) {
		return EDEADLK;
	}

	/* Invalid arguments should be reported where possible. */
	if (error1 == EINVAL || error2 == EINVAL) {
		return EINVAL;
	}

	/* Precedence goes to "visibility", with both process and file. */
	if (error1 == ESRCH || error2 == ESRCH) {
		return ESRCH;
	}

	if (error1 == ENOENT || error2 == ENOENT) {
		return ENOENT;
	}

	/* Precedence goes to DAC/MAC protections. */
	if (error1 == EACCES || error2 == EACCES) {
		return EACCES;
	}

	/* Precedence goes to privilege. */
	if (error1 == EPERM || error2 == EPERM) {
		return EPERM;
	}

	/* Precedence goes to error over success; otherwise, arbitrary. */
	if (error1 != 0) {
		return error1;
	}
	return error2;
}

What it does is comparing the error messages returned by the various MACF policy extensions, and ensure that if one of those didn’t allow the operation, it will return the error accordingly. There is a certain priority for the various errors, and the “highest” will be returned. This also means that if any of the extension denies the action, it will be denied.

The fact that a 0 is expected as return for an allowed operation is well documented at the mpo_mount_check_snapshot_mount_t type definition inside ./xnu-6153.141.1/security/mac_policy.h.

/**
 *  @brief Access control check for fs_snapshot_mount
 *  @param cred Subject credential
 *  @param rvp Vnode of either the root directory of the
 *  filesystem to mount snapshot of, or the device from
 *  which to mount the snapshot.
 *  @param vp Vnode that is to be the mount point
 *  @param cnp Component name for vp
 *  @param name Name of snapshot to mount
 *  @param vfc_name Filesystem type name
 *
 *  Determine whether the subject identified by the credential can
 *  mount the named snapshot from the filesystem at the given
 *  directory.
 *
 *  @return Return 0 if access is granted, otherwise an appropriate value
 *  for errno should be returned.
 */
typedef int mpo_mount_check_snapshot_mount_t(
	kauth_cred_t cred,
	struct vnode *rvp,
	struct vnode *vp,
	struct componentname *cnp,
	const char *name,
	const char *vfc_name
	);

It says @return Return 0 if access is granted. This is generally true for all MACF calls, not just this.

An interesting thing I found is that in order to use the fs_snaphot system call, which is also used by snapUtil, and not get denied, you need the com.apple.developer.vfs.snapshot entitlement.

Now that we now that there is a MACF call, time to take a look on the Sandbox. For this we will use the Sandbox extension from Big Sur instead of Catalina.

Chapter 2 - The Sandbox

The mount_check_snapshot_mount Sandbox hook is implemented by the hook_mount_check_snapshot_mount function.

int _hook_mount_check_snapshot_mount(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5) {
    var_30 = **qword_44080;
    var_40 = 0xaaaaaaaaaaaaaaaa;
    *(&var_40 + 0x8) = 0xaaaaaaaaaaaaaaaa;
    ___strlcpy_chk(&var_40, arg5, 0x10, 0x10, arg4, arg5);
    ___bzero(&var_1C8, 0x188);
    *(int32_t *)(&var_1C8 + 0xa8) = 0x1;
    *(&var_1C8 + 0xb0) = arg2;
    *(&var_1C8 + 0x118) = &var_40;
    *(&var_1C8 + 0x120) = arg4;
    rax = _cred_sb_evaluate(arg0, 0x2c, &var_1C8, 0x10, arg4, arg5);
    if (**qword_44080 != var_30) {
            rax = ___stack_chk_fail();
    }
    return rax;
}

This is a short function, which calls into cred_sb_evaluate with the opcode 0x2c. It’s very common for Sandbox hook functions, to be short, and pass evaluation for the cred_sb_evaluate function with a given opcode.

int _cred_sb_evaluate(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5) {
    r9 = arg5;
    r8 = arg4;
    rdi = arg0;
    r15 = arg2;
    r14 = arg1;
    *(arg2 + 0x20) = rdi;
    if (rdi < 0x1) {
            rbx = 0x0;
    }
    else {
            rbx = _label_get_sandbox(*(rdi + 0x78));
    }
    var_50 = *qword_44008;
    *(&var_50 + 0x8) = 0x40000000;
    *(&var_50 + 0x10) = ___sb_evaluate_block_invoke;
    *(&var_50 + 0x18) = ___block_descriptor_tmp.16;
    *(&var_50 + 0x20) = rbx;
    *(int32_t *)(&var_50 + 0x30) = r14;
    *(&var_50 + 0x28) = r15;
    _sb_evaluate_internal(rbx, r14, r15, &var_50, r8, r9);
    _sandbox_release(rbx, r14);
    rax = rax;
    return rax;
}

cred_sb_evaluate is also not very long and it further calls into sb_evaluate_internal, passing the opcode. This is a massive, and very complex function, and ultimately this is what will determine if the operation is allowed or not. Most hook operations end up calling into this internal function. It’s over 1000 lines of code even if decompiled.

I decided not to fully reverse engineer this as in itself it would be weeks of work, but I still wanted to see the decision making process, so jumped into kernel debugging. I wrote about how to do kernel debugging on macOS earlier, which can be found in my company’s website, here: Kernel Debugging macOS with SIP.

A small detour. Kernel debugging on Big Sur, with VMware Fusion 12.1 doesn’t work well if the VM has more than 1 vCPU. No matter if we use VMware’s GDB stub or proper XNU kernel debugging, when we “continue” after a breakpoint, the VM will hang forever and the “vmware-vmx” process will go high on CPU. This could also happen with 1 vCPU, but rarely.

Once kernel debugging is going, we start by setting a breakpoint on hook_mount_check_snapshot_mount and continue.

(lldb) b hook_mount_check_snapshot_mount
Breakpoint 1: where = Sandbox`hook_mount_check_snapshot_mount, address = 0xffffff801021a49c
(lldb) c
Process 1 resuming

Then we issue a snapshot mount in Terminal to trigger the breakpoint. We do this first from a Terminal, which has no Full Disk Access set.

mount_apfs -o noowners,ro -s com.apple.TimeMachine.2020-12-04-045333.local /System/Volumes/Data /tmp/snap

We will hit our breakpoint.

Process 1 stopped
* thread #3, name = '0xffffff86a42b4000', queue = 'cpu-0', stop reason = breakpoint 1.1
    frame #0: 0xffffff801021a49c Sandbox`hook_mount_check_snapshot_mount
Sandbox`hook_mount_check_snapshot_mount:
->  0xffffff801021a49c <+0>: int3   
    0xffffff801021a49d <+1>: mov    rbp, rsp
    0xffffff801021a4a0 <+4>: push   r15
    0xffffff801021a4a2 <+6>: push   r14
Target 0: (kernel) stopped.

We can step through the execution till sb_evaluate_internal returns and then check the value of RAX, which contains the return value, and the “decision” of the evaluation.

(lldb)  
Process 1 stopped
* thread #3, name = '0xffffff86a42b4000', queue = 'cpu-0', stop reason = instruction step over
    frame #0: 0xffffff8010217f57 Sandbox`cred_sb_evaluate + 119
Sandbox`cred_sb_evaluate:
->  0xffffff8010217f57 <+119>: mov    r14, rax
    0xffffff8010217f5a <+122>: mov    rdi, rbx
    0xffffff8010217f5d <+125>: call   0xffffff801021713d        ; sandbox_release
    0xffffff8010217f62 <+130>: mov    rax, r14
Target 0: (kernel) stopped.
(lldb) register read $rax
     rax = 0x00000001ffffffff

We can see that the return value is 0x00000001ffffffff, which is definitely not 0, which means that the operation is not allowed. This will be eventually returned upwards.

If we check the backtrace we will see that we got here through slightly different system call, compared to what we checked originally in the XNU source code.

(lldb) bt
* thread #3, name = '0xffffff86a42b4000', queue = 'cpu-0', stop reason = instruction step over
  * frame #0: 0xffffff8010217f6c Sandbox`cred_sb_evaluate + 140
    frame #1: 0xffffff801021a535 Sandbox`hook_mount_check_snapshot_mount + 153
    frame #2: 0xffffff800daafe3f kernel`mac_mount_check_snapshot_mount(ctx=<unavailable>, rvp=0xffffff869e187200, vp=0xffffff86ae81bd00, cnp=0x0000000000000000, name="com.apple.TimeMachine.2020-12-11-135636.local", vfc_name="apfs") at mac_vfs.c:2376:2 [opt]
    frame #3: 0xffffff80102eb5fb apfs`handle_snapshot_mount + 4808
    frame #4: 0xffffff80102e5524 apfs`apfs_vfsop_mount + 9469
    frame #5: 0xffffff800d54f198 kernel`mount_common(fstypename=<unavailable>, pvp=0xffffff869e54d200, vp=<unavailable>, cnp=<unavailable>, fsmountargs=140732678379552, flags=2097177, internal_flags=0, labelstr=0x0000000000000000, kernelmount=0, ctx=0xffffff86a3f57438) at vfs_syscalls.c:1220:11 [opt]
    frame #6: 0xffffff800d551145 kernel`__mac_mount(p=<unavailable>, uap=0xffffffa04e8fbf18, retval=<unavailable>) at vfs_syscalls.c:596:10 [opt]
    frame #7: 0xffffff800d550cfe kernel`mount(p=<unavailable>, uap=<unavailable>, retval=<unavailable>) at vfs_syscalls.c:356:9 [opt]
    frame #8: 0xffffff800d969ceb kernel`unix_syscall64(state=<unavailable>) at systemcalls.c:412:10 [opt]
    frame #9: 0xffffff800d2621f6 kernel`hndl_unix_scall64 + 22

We can see that mount_apfs called the mount system call, instead of the fs_snapshot, and mount_common eventually called into the apfs kernel extension to handle the mount operation. Then apfs called mac_mount_check_snapshot_mount what we saw earlier.

If we open apfs in Hopper and navigate to this function, we will find this callout.

int _handle_snapshot_mount(int arg0, int arg1, int arg2, int arg3, int arg4, int arg5) {
...
rax = _mac_mount_check_snapshot_mount(var_528, rsi, rdx, rcx, r8, r9);
...

Now let’s give Terminal “Full Disk Access” rights, and try mounting again.

(lldb) b 0xffffff8010217f5a
Breakpoint 4: where = Sandbox`cred_sb_evaluate + 122, address = 0xffffff8010217f5a
(lldb) c
Process 1 resuming
Process 1 stopped
* thread #12, name = '0xffffff86a00a36c0', queue = 'cpu-0', stop reason = breakpoint 4.1
    frame #0: 0xffffff8010217f5a Sandbox`cred_sb_evaluate + 122
Sandbox`cred_sb_evaluate:
->  0xffffff8010217f5a <+122>: int3   
    0xffffff8010217f5b <+123>: mov    edi, ebx
    0xffffff8010217f5d <+125>: call   0xffffff801021713d        ; sandbox_release
    0xffffff8010217f62 <+130>: mov    rax, r14
Target 0: (kernel) stopped.
(lldb) bt
* thread #12, name = '0xffffff86a00a36c0', queue = 'cpu-0', stop reason = breakpoint 4.1
  * frame #0: 0xffffff8010217f5a Sandbox`cred_sb_evaluate + 122
    frame #1: 0xffffff801021a535 Sandbox`hook_mount_check_snapshot_mount + 153
    frame #2: 0xffffff800daafe3f kernel`mac_mount_check_snapshot_mount(ctx=<unavailable>, rvp=0xffffff869e187200, vp=0xffffff86ae81bd00, cnp=0x0000000000000000, name="com.apple.TimeMachine.2020-12-11-135636.local", vfc_name="apfs") at mac_vfs.c:2376:2 [opt]
    frame #3: 0xffffff80102eb5fb apfs`handle_snapshot_mount + 4808
    frame #4: 0xffffff80102e5524 apfs`apfs_vfsop_mount + 9469
    frame #5: 0xffffff800d54f198 kernel`mount_common(fstypename=<unavailable>, pvp=0xffffff869e54d200, vp=<unavailable>, cnp=<unavailable>, fsmountargs=140732897724560, flags=2097177, internal_flags=0, labelstr=0x0000000000000000, kernelmount=0, ctx=0xffffff86a35383b0) at vfs_syscalls.c:1220:11 [opt]
    frame #6: 0xffffff800d551145 kernel`__mac_mount(p=<unavailable>, uap=0xffffffa04e7e3f18, retval=<unavailable>) at vfs_syscalls.c:596:10 [opt]
    frame #7: 0xffffff800d550cfe kernel`mount(p=<unavailable>, uap=<unavailable>, retval=<unavailable>) at vfs_syscalls.c:356:9 [opt]
    frame #8: 0xffffff800d969ceb kernel`unix_syscall64(state=<unavailable>) at systemcalls.c:412:10 [opt]
    frame #9: 0xffffff800d2621f6 kernel`hndl_unix_scall64 + 22
(lldb) register read $rax
     rax = 0x0000000000000000

Once we hit our breakpoint we will set a new breakpoint which is hit upon returning from sb_evaluate_internal. Now if we check the value of RAX, we will find that the return value is “0”, and our operation will indeed be successful. This confirms that indeed this is where the decision happens, and this check was missing prior Catalina 10.15.4.

If we do a binary diffing on the Sandbox between macOS version 10.15.3 and 10.15.4 we will see that the function hook_mount_check_snapshot_mount was introduced at this version.

Conclusion

In this rather long post we explored the secrets behind the fix for CVE-2020-9771. We started with reviewing the changes in the XNU source code, and found that a new MACF policy check, mac_mount_check_snapshot_mount was introduced for snapshot mounting. Then we moved over the Sandbox, where we examined the hook_mount_check_snapshot_mount function, which implements this check by calling into sb_evaluate_internal. Lastly we used kernel debugging to confirm the decision made by the Sandbox and compare the results of having and not having FDA rights.

I hope this was a useful article, I certainly learned a lot by going through this process. I think I will always feel like a n00b, no matter what, but now at least I’m one step forward.