On Android, MTE is always enabled in Zygote, and is disabled after fork for apps that didn't opt-in
to MTE.
Depends on the slab canary adjustments in the previous commit.
- tag slab allocations with [1..14] tags
- tag freed slab allocations with the "15" tag value to detect accesses to freed slab memory
- when generating tag value for a slab slot, always exclude most recent tag value for that slot
(to make use-after-free detection more reliable) and most recent tag values of its immediate
neighbors (to detect linear overflows and underflows)
If N_ARENA is greater than 1 `thread_arena` is initially to N_ARENA,
which is an invalid index into `ro.size_class_metadata[]`.
The actual used arena is computed in init().
Ensure init() is called if a new thread is only using realloc() to avoid
UB, e.g. pthread_mutex_lock() might crash due the memory not holding an
initialized mutex.
Affects mesa 23.2.0~rc4.
Example back trace using glmark2 (note `arena=4` with the default
N_ARENA being 4):
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ___pthread_mutex_lock (mutex=0x7edff8d3f200) at ./nptl/pthread_mutex_lock.c:80
type = <optimized out>
__PRETTY_FUNCTION__ = "___pthread_mutex_lock"
id = <optimized out>
#1 0x00007f0ab62091a6 in mutex_lock (m=0x7edff8d3f200) at ./mutex.h:21
No locals.
#2 0x00007f0ab620c9b5 in allocate_small (arena=4, requested_size=24) at h_malloc.c:517
info = {size = 32, class = 2}
size = 32
c = 0x7edff8d3f200
slots = 128
slab_size = 4096
metadata = 0x0
slot = 0
slab = 0x0
p = 0x0
#3 0x00007f0ab6209809 in allocate (arena=4, size=24) at h_malloc.c:1252
No locals.
#4 0x00007f0ab6208e26 in realloc (old=0x72b138199120, size=24) at h_malloc.c:1499
vma_merging_reliable = false
old_size = 16
new = 0x0
copy_size = 139683981990973
#5 0x00007299f919e556 in attach_shader (ctx=0x7299e9ef9000, shProg=0x7370c9277d30, sh=0x7370c9278230) at ../src/mesa/main/shaderapi.c:336
n = 1
#6 0x00007299f904223e in _mesa_unmarshal_AttachShader (ctx=<optimized out>, cmd=<optimized out>) at src/mapi/glapi/gen/marshal_generated2.c:1539
program = <optimized out>
shader = <optimized out>
cmd_size = 2
#7 0x00007299f8f2e3b2 in glthread_unmarshal_batch (job=job@entry=0x7299e9ef9168, gdata=gdata@entry=0x0, thread_index=thread_index@entry=0) at ../src/mesa/main/glthread.c:139
cmd = 0x7299e9ef9180
batch = 0x7299e9ef9168
ctx = 0x7299e9ef9000
pos = 0
used = 3
buffer = 0x7299e9ef9180
shared = <optimized out>
lock_mutexes = <optimized out>
batch_index = <optimized out>
#8 0x00007299f8ecc2d9 in util_queue_thread_func (input=input@entry=0x72c1160e5580) at ../src/util/u_queue.c:309
job = {job = 0x7299e9ef9168, global_data = 0x0, job_size = 0, fence = 0x7299e9ef9168, execute = <optimized out>, cleanup = <optimized out>}
queue = 0x7299e9ef9058
thread_index = 0
#9 0x00007299f8f1bcbb in impl_thrd_routine (p=<optimized out>) at ../src/c11/impl/threads_posix.c:67
pack = {func = 0x7299f8ecc190 <util_queue_thread_func>, arg = 0x72c1160e5580}
#10 0x00007f0ab5aa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139683974242608, 2767510063778797177, -168, 11, 140727286820160, 126005371879424, -4369625917767903623, -2847048016936659335}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#11 0x00007f0ab5b26a2c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This needs to be disabled for compatibility with the exploit protection
compatibility mode on GrapheneOS. hardened_malloc shouldn't be trying to
initialize itself when exploit protection compatibility mode is enabled.
This has to be handled in our Bionic integration instead.
Calls to malloc with a zero size are extremely rare relative to normal
usage of the API. It's generally only done by inefficient C code with
open coded dynamic array implementations where they aren't handling zero
size as a special case for their usage of malloc/realloc. Efficient code
wouldn't be making these allocations. It doesn't make sense to optimize
for the performance of rare edge cases caused by inefficient code.