hardened_malloc/README.md

This project currently aims to support Android, musl and glibc. It may support
other non-Linux operating systems in the future. For Android and musl, there
will be custom integration and other hardening features. The glibc support will
be limited to replacing the malloc implementation because musl is a much more
robust and cleaner base to build on and can cover the same use cases.

Debian stable determines the most ancient set of supported dependencies:

* glibc 2.24
* Linux 4.9
* Clang 3.8 or GCC 6.3

However, using more recent releases is highly recommended. Older versions of
the dependencies may be compatible at the moment but are not tested and will
explicitly not be supported.

For external malloc replacement with musl, musl 1.1.20 is required. However,
there will be custom integration offering better performance in the future
along with other hardening for the C standard library implementation.

Major releases of Android will be supported until tags stop being pushed to
the Android Open Source Project (AOSP). Google supports each major release
with security patches for 3 years, but tagged releases of the Android Open
Source Project are more than just security patches and are no longer pushed
once no officially supported devices are using them anymore. For example, at
the time of writing (September 2018), AOSP only has tagged releases for 8.1
(Nexus 5X, Nexus 5X, Pixel C) and 9.0 (Pixel, Pixel XL, Pixel 2, Pixel 2 XL).
There are ongoing security patches for 6.0, 6.0.1, 7.0, 7.1.1, 7.1.2, 8.0, 8.1
and 9.0 but only the active AOSP branches (8.1 and 9.0) are supported by this
project and it doesn't make much sense to use much older releases with far
less privacy and security hardening.

# Testing

The `preload.sh` script can be used for testing with dynamically linked
executables using glibc or musl:

    ./preload.sh krita --new-image RGBA,U8,500,500

It can be necessary to substantially increase the `vm.max_map_count` sysctl to
accomodate the large number of mappings caused by guard slabs and large
allocation guard regions. There will be a configuration option in `config.h`
for tuning the proportion of slabs to guard slabs too, since the default 1:1
proportion makes the address space quite sparse.

It can offer slightly better performance when integrated into the C standard
library and there are other opportunities for similar hardening within C
standard library and dynamic linker implementations. For example, a library
region can be implemented to offer similar isolation for dynamic libraries as
this allocator offers across different size classes. The intention is that this
will be offered as part of hardened variants of the Bionic and musl C standard
libraries.

# Configuration

You can set some configuration options at compile-time via arguments to the
make command as follows:

    make CONFIG_EXAMPLE=false

The available configuration options are the following:

* `CONFIG_CXX_ALLOCATOR`: `true` (default) or `false` to control whether the
  C++ allocator is replaced for slightly improved performance and detection of
  mismatched sizes for sized deallocation (often type confusion bugs). This
  will result in linking against the C++ standard library.
* `CONFIG_SEAL_METADATA`: `true` or `false` (default) to control whether Memory
  Protection Keys are used to disable access to all writable allocator state
  outside of the memory allocator code. It's currently disabled by default due
  to being extremely experimental and a significant performance cost for this
  use case on current generation hardware, which may become drastically lower
  in the future. Whether or not this feature is enabled, the metadata is all
  contained within an isolated memory region with high entropy random guard
  regions around it.

Additional compile-time configuration is available in the `config.h` file for
controlling the balance between security and performance / memory usage. By
default, all the optional security features are enabled. Options are only
provided for the features with a significant performance or memory usage cost.

```
#define WRITE_AFTER_FREE_CHECK true
#define SLOT_RANDOMIZE true
#define ZERO_ON_FREE true
#define SLAB_CANARY true
#define GUARD_SLABS_INTERVAL 1
#define GUARD_SIZE_DIVISOR 2
#define REGION_QUARANTINE_RANDOM_SIZE 128
#define REGION_QUARANTINE_QUEUE_SIZE 1024
#define REGION_QUARANTINE_SKIP_THRESHOLD (32 * 1024 * 1024)
#define FREE_SLABS_QUARANTINE_RANDOM_SIZE 32
```

There will be more control over enabled features in the future along with
control over fairly arbitrarily chosen values like the size of empty slab
caches (making them smaller improves security), the maximum size of guard
regions for large allocations and the proportion of slabs to guard slabs.

# Basic design

The current design is very simple and will become a bit more sophisticated as
the basic features are completed and the implementation is hardened and
optimized. The allocator is exclusive to 64-bit platforms in order to take full
advantage of the abundant address space without being constrained by needing to
keep the design compatible with 32-bit.

Small allocations are always located in a large memory region reserved for slab
allocations. It can be determined that an allocation is one of the small size
classes from the address range. Each small size class has a separate reserved
region within the larger region, and the size of a small allocation can simply
be determined from the range. Each small size class has a separate out-of-line
metadata array outside of the overall allocation region, with the index of the
metadata struct within the array mapping to the index of the slab within the
dedicated size class region. Slabs are a multiple of the page size and are
page aligned. The entire small size class region starts out memory protected
and becomes readable / writable as it gets allocated, with idle slabs beyond
the cache limit having their pages dropped and the memory protected again.

Large allocations are tracked via a global hash table mapping their address to
their size and guard size. They're simply memory mappings and get mapped on
allocation and then unmapped on free.

This allocator is aimed at production usage, not aiding with finding and fixing
memory corruption bugs for software development. It does find many latent bugs
but won't include features like the option of generating and storing stack
traces for each allocation to include the allocation site in related error
messages. The design choices are based around minimizing overhead and
maximizing security which often leads to different decisions than a tool
attempting to find bugs. For example, it uses zero-based sanitization on free
and doesn't minimize slack space from size class rounding between the end of an
allocation and the canary / guard region. Zero-based filling has the least
chance of uncovering latent bugs, but also the best chance of mitigating
vulnerabilities. The canary feature is primarily meant to act as padding
absorbing small overflows to render them harmless, so slack space is helpful
rather than harmful despite not detecting the corruption on free. The canary
needs detection on free in order to have any hope of stopping other kinds of
issues like a sequential overflow, which is why it's included.  It's assumed
that an attacker can figure out the allocator is in use so the focus is
explicitly not on detecting bugs that are impossible to exploit with it in use
like an 8 byte overflow. The design choices would be different if performance
was a bit less important and if a core goal was finding latent bugs.

# Security properties

* Fully out-of-line metadata
* Deterministic detection of any invalid free (unallocated, unaligned, etc.)
    * Validation of the size passed for C++14 sized deallocation by `delete`
      (detects type confusion if the size is different) and various containers
      using the allocator API directly
* Isolated memory region for slab allocations
    * Divided up into isolated inner regions for each size class
        * High entropy random base for each size class region
        * No deterministic / low entropy offsets between allocations with
          different size classes
    * Metadata is completely outside the slab allocation region
        * No references to metadata within the slab allocation region
        * No deterministic / low entropy offsets to metadata
    * Entire slab region starts out non-readable and non-writable
    * Slabs beyond the cache limit are purged and become non-readable and
      non-writable memory again
        * Placed into a queue for reuse in FIFO order to maximize the time
          spent memory protected
        * Randomized array is used to add a random delay for reuse
* Fine-grained randomization within memory regions
    * Randomly sized guard regions for large allocations
    * Random slot selection within slabs
    * [in-progress] Randomized delayed free for slab allocations
    * [in-progress] Randomized allocation of slabs
    * [more randomization coming as the implementation is matured]
* Slab allocations are zeroed on free
* Large allocations are purged and memory protected on free with the memory
  mapping kept reserved in a quarantine to detect use-after-free
    * The quarantine is primarily based on a FIFO ring buffer, with the oldest
      mapping in the quarantine being unmapped to make room for the most
      recently freed mapping
    * Another layer of the quarantine swaps with a random slot in an array to
      randomize the number of large deallocations required to push mappings out
      of the quarantine
* Detection of write-after-free by verifying zero filling is intact
* Memory in fresh allocations is consistently zeroed due to it either being
  fresh pages or zeroed on free after previous usage
* [in-progress] Delayed free via a combination of FIFO and randomization for
  slab allocations
* Random canaries placed after each slab allocation to *absorb*
  and then later detect overflows/underflows
    * High entropy per-slab random values
    * Leading byte is zeroed to contain C string overflows
    * [in-progress] Mangled into a unique value per slab slot (although not
      with a strong keyed hash due to performance limitations)
* Possible slab locations are skipped and remain memory protected, leaving slab
  size class regions interspersed with guard pages
* Zero size allocations are memory protected
* Protected allocator state (including all metadata)
    * Address space for state is entirely reserved during initialization and
      never reused for allocations or anything else
    * State within global variables is entirely read-only after initialization
      with pointers to the isolated allocator state so leaking the address of
      the library doesn't leak the address of writable state
    * Allocator state is located within a dedicated region with high entropy
      randomly sized guard regions around it
    * Protection via Memory Protection Keys (MPK) on x86\_64
    * [future] Protection via MTE on ARMv8.5+
* Extension for retrieving the size of allocations with fallback
  to a sentinel for pointers not managed by the allocator
    * Can also return accurate values for pointers *within* small allocations
    * The same applies to pointers within the first page of large allocations,
      otherwise it currently has to return a sentinel
* No alignment tricks interfering with ASLR like jemalloc, PartitionAlloc, etc.
* No usage of the legacy brk heap
* Aggressive sanity checks
    * Errors other than ENOMEM from mmap, munmap, mprotect and mremap treated
      as fatal, which can help to detect memory management gone wrong elsewhere
      in the process.

# Randomness

The current implementation of random number generation for randomization-based
mitigations is based on generating a keystream from a stream cipher (ChaCha8)
in small chunks. A separate CSPRNG is used for each small size class, large
allocations, etc. in order to fit into the existing fine-grained locking model
without needing to waste memory per thread by having the CSPRNG state in Thread
Local Storage. Similarly, it's protected via the same approach taken for the
rest of the metadata. The stream cipher is regularly reseeded from the OS to
provide backtracking and prediction resistance with a negligible cost. The
reseed interval simply needs to be adjusted to the point that it stops
registering as having any significant performance impact. The performance
impact on recent Linux kernels is primarily from the high cost of system calls
and locking since the implementation is quite efficient (ChaCha20), especially
for just generating the key and nonce for another stream cipher (ChaCha8).

ChaCha8 is a great fit because it's extremely fast across platforms without
relying on hardware support or complex platform-specific code. The security
margins of ChaCha20 would be completely overkill for the use case. Using
ChaCha8 avoids needing to resort to a non-cryptographically secure PRNG or
something without a lot of scrunity. The current implementation is simply the
reference implementation of ChaCha8 converted into a pure keystream by ripping
out the XOR of the message into the keystream.

The random range generation functions are a highly optimized implementation
too. Traditional uniform random number generation within a range is very high
overhead and can easily dwarf the cost of an efficient CSPRNG.

# Size classes

The zero byte size class is a special case of the smallest regular size class. It's allocated in a
separate region with the memory left non-readable and non-writable.

The slab slot count for each size class is not yet finely tuned beyond choosing values avoiding
internal fragmentation for slabs (i.e. avoiding wasted space due to page size rounding).

The choice of size classes is the same as jemalloc, but with a much different approach to the
slabs containing them:

> size classes are multiples of the quantum [16], spaced such that there are four size classes for
> each doubling in size, which limits internal fragmentation to approximately 20% for all but the
> smallest size classes

| size class | worst case internal fragmentation | slab slots | slab size | worst case internal fragmentation for slabs |
| - | - | - | - | - |
| 16 | 100% | 256 | 4096 | 0.0% |
| 32 | 46.875% | 128 | 4096 | 0.0% |
| 48 | 31.25% | 85 | 4096 | 0.390625% |
| 64 | 23.4375% | 64 | 4096 | 0.0% |
| 80 | 18.75% | 51 | 4096 | 0.390625% |
| 96 | 15.625% | 42 | 4096 | 1.5625% |
| 112 | 13.392857142857139% | 36 | 4096 | 1.5625% |
| 128 | 11.71875% | 64 | 8192 | 0.0% |
| 160 | 19.375% | 51 | 8192 | 0.390625% |
| 192 | 16.145833333333343% | 64 | 12288 | 0.0% |
| 224 | 13.839285714285708% | 54 | 12288 | 1.5625% |
| 256 | 12.109375% | 64 | 16384 | 0.0% |
| 320 | 19.6875% | 64 | 20480 | 0.0% |
| 384 | 16.40625% | 64 | 24576 | 0.0% |
| 448 | 14.0625% | 64 | 28672 | 0.0% |
| 512 | 12.3046875% | 64 | 32768 | 0.0% |
| 640 | 19.84375% | 64 | 40960 | 0.0% |
| 768 | 16.536458333333343% | 64 | 49152 | 0.0% |
| 896 | 14.174107142857139% | 64 | 57344 | 0.0% |
| 1024 | 12.40234375% | 64 | 65536 | 0.0% |
| 1280 | 19.921875% | 16 | 20480 | 0.0% |
| 1536 | 16.6015625% | 16 | 24576 | 0.0% |
| 1792 | 14.229910714285708% | 16 | 28672 | 0.0% |
| 2048 | 12.451171875% | 16 | 32768 | 0.0% |
| 2560 | 19.9609375% | 8 | 20480 | 0.0% |
| 3072 | 16.634114583333343% | 8 | 24576 | 0.0% |
| 3584 | 14.2578125% | 8 | 28672 | 0.0% |
| 4096 | 12.4755859375% | 8 | 32768 | 0.0% |
| 5120 | 19.98046875% | 8 | 40960 | 0.0% |
| 6144 | 16.650390625% | 8 | 49152 | 0.0% |
| 7168 | 14.271763392857139% | 8 | 57344 | 0.0% |
| 8192 | 12.48779296875% | 8 | 65536 | 0.0% |
| 10240 | 19.990234375% | 6 | 61440 | 0.0% |
| 12288 | 16.658528645833343% | 5 | 61440 | 0.0% |
| 14336 | 14.278738839285708% | 4 | 57344 | 0.0% |
| 16384 | 12.493896484375% | 4 | 65536 | 0.0% |
expand documentation on supported environments 2018-09-02 06:05:37 -04:00			`This project currently aims to support Android, musl and glibc. It may support`
			`other non-Linux operating systems in the future. For Android and musl, there`
			`will be custom integration and other hardening features. The glibc support will`
			`be limited to replacing the malloc implementation because musl is a much more`
			`robust and cleaner base to build on and can cover the same use cases.`

document minimum versions for dependencies 2018-08-24 18:58:55 -04:00			`Debian stable determines the most ancient set of supported dependencies:`

			`* glibc 2.24`
			`* Linux 4.9`
			`* Clang 3.8 or GCC 6.3`

			`However, using more recent releases is highly recommended. Older versions of`
			`the dependencies may be compatible at the moment but are not tested and will`
			`explicitly not be supported.`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00
document minimum musl version for external use 2018-09-06 19:52:09 -04:00			`For external malloc replacement with musl, musl 1.1.20 is required. However,`
			`there will be custom integration offering better performance in the future`
			`along with other hardening for the C standard library implementation.`

expand documentation on supported environments 2018-09-02 06:05:37 -04:00			`Major releases of Android will be supported until tags stop being pushed to`
			`the Android Open Source Project (AOSP). Google supports each major release`
			`with security patches for 3 years, but tagged releases of the Android Open`
			`Source Project are more than just security patches and are no longer pushed`
			`once no officially supported devices are using them anymore. For example, at`
			`the time of writing (September 2018), AOSP only has tagged releases for 8.1`
			`(Nexus 5X, Nexus 5X, Pixel C) and 9.0 (Pixel, Pixel XL, Pixel 2, Pixel 2 XL).`
			`There are ongoing security patches for 6.0, 6.0.1, 7.0, 7.1.1, 7.1.2, 8.0, 8.1`
			`and 9.0 but only the active AOSP branches (8.1 and 9.0) are supported by this`
			`project and it doesn't make much sense to use much older releases with far`
			`less privacy and security hardening.`
expand documentation in the README 2018-08-30 11:07:20 -04:00
add documentation on testing 2018-10-04 03:27:30 -04:00			`# Testing`

			The `preload.sh` script can be used for testing with dynamically linked
			`executables using glibc or musl:`

			`./preload.sh krita --new-image RGBA,U8,500,500`

document tuning vm.max_map_count 2018-10-04 03:44:19 -04:00			It can be necessary to substantially increase the `vm.max_map_count` sysctl to
			`accomodate the large number of mappings caused by guard slabs and large`
			allocation guard regions. There will be a configuration option in `config.h`
			`for tuning the proportion of slabs to guard slabs too, since the default 1:1`
			`proportion makes the address space quite sparse.`

add documentation on testing 2018-10-04 03:27:30 -04:00			`It can offer slightly better performance when integrated into the C standard`
			`library and there are other opportunities for similar hardening within C`
			`standard library and dynamic linker implementations. For example, a library`
			`region can be implemented to offer similar isolation for dynamic libraries as`
			`this allocator offers across different size classes. The intention is that this`
			`will be offered as part of hardened variants of the Bionic and musl C standard`
			`libraries.`

add basic documentation on configuration 2018-10-04 03:15:55 -04:00			`# Configuration`

support replacing C++ new/delete implementation This adds support for sanity checks based on sized deallocation and will reduce the overhead of calls through the C++ allocator. 2018-09-19 13:57:35 -04:00			`You can set some configuration options at compile-time via arguments to the`
			`make command as follows:`

			`make CONFIG_EXAMPLE=false`

			`The available configuration options are the following:`

			* `CONFIG_CXX_ALLOCATOR`: `true` (default) or `false` to control whether the
explain the purpose of CONFIG_CXX_ALLOCATOR 2018-10-18 14:57:05 -04:00			`C++ allocator is replaced for slightly improved performance and detection of`
			`mismatched sizes for sized deallocation (often type confusion bugs). This`
			`will result in linking against the C++ standard library.`
guard metadata with Memory Protection Keys (MPK) 2018-10-19 21:29:40 -04:00			* `CONFIG_SEAL_METADATA`: `true` or `false` (default) to control whether Memory
			`Protection Keys are used to disable access to all writable allocator state`
			`outside of the memory allocator code. It's currently disabled by default due`
document relatively high performance cost of MPK 2018-10-23 19:38:00 -04:00			`to being extremely experimental and a significant performance cost for this`
			`use case on current generation hardware, which may become drastically lower`
some clarifications for the README 2018-10-28 20:28:10 -04:00			`in the future. Whether or not this feature is enabled, the metadata is all`
			`contained within an isolated memory region with high entropy random guard`
			`regions around it.`

			Additional compile-time configuration is available in the `config.h` file for
			`controlling the balance between security and performance / memory usage. By`
			`default, all the optional security features are enabled. Options are only`
			`provided for the features with a significant performance or memory usage cost.`
add basic documentation on configuration 2018-10-04 03:15:55 -04:00
			```
			`#define WRITE_AFTER_FREE_CHECK true`
			`#define SLOT_RANDOMIZE true`
			`#define ZERO_ON_FREE true`
			`#define SLAB_CANARY true`
update configuration in README 2018-10-06 15:37:50 -04:00			`#define GUARD_SLABS_INTERVAL 1`
			`#define GUARD_SIZE_DIVISOR 2`
update configuration documentation again 2018-10-18 20:18:55 -04:00			`#define REGION_QUARANTINE_RANDOM_SIZE 128`
			`#define REGION_QUARANTINE_QUEUE_SIZE 1024`
add quarantine for large allocations 2018-10-08 15:50:31 -04:00			`#define REGION_QUARANTINE_SKIP_THRESHOLD (32 * 1024 * 1024)`
update configuration documentation again 2018-10-18 20:18:55 -04:00			`#define FREE_SLABS_QUARANTINE_RANDOM_SIZE 32`
add basic documentation on configuration 2018-10-04 03:15:55 -04:00			```

			`There will be more control over enabled features in the future along with`
			`control over fairly arbitrarily chosen values like the size of empty slab`
			`caches (making them smaller improves security), the maximum size of guard`
			`regions for large allocations and the proportion of slabs to guard slabs.`

add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`# Basic design`

			`The current design is very simple and will become a bit more sophisticated as`
			`the basic features are completed and the implementation is hardened and`
			`optimized. The allocator is exclusive to 64-bit platforms in order to take full`
			`advantage of the abundant address space without being constrained by needing to`
			`keep the design compatible with 32-bit.`

			`Small allocations are always located in a large memory region reserved for slab`
			`allocations. It can be determined that an allocation is one of the small size`
			`classes from the address range. Each small size class has a separate reserved`
			`region within the larger region, and the size of a small allocation can simply`
			`be determined from the range. Each small size class has a separate out-of-line`
			`metadata array outside of the overall allocation region, with the index of the`
			`metadata struct within the array mapping to the index of the slab within the`
			`dedicated size class region. Slabs are a multiple of the page size and are`
fix another typo in the README 2018-10-04 04:12:21 -04:00			`page aligned. The entire small size class region starts out memory protected`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`and becomes readable / writable as it gets allocated, with idle slabs beyond`
			`the cache limit having their pages dropped and the memory protected again.`

			`Large allocations are tracked via a global hash table mapping their address to`
			`their size and guard size. They're simply memory mappings and get mapped on`
			`allocation and then unmapped on free.`

explain the production-oriented design rationale 2018-10-15 04:04:51 -04:00			`This allocator is aimed at production usage, not aiding with finding and fixing`
			`memory corruption bugs for software development. It does find many latent bugs`
			`but won't include features like the option of generating and storing stack`
			`traces for each allocation to include the allocation site in related error`
			`messages. The design choices are based around minimizing overhead and`
			`maximizing security which often leads to different decisions than a tool`
			`attempting to find bugs. For example, it uses zero-based sanitization on free`
			`and doesn't minimize slack space from size class rounding between the end of an`
			`allocation and the canary / guard region. Zero-based filling has the least`
			`chance of uncovering latent bugs, but also the best chance of mitigating`
			`vulnerabilities. The canary feature is primarily meant to act as padding`
			`absorbing small overflows to render them harmless, so slack space is helpful`
			`rather than harmful despite not detecting the corruption on free. The canary`
			`needs detection on free in order to have any hope of stopping other kinds of`
			`issues like a sequential overflow, which is why it's included. It's assumed`
			`that an attacker can figure out the allocator is in use so the focus is`
			`explicitly not on detecting bugs that are impossible to exploit with it in use`
			`like an 8 byte overflow. The design choices would be different if performance`
			`was a bit less important and if a core goal was finding latent bugs.`

add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`# Security properties`

			`* Fully out-of-line metadata`
			`* Deterministic detection of any invalid free (unallocated, unaligned, etc.)`
document sized deallocation validation 2018-10-10 18:48:45 -04:00			* Validation of the size passed for C++14 sized deallocation by `delete`
			`(detects type confusion if the size is different) and various containers`
			`using the allocator API directly`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`* Isolated memory region for slab allocations`
expand documentation in the README 2018-08-30 11:07:20 -04:00			`* Divided up into isolated inner regions for each size class`
			`* High entropy random base for each size class region`
			`* No deterministic / low entropy offsets between allocations with`
			`different size classes`
add missing word 2018-08-26 07:11:22 -04:00			`* Metadata is completely outside the slab allocation region`
expand documentation in the README 2018-08-30 11:07:20 -04:00			`* No references to metadata within the slab allocation region`
			`* No deterministic / low entropy offsets to metadata`
			`* Entire slab region starts out non-readable and non-writable`
			`* Slabs beyond the cache limit are purged and become non-readable and`
			`non-writable memory again`
document free slabs quarantine 2018-10-14 22:19:10 -04:00			`* Placed into a queue for reuse in FIFO order to maximize the time`
			`spent memory protected`
			`* Randomized array is used to add a random delay for reuse`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`* Fine-grained randomization within memory regions`
			`* Randomly sized guard regions for large allocations`
			`* Random slot selection within slabs`
			`* [in-progress] Randomized delayed free for slab allocations`
			`* [in-progress] Randomized allocation of slabs`
			`* [more randomization coming as the implementation is matured]`
add quarantine for large allocations 2018-10-08 15:50:31 -04:00			`* Slab allocations are zeroed on free`
			`* Large allocations are purged and memory protected on free with the memory`
			`mapping kept reserved in a quarantine to detect use-after-free`
document the randomized array in the quarantine 2018-10-12 15:10:35 -04:00			`* The quarantine is primarily based on a FIFO ring buffer, with the oldest`
			`mapping in the quarantine being unmapped to make room for the most`
			`recently freed mapping`
			`* Another layer of the quarantine swaps with a random slot in an array to`
			`randomize the number of large deallocations required to push mappings out`
			`of the quarantine`
aggressive write-after-free check is implemented 2018-09-11 05:17:22 -04:00			`* Detection of write-after-free by verifying zero filling is intact`
clarify memory is zero on init without more writes 2018-08-27 01:14:15 -04:00			`* Memory in fresh allocations is consistently zeroed due to it either being`
			`fresh pages or zeroed on free after previous usage`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`* [in-progress] Delayed free via a combination of FIFO and randomization for`
			`slab allocations`
document that slab canaries are implemented 2018-09-05 00:19:27 -04:00			`* Random canaries placed after each slab allocation to absorb`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`and then later detect overflows/underflows`
			`* High entropy per-slab random values`
zero leading byte of canaries 2018-10-03 17:09:57 -04:00			`* Leading byte is zeroed to contain C string overflows`
document that slab canaries are implemented 2018-09-05 00:19:27 -04:00			`* [in-progress] Mangled into a unique value per slab slot (although not`
			`with a strong keyed hash due to performance limitations)`
add initial guard slabs implementation 2018-09-06 18:53:06 -04:00			`* Possible slab locations are skipped and remain memory protected, leaving slab`
			`size class regions interspersed with guard pages`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`* Zero size allocations are memory protected`
document progress on protected allocator state 2018-10-14 22:12:03 -04:00			`* Protected allocator state (including all metadata)`
			`* Address space for state is entirely reserved during initialization and`
expand on metadata address space security 2018-10-10 18:55:31 -04:00			`never reused for allocations or anything else`
document progress on protected allocator state 2018-10-14 22:12:03 -04:00			`* State within global variables is entirely read-only after initialization`
			`with pointers to the isolated allocator state so leaking the address of`
			`the library doesn't leak the address of writable state`
metadata region unification is completed 2018-10-18 16:10:49 -04:00			`* Allocator state is located within a dedicated region with high entropy`
			`randomly sized guard regions around it`
guard metadata with Memory Protection Keys (MPK) 2018-10-19 21:29:40 -04:00			`* Protection via Memory Protection Keys (MPK) on x86\_64`
set scope of planned metadata protection 2018-10-18 20:33:48 -04:00			`* [future] Protection via MTE on ARMv8.5+`
expand documentation in the README 2018-08-30 11:07:20 -04:00			`* Extension for retrieving the size of allocations with fallback`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00			`to a sentinel for pointers not managed by the allocator`
			`* Can also return accurate values for pointers within small allocations`
			`* The same applies to pointers within the first page of large allocations,`
			`otherwise it currently has to return a sentinel`
expand documentation in the README 2018-08-30 11:07:20 -04:00			`* No alignment tricks interfering with ASLR like jemalloc, PartitionAlloc, etc.`
			`* No usage of the legacy brk heap`
			`* Aggressive sanity checks`
			`* Errors other than ENOMEM from mmap, munmap, mprotect and mremap treated`
fix typo in README 2018-10-03 17:23:20 -04:00			`as fatal, which can help to detect memory management gone wrong elsewhere`
			`in the process.`
expand documentation in the README 2018-08-30 11:07:20 -04:00
			`# Randomness`

			`The current implementation of random number generation for randomization-based`
			`mitigations is based on generating a keystream from a stream cipher (ChaCha8)`
			`in small chunks. A separate CSPRNG is used for each small size class, large`
			`allocations, etc. in order to fit into the existing fine-grained locking model`
			`without needing to waste memory per thread by having the CSPRNG state in Thread`
			`Local Storage. Similarly, it's protected via the same approach taken for the`
			`rest of the metadata. The stream cipher is regularly reseeded from the OS to`
			`provide backtracking and prediction resistance with a negligible cost. The`
			`reseed interval simply needs to be adjusted to the point that it stops`
			`registering as having any significant performance impact. The performance`
			`impact on recent Linux kernels is primarily from the high cost of system calls`
			`and locking since the implementation is quite efficient (ChaCha20), especially`
			`for just generating the key and nonce for another stream cipher (ChaCha8).`

			`ChaCha8 is a great fit because it's extremely fast across platforms without`
			`relying on hardware support or complex platform-specific code. The security`
			`margins of ChaCha20 would be completely overkill for the use case. Using`
			`ChaCha8 avoids needing to resort to a non-cryptographically secure PRNG or`
			`something without a lot of scrunity. The current implementation is simply the`
			`reference implementation of ChaCha8 converted into a pure keystream by ripping`
			`out the XOR of the message into the keystream.`

			`The random range generation functions are a highly optimized implementation`
			`too. Traditional uniform random number generation within a range is very high`
			`overhead and can easily dwarf the cost of an efficient CSPRNG.`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00
			`# Size classes`

			`The zero byte size class is a special case of the smallest regular size class. It's allocated in a`
			`separate region with the memory left non-readable and non-writable.`

			`The slab slot count for each size class is not yet finely tuned beyond choosing values avoiding`
			`internal fragmentation for slabs (i.e. avoiding wasted space due to page size rounding).`

clarify that the slab design isn't like jemalloc 2018-08-26 07:19:58 -04:00			`The choice of size classes is the same as jemalloc, but with a much different approach to the`
			`slabs containing them:`
add some basic initial documentation to the README 2018-08-26 06:23:24 -04:00
			`> size classes are multiples of the quantum [16], spaced such that there are four size classes for`
			`> each doubling in size, which limits internal fragmentation to approximately 20% for all but the`
			`> smallest size classes`

			`\| size class \| worst case internal fragmentation \| slab slots \| slab size \| worst case internal fragmentation for slabs \|`
			`\| - \| - \| - \| - \| - \|`
			`\| 16 \| 100% \| 256 \| 4096 \| 0.0% \|`
			`\| 32 \| 46.875% \| 128 \| 4096 \| 0.0% \|`
			`\| 48 \| 31.25% \| 85 \| 4096 \| 0.390625% \|`
			`\| 64 \| 23.4375% \| 64 \| 4096 \| 0.0% \|`
			`\| 80 \| 18.75% \| 51 \| 4096 \| 0.390625% \|`
			`\| 96 \| 15.625% \| 42 \| 4096 \| 1.5625% \|`
			`\| 112 \| 13.392857142857139% \| 36 \| 4096 \| 1.5625% \|`
			`\| 128 \| 11.71875% \| 64 \| 8192 \| 0.0% \|`
			`\| 160 \| 19.375% \| 51 \| 8192 \| 0.390625% \|`
			`\| 192 \| 16.145833333333343% \| 64 \| 12288 \| 0.0% \|`
			`\| 224 \| 13.839285714285708% \| 54 \| 12288 \| 1.5625% \|`
			`\| 256 \| 12.109375% \| 64 \| 16384 \| 0.0% \|`
			`\| 320 \| 19.6875% \| 64 \| 20480 \| 0.0% \|`
			`\| 384 \| 16.40625% \| 64 \| 24576 \| 0.0% \|`
			`\| 448 \| 14.0625% \| 64 \| 28672 \| 0.0% \|`
			`\| 512 \| 12.3046875% \| 64 \| 32768 \| 0.0% \|`
			`\| 640 \| 19.84375% \| 64 \| 40960 \| 0.0% \|`
			`\| 768 \| 16.536458333333343% \| 64 \| 49152 \| 0.0% \|`
			`\| 896 \| 14.174107142857139% \| 64 \| 57344 \| 0.0% \|`
			`\| 1024 \| 12.40234375% \| 64 \| 65536 \| 0.0% \|`
			`\| 1280 \| 19.921875% \| 16 \| 20480 \| 0.0% \|`
			`\| 1536 \| 16.6015625% \| 16 \| 24576 \| 0.0% \|`
			`\| 1792 \| 14.229910714285708% \| 16 \| 28672 \| 0.0% \|`
			`\| 2048 \| 12.451171875% \| 16 \| 32768 \| 0.0% \|`
			`\| 2560 \| 19.9609375% \| 8 \| 20480 \| 0.0% \|`
			`\| 3072 \| 16.634114583333343% \| 8 \| 24576 \| 0.0% \|`
			`\| 3584 \| 14.2578125% \| 8 \| 28672 \| 0.0% \|`
			`\| 4096 \| 12.4755859375% \| 8 \| 32768 \| 0.0% \|`
			`\| 5120 \| 19.98046875% \| 8 \| 40960 \| 0.0% \|`
			`\| 6144 \| 16.650390625% \| 8 \| 49152 \| 0.0% \|`
			`\| 7168 \| 14.271763392857139% \| 8 \| 57344 \| 0.0% \|`
			`\| 8192 \| 12.48779296875% \| 8 \| 65536 \| 0.0% \|`
			`\| 10240 \| 19.990234375% \| 6 \| 61440 \| 0.0% \|`
			`\| 12288 \| 16.658528645833343% \| 5 \| 61440 \| 0.0% \|`
			`\| 14336 \| 14.278738839285708% \| 4 \| 57344 \| 0.0% \|`
			`\| 16384 \| 12.493896484375% \| 4 \| 65536 \| 0.0% \|`