BBRv1 significantly improves throughput in some cases but it also
significantly reduces it in others. We've run into too many network
conditions it handles quite poorly. There's also a bad interaction
between BBR and synproxy where it will cripple the initial throughput
for connections established via synproxy. This means a basic SYN flood
attack could cripple initial TCP throughput for most connections.
Android doesn't enable ECN for outbound connections yet and we don't
want to deviate from that so it mainly only gets activated for macOS
and iOS clients. Linux kernel approach to ECN hasn't been modernized and
there are fierce debates about how it should work. It can cause issues
and it seems best to avoid it until Android enables it.
The default was switched from sntrup761x25519-sha512@openssh.com to
mlkem768x25519-sha256 in OpenSSH 10.0. It's much faster and also matches
the new default TLS key exchange algorithm for OpenSSL 3.5.0.
Using RAID 1 for ESP is the normal approach used by typical automatic
installs on dedicated servers. It's discouraged by systemd since they
don't know if out-of-band writes could happen such as a Windows install
seeing it and mounting it. That's not a problem for us and we want to do
things the normal way instead of a more error prone approach of syncing
changes without RAID 1.
This wasn't initially enabled because we were concerned about a
potential bottleneck due to CAKE being single threaded. We expect the
Ryzen 9950X will be more than powerful enough for CAKE at 25Gbps and it
does appear to help substantially compared to fq_codel with maintaining
high throughput across problematic connections especially when combined
with BBR which we'll likely switch to for congestion control across the
servers, especially with BBRv3 on the horizon.
This is needed because mjolnir connecting directly to synapse causes it
to repeatedly disconnect around every hour, likely due to an issue with
keepalive.