Commit graph

13 commits

Author SHA1 Message Date
Daniel Micay
a999a00c88 split metal and mdraid server types 2025-11-06 11:59:13 -05:00
Daniel Micay
e57096dfec disable TCP Fast Open on BuyVM for now 2025-09-30 16:56:21 -04:00
Daniel Micay
47062b9c68 raise wmem_max/rmem_max for non-autotuned buffers
Unbound now requests 4M for the send buffer by default and we might as
well permit that for both the send and receive buffers. We set the max
auto-tuned send buffer size on a per-server basis but don't currently
have much use for tuning the maximum manually specified buffer size
across servers. It can be moved in the future if needed.
2025-09-18 13:56:46 -04:00
Daniel Micay
b2c15916cc no need to override default qdisc since we set it 2025-09-17 19:23:26 -04:00
Daniel Micay
35ca9a2a19 allow server TCP Fast Open and rotate the keys
This needs to be configured by specific services to have any effect. For
now, we're only enabling it for the PowerDNS Authoritative Server and
dnsdist since it's recommended by RFC 9210 and actively used by various
recursive resolver servers when falling back to TCP. TCP Fast Open is
rarely used from end user devices due to it enabling tracking and having
issues with middleboxes. We aren't going to start using it anywhere in
GrapheneOS but may have more server-side uses for it. This functionality
is built into QUIC without the same downsides but QUIC support in the
software we use is not ready for us to enable it, especially the very
primitive support in nginx.

For most servers, a new random TCP Fast Open key is created on a daily
basis and the previous key continues to be accepted. For DNS servers,
the new key is generated via a keyed hash of the current date in order
to keep it consistent across servers providing an anycast IP without it
needing regular synchronization.
2025-09-15 21:10:39 -04:00
Daniel Micay
28106192b1 reduce conntrack TCP established timeout to 1 hour
We have nothing depending on having even anywhere close to 1 hour of
idle time so we could reduce this significantly more.
2025-09-01 19:35:03 -04:00
Daniel Micay
efced81f5f add ordering prefix to relevant configuration 2025-08-16 13:01:44 -04:00
Daniel Micay
b01dfbb947 switch to fq as the default qdisc 2025-08-14 16:57:48 -04:00
Daniel Micay
54d41f25fa switch congestion control back to BBRv1 from CUBIC
BBRv1 provides much better throughput in many cases and is particularly
useful for our update servers. The fairness issues based on round trip
time are not a major issue for us. The fairness issues for competing
with traditional loss-based congestion control are relevant to us but it
seems to benefit it more than it hurts us. BBRv3 will fix most of this
while preserving nearly all the benefits and will likely be shipped as a
replacement for BBRv1 in the Linux kernel rather than another option.

The reason we rolled it back last time was seeing cases of the initial
bandwidth estimate being overly low combined with a very bad interaction
with synproxy causing low bandwidth initially. We've partially addressed
the synproxy issue by raising the synproxy threshold based on conntrack
table size which we're now fully scaling based on available memory. If
we decide this is still a significant issue, we can limit using BBRv1 to
our update servers where it has massive benefits and the least downside
due to initial bandwidth not being as important. BBRv3 will help with
this by probing Round Trip Time every 5 seconds instead of 10 seconds
but still has similar issues.
2025-07-23 00:26:41 -04:00
Daniel Micay
d14c4cccc6 use default conntrack UDP stream timeout
This is relevant to zerotier and will be relevant to QUIC once we begin
using it.
2025-07-23 00:26:41 -04:00
Daniel Micay
39b5148808 switch back to CUBIC from BBRv1 and keep ECN off
BBRv1 significantly improves throughput in some cases but it also
significantly reduces it in others. We've run into too many network
conditions it handles quite poorly. There's also a bad interaction
between BBR and synproxy where it will cripple the initial throughput
for connections established via synproxy. This means a basic SYN flood
attack could cripple initial TCP throughput for most connections.

Android doesn't enable ECN for outbound connections yet and we don't
want to deviate from that so it mainly only gets activated for macOS
and iOS clients. Linux kernel approach to ECN hasn't been modernized and
there are fierce debates about how it should work. It can cause issues
and it seems best to avoid it until Android enables it.
2025-04-25 13:34:33 -04:00
Daniel Micay
e38b248b47 raise RAID resync limit for bare metal servers 2025-04-23 21:10:49 -04:00
Daniel Micay
1f4d7316b8 reorganize configurations into etc directory 2025-04-15 12:53:49 -04:00