Commit graph

32 commits

Author SHA1 Message Date
Daniel Micay
b2cc89768a switch from CAKE to mq fq_codel for update servers
CAKE was causing a bottleneck due to being single threaded.
2025-07-09 15:35:32 -04:00
Daniel Micay
e617cfe441 unbound: enable infra-keep-probing 2025-07-01 14:34:46 -04:00
Daniel Micay
45b8e80e31 switch congestion control back to BBRv1 from CUBIC
BBRv1 provides much better throughput in many cases and is particularly
useful for our update servers. The fairness issues based on round trip
time are not a major issue for us. The fairness issues for competing
with traditional loss-based congestion control are relevant to us but it
seems to benefit it more than it hurts us. BBRv3 will fix most of this
while preserving nearly all the benefits and will likely be shipped as a
replacement for BBRv1 in the Linux kernel rather than another option.

The reason we rolled it back last time was seeing cases of the initial
bandwidth estimate being overly low combined with a very bad interaction
with synproxy causing low bandwidth initially. We've partially addressed
the synproxy issue by raising the synproxy threshold based on conntrack
table size which we're now fully scaling based on available memory. If
we decide this is still a significant issue, we can limit using BBRv1 to
our update servers where it has massive benefits and the least downside
due to initial bandwidth not being as important. BBRv3 will help with
this by probing Round Trip Time every 5 seconds instead of 10 seconds
but still has similar issues.
2025-07-01 10:13:05 -04:00
Daniel Micay
dfa2f48ae1 move zerotier-one to port 999 2025-06-27 14:11:44 -04:00
Daniel Micay
ac0dc27596 move dnsdist control socket to port 55
This avoids unnecessary overlap with our ephemeral port range.
2025-06-27 13:39:43 -04:00
Daniel Micay
3b2f6d546c nftables: simplify nameserver control socket rules 2025-06-27 13:10:16 -04:00
Daniel Micay
8b87654075 scale synproxy threshold based on conntrack max 2025-06-22 22:27:48 -04:00
Daniel Micay
5b9e9fe712 use default conntrack UDP stream timeout
This is relevant to zerotier and will be relevant to QUIC once we begin
using it.
2025-06-22 22:08:34 -04:00
Daniel Micay
6b2e72e935 sshd: reduce LoginGraceTime to 5s 2025-06-06 11:01:01 -04:00
Daniel Micay
57a5209d8b integrate dnsdist in session ticket keys management 2025-05-27 15:40:54 -04:00
Daniel Micay
94a2567b15 add tls group for session ticket keys 2025-05-27 15:40:52 -04:00
Daniel Micay
44f6e6021a make session ticket management more generic 2025-05-27 14:23:23 -04:00
Daniel Micay
7cb75131dc drop executable bit for regular files in FAT32 ESP 2025-05-21 20:00:08 -04:00
Daniel Micay
5c41418606 nftables: add support for dnsdist control socket 2025-05-16 13:19:38 -04:00
Daniel Micay
e75172d57c replace nginx with dnsdist for DNS-over-TLS 2025-05-13 21:42:53 -04:00
Daniel Micay
f9f3cdab05 add 1.ns1.grapheneos.org server 2025-05-08 22:26:56 -04:00
Daniel Micay
7095105832 add 3.ns1.grapheneos.org server 2025-05-08 22:26:56 -04:00
Daniel Micay
90a7780b5e migrate to new tlsserver Let's Encrypt profile
We can no longer use OCSP stapling and Must-Staple. These will soon be
obsolete once the `shortlived` profile is available for public use since
it will provide certificates with a similar lifetime as OCSP responses.

In the meantime, we've moved to the `tlsserver` profile stripping legacy
features to prepare for the `shortlived` profile which will be identical
to `tlsserver` but with a validity period of 6 days.

The certificate for SUPL is still temporarily using the classic profile
to work around the older generations of end-of-life Snapdragon Pixels
not having support for SNI. We can eventually drop support for these
devices from the SUPL service to allow us to disable TLSv1.1, DHE and
move to the `tlsserver` or `shortlived` profile.

The certificate for SMTP is still temporarily using the classic profile
to avoid potential compatibility issues with servers supporting TLSv1.2
but still not yet supporting SNI.
2025-05-08 22:26:43 -04:00
Daniel Micay
a6d1e00d07 drop SSH connections to new anycast IPs 2025-05-05 17:29:56 -04:00
Daniel Micay
029882f051 set up certificate replication for ns1 replicas 2025-05-05 17:29:54 -04:00
Daniel Micay
c7cb5d025e add 2.ns1.grapheneos.org server 2025-05-04 16:01:04 -04:00
Daniel Micay
2784008a65 nftables: add support for rage4 anycast for ns1 2025-05-03 18:13:20 -04:00
Daniel Micay
566f1a10d2 rename ns1.grapheneos.org to 0.ns1.grapheneos.org 2025-05-03 18:13:18 -04:00
Daniel Micay
7861ef2c30 remove legacy OVH update servers 2025-04-30 23:27:40 -04:00
Daniel Micay
39b5148808 switch back to CUBIC from BBRv1 and keep ECN off
BBRv1 significantly improves throughput in some cases but it also
significantly reduces it in others. We've run into too many network
conditions it handles quite poorly. There's also a bad interaction
between BBR and synproxy where it will cripple the initial throughput
for connections established via synproxy. This means a basic SYN flood
attack could cripple initial TCP throughput for most connections.

Android doesn't enable ECN for outbound connections yet and we don't
want to deviate from that so it mainly only gets activated for macOS
and iOS clients. Linux kernel approach to ECN hasn't been modernized and
there are fierce debates about how it should work. It can cause issues
and it seems best to avoid it until Android enables it.
2025-04-25 13:34:33 -04:00
Daniel Micay
9556ca4b79 use 4.releases.grapheneos.org as primary instance 2025-04-25 00:47:28 -04:00
Daniel Micay
9290c1fd90 add new ReliableSite update servers 2025-04-24 01:15:39 -04:00
Daniel Micay
e38b248b47 raise RAID resync limit for bare metal servers 2025-04-23 21:10:49 -04:00
Daniel Micay
687fd3ddc5 drop unused DHCP configuration for 4.releases.grapheneos.org 2025-04-23 21:07:05 -04:00
Daniel Micay
250d813c56 add IPv4 gateway route for 4.releases.grapheneos.org 2025-04-23 21:07:05 -04:00
Daniel Micay
1f4d7316b8 reorganize configurations into etc directory 2025-04-15 12:53:49 -04:00
Daniel Micay
b5fd158374 add cpupower configuration for bare metal 2025-04-15 12:30:33 -04:00