Commit graph

741 commits

Author SHA1 Message Date
Daniel Micay
3691bd8e51 fetch-info: enable standard error detection setup 2025-09-25 15:50:14 -04:00
Daniel Micay
173822655c switch to xxd for converting random bytes to hex 2025-09-23 19:56:42 -04:00
Daniel Micay
d125eb96ca improve tls group configuration 2025-09-20 14:49:41 -04:00
Daniel Micay
47062b9c68 raise wmem_max/rmem_max for non-autotuned buffers
Unbound now requests 4M for the send buffer by default and we might as
well permit that for both the send and receive buffers. We set the max
auto-tuned send buffer size on a per-server basis but don't currently
have much use for tuning the maximum manually specified buffer size
across servers. It can be moved in the future if needed.
2025-09-18 13:56:46 -04:00
Daniel Micay
348cdf9d74 update systemd configuration 2025-09-18 11:17:05 -04:00
Daniel Micay
c6156ebed7 switch from shaped CAKE to FQ for BuyVM servers
These servers originally only had the 1Gbps base bandwidth and shaping
it with CAKE worked well to make the most of it during traffic spikes
for the web servers. It has little value for the nameservers since the
only potentially high throughput service is non-interactive SSH.

These servers now have 10Gbps burst available but are heavily limited by
their single virtual core and unable to use all of it in practice. CAKE
can only provide significant value when it's the bottleneck which isn't
the case when the workload is CPU limited. We don't want to keep around
the artificially low 1Gbps limit and it can't do much more.

Unlike OVH, the practical bottleneck is the CPU and FQ has the lowest
CPU usage in practice due to being very performance-oriented with a FIFO
fast path and offloading TCP pacing from the TCP stack to itself. On the
DNS servers, the fast path is always used in practice. Our OVH servers
have a much lower enforced bandwidth limit and the way they implement it
ruins fairness across flows. We definitely want to stick with CAKE for
our VPS instances on OVH but it doesn't make sense on BuyVM anymore.
2025-09-18 01:26:39 -04:00
Daniel Micay
b2c15916cc no need to override default qdisc since we set it 2025-09-17 19:23:26 -04:00
Daniel Micay
7d55588972 nftables: preserve connlimit sets across reloads 2025-09-17 19:23:22 -04:00
Daniel Micay
f3156e641d nftables: reorder network server UDP notrack 2025-09-16 18:19:33 -04:00
Daniel Micay
78bd96f4ae nftables: move listening ports to constants 2025-09-16 18:19:31 -04:00
Daniel Micay
d923bc7e24 use monotonic timer for session ticket key rotation
It makes more sense to rotate session ticket keys every 8 hours instead
of doing it at 3 specific times each day where the initial rotation will
happen earlier than necessary. It makes little difference due to keeping
the previous 3 session tickets valid but is cleaner.
2025-09-15 21:10:42 -04:00
Daniel Micay
5f1b0c886d nftables: replace magic numbers with constants 2025-09-15 21:10:42 -04:00
Daniel Micay
8bf64de00d add hosts arrays for ns1 and ns2 2025-09-15 21:10:42 -04:00
Daniel Micay
35ca9a2a19 allow server TCP Fast Open and rotate the keys
This needs to be configured by specific services to have any effect. For
now, we're only enabling it for the PowerDNS Authoritative Server and
dnsdist since it's recommended by RFC 9210 and actively used by various
recursive resolver servers when falling back to TCP. TCP Fast Open is
rarely used from end user devices due to it enabling tracking and having
issues with middleboxes. We aren't going to start using it anywhere in
GrapheneOS but may have more server-side uses for it. This functionality
is built into QUIC without the same downsides but QUIC support in the
software we use is not ready for us to enable it, especially the very
primitive support in nginx.

For most servers, a new random TCP Fast Open key is created on a daily
basis and the previous key continues to be accepted. For DNS servers,
the new key is generated via a keyed hash of the current date in order
to keep it consistent across servers providing an anycast IP without it
needing regular synchronization.
2025-09-15 21:10:39 -04:00
Daniel Micay
b2cb800512 re-enable generating fallback initramfs 2025-09-12 17:54:21 -04:00
Daniel Micay
46fe2fd36c add CAP_CHOWN to certbot-renew.service for dnsdist 2025-09-05 02:06:01 -04:00
Daniel Micay
defb596ac1 raise journal file size for relevant servers 2025-09-04 23:19:40 -04:00
Daniel Micay
9952c02e43 add ethtool to virtual servers too 2025-09-04 17:08:59 -04:00
Daniel Micay
ca22d4a0a3 enable adaptive-rx on ReliableSite update servers
This is fully supported by the Broadcom NIC used for both servers but
not enabled by default. It's already enabled by default for the Intel
NIC used by the Macarne update server.
2025-09-04 16:48:17 -04:00
Daniel Micay
ece7064674 raise NIC channels to number of threads
1.releases.grapheneos.org and 2.releases.grapheneos.org were ending up
with only 6 channels by default despite the hardware being capable of
far more. This raises it to match the 24 CPU threads.

0.releases.grapheneos.org is already using 32 channels by default which
matches the 32 CPU threads.
2025-09-04 01:00:22 -04:00
Daniel Micay
925b54eaf6 DSCP debugging replaced with counter on map 2025-09-04 00:53:20 -04:00
Daniel Micay
e9fda8e7a1 map packet priority 4 to the high priority fq band 2025-09-01 19:35:49 -04:00
Daniel Micay
97d650c7ed nftables: use DSCP to assign packets to fq bands 2025-09-01 19:35:49 -04:00
Daniel Micay
676763b8a5 nftables: split out update servers
This will be used for fq-specific configuration.
2025-09-01 19:35:49 -04:00
Daniel Micay
adf8269ac2 switch CAKE to diffserv4 now that DSCP marks are correct 2025-09-01 19:35:49 -04:00
Daniel Micay
41174c2a08 clean inbound DSCP
This avoids setting outbound DSCP for echo-reply, TCP RST for TCP
sockets in the Time-Wait state and potentially other cases. We don't
want it to be possible for inbound packets to determine our outbound
traffic classification even to a small extent.
2025-09-01 19:35:47 -04:00
Daniel Micay
28106192b1 reduce conntrack TCP established timeout to 1 hour
We have nothing depending on having even anywhere close to 1 hour of
idle time so we could reduce this significantly more.
2025-09-01 19:35:03 -04:00
Daniel Micay
e5ae9ca13b raise tcp_wmem[2] for update servers
Linux recently raised the default tcp_rmem[2] to 32MiB so it makes sense
to match it on the sending side to maximize bandwidth.
2025-09-01 19:35:03 -04:00
Daniel Micay
04479af3ad sort gitignore 2025-08-29 10:38:33 -04:00
Daniel Micay
3d0e2ffb23 expand SSH connection limit allowlist 2025-08-29 10:38:31 -04:00
Daniel Micay
f3ae87143f set handle for CAKE 2025-08-28 20:06:46 -04:00
Daniel Micay
92da7251ef switch pacman mirror due to mirror server issues 2025-08-28 11:27:38 -04:00
Daniel Micay
cb01ad4f20 nftables: block IPv6 for forum web server
We used to have this but it was lost during changes to our firewall
rules. We don't have an AAAA record for discuss.grapheneos.org to avoid
IPv6 connections but should also be explicitly blocking it. We're doing
this due to reliance on IP bans for registration to block spammers and
having IPv6 would greatly weaken it even if banning based on /64.
2025-08-28 11:25:11 -04:00
Daniel Micay
e77a5fb357 adjust DSCP configuration
AFx1 is classified as low priority traffic by the legacy TOS handling.
2025-08-25 18:51:07 -04:00
Daniel Micay
110dfe1a8f update python dependencies 2025-08-24 09:34:50 -04:00
Daniel Micay
0a810fd38f switch SSH IPv6 connection limit to /64 2025-08-23 22:21:27 -04:00
Daniel Micay
b4e1c96d74 nftables: drop obsolete synapse workaround 2025-08-23 21:05:28 -04:00
Daniel Micay
f54010112e switch to Unix socket for synapse 2025-08-22 16:59:05 -04:00
Daniel Micay
247f709df5 nftables: drop obsolete postgres stat collector rules
PostgreSQL 15 removed the UDP-based statistics collector and replaced it
with a shared memory implementation.
2025-08-22 13:14:17 -04:00
Daniel Micay
66d5c7602d nftables: mjolnir no longer connecting directly 2025-08-22 13:04:15 -04:00
Daniel Micay
4bf3955b38 nftables: pdns webserver moved to Unix socket 2025-08-22 12:43:38 -04:00
Daniel Micay
124dd54ef5 more frequent rotation for shorter log retention 2025-08-17 03:17:51 -04:00
Daniel Micay
931c72f9f5 raise journal size for relevant servers 2025-08-17 03:07:20 -04:00
Daniel Micay
1fc89bbeb4 add --copy-links to certbot dnsdist deployment 2025-08-17 03:03:33 -04:00
Daniel Micay
efced81f5f add ordering prefix to relevant configuration 2025-08-16 13:01:44 -04:00
Daniel Micay
b01dfbb947 switch to fq as the default qdisc 2025-08-14 16:57:48 -04:00
Daniel Micay
2db3740436 rotate-session-ticket-keys: improve error handling 2025-08-11 00:00:57 -04:00
Daniel Micay
c5a724ea7e drop code for toggling OVH permanent mitigation
This functionality was deprecated in July and is being removed in
September.
2025-08-09 17:41:33 -04:00
Daniel Micay
d0662589bf update python dependencies 2025-08-09 17:33:48 -04:00
Daniel Micay
274b5d60cb disable automatic xfs_fsr.service for now 2025-08-07 19:04:08 -04:00