graphene-os-server-infrastructure

Git-Mirrors/graphene-os-server-infrastructure

mirror of https://github.com/GrapheneOS/infrastructure.git synced 2025-11-24 16:53:07 -05:00

Author	SHA1	Message	Date
Daniel Micay	5256f2e4a4	replace 1.ns1.grapheneos.org server with sea.ns1.grapheneos.org	2025-11-21 11:31:48 -05:00
Daniel Micay	f95fa51821	add lax.ns1.grapheneos.org server	2025-11-21 11:31:48 -05:00
Daniel Micay	951662aeca	replace 0.ns1.grapheneos.org server with nyc.ns1.grapheneos.org	2025-11-21 11:31:48 -05:00
Daniel Micay	4aba8d355a	add mia.ns1.grapheneos.org server	2025-11-21 11:31:48 -05:00
Daniel Micay	ebd44c9253	grapheneos.org: switch to location-based server names	2025-11-21 11:31:48 -05:00
Daniel Micay	e3bcb9e87f	ns2.grapheneos.org: switch to location-based server names	2025-11-21 11:31:48 -05:00
Daniel Micay	93e1d3866b	releases.grapheneos.org: switch to location-based server names	2025-11-21 11:31:48 -05:00
Daniel Micay	8af52e3498	journald: revert back to default SystemMaxFiles This was raised to 10000 to work around 2 separate journald bugs causing premature rotation which have been resolved for a long time.	2025-11-04 13:45:16 -05:00
Daniel Micay	7f0982f9d7	journald: disable ForwardToWall	2025-11-04 11:51:00 -05:00
Daniel Micay	f1ff8ac931	phase out 2.releases.grapheneos.org	2025-11-04 11:19:13 -05:00
Daniel Micay	8697cf2a2d	switch back to unified journald rotation/retention Since we're no longer storing nginx logs in journald, we no longer need to use journald configuration to control nginx log rotation/retention. We switched from nginx to dnsdist for the authoritative DNS servers and are therefore no longer logging any of the queries persistently since we can rely on the PowerDNS and dnsdist in-memory buffers and stats. We can use nginx-specific logrotate configuration on a per-server basis based on balancing the usefulness of access logs with storage space and getting rid of slightly sensitive data faster (mainly IP addresses).	2025-11-03 20:03:59 -05:00
Daniel Micay	2caa67529a	set up syslog-ng for nginx access log This sets up the infrastructure for moving from storing nginx access logs in journald to plain text files written by syslog-ng and rotated by logrotate. This works around the poor performance, poor space efficiency and lack of archived log compression for journald. Unlike writing access logs directly with nginx, this continues avoiding blocking writes in the event loop and sticks to asynchronous sends through a socket. Since nginx only supports syslog via the RFC 3164 protocol rather than the more modern RFC 5424 protocol, this leaves formatting timestamps up to nginx rather than using the ones provided via the syslog protocol.	2025-11-03 00:33:28 -05:00
Daniel Micay	a346146625	reorder update servers	2025-11-01 20:04:51 -04:00
Daniel Micay	01305667bd	remove legacy 2.releases.grapheneos.org IPv6 address	2025-10-31 00:38:22 -04:00
Daniel Micay	7fa179260f	phase in new IPv6 address for 2.releases.grapheneos.org	2025-10-30 20:11:17 -04:00
Daniel Micay	0d1705320f	use consistent naming for session ticket key scripts/units	2025-10-30 17:06:07 -04:00
Daniel Micay	9fde84c877	add initial session ticket key synchronization	2025-10-30 14:22:55 -04:00
Daniel Micay	f9430a1aeb	add script for deploying certbot replication setup	2025-10-30 14:22:32 -04:00
Daniel Micay	8340cf2813	add workaround for system encrypted swap race This appeared to be solved a while ago but ended up returning.	2025-10-29 22:36:11 -04:00
Daniel Micay	0b519d6f5e	set AccuracySec=1us for tcp-fastopen-rotate-keys	2025-10-28 12:33:10 -04:00
Daniel Micay	9ed61cef61	reduce TLS session ticket key interval from 8h to 6h	2025-10-27 22:50:32 -04:00
Daniel Micay	ce0942702e	add RemainAfterExit=yes to create-session-ticket-keys.service	2025-10-27 22:11:22 -04:00
Daniel Micay	448565de54	update description for rotate-session-ticket-keys.timer	2025-10-27 21:19:32 -04:00
Daniel Micay	c4af821eda	always create /var/cache/nginx for web servers This avoids needing to restart nginx for ReadWritePaths to kick in after creating it.	2025-10-27 20:52:34 -04:00
Daniel Micay	f8a1d381e7	mdmonitor.service: use syslog reporting	2025-10-19 16:16:33 -04:00
Daniel Micay	f2a4df1d0f	add another IPv6 address for 0.releases.grapheneos.org This will be used to send more traffic to it via DNS RRset load balancing.	2025-10-11 15:31:09 -04:00
Daniel Micay	5ea8e202a1	0.releases.grapheneos.org IPv4 update The main IPv4 address has changed and we're now using an additional IPv4 address to send more traffic to it via DNS RRset load balancing.	2025-10-11 15:30:35 -04:00
Daniel Micay	02b7e4e5c1	add 3.releases.grapheneos.org server	2025-10-09 09:06:31 -04:00
Daniel Micay	48d939d39d	adjust IPv6 subnet size for ReliableSite servers	2025-10-05 00:50:18 -04:00
Daniel Micay	348cdf9d74	update systemd configuration	2025-09-18 11:17:05 -04:00
Daniel Micay	c6156ebed7	switch from shaped CAKE to FQ for BuyVM servers These servers originally only had the 1Gbps base bandwidth and shaping it with CAKE worked well to make the most of it during traffic spikes for the web servers. It has little value for the nameservers since the only potentially high throughput service is non-interactive SSH. These servers now have 10Gbps burst available but are heavily limited by their single virtual core and unable to use all of it in practice. CAKE can only provide significant value when it's the bottleneck which isn't the case when the workload is CPU limited. We don't want to keep around the artificially low 1Gbps limit and it can't do much more. Unlike OVH, the practical bottleneck is the CPU and FQ has the lowest CPU usage in practice due to being very performance-oriented with a FIFO fast path and offloading TCP pacing from the TCP stack to itself. On the DNS servers, the fast path is always used in practice. Our OVH servers have a much lower enforced bandwidth limit and the way they implement it ruins fairness across flows. We definitely want to stick with CAKE for our VPS instances on OVH but it doesn't make sense on BuyVM anymore.	2025-09-18 01:26:39 -04:00
Daniel Micay	d923bc7e24	use monotonic timer for session ticket key rotation It makes more sense to rotate session ticket keys every 8 hours instead of doing it at 3 specific times each day where the initial rotation will happen earlier than necessary. It makes little difference due to keeping the previous 3 session tickets valid but is cleaner.	2025-09-15 21:10:42 -04:00
Daniel Micay	35ca9a2a19	allow server TCP Fast Open and rotate the keys This needs to be configured by specific services to have any effect. For now, we're only enabling it for the PowerDNS Authoritative Server and dnsdist since it's recommended by RFC 9210 and actively used by various recursive resolver servers when falling back to TCP. TCP Fast Open is rarely used from end user devices due to it enabling tracking and having issues with middleboxes. We aren't going to start using it anywhere in GrapheneOS but may have more server-side uses for it. This functionality is built into QUIC without the same downsides but QUIC support in the software we use is not ready for us to enable it, especially the very primitive support in nginx. For most servers, a new random TCP Fast Open key is created on a daily basis and the previous key continues to be accepted. For DNS servers, the new key is generated via a keyed hash of the current date in order to keep it consistent across servers providing an anycast IP without it needing regular synchronization.	2025-09-15 21:10:39 -04:00
Daniel Micay	46fe2fd36c	add CAP_CHOWN to certbot-renew.service for dnsdist	2025-09-05 02:06:01 -04:00
Daniel Micay	ca22d4a0a3	enable adaptive-rx on ReliableSite update servers This is fully supported by the Broadcom NIC used for both servers but not enabled by default. It's already enabled by default for the Intel NIC used by the Macarne update server.	2025-09-04 16:48:17 -04:00
Daniel Micay	ece7064674	raise NIC channels to number of threads 1.releases.grapheneos.org and 2.releases.grapheneos.org were ending up with only 6 channels by default despite the hardware being capable of far more. This raises it to match the 24 CPU threads. 0.releases.grapheneos.org is already using 32 channels by default which matches the 32 CPU threads.	2025-09-04 01:00:22 -04:00
Daniel Micay	e9fda8e7a1	map packet priority 4 to the high priority fq band	2025-09-01 19:35:49 -04:00
Daniel Micay	adf8269ac2	switch CAKE to diffserv4 now that DSCP marks are correct	2025-09-01 19:35:49 -04:00
Daniel Micay	f3ae87143f	set handle for CAKE	2025-08-28 20:06:46 -04:00
Daniel Micay	124dd54ef5	more frequent rotation for shorter log retention	2025-08-17 03:17:51 -04:00
Daniel Micay	274b5d60cb	disable automatic xfs_fsr.service for now	2025-08-07 19:04:08 -04:00
Daniel Micay	785ad04bbf	rename update servers	2025-08-03 21:45:34 -04:00
Daniel Micay	04100dca2c	use no-split-gso for CAKE across the board Based on the CAKE statistics during load testing, the latency benefits of GSO splitting are minimal for our servers and the increased CPU usage can increase latency.	2025-07-31 12:18:52 -04:00
Daniel Micay	01bb6a5504	set CAKE flow isolation mode to dual-dsthost We have no use case for fairness based on source address.	2025-07-30 18:45:03 -04:00
Daniel Micay	b669c4ce61	relax PrivateUsers for certbot-renew.service This was preventing using the dnsdist group for the nameservers.	2025-07-27 13:08:48 -04:00
Daniel Micay	a1336fba2f	switch from CAKE to mq fq_codel for update servers CAKE was causing a bottleneck due to being single threaded.	2025-07-23 00:26:41 -04:00
Daniel Micay	57a5209d8b	integrate dnsdist in session ticket keys management	2025-05-27 15:40:54 -04:00
Daniel Micay	94a2567b15	add tls group for session ticket keys	2025-05-27 15:40:52 -04:00
Daniel Micay	44f6e6021a	make session ticket management more generic	2025-05-27 14:23:23 -04:00
Daniel Micay	f9f3cdab05	add 1.ns1.grapheneos.org server	2025-05-08 22:26:56 -04:00

1 2

59 commits