graphene-os-server-infrastructure

Git-Mirrors/graphene-os-server-infrastructure

mirror of https://github.com/GrapheneOS/infrastructure.git synced 2025-11-29 10:56:32 -05:00

Author	SHA1	Message	Date
Daniel Micay	6c8ddbe012	drop unnecessary inclusion of / in fstab	2025-11-21 03:46:12 -05:00
Daniel Micay	1427e0c7c4	add mkinitcpio.conf for servers with mdraid	2025-11-21 03:46:12 -05:00
Daniel Micay	50729cadb9	split metal and mdraid server types	2025-11-21 03:46:07 -05:00
Daniel Micay	76b88bbffa	update mkinitcpio.conf	2025-11-06 11:59:13 -05:00
Daniel Micay	c9b84fdb79	logrotate: use better size+time rotation approach	2025-11-06 11:58:40 -05:00
Daniel Micay	5f2e4a45c3	logrotate: preserve existing file owner/group/mode wmtp and btmp are reliably created by systemd at boot with the proper permissions which also means missingok can be dropped.	2025-11-05 23:45:10 -05:00
Daniel Micay	eeb00c5bda	logrotate: default to delayed compression with opt-in to no delay	2025-11-05 23:32:48 -05:00
Daniel Micay	a0563b249b	ssh: use AcceptEnv for COLORTERM	2025-11-05 20:23:39 -05:00
Daniel Micay	8af52e3498	journald: revert back to default SystemMaxFiles This was raised to 10000 to work around 2 separate journald bugs causing premature rotation which have been resolved for a long time.	2025-11-04 13:45:16 -05:00
Daniel Micay	7f0982f9d7	journald: disable ForwardToWall	2025-11-04 11:51:00 -05:00
Daniel Micay	f1ff8ac931	phase out 2.releases.grapheneos.org	2025-11-04 11:19:13 -05:00
Daniel Micay	8697cf2a2d	switch back to unified journald rotation/retention Since we're no longer storing nginx logs in journald, we no longer need to use journald configuration to control nginx log rotation/retention. We switched from nginx to dnsdist for the authoritative DNS servers and are therefore no longer logging any of the queries persistently since we can rely on the PowerDNS and dnsdist in-memory buffers and stats. We can use nginx-specific logrotate configuration on a per-server basis based on balancing the usefulness of access logs with storage space and getting rid of slightly sensitive data faster (mainly IP addresses).	2025-11-03 20:03:59 -05:00
Daniel Micay	9d68a079db	logrotate: use specific log file paths This avoids ending up with the glob path in the logrotate state file when nothing matches the glob pattern.	2025-11-03 12:54:18 -05:00
Daniel Micay	39b6de58dd	syslog-ng: add socket for nginx error logs The error log is fairly quiet during regular use but can end up logging one or more lines per request during DDoS attacks. Errors are logged for worker_connections depletion and limit_conn rejections. There's also currently an nginx bug with modern TLS and OpenSSL causing some client side TLS errors to be logged as crit instead of info.	2025-11-03 12:53:24 -05:00
Daniel Micay	386d332aaf	remove unused logrotate configurations	2025-11-03 00:33:30 -05:00
Daniel Micay	934c5dbd53	logrotate: remove notifempty for nginx	2025-11-03 00:33:30 -05:00
Daniel Micay	b61c76c324	logrotate: remove nocreate for letsencrypt	2025-11-03 00:33:30 -05:00
Daniel Micay	39e701e9fb	update pacreport.conf	2025-11-03 00:33:30 -05:00
Daniel Micay	944b4679c1	merge website and network servers This provides more redundancy for both services through having 2 instances in each region. The network services have much higher bandwidth usage and load so this will also delay us needing to obtain new servers by making better use of the ones we have.	2025-11-03 00:33:30 -05:00
Daniel Micay	2caa67529a	set up syslog-ng for nginx access log This sets up the infrastructure for moving from storing nginx access logs in journald to plain text files written by syslog-ng and rotated by logrotate. This works around the poor performance, poor space efficiency and lack of archived log compression for journald. Unlike writing access logs directly with nginx, this continues avoiding blocking writes in the event loop and sticks to asynchronous sends through a socket. Since nginx only supports syslog via the RFC 3164 protocol rather than the more modern RFC 5424 protocol, this leaves formatting timestamps up to nginx rather than using the ones provided via the syslog protocol.	2025-11-03 00:33:28 -05:00
Daniel Micay	3c4380370e	logrotate: use zstd for compression	2025-11-01 20:04:53 -04:00
Daniel Micay	a346146625	reorder update servers	2025-11-01 20:04:51 -04:00
Daniel Micay	01305667bd	remove legacy 2.releases.grapheneos.org IPv6 address	2025-10-31 00:38:22 -04:00
Daniel Micay	7fa179260f	phase in new IPv6 address for 2.releases.grapheneos.org	2025-10-30 20:11:17 -04:00
Daniel Micay	4e771284f5	expand pacreport.conf	2025-10-30 17:09:11 -04:00
Daniel Micay	0d1705320f	use consistent naming for session ticket key scripts/units	2025-10-30 17:06:07 -04:00
Daniel Micay	9fde84c877	add initial session ticket key synchronization	2025-10-30 14:22:55 -04:00
Daniel Micay	f9430a1aeb	add script for deploying certbot replication setup	2025-10-30 14:22:32 -04:00
Daniel Micay	e6db6a15e6	add swap device timeout as a fallback The previous commit works around a long term systemd bug which recently began impacting us again. If the workaround stops working, the behavior should not be stalling boot forever. Swap isn't needed for our servers to function so it shouldn't break them if it can't be set up.	2025-10-29 22:47:01 -04:00
Daniel Micay	8340cf2813	add workaround for system encrypted swap race This appeared to be solved a while ago but ended up returning.	2025-10-29 22:36:11 -04:00
Daniel Micay	85c5ccc613	update IP addresses for 0.releases.grapheneos.org	2025-10-28 15:25:16 -04:00
Daniel Micay	0b519d6f5e	set AccuracySec=1us for tcp-fastopen-rotate-keys	2025-10-28 12:33:10 -04:00
Daniel Micay	9ed61cef61	reduce TLS session ticket key interval from 8h to 6h	2025-10-27 22:50:32 -04:00
Daniel Micay	ce0942702e	add RemainAfterExit=yes to create-session-ticket-keys.service	2025-10-27 22:11:22 -04:00
Daniel Micay	448565de54	update description for rotate-session-ticket-keys.timer	2025-10-27 21:19:32 -04:00
Daniel Micay	c4af821eda	always create /var/cache/nginx for web servers This avoids needing to restart nginx for ReadWritePaths to kick in after creating it.	2025-10-27 20:52:34 -04:00
Daniel Micay	048ccb3fba	allow powerdns user to query pdns over loopback This is being used by the pdns-trigger-health-checks script.	2025-10-23 14:11:56 -04:00
Daniel Micay	9c2183c794	stop blacklisting tls module It no longer gets autoloaded by default due to Linux kernel changes.	2025-10-22 17:36:06 -04:00
Daniel Micay	178791ffd8	update pacreport.conf	2025-10-21 14:11:46 -04:00
Daniel Micay	f8a1d381e7	mdmonitor.service: use syslog reporting	2025-10-19 16:16:33 -04:00
Daniel Micay	f2a4df1d0f	add another IPv6 address for 0.releases.grapheneos.org This will be used to send more traffic to it via DNS RRset load balancing.	2025-10-11 15:31:09 -04:00
Daniel Micay	5ea8e202a1	0.releases.grapheneos.org IPv4 update The main IPv4 address has changed and we're now using an additional IPv4 address to send more traffic to it via DNS RRset load balancing.	2025-10-11 15:30:35 -04:00
Daniel Micay	02b7e4e5c1	add 3.releases.grapheneos.org server	2025-10-09 09:06:31 -04:00
Daniel Micay	48d939d39d	adjust IPv6 subnet size for ReliableSite servers	2025-10-05 00:50:18 -04:00
Daniel Micay	e57096dfec	disable TCP Fast Open on BuyVM for now	2025-09-30 16:56:21 -04:00
Daniel Micay	d125eb96ca	improve tls group configuration	2025-09-20 14:49:41 -04:00
Daniel Micay	47062b9c68	raise wmem_max/rmem_max for non-autotuned buffers Unbound now requests 4M for the send buffer by default and we might as well permit that for both the send and receive buffers. We set the max auto-tuned send buffer size on a per-server basis but don't currently have much use for tuning the maximum manually specified buffer size across servers. It can be moved in the future if needed.	2025-09-18 13:56:46 -04:00
Daniel Micay	348cdf9d74	update systemd configuration	2025-09-18 11:17:05 -04:00
Daniel Micay	c6156ebed7	switch from shaped CAKE to FQ for BuyVM servers These servers originally only had the 1Gbps base bandwidth and shaping it with CAKE worked well to make the most of it during traffic spikes for the web servers. It has little value for the nameservers since the only potentially high throughput service is non-interactive SSH. These servers now have 10Gbps burst available but are heavily limited by their single virtual core and unable to use all of it in practice. CAKE can only provide significant value when it's the bottleneck which isn't the case when the workload is CPU limited. We don't want to keep around the artificially low 1Gbps limit and it can't do much more. Unlike OVH, the practical bottleneck is the CPU and FQ has the lowest CPU usage in practice due to being very performance-oriented with a FIFO fast path and offloading TCP pacing from the TCP stack to itself. On the DNS servers, the fast path is always used in practice. Our OVH servers have a much lower enforced bandwidth limit and the way they implement it ruins fairness across flows. We definitely want to stick with CAKE for our VPS instances on OVH but it doesn't make sense on BuyVM anymore.	2025-09-18 01:26:39 -04:00
Daniel Micay	b2c15916cc	no need to override default qdisc since we set it	2025-09-17 19:23:26 -04:00

1 2 3

118 commits