Commit graph

118 commits

Author SHA1 Message Date
Daniel Micay
6c8ddbe012 drop unnecessary inclusion of / in fstab 2025-11-21 03:46:12 -05:00
Daniel Micay
1427e0c7c4 add mkinitcpio.conf for servers with mdraid 2025-11-21 03:46:12 -05:00
Daniel Micay
50729cadb9 split metal and mdraid server types 2025-11-21 03:46:07 -05:00
Daniel Micay
76b88bbffa update mkinitcpio.conf 2025-11-06 11:59:13 -05:00
Daniel Micay
c9b84fdb79 logrotate: use better size+time rotation approach 2025-11-06 11:58:40 -05:00
Daniel Micay
5f2e4a45c3 logrotate: preserve existing file owner/group/mode
wmtp and btmp are reliably created by systemd at boot with the proper
permissions which also means missingok can be dropped.
2025-11-05 23:45:10 -05:00
Daniel Micay
eeb00c5bda logrotate: default to delayed compression with opt-in to no delay 2025-11-05 23:32:48 -05:00
Daniel Micay
a0563b249b ssh: use AcceptEnv for COLORTERM 2025-11-05 20:23:39 -05:00
Daniel Micay
8af52e3498 journald: revert back to default SystemMaxFiles
This was raised to 10000 to work around 2 separate journald bugs causing
premature rotation which have been resolved for a long time.
2025-11-04 13:45:16 -05:00
Daniel Micay
7f0982f9d7 journald: disable ForwardToWall 2025-11-04 11:51:00 -05:00
Daniel Micay
f1ff8ac931 phase out 2.releases.grapheneos.org 2025-11-04 11:19:13 -05:00
Daniel Micay
8697cf2a2d switch back to unified journald rotation/retention
Since we're no longer storing nginx logs in journald, we no longer need
to use journald configuration to control nginx log rotation/retention.

We switched from nginx to dnsdist for the authoritative DNS servers and
are therefore no longer logging any of the queries persistently since we
can rely on the PowerDNS and dnsdist in-memory buffers and stats.

We can use nginx-specific logrotate configuration on a per-server basis
based on balancing the usefulness of access logs with storage space and
getting rid of slightly sensitive data faster (mainly IP addresses).
2025-11-03 20:03:59 -05:00
Daniel Micay
9d68a079db logrotate: use specific log file paths
This avoids ending up with the glob path in the logrotate state file
when nothing matches the glob pattern.
2025-11-03 12:54:18 -05:00
Daniel Micay
39b6de58dd syslog-ng: add socket for nginx error logs
The error log is fairly quiet during regular use but can end up logging
one or more lines per request during DDoS attacks. Errors are logged for
worker_connections depletion and limit_conn rejections. There's also
currently an nginx bug with modern TLS and OpenSSL causing some client
side TLS errors to be logged as crit instead of info.
2025-11-03 12:53:24 -05:00
Daniel Micay
386d332aaf remove unused logrotate configurations 2025-11-03 00:33:30 -05:00
Daniel Micay
934c5dbd53 logrotate: remove notifempty for nginx 2025-11-03 00:33:30 -05:00
Daniel Micay
b61c76c324 logrotate: remove nocreate for letsencrypt 2025-11-03 00:33:30 -05:00
Daniel Micay
39e701e9fb update pacreport.conf 2025-11-03 00:33:30 -05:00
Daniel Micay
944b4679c1 merge website and network servers
This provides more redundancy for both services through having 2
instances in each region. The network services have much higher
bandwidth usage and load so this will also delay us needing to obtain
new servers by making better use of the ones we have.
2025-11-03 00:33:30 -05:00
Daniel Micay
2caa67529a set up syslog-ng for nginx access log
This sets up the infrastructure for moving from storing nginx access
logs in journald to plain text files written by syslog-ng and rotated by
logrotate. This works around the poor performance, poor space efficiency
and lack of archived log compression for journald. Unlike writing access
logs directly with nginx, this continues avoiding blocking writes in the
event loop and sticks to asynchronous sends through a socket.

Since nginx only supports syslog via the RFC 3164 protocol rather than
the more modern RFC 5424 protocol, this leaves formatting timestamps up
to nginx rather than using the ones provided via the syslog protocol.
2025-11-03 00:33:28 -05:00
Daniel Micay
3c4380370e logrotate: use zstd for compression 2025-11-01 20:04:53 -04:00
Daniel Micay
a346146625 reorder update servers 2025-11-01 20:04:51 -04:00
Daniel Micay
01305667bd remove legacy 2.releases.grapheneos.org IPv6 address 2025-10-31 00:38:22 -04:00
Daniel Micay
7fa179260f phase in new IPv6 address for 2.releases.grapheneos.org 2025-10-30 20:11:17 -04:00
Daniel Micay
4e771284f5 expand pacreport.conf 2025-10-30 17:09:11 -04:00
Daniel Micay
0d1705320f use consistent naming for session ticket key scripts/units 2025-10-30 17:06:07 -04:00
Daniel Micay
9fde84c877 add initial session ticket key synchronization 2025-10-30 14:22:55 -04:00
Daniel Micay
f9430a1aeb add script for deploying certbot replication setup 2025-10-30 14:22:32 -04:00
Daniel Micay
e6db6a15e6 add swap device timeout as a fallback
The previous commit works around a long term systemd bug which recently
began impacting us again. If the workaround stops working, the behavior
should not be stalling boot forever. Swap isn't needed for our servers
to function so it shouldn't break them if it can't be set up.
2025-10-29 22:47:01 -04:00
Daniel Micay
8340cf2813 add workaround for system encrypted swap race
This appeared to be solved a while ago but ended up returning.
2025-10-29 22:36:11 -04:00
Daniel Micay
85c5ccc613 update IP addresses for 0.releases.grapheneos.org 2025-10-28 15:25:16 -04:00
Daniel Micay
0b519d6f5e set AccuracySec=1us for tcp-fastopen-rotate-keys 2025-10-28 12:33:10 -04:00
Daniel Micay
9ed61cef61 reduce TLS session ticket key interval from 8h to 6h 2025-10-27 22:50:32 -04:00
Daniel Micay
ce0942702e add RemainAfterExit=yes to create-session-ticket-keys.service 2025-10-27 22:11:22 -04:00
Daniel Micay
448565de54 update description for rotate-session-ticket-keys.timer 2025-10-27 21:19:32 -04:00
Daniel Micay
c4af821eda always create /var/cache/nginx for web servers
This avoids needing to restart nginx for ReadWritePaths to kick in after
creating it.
2025-10-27 20:52:34 -04:00
Daniel Micay
048ccb3fba allow powerdns user to query pdns over loopback
This is being used by the pdns-trigger-health-checks script.
2025-10-23 14:11:56 -04:00
Daniel Micay
9c2183c794 stop blacklisting tls module
It no longer gets autoloaded by default due to Linux kernel changes.
2025-10-22 17:36:06 -04:00
Daniel Micay
178791ffd8 update pacreport.conf 2025-10-21 14:11:46 -04:00
Daniel Micay
f8a1d381e7 mdmonitor.service: use syslog reporting 2025-10-19 16:16:33 -04:00
Daniel Micay
f2a4df1d0f add another IPv6 address for 0.releases.grapheneos.org
This will be used to send more traffic to it via DNS RRset load
balancing.
2025-10-11 15:31:09 -04:00
Daniel Micay
5ea8e202a1 0.releases.grapheneos.org IPv4 update
The main IPv4 address has changed and we're now using an additional IPv4
address to send more traffic to it via DNS RRset load balancing.
2025-10-11 15:30:35 -04:00
Daniel Micay
02b7e4e5c1 add 3.releases.grapheneos.org server 2025-10-09 09:06:31 -04:00
Daniel Micay
48d939d39d adjust IPv6 subnet size for ReliableSite servers 2025-10-05 00:50:18 -04:00
Daniel Micay
e57096dfec disable TCP Fast Open on BuyVM for now 2025-09-30 16:56:21 -04:00
Daniel Micay
d125eb96ca improve tls group configuration 2025-09-20 14:49:41 -04:00
Daniel Micay
47062b9c68 raise wmem_max/rmem_max for non-autotuned buffers
Unbound now requests 4M for the send buffer by default and we might as
well permit that for both the send and receive buffers. We set the max
auto-tuned send buffer size on a per-server basis but don't currently
have much use for tuning the maximum manually specified buffer size
across servers. It can be moved in the future if needed.
2025-09-18 13:56:46 -04:00
Daniel Micay
348cdf9d74 update systemd configuration 2025-09-18 11:17:05 -04:00
Daniel Micay
c6156ebed7 switch from shaped CAKE to FQ for BuyVM servers
These servers originally only had the 1Gbps base bandwidth and shaping
it with CAKE worked well to make the most of it during traffic spikes
for the web servers. It has little value for the nameservers since the
only potentially high throughput service is non-interactive SSH.

These servers now have 10Gbps burst available but are heavily limited by
their single virtual core and unable to use all of it in practice. CAKE
can only provide significant value when it's the bottleneck which isn't
the case when the workload is CPU limited. We don't want to keep around
the artificially low 1Gbps limit and it can't do much more.

Unlike OVH, the practical bottleneck is the CPU and FQ has the lowest
CPU usage in practice due to being very performance-oriented with a FIFO
fast path and offloading TCP pacing from the TCP stack to itself. On the
DNS servers, the fast path is always used in practice. Our OVH servers
have a much lower enforced bandwidth limit and the way they implement it
ruins fairness across flows. We definitely want to stick with CAKE for
our VPS instances on OVH but it doesn't make sense on BuyVM anymore.
2025-09-18 01:26:39 -04:00
Daniel Micay
b2c15916cc no need to override default qdisc since we set it 2025-09-17 19:23:26 -04:00