Due to the table locks taken out by the naive upsert, the table
statistics may be out of date. During deduplication it is important that
the correct index is used as otherwise a full table scan may be
incorrectly used, which can end up thrashing the database badly.
The background update to remove duplicate rows naively deleted and
reinserted the duplicates. For large tables with a large number of
duplicates this causes a lot of bloat (with postgres), as the inserted
rows are appended to the table, since deleted rows will not be
overwritten until a VACUUM has happened.
This should hopefully also help ensure that the query in the last batch
uses the correct index, as inserting a large number of new rows without
analyzing will upset the query planner.
Rearrange the comments to try to clarify them, and expand on what some of it
means.
Use a sensible default 'bind_addresses' setting.
For the insecure port, only bind to localhost, and enable x_forwarded, since
apparently it's for use behind a load-balancer.
Add more tables to the list of tables which need a background update to
complete before we can upsert into them, which fixes a race against the
background updates.
A surprising number of people are using the well-known method, and are
simply copying the example configuration. This is problematic as the
example includes an explicit port, which causes inbound federation
requests to have the HTTP Host header include the port, upsetting some
reverse proxies.
Given that, we update the well-known example to be more explicit about
the various ways you can set it up, and the consequence of using an
explict port.
A surprising number of people are using the well-known method, and are
simply copying the example configuration. This is problematic as the
example includes an explicit port, which causes inbound federation
requests to have the HTTP Host header include the port, upsetting some
reverse proxies.
Given that, we update the well-known example to be more explicit about
the various ways you can set it up, and the consequence of using an
explict port.