set_subkey corrupts record_data_size in a Record struct by calling
set_record_data_size with a value that depends only on the length of
the new subkey value. This leads to various undesirable outcomes, such
as: applications can write more than MAX_RECORD_DATA_SIZE without
encountering the intended "veilid.error.VeilidAPIErrorGeneric:
label='Generic' message='dht record too large'" error message, and
"panicked at 'attempt to subtract with overflow'" (i.e., an attempt to
set a negative value of a size) if a subkey's new length is less than
a subkey's old length. Typically, record_data_size in a Record struct
will be incorrect if a value was set for more than one subkey. Some
users might want to start over with a table_store that doesn't have
any incorrect record_data_size values.
The issue begins here:
6f71c6a00a/veilid-core/src/storage_manager/record_store.rs (L583-L586)
and is triggered here:
6f71c6a00a/veilid-core/src/storage_manager/record_store.rs (L613-L615)
It should be clear that new_record_data_size is only related to the
subkey that is currently being set. The amount of data in the record,
before set_subkey is called, is ignored. It appears that
new_total_size, not new_record_data_size, was intended to be used for
set_record_data_size, and this change succeeds for me in limited
testing but I don't have a comprehensive test suite.
One way to reproduce is by running the code from
https://gitlab.com/vatueil/veilid-file on a greater than 1 MB file
while watching variable values within
veilid-core/src/storage_manager/record_store.rs. For example: "poetry
run file put /usr/bin/tcpdump" (1.3 MB on Ubuntu 23.04). With the
original Veilid code, each of the dozens of subkey writes is checking
whether a roughly 32K value is greater than 1048576, it never is, and
thus there is never a "dht record too large" error. With the patch in
this MR, each of the dozens of subkey writes is checking whether an
ever-increasing value is greater than 1048576, it eventually is, and
the "dht record too large" error is printed. With the patch, one can
work with smaller files, e.g., do "poetry run file put /usr/bin/ssh"
(0.8 MB) followed by "poetry run file get VLD0:<_insert_key_here_>
ssh-copy" and the retrieved file ssh-copy is identical to
/usr/bin/ssh.
The more detailed behavior is that the modified code has
record.total_size of 350 on the first iteration, then 33596, 66842,
100088, etc. The original code also has record.total_size of 350 on
the first iteration, but then stays at 33246 forever (33246 is the
user-supplied subkey size of 32768, plus 128, plus the minimum record
size of 350),
The original thinking was that if len(b) < FIXED_SIZE, then that would
be picked up later by the "invalid member length" check. In that case,
this only really *needs* to make sure that the check after this for
"wrong fourcc" wouldn't fail. But if len(b) < FIXED_SIZE, it really is
an invalid size, and should get that error message before even starting
to validate its other qualities.