rsync_ext: Unicode decode errors block all subsequent lines #31
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
rsync_ext: Unicode decode errors block all subsequent lines
Problem
When
rsync_ext()encounters a line with invalid UTF-8 bytes, it gets stuck and skips all remaining lines, even valid ones.Root Cause
Lines 123-128 in
curateipsum/fs.py:The
continueon line 128 skips:yield(expected, we want to skip the invalid line)prev_line = line(NOT expected, causes the bug)Example
If rsync outputs:
b"\xff\xfe invalid"(invalid UTF-8)b">f+++++++++ valid1.txt\n"b">f+++++++++ valid2.txt\n"Actual behavior:
prev_lineprev_lineSTILL invalidprev_lineSTILL invalidExpected behavior:
Impact
If a backup contains files with names in non-UTF-8 encoding (e.g., legacy Windows-1251 Cyrillic filenames),
rsync_ext()will fail to process ANY files after the first invalid filename, silently losing sync data.Solution
Update line 128 to restore the invariant before continuing:
Discovery
Found during test implementation for
rsync_ext()function. Seetest_handles_unicode_decode_errorintests/test_fs.pywhich documents this actual behavior.