rsync() creates complete source file map in memory #13

Open
opened 2025-11-15 03:44:15 +00:00 by snegov · 0 comments
Owner

rsync() creates complete source file map in memory

Priority: Low
Component: fs.py
Type: Performance

Description

The rsync() function creates a dictionary containing all source files before processing, which could be problematic for directory trees with millions of files.

Location

curateipsum/fs.py:266-269

Current Code

# Create source map {rel_path: dir_entry}
src_files_map = {
    ent.path[len(src_root_abs) + 1:]: ent for ent in scantree(src_root_abs)
}

Problem

For a source directory with 1M files, this creates a dictionary with 1M entries in memory before processing begins.

Consideration

The current approach is actually necessary for the algorithm to work (detecting deletions), so this is more of a limitation to document than a bug to fix.

Proposed Solution

Document the memory requirements in the function docstring, or consider alternative approaches for very large directory trees (e.g., database-backed approach).

Impact

Low - Only affects users backing up extremely large directory trees.

# rsync() creates complete source file map in memory **Priority:** Low **Component:** fs.py **Type:** Performance ## Description The `rsync()` function creates a dictionary containing all source files before processing, which could be problematic for directory trees with millions of files. ## Location `curateipsum/fs.py:266-269` ## Current Code ```python # Create source map {rel_path: dir_entry} src_files_map = { ent.path[len(src_root_abs) + 1:]: ent for ent in scantree(src_root_abs) } ``` ## Problem For a source directory with 1M files, this creates a dictionary with 1M entries in memory before processing begins. ## Consideration The current approach is actually necessary for the algorithm to work (detecting deletions), so this is more of a limitation to document than a bug to fix. ## Proposed Solution Document the memory requirements in the function docstring, or consider alternative approaches for very large directory trees (e.g., database-backed approach). ## Impact **Low** - Only affects users backing up extremely large directory trees.
Sign in to join this conversation.
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: snegov/cura-te-ipsum#13
No description provided.