← Back to issue list

Bug: stage-packages re-downloaded on every run due to python-apt _file_is_same() bug

View original Github issue

Metadata

Project
craft-parts
Number
#1495
Type
issue
State
open
Author
EddyPronk
Labels
Created
2026-03-13 09:57:14+00:00
Updated
2026-04-21 15:29:04+00:00
Closed

Current evaluation

No evaluation has been recorded for this issue yet.

Issue body

## Summary `craft-parts` re-downloads every stage-package on every run, even when an up-to-date `.deb` file is already present in the cache directory. ## Affected versions - python-apt 2.7.7+ubuntu5.2 (shipped in snapcraft snap revision 17123) - python-apt 3.1.0 (latest Debian — confirmed at https://sources.debian.org/src/python-apt/3.1.0/apt/package.py/#L53) ## Root cause `apt.package._file_is_same()` in python-apt always returns `False` due to two bugs: **Bug 1 — file opened in text mode** ```python with open(path) as fobj: # should be open(path, "rb") ``` `.deb` files are binary. On Linux this is harmless in practice because the `apt_pkg` C extension reads the underlying file descriptor directly, but it is incorrect. **Bug 2 — order-sensitive hash comparison (the real culprit)** ```python return apt_pkg.Hashes(fobj).hashes == hashes ``` `HashStringList.__eq__` compares element-by-element (list semantics). The hash order returned by `apt_pkg.Hashes(file)` differs from the order stored in the package records: | Source | Order | |---|---| | `apt_pkg.Hashes(file)` | MD5Sum, SHA1, SHA256, SHA512 | | `package._records.hashes` | SHA512, SHA256, SHA1, MD5Sum | The values are identical but the order differs, so `==` returns `False`. `Version.fetch_binary()` therefore triggers a network download on every call. ## Impact Every call to `craft_parts.LifecycleManager` that processes `stage-packages` re-downloads all packages from the network, even after a successful previous run with a warm cache. This affects every craft-parts consumer (snapcraft, charmcraft, rockcraft, etc.). ## How to reproduce The minimal parts specification is in `parts.yaml` next to this file: ```yaml parts: demo: plugin: nil stage-packages: - libpng16-16 ``` ### Manual reproduction From the `craft-parts` repository root, run the pull step twice keeping the same cache directory but wiping the work directory between runs: ```bash # First run — cold cache, packages are downloaded rm -rf /tmp/cp-cache /tmp/cp-work python -m craft_parts -f tests/integration/packages/parts.yaml \ --cache-dir /tmp/cp-cache --work-dir /tmp/cp-work --verbose pull ls -l /tmp/cp-cache/download/*.deb # note the timestamps # Wipe work-dir state so craft-parts re-executes the pull step, # but leave the deb cache intact rm -rf /tmp/cp-work # Second run — cache should be used, .deb timestamps must not change python -m craft_parts -f tests/integration/packages/parts.yaml \ --cache-dir /tmp/cp-cache --work-dir /tmp/cp-work --verbose pull ls -l /tmp/cp-cache/download/*.deb # compare timestamps with run 1 ``` **Expected:** timestamps unchanged on the second `ls` — `.deb` files were served from cache, no network activity. **Actual (with bug):** timestamps updated — every `.deb` was re-downloaded from the network despite being present in `/tmp/cp-cache/download/`. Note: craft-parts logs `Downloading package: <name>` before calling `fetch_binary()`, so those lines appear on both runs regardless and are not a reliable indicator of actual network activity. Use the `.deb` file timestamps or the automated test below. ### Automated test An automated regression test is provided in `test_deb_cache.py` alongside this file. Run it with: ```bash pytest tests/integration/packages/test_deb_cache.py -v ``` The test fails with the bug present and lists every package that was unnecessarily re-downloaded: ``` AssertionError: The following packages were re-downloaded on the second run even though their .deb files were already present in .../cache/download: ['libpng16-16', 'libc6', 'libcrypt1', 'zlib1g', 'gcc-12-base', 'libgcc-s1'] ``` ## Proposed fix Change `_file_is_same` in `apt/package.py` to use set comparison: ```python def _file_is_same(path: str, size: int, hashes: apt_pkg.HashStringList) -> bool: if os.path.exists(path) and os.path.getsize(path) == size: with open(path, "rb") as fobj: return set(str(h) for h in apt_pkg.Hashes(fobj).hashes) == set( str(h) for h in hashes ) return False ``` The fix addresses both bugs: binary file mode and order-independent comparison. Until python-apt is patched, the fix can be applied as a monkey-patch in craft-parts' `apt_cache.py` before calling `fetch_archives()`. Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1130691

Evaluation history

No evaluation history available.