Bug: stage-packages re-downloaded on every run due to python-apt _file_is_same() bug
Metadata
Current evaluation
No evaluation has been recorded for this issue yet.
Issue body
## Summary
`craft-parts` re-downloads every stage-package on every run, even when an
up-to-date `.deb` file is already present in the cache directory.
## Affected versions
- python-apt 2.7.7+ubuntu5.2 (shipped in snapcraft snap revision 17123)
- python-apt 3.1.0 (latest Debian — confirmed at
https://sources.debian.org/src/python-apt/3.1.0/apt/package.py/#L53)
## Root cause
`apt.package._file_is_same()` in python-apt always returns `False` due to two
bugs:
**Bug 1 — file opened in text mode**
```python
with open(path) as fobj: # should be open(path, "rb")
```
`.deb` files are binary. On Linux this is harmless in practice because the
`apt_pkg` C extension reads the underlying file descriptor directly, but it is
incorrect.
**Bug 2 — order-sensitive hash comparison (the real culprit)**
```python
return apt_pkg.Hashes(fobj).hashes == hashes
```
`HashStringList.__eq__` compares element-by-element (list semantics). The hash
order returned by `apt_pkg.Hashes(file)` differs from the order stored in the
package records:
| Source | Order |
|---|---|
| `apt_pkg.Hashes(file)` | MD5Sum, SHA1, SHA256, SHA512 |
| `package._records.hashes` | SHA512, SHA256, SHA1, MD5Sum |
The values are identical but the order differs, so `==` returns `False`.
`Version.fetch_binary()` therefore triggers a network download on every call.
## Impact
Every call to `craft_parts.LifecycleManager` that processes `stage-packages`
re-downloads all packages from the network, even after a successful previous
run with a warm cache. This affects every craft-parts consumer (snapcraft,
charmcraft, rockcraft, etc.).
## How to reproduce
The minimal parts specification is in `parts.yaml` next to this file:
```yaml
parts:
demo:
plugin: nil
stage-packages:
- libpng16-16
```
### Manual reproduction
From the `craft-parts` repository root, run the pull step twice keeping the
same cache directory but wiping the work directory between runs:
```bash
# First run — cold cache, packages are downloaded
rm -rf /tmp/cp-cache /tmp/cp-work
python -m craft_parts -f tests/integration/packages/parts.yaml \
--cache-dir /tmp/cp-cache --work-dir /tmp/cp-work --verbose pull
ls -l /tmp/cp-cache/download/*.deb # note the timestamps
# Wipe work-dir state so craft-parts re-executes the pull step,
# but leave the deb cache intact
rm -rf /tmp/cp-work
# Second run — cache should be used, .deb timestamps must not change
python -m craft_parts -f tests/integration/packages/parts.yaml \
--cache-dir /tmp/cp-cache --work-dir /tmp/cp-work --verbose pull
ls -l /tmp/cp-cache/download/*.deb # compare timestamps with run 1
```
**Expected:** timestamps unchanged on the second `ls` — `.deb` files were
served from cache, no network activity.
**Actual (with bug):** timestamps updated — every `.deb` was re-downloaded
from the network despite being present in `/tmp/cp-cache/download/`.
Note: craft-parts logs `Downloading package: <name>` before calling
`fetch_binary()`, so those lines appear on both runs regardless and are not
a reliable indicator of actual network activity. Use the `.deb` file
timestamps or the automated test below.
### Automated test
An automated regression test is provided in `test_deb_cache.py` alongside this
file. Run it with:
```bash
pytest tests/integration/packages/test_deb_cache.py -v
```
The test fails with the bug present and lists every package that was
unnecessarily re-downloaded:
```
AssertionError: The following packages were re-downloaded on the second run
even though their .deb files were already present in .../cache/download:
['libpng16-16', 'libc6', 'libcrypt1', 'zlib1g', 'gcc-12-base', 'libgcc-s1']
```
## Proposed fix
Change `_file_is_same` in `apt/package.py` to use set comparison:
```python
def _file_is_same(path: str, size: int, hashes: apt_pkg.HashStringList) -> bool:
if os.path.exists(path) and os.path.getsize(path) == size:
with open(path, "rb") as fobj:
return set(str(h) for h in apt_pkg.Hashes(fobj).hashes) == set(
str(h) for h in hashes
)
return False
```
The fix addresses both bugs: binary file mode and order-independent comparison.
Until python-apt is patched, the fix can be applied as a monkey-patch in
craft-parts' `apt_cache.py` before calling `fetch_archives()`.
Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1130691
Evaluation history
No evaluation history available.