← Back to issue list

Investigation: Spread test failures are due to craft_providers snapd initialization issue

View original Github issue

Metadata

Project
charmcraft
Number
#2509
Type
pull request
State
open
Author
Copilot
Labels
Created
2025-12-06 03:19:43+00:00
Updated
2025-12-08 21:24:25+00:00
Closed

Current evaluation

No evaluation has been recorded for this issue yet.

Issue body

This PR documents the investigation into spread test failures that show "daemon is stopping to wait for socket activation" and "Failed to wait for snap refreshes to complete" errors in smoketests/reactive and smoketests/different-dir. ## Investigation Results The failures occur **inside** LXD containers created by the craft_providers library during charm builds, not on the host system where spread tests run. The error happens when craft_providers sets up base LXD containers and runs `snap watch --last=auto-refresh?` inside those containers while snapd is in a transitional state. ## Root Cause The issue is in the craft_providers library's container initialization code (specifically in `craft_providers/base.py` in the `_disable_and_wait_for_snap_refresh` method around line 616). The spread.yaml prepare sections only configure the host system (Google Cloud instances) and cannot affect the internal setup of LXD containers created by charmcraft during builds. ## Bug Report for craft-providers A comprehensive bug report has been prepared and documented in `craft-providers-bug-report.md` that should be filed at: https://github.com/canonical/craft-providers/issues The report includes: - Complete error details and stack traces from the failing builds - Root cause analysis of the snapd transitional state issue - Three proposed solutions with detailed code examples: 1. Retry logic with exponential backoff (recommended) 2. Enhanced snapd readiness checks before snap operations 3. Graceful degradation with warnings - Impact assessment on charmcraft and other craft-providers consumers - Steps to reproduce the issue ### Recommended Fix The bug report recommends implementing retry logic with exponential backoff in the `_disable_and_wait_for_snap_refresh` method to handle the transient "daemon is stopping to wait for socket activation" error gracefully. This approach would allow the container initialization to succeed even when snapd is temporarily in a transitional state. The fix needs to be made in the craft_providers library, as this is outside the scope of what can be fixed through spread.yaml modifications in the charmcraft repository. --- - [ ] I've followed the [contribution guidelines](https://github.com/canonical/charmcraft/blob/main/CONTRIBUTING.md). - [ ] I've signed the [CLA](http://www.ubuntu.com/legal/contributors/). - [ ] I've successfully run `make lint && make test`. - [ ] I've added or updated any relevant documentation. - [ ] I've updated the relevant release notes. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > # Fix Spread Test Job Failure: LXD Snap Refresh > > Recent workflow runs ([see logs](https://github.com/canonical/charmcraft/actions/runs/19978096550/job/57299831626)) show that spread tests are failing due to errors like: > > ``` > error: daemon is stopping to wait for socket activation > craft_providers.lxd.errors.LXDError: Failed to wait for snap refreshes to complete. > ``` > This problem occurs during container setup when waiting for snapd refreshes in LXD containers, causing build/test failures in jobs such as `smoketests/reactive:stable` and `smoketests/different-dir`. > > ## Solution > Modify the setup steps in `.github/workflows/spread.yaml` to ensure snapd is running and healthy before spread/charmcraft commands are executed in the test container. This can be done by adding commands to restart snapd and wait for its readiness inside the container. > > ### Example fix (snippet): > Add before running charmcraft/spread tests: > ```yaml > - name: Ensure snapd is running in the LXD container > run: | > lxc exec <container-name> -- systemctl restart snapd > lxc exec <container-name> -- snap wait system seed.loaded > ``` > Replace `<container-name>` with the actual name of the LXD instance used in the workflow setup. > > ### Screenshots/logs referenced for failure > Refer to the job log for the full error trace and explanation: https://github.com/canonical/charmcraft/actions/runs/19978096550/job/57299831626 > > # Images > None attached. > </details> <!-- START COPILOT CODING AGENT SUFFIX --> *This pull request was created as a result of the following prompt from Copilot chat.* > # Fix Spread Test Job Failure: LXD Snap Refresh > > Recent workflow runs ([see logs](https://github.com/canonical/charmcraft/actions/runs/19978096550/job/57299831626)) show that spread tests are failing due to errors like: > > ``` > error: daemon is stopping to wait for socket activation > craft_providers.lxd.errors.LXDError: Failed to wait for snap refreshes to complete. > ``` > This problem occurs during container setup when waiting for snapd refreshes in LXD containers, causing build/test failures in jobs such as `smoketests/reactive:stable` and `smoketests/different-dir`. > > ## Solution > Modify the setup steps in `.github/workflows/spread.yaml` to ensure snapd is running and healthy before spread/charmcraft commands are executed in the test container. This can be done by adding commands to restart snapd and wait for its readiness inside the container. > > ### Example fix (snippet): > Add before running charmcraft/spread tests: > ```yaml > - name: Ensure snapd is running in the LXD container > run: | > lxc exec <container-name> -- systemctl restart snapd > lxc exec <container-name> -- snap wait system seed.loaded > ``` > Replace `<container-name>` with the actual name of the LXD instance used in the workflow setup. > > ### Screenshots/logs referenced for failure > Refer to the job log for the full error trace and explanation: https://github.com/canonical/charmcraft/actions/runs/19978096550/job/57299831626 > > # Images > None attached. > <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Evaluation history

No evaluation history available.