8000
Skip to content

lxd: Resolve the intermittent "file already closed" errors in distributeImage#15674

Merged
tomponline merged 9 commits intocanonical:mainfrom
tomponline:tp-distribute-image
Jun 3, 2025
Merged

lxd: Resolve the intermittent "file already closed" errors in distributeImage#15674
tomponline merged 9 commits intocanonical:mainfrom
tomponline:tp-distribute-image

Conversation

@tomponline
Copy link
Copy Markdown
Member
@tomponline tomponline commented May 28, 2025

By switching to a per-cluster member local function with defer for closing file handles, rather than bundling them up until the end of the overall function using the reverter (its cleaner too).

Also improves logging and extends image auto update tests to check for correct image file distribution (with comments justifying why LXD has the behaviour it has).

@tomponline tomponline self-assigned this May 28, 2025
@tomponline tomponline force-pushed the tp-distribute-image branch 2 times, most recently from a98e6e9 to 628063e Compare May 28, 2025 13:16
@tomponline tomponline requested a review from simondeziel May 28, 2025 22:04
Copy link
Copy Markdown
Member
@simondeziel simondeziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it but it'd be best to ask someone else to review the Go part. It certainly read better to my untrained eyes :)

Comment thread lxd/images.go Outdated
Comment thread test/suites/clustering.sh Outdated
Comment thread test/suites/clustering.sh Outdated
Comment thread test/suites/clustering.sh Outdated
…ributeImage

Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
… in distributeImage

Allows removal of reverter package usage and simplifies file closing logic.

With the added benefit that image file handles are closed at the end of each member update.

Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
@tomponline tomponline force-pushed the tp-distribute-image branch from 628063e to d7763b5 Compare June 3, 2025 07:39
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
@tomponline tomponline marked this pull request as ready for review June 3, 2025 07:51
…age file distribution

Signed-off-by: Thomas Parrott <thomas.parrott@canonical.com>
@tomponline tomponline changed the title lxd: Attempt to resolve the intermittent "file already closed" errors in distributeImage lxd: Resolve the intermittent "file already closed" errors in distributeImage Jun 3, 2025
Copy link
Copy Markdown
Contributor
@roosterfish roosterfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@tomponline tomponline merged commit a161565 into canonical:main Jun 3, 2025
28 checks passed
@tomponline tomponline deleted the tp-distribute-image branch June 3, 2025 10:48
@simondeziel
Copy link
Copy Markdown
Member

I am really happy that you've been able to put this bug to bed for good (fingers crossed)! Thanks

@simondeziel
Copy link
Copy Markdown
Member

Sigh, seems that it didn't fully fix the problem: https://github.com/canonical/lxd/actions/runs/15428110615/job/43420377825?pr=15711#step:15:32370

@tomponline
Copy link
8000 Copy Markdown
Member Author

Sigh, seems that it didn't fully fix the problem: https://github.com/canonical/lxd/actions/runs/15428110615/job/43420377825?pr=15711#step:15:32370

Ill take a look. Its a different error than before:

time="2025-06-03T21:37:57Z" level=error msg="Error deleting old image from database" ID=4 err="database is locked" fingerprint=119e2f4d376a0d9c3c3576bc6ef6e77e8e4146c6b1edf2f0b8dded3354cfed57
time="2025-06-03T21:37:57Z" level=error msg="Error deleting old image from database" ID=5 err="database is locked" fingerprint=119e2f4d376a0d9c3c3576bc6ef6e77e8e4146c6b1edf2f0b8dded3354cfed57

@tomponline
Copy link
Copy Markdown
Member Author

I've identified the issue i think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0