fix: skip checkTableIsMaster during FK table restoration to prevent flaky restore failures#24066
fix: skip checkTableIsMaster during FK table restoration to prevent flaky restore failures#24066XuPeng-SH wants to merge 4 commits intomatrixorigin:mainfrom
Conversation
Review Summary by QodoFix FK table restoration by skipping master check during topo-sorted restore
WalkthroughsDescription• Add skipMasterCheck parameter to recreateTable() to prevent incorrect skipping of parent tables during FK-aware snapshot restore • Pass skipMasterCheck=true from restoreTablesWithFk() to ensure all topo-sorted FK tables are restored regardless of master status • Fix silent error swallowing by adding missing error check after checkTableIsMaster() call • Add comprehensive test coverage (11 new test functions) for FK restoration logic and master table detection Diagramflowchart LR
A["restoreTablesWithFk<br/>topo-sorted FK tables"] -->|skipMasterCheck=true| B["recreateTable"]
C["restoreToDatabaseOrTable<br/>regular restore"] -->|skipMasterCheck=false| B
D["restoreSystemDatabase<br/>system DB restore"] -->|skipMasterCheck=false| B
B -->|skipMasterCheck=true| E["Skip master check<br/>restore all tables"]
B -->|skipMasterCheck=false| F["Check if master table<br/>skip if true"]
E --> G["Parent tables always restored<br/>FK validation succeeds"]
F --> H["Preserve existing behavior<br/>skip master tables"]
File Changes1. pkg/frontend/snapshot.go
|
Code Review by Qodo
1. Unsafe master-check bypass
|
f73e4bb to
50477bc
Compare
|
I don't think this is ready to close #23955 yet.
Before merging, I think we need:
|
|
Thanks for the thorough review @gouhongshen! I've addressed your concerns: 1. ExpectedEOB / same-txn drop+recreate issue: 2. Unconditional skipMasterCheck bypass: skipMasterCheck := restoreLevel != tree.RESTORELEVELTABLEFor TABLE level restore, since 3. Tests: Please review the updated changes. |
…laky restore failures
During snapshot restore, restoreTablesWithFk() creates FK-related tables in
topological order (parents first, then children). Previously, recreateTable()
always called checkTableIsMaster(), which queries mo_foreign_keys to determine
if a table is referenced by foreign keys. For parent tables, this check could
return true (due to stale FK entries from a previous restore within the same
transaction), causing the parent to be incorrectly skipped.
Changes:
1. Add skipMasterCheck parameter to recreateTable() - skip the master check
for tables being restored as part of FK dependency chain.
2. Add IgnoreForeignKey context value for clone operations - this skips FK
validation during restore, as FK tables are already restored in topological
order. Without this, Resolve() can fail with ExpectedEOB when the database
was drop+recreated in the same transaction.
3. Make skipMasterCheck conditional based on restore level:
- For TABLE level restore, getFkDeps() only captures outgoing FK edges
(child->parent), not incoming references. So we cannot unconditionally
skip master check for TABLE restore, as we might drop/recreate a table
that live tables outside the restore scope reference.
- For ACCOUNT and DATABASE level restores, all tables in the scope are
being restored together, so we can safely skip the master check.
Fixes: matrixorigin#23955
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nt table restoration During snapshot restore, checkTableIsMaster() checks if a table is referenced by other tables via foreign keys. Previously, this check would incorrectly block the restoration of parent tables when: 1. Child table was restored first (e.g., restore table aff01) 2. Parent table restore attempted later (e.g., restore table pri01) 3. checkTableIsMaster(pri01) returns true because aff01 references it 4. Parent table restoration is skipped This caused BVT test failures in nonsys_create_snapshot_for_nonsys_db_table_level.sql where sequential 'restore table' commands for FK-related tables would fail. Changes: 1. restoreTablesWithFk: Always skip master check (skipMasterCheck=true) for all tables being restored as a group. These tables are processed in topological order, so FK references between them will be properly re-established. 2. restoreToDatabaseOrTable: Skip master check for non-FK tables too. When a user explicitly specifies a table to restore, they should be allowed to do so regardless of existing FK references from other tables. 3. restoreSystemDatabase: Skip master check for system tables being restored as part of the system database. This is a more comprehensive fix that addresses both the original flaky test issue (matrixorigin#23955) and the BVT test failures caused by the previous partial fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make PITR restore behavior consistent with snapshot restore: - restoreToDatabaseOrTableWithPitr: skipMasterCheck=true - restoreSystemDatabaseWithPitr: skipMasterCheck=true This ensures PITR restore doesn't have the same FK parent table blocking issue that was fixed for snapshot restore. Co-authore 72A0 d-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the snapshot restore fix: use IgnoreForeignKey context when executing INSERT SELECT during PITR restore. This prevents Resolve() failures with ExpectedEOB when the database was drop+recreated in the same transaction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What type of PR is this?
Which issue(s) does this PR fix/address?
issue #23955
What this PR does / why we need it
During cross-account snapshot restore (
restore account acc01{snapshot="sp07"} to account acc02), FK-related tables are restored in topological order: parent tables first, then child tables. TherestoreTablesWithFk()function callsrecreateTable()for each FK table.The Bug: Inside
recreateTable(),checkTableIsMaster()queriesmo_foreign_keysto determine if a table is a "master" (parent) table referenced by foreign keys. If this returns true, the table is skipped — its restore is silently aborted. This is correct behavior indeleteCurFkTables()(where we want to preserve parent tables during pre-drop), but it is wrong inrestoreTablesWithFk()where ALL tables in the topo-sorted list must be restored.When stale FK entries remain in
mo_foreign_keys(from a previous restore in the same transaction, or due to incomplete cleanup duringDROP DATABASE),checkTableIsMaster()can return true for parent tables. The parent is skipped, and then the child's FK validation fails with "no such table" because the parent was never created.Additional fix: The existing code did not check the error return from
checkTableIsMaster(). If the check query failed, the error was silently swallowed. This PR adds the missing error propagation.Changes
pkg/frontend/snapshot.go(+15/-9)skipMasterCheck boolparameter torecreateTable()skipMasterCheck=true: skip thecheckTableIsMaster()query entirelyskipMasterCheck=false: preserve existing behavior (used in non-FK callers)if err != nil { return }aftercheckTableIsMaster()callrestoreTablesWithFk()→ passesskipMasterCheck=true(fix)restoreToDatabaseOrTable()→ passesskipMasterCheck=false(existing behavior)restoreSystemDatabase()→ passesskipMasterCheck=false(existing behavior)pkg/frontend/snapshot_test.go(+358)11 new test functions covering:
Test_recreateTable_SkipMasterCheck— skipMasterCheck=true bypasses master checkTest_recreateTable_MasterTableSkipped— skipMasterCheck=false skips master tablesTest_recreateTable_NonMasterProceeds— skipMasterCheck=false proceeds for non-masterTest_recreateTable_MasterCheckError— error propagation from checkTableIsMasterTest_recreateTable_CrossAccountRestore— cross-account ExecRestore pathTest_restoreTablesWithFk_AlwaysRestoresParentTables— key test: parent tables NOT skippedTest_restoreTablesWithFk_SkipsNilEntries— nil fkTableMap entries skippedTest_restoreTablesWithFk_EmptyList— empty list succeedsTest_checkTableIsMaster— true/false return casesTest_checkDatabaseIsMaster— true/false return casesTest_deleteCurFkTables_SkipsMasterTables— existing deleteCurFk behavior preservedCoverage: restoreTablesWithFk 85.7%, checkTableIsMaster 90.9%, checkDatabaseIsMaster 81.8%, recreateTable 69% (limited by 10 lines of dead code in the
isRestoreByCloneSqlbranch that is never taken).