Skip to content

Error when restoring from a remote archive wal #654

@Dsolik

Description

@Dsolik

Добрый день!

Использую pg_probackup для создания FULL и DELTA бэкапов на удаленный сервер, также настроен удаленный wal архив. FULL и DELTA выполняется с реплики. бэкапирование wal выполняется на лидере.

===================================================================================================================================================
Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status

replica 15 T08FTP 2025-07-31 02:25:25+03 DELTA STREAM 9/8 1h:24m 229GB 80GB 2.63 26353/4371AF20 26367/23FFE570 OK
replica 15 T06L5P 2025-07-30 02:28:27+03 DELTA STREAM 8/8 1h:27m 260GB 80GB 2.55 26221/991DCA28 26235/ACEC6550 OK
replica 15 T04QHP 2025-07-29 02:20:50+03 DELTA STREAM 8/8 1h:19m 218GB 73GB 2.63 260D9/58567E80 260EB/908AD6B8 OK
replica 15 T02VTP 2025-07-28 02:20:13+03 DELTA STREAM 8/8 1h:19m 252GB 68GB 2.58 25FB7/3AC43DF8 25FC8/26E9CD10 OK
replica 15 SZZ6HP 2025-07-26 14:39:53+03 FULL STREAM 8/0 14h:7m 7042GB 541GB 2.41 25E53/3A642E50 25EDA/5CD0BF68 OK

При восстановлении на момент времени 2025-07-28 16:00:00+03
[backup_user@backup ~]$ time pg_probackup-15 restore -B /mnt/backup_store/pg_probackup -D /backup/restore --instance=replica -j8 --restore-as-replica --recovery-target-time='2025-07-28 16:00:00+03' --no-validate --remote-proto=ssh --remote-host=replica --remote-user=postgres --db-include=postgres -i T02VTP

Получил ошибку
2025-07-30 22:37:04 MSK [47440]: [1-1]: INFO: pg_probackup archive-get used prefetched WAL segment 0000000800025FF000000092, prefetch state: 7/8
2025-07-30 22:37:04 MSK [47440]: [1-1]: INFO: pg_probackup archive-get completed successfully, fetched: 0/8, time elapsed: 14ms
2025-07-30 22:37:04 MSK [10116]: [14690-1] app=,user=,db=,client= LOG: restored log file "0000000800025FF000000092" from archive
2025-07-30 22:37:04 MSK [47443]: [1-1]: INFO: pg_probackup archive-get WAL file: 0000000800025FF000000093, remote: ssh, threads: 1/1, batch: 8
2025-07-30 22:37:04 MSK [47443]: [1-1]: INFO: pg_probackup archive-get used prefetched WAL segment 0000000800025FF000000093, prefetch state: 6/8
2025-07-30 22:37:04 MSK [47443]: [1-1]: INFO: pg_probackup archive-get completed successfully, fetched: 0/8, time elapsed: 13ms
2025-07-30 22:37:04 MSK [10116]: [14691-1] app=,user=,db=,client= LOG: restored log file "0000000800025FF000000093" from archive
2025-07-30 22:37:05 MSK [47447]: [1-1]: INFO: pg_probackup archive-get WAL file: 0000000800025FF000000094, remote: ssh, threads: 1/1, batch: 8
2025-07-30 22:37:05 MSK [47447]: [1-1]: INFO: pg_probackup archive-get used prefetched WAL segment 0000000800025FF000000094, prefetch state: 5/8
2025-07-30 22:37:05 MSK [47447]: [1-1]: INFO: pg_probackup archive-get completed successfully, fetched: 0/8, time elapsed: 14ms
2025-07-30 22:37:05 MSK [10116]: [14692-1] app=,user=,db=,client= LOG: restored log file "0000000800025FF000000094" from archive
2025-07-30 22:37:05 MSK [10116]: [14693-1] app=,user=,db=,client= WARNING: page 105 of relation base/217902/488441176 is uninitialized
2025-07-30 22:37:05 MSK [10116]: [14694-1] app=,user=,db=,client= CONTEXT: WAL redo at 25FF0/9439A4E0 for Hash/DELETE: clear_dead_marking F, is_primary F; blkref #0: rel 1663/217902/488441176, blk 105; blkref #1: rel 1663/217902/488441176, blk 176 FPW
2025-07-30 22:37:05 MSK [10116]: [14695-1] app=,user=,db=,client= PANIC: WAL contains references to invalid pages
2025-07-30 22:37:05 MSK [10116]: [14696-1] app=,user=,db=,client= CONTEXT: WAL redo at 25FF0/9439A4E0 for Hash/DELETE: clear_dead_marking F, is_primary F; blkref #0: rel 1663/217902/488441176, blk 105; blkref #1: rel 1663/217902/488441176, blk 176 FPW
2025-07-30 22:37:05 MSK [10111]: [8-1] app=,user=,db=,client= LOG: startup process (PID 10116) was terminated by signal 6: Aborted
2025-07-30 22:37:05 MSK [10111]: [9-1] app=,user=,db=,client= LOG: terminating any other active server processes
2025-07-30 22:37:05 MSK [10111]: [10-1] app=,user=,db=,client= LOG: shutting down due to startup process failure
2025-07-30 22:37:05 MSK [10111]: [11-1] app=,user=,db=,client= LOG: database system is shut down

Подскажите проблема с wal 0000000800025FF000000094? На каком этапе он мог повредится?
или проблема с relation base/217902/488441176? В БД на текущий момент не нахожу relation base/217902/488441176

Лог архивирования wal 0000000800025FF000000094 на лидере:
2025-07-28 05:21:36 MSK [59369]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000800025FF000000092, threads: 1/4, batch: 1/8, compression: zlib
2025-07-28 05:21:37 MSK [59369]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 593ms
2025-07-28 05:21:44 MSK [59957]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000800025FF000000093, threads: 1/4, batch: 1/8, compression: zlib
2025-07-28 05:21:44 MSK [59957]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 564ms
2025-07-28 05:21:50 MSK [60240]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000800025FF000000094, threads: 1/4, batch: 1/8, compression: zlib
2025-07-28 05:21:51 MSK [60240]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 586ms
2025-07-28 05:21:53 MSK [60336]: [1-1]: INFO: pg_probackup archive-push WAL file: 0000000800025FF000000095, threads: 1/4, batch: 1/8, compression: zlib
2025-07-28 05:21:54 MSK [60336]: [1-1]: INFO: pg_probackup archive-push completed successfully, pushed: 1, skipped: 0, time elapsed: 985ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions