postgresql连续归档及时间点恢复的操作

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

postgresql连续归档及时间点恢复的操作
简介
前⾯我们介绍了通过pgsql的流复制在⽣产环境中搭建⾼可⽤环境来保证服务的可持续性；我们也要对数据库进⾏周期备份，来防⽌数据的丢失，这就需要连续归档，它不仅可以⽤于⼤型数据库的增量备份和恢复，也可以⽤于搭建standby镜像备份。

PostgreSQL默认处于⾮归档模式。

开启归档模式，主要涉及到三个参数：wal_level,archive_mode和archive_commandwal_level参数默认为mininal,设置此参数为archive或者之上的级别都可以打开归档。

当postgresql需要传输归档⽇志时，会调⽤archive_command指定的shell命令。

归档⽂件传输成功时，shell命令要返回0，此时，postgresql会认为归档⽂件已经传输成功，因此可以删除或者重新循环利⽤归档⽂件。

当shell命令返回⾮0值时，postgresql会保留所有未成功传输的归档⽇志，并不断尝试重新传输，直到成功。

如果归档命令⼀直不成功，pg_xlog⽬录会持续增长，有耗尽服务器存储空间的可能，此时postgresql会PANIC关闭，直到释放存储空间。

另外将归档WAL⽇志存储在本机上是风险极⾼，不被推荐的。

postgresql通过archive_command提供了存储WAL⽇志的灵活性，可以将归档⽇志存储到挂装的NFS⽬录，磁带，刻录到光盘，也可以将WAL⽇志通过ssh/scp，rsync传输到异机保存。

**注意:**archive_command及restore_command命令将以运⾏PostgreSQL的系统⽤户的⾝份运⾏。

Centos系统⾥，这个系统⽤户是postges。

环境说明
Role IP系统数据库
源库10.10.10.60Centos6.5postgresql 9.2
备份库10.10.10.61Centos6.5postgresql 9.2
需求说明：源库产⽣归档⽇志，并传输到备份库上的归档⽬录/data/pg_archive；备份库利⽤归档⽇志，恢复⾄源库的任意时间点的数据。

注意:基础环境我们基于postgresql流复制，但是备份库作为⼀个独⽴的库，此时请保证recovery.conf中的standby_mode=off
环境配置
1.ssh⽆密码登录
由于我们备份和还原过程中所⽤的archive_command和restore_command命令都以postgres⽤户运⾏，因此我们需要针对postgres⽤户实现ssh⽆密码登录。

#源库
ssh-ketgen -t rsa
scp id_rsa.pub postgres@10.10.10.60:/var/lib/pgsql/.ssh/authorized_keys
#备份库
ssh-ketgen -t rsa
scp id_rsa.pub postgres@10.10.10.61:/var/lib/pgsql/.ssh/authorized_keys
**注意:**yum安装postgresql时，默认⽣成的postgres⽤户的家⽬录在/var/lib/pgsql
2.配置备份库的归档⽬录
#备份库
mkdir -p /data/pg_archive
chmod postgres.postgres /data/pg_archive
说明：源库产⽣的归档⽇志，要存到到异地备份库的/data/pg_archive下。

3.修改源库的postgresql.conf
在postgresql.conf中添加以下⼏⾏
#开启归档模式
archive_mode = on
archive_command = 'ssh 10.10.10.60 test ! -f /data/pg_archive/%f && scp %p 10.10.10.60:/data/pg_archive/%f'
其中： %p表⽰wal⽇志⽂件的路径，%f表⽰wal⽇志⽂件名称。

archive_command表⽰先验证备份库的归档⽬录下是否存在同名⽂件，以免发⽣覆盖丢失数据，若不存在将源库上产⽣的归档⽇志保存到备份库的/data/pg_archive⽬录下。

注意:
(a)archive_timeout强制N秒以后进⾏⼀次归档,若设置太⼩，很快就会超过wal_keep_segments = 16，导致数据覆盖丢失，因此不要盲⽬设置。

(b)归档模式的开启，只有在wal_level = hot_standby或archive
4.重载源库并查看
pg_ctl reload -D /data/pgsql/data
postgres=# show archive_mode;
archive_mode
--------------
on
(1 row)
模拟归档备份
1.查看源库上的pg_xlog⽬录
-bash-4.2$ ll pg_xlog
total 16388
-rw-------. 1 postgres postgres 16777216 Apr 21 13:42 000000010000000000000001
drwx------. 2 postgres postgres 4096 Apr 21 13:36 archive_status
此时archive_status⽬录为存放归档⽇志的状态，若归档已经产⽣，但没有传输成功则为xxx.ready，并且⼀直会保留直⾄传输成功，然后状态变为xxx.done;此时⽬录为空
2.在源库上添加数据
此时由于数据库为空，我们来创建testdb库，并添加数据
postgres=# create database testdb;
CREATE DATABASE
postgres=# create table t1(id int4,create_time timestamp(0) without time zone);
CREATE TABLE
postgres=# insert into t1 values(1,now());
INSERT 0 1
postgres=# insert into t1 values(2,now());
INSERT 0 1
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-21 13:49:34
2 | 2016-04-21 13:49:48
(2 rows)
3.在源库上⼿动切换归档
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/1821010
(1 row)
正常情况下，wal⽇志段在达到16M后会⾃动归档，由于测试我们使⽤⼿动切换归档。

4.查看源库pg_xlog⽬录
-bash-4.2$ ll pg_xlog/
total 16388
-rw-------. 1 postgres postgres 16777216 Apr 21 13:42 000000010000000000000001
drwx------. 2 postgres postgres 4096 Apr 21 13:36 archive_status
-bash-4.2$ ls pg_xlog/
000000010000000000000001 000000010000000000000002 archive_status
-bash-4.2$ ls pg_xlog/archive_status/
000000010000000000000001.ready
此时归档⽇志的状态为ready，说明此⽇志没有传输成功，查看⽇志
vim /data/pgsql/pg_log/postgresql-Thu.log
ssh: connect to host 10.10.10.60 port 22: Connection timed out^M
FATAL: archive command failed with exit code 255
DETAIL: The failed archive command was: ssh 10.10.10.68 test ! -f /data/pg_archive/000000010000000000000001 && scp pg_xlog/000000010000000000000001 10.10.10.60:/data/pg_archive/000000010000000000000001 LOG: archiver process (PID 22284) exited with exit code 1
原来是由于ip地址错误导致⽆法通过ssh传输，更改ip为10.10.10.61后，再次产⽣归档才能再次重新传输。

注意：触发归档有三种⽅式：
1.⼿动切换wal⽇志，select pg_switch_xlog()
2.wal⽇志写满后触发归档，配置⽂件默认达到16M后就会触发归档，wal_keep_segments = 16
3.归档超时触发归档，archive_timeout
在此我们使⽤的是⼿挡切换归档。

postgres=# insert into t1 values(3,now());
INSERT 0 1
postgres=# insert into t1 values(4,now());
INSERT 0 1
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/2000310
(1 row)
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/3000000
(1 row)
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/30000D8
(1 row)
再次查看pg_xlog⽬录
-bash-4.2$ ll pg_xlog/archive_status/
total 0
-rw-------. 1 postgres postgres 0 Apr 21 13:51 000000010000000000000001.done
-rw-------. 1 postgres postgres 0 Apr 21 14:00 000000010000000000000002.done
-rw-------. 1 postgres postgres 0 Apr 21 14:04 000000010000000000000003.done
5.查看备份库上的归档⽬录
-bash-4.2$ ll /data/pg_archive/
total 49152
-rw-------. 1 postgres postgres 16777216 Apr 21 14:04 000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Apr 21 14:04 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Apr 21 14:04 000000010000000000000003
⾄此，归档备份已经完成，下⾯我们要介绍利⽤归档进⾏恢复。

模拟从归档进⾏PITR恢复
PITR恢复是基于⽂件系统备份和wal⽂件的备份，因此⾸先我们需要个基础备份，然后在此基础备份上对wal归档⽇志进⾏回放。

具体步骤如下:
1.使⽤pg_basebackup进⾏基础备份
pg_basebackup使⽤replication复制协议，因此需要在源库上配置pg_hba.conf⽂件以允许replication，⽆论是本地还是通过⽹络。

vim pg_hba.conf
#添加以下两⾏，允许本地和⽹络上的replication⽤于pg_basebackup
host replication rep 127.0.0.1/32 md5
host replication rep 10.10.10.61/8 md5
#重载
pg_ctl reload -D /data/pgsql/data
添加完毕后请重载pgsql
在备份库上执⾏pg_basebackup进⾏远程的基础备份
-bash-4.2$ pg_basebackup -D /data/pgsql/data -Fp -Xs -v -P -h 10.10.10.61 -p 5432 -U rep
Password:
transaction log start point: 0/5000020
pg_basebackup: starting background WAL receiver
26664/26664 kB (100%), 1/1 tablespace
transaction log end point: 0/50000E0
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed
-D 表⽰接受基础备份的⽬录，我们将基础备份放到/data/pgsql/data
-X 参数，在备份完成之后，会到主库上收集 pg_basebackup 执⾏期间产⽣的 WAL ⽇志，在 9.2 版本之后⽀持 -Xs 即stream 形式，这种模式不需要收集主库的 WAL ⽂件，⽽能以stream 复制⽅式直接追赶主库。

2.修改备库上配置⽂件
由于所有的配置⽂件是从源库上的备份过来的，因此我们需要修改：
vim postgresql.conf
屏蔽以下两⾏
#archive_mode = on
#archive_command = 'ssh 192.168.3.139 test ! -f /data/pg_archive/%f && scp %p 192.168.3.139:/data/pg_archive/%f'
3.查看源库上的时间确认需要的恢复时间点
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-21 13:49:34
2 | 2016-04-21 13:49:48
3 | 2016-04-21 14:00:22
4 | 2016-04-21 14:00:25
5 | 2016-04-21 14:49:11
6 | 2016-04-21 14:49:14
7 | 2016-04-21 14:49:17
(4 rows)
由于此次基础备份是在“ 4 | 2016-04-21 14:00:25”这条记录后归档，⽽后⾯的5，6，7三条记录是在基础备份后⽣成的，因此若恢复5，6，7中的记录需要在基础备份上通过回放
5，6，7的归档⽇志达到。

在此我们要将数据恢复到6这条记录下，需要在recovery.conf中做如下设置：
cp /usr/share/pgsql/recovery.conf.sample /data/pgsql/data/recovery.conf
vim recovery.conf
restore_command = 'cp /data/pg_archive/%f %p'
recovery_target_time = '2016-04-21 14:49:14'
**注意：**recovery.conf中standby_mode要为off，否则备份库将会以备库⾝份启动，⽽不是即时恢复。

4.启动备份库
备份库启动过程中，会进⾏PITR恢复到指定的时间点
pg_ctl start -D /data/pgsql/data
#查看⽇志
vim /data/pgsql/pg_log/postgresql-Thu.log
LOG: database system was interrupted; last known up at 2016-04-21 14:34:29 CST
LOG: starting point-in-time recovery to 2016-04-21 14:49:14+08
LOG: restored log file "000000010000000000000005" from archive
LOG: redo starts at 0/5000020
LOG: consistent recovery state reached at 0/50000E0
LOG: restored log file "000000010000000000000006" from archive
LOG: recovery stopping before commit of transaction 1898, time 2016-04-21 14:49:16.635744+08
LOG: redo done at 0/6000398
LOG: last completed transaction was at log time 2016-04-21 14:49:13.786388+08
cp: cannot stat ‘/data/pg_archive/00000002.history': No such file or directory
LOG: selected new timeline ID: 2
cp: cannot stat ‘/data/pg_archive/00000001.history': No such file or directory
LOG: archive recovery complete
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
#查看数据
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-21 13:49:34
2 | 2016-04-21 13:49:48
3 | 2016-04-21 14:00:22
4 | 2016-04-21 14:00:25
5 | 2016-04-21 14:49:11
6 | 2016-04-21 14:49:14
(6 rows)
7.查看备份库pg_xlog
-bash-4.2$ ll pg_xlog
total 49160
-rw-------. 1 postgres postgres 16777216 Apr 21 15:00 000000010000000000000005
-rw-------. 1 postgres postgres 16777216 Apr 21 15:00 000000010000000000000006
-rw-------. 1 postgres postgres 16777216 Apr 21 15:00 000000020000000000000006
-rw-------. 1 postgres postgres 64 Apr 21 15:00 00000002.history
drwx------. 2 postgres postgres 4096 Apr 21 15:00 archive_status
-bash-4.2$ cat pg_xlog/00000002.history
1 000000010000000000000006 before 2016-04-21 14:49:16.635744+08
从pg_xlog我们看到设置好recovery.conf⽂件后，启动数据库，将会产⽣新的timeline，id=2，⽽且会⽣成⼀个新的history⽂件00000002.history，⾥⾯记录的是新时间线2从什么时间哪个时间线什么原因分出来的，该⽂件可能含有多⾏记录。

另外，恢复的默认⾏为是沿着与当前基本备份相同的时间线恢复。

如果你想恢复到某些时间线，你需要指定的recovery.conf⽬标时间线recovery_target_timeline，不能恢复到早于基本备份分⽀的时间点。

注意:如果恢复过⼀次，并重新设置recovery_target_time，重新启动触发恢复，并不会基于时间线1进⾏恢复，⽽是基于时间线2进⾏恢复的，但是此时间线上在/data/pg_archive/并没有时间线为2的归档⽇志，因此会报错。

补充：postgres修改归档模式
步骤⼀：
修改postgresql的配置⽂件(postgresql.conf)
wal_level=hot_standby
archive_mode =on
archive_command ='DATE=`date +%Y%m%d`;DIR="/home/postgres/arch/$DATE";(test -d $DIR || mkdir -p $DIR)&& cp %p $DIR/%f' ps：%p 是指相对路径 %f是指⽂件名
步骤⼆：创建归档路径
mkdir -p /home/postgres/arch
chown -R postgres:postgres /home/postgres/arch
步骤三：重启数据库
步骤四：验证归档是否正常
postgres=# checkpoint;
CHECKPOINT
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
1/760000E8
(1 row)
postgres@ubuntu:~$ cd /home/postgres/data/data_1999/arch/
postgres@ubuntu:~/data/data_1999/arch$ ls
20150603
postgres@ubuntu:~/data/data_1999/arch$ cd 20150603/
postgres@ubuntu:~/data/data_1999/arch/20150603$ ls
000000010000000100000074 000000010000000100000075 000000010000000100000076
以上为个⼈经验，希望能给⼤家⼀个参考，也希望⼤家多多⽀持。

如有错误或未考虑完全的地⽅，望不吝赐教。