Hacmp 5.1 的配置方法

合集下载

HACMP配置文档

1 HA&冷备安装配置和验证1.1 HACMP配置1.1.1 HACMP安装1.检查每台机器上是否都找到了所有硬盘(包括内置硬盘)。

lspvlsdev –Cc disk如果某台机器没有找到共享硬盘，可用cfgmgr命令。

cfgmgr2.给系统添加TTY。

如果安装了8port卡，一共应该有sa0 ~ sa4共5个tty，依次(0 ~ 4)安装。

Æ Add a TTY Æ rs232 Æ saX Æ Port Number: 0ttysmitty3.检测连8port卡的TTY。

在两台机器上同时运行：</dev/ttyXsttyX应该为4(使用8port卡时)或3(使用串口4时)，这时应该有信息显示。

4.安装HACMP/ES软件5.4.1，及补丁放入HACMP/ES光盘：安装HA文件集。

安装文件包括：除了cluster.haview、cluster.hativoli和所有msg语言文件集外，其他文件集都要安装。

安装5.4.1的补丁SP06/eserver/support/fixes/fixcentral/psearch?searchstring=latest+ha+R54+fi xes&searchtype=apar&release=53中选择IZ57986下载并安装，安装后重启系统验证补丁安装lslpp -l |grep cluster 看文件版本号是否已经升上去1.1.2 HACMP配置1.配置IP以下是一个范例。

A机(生产机)IP IP_label1.1.1.1 A_svc1.1.1.2 A_boot1.1.2.1 A_stdby1.1.1.3 B_svc1.1.1.4 B_boot1.1.2.2 B_stdbyB机(备份机)IP IP_label1.1.1.1 A_svc1.1.1.2 A_boot1.1.2.1 A_stdby1.1.1.3 B_svc1.1.1.4 B_boot1.1.2.2 B_stdby开机未启HA时，boot和standby地址生效；启动HA后，service地址覆盖boot生效，service地址对应网卡失效后，由standby所在网卡接管。

使用 SSA Target 模式配置 HACMP

如下图所示：实验环境拓扑图我们在F85 上依次执行以下命令：规划Cluster我们的Cluter 规划如下：43P140 F85Cluster name db2_cluster db2_cluster node name db2_140 db2_85 Network net_ether_01 net_ether_01Network test140_boottest140_stbtest140_svc test85_boot test85_stb test85_svc非IP Network net_tmssa_01 net_tmssa_01非IP Network tmssa_140 /dev/tmssa140 tmssa_85 /dev/tmssa85 Resource GroupGroup Name db2_gr1 db2_gr2Cluster Mode cascading cascadingCluster Node db2_140, db2_85 db2_85 ,db2_140Service IP Label test140_svc test85_svcShare VGShare LVShare FS db2_vg db2_lv db2_fs db2_vg db2_lvdb2_fsApplication Server db2_svr db2_svr网络故障如果发往node1 上的service 和standby 网卡上的K-A (Keep-Alive 信息) 包全都丢失，而非TCP/IP 网络上的K-A 仍然存在，那么HACMP 判断node1 仍然正常而网络发生故障。

此时HACMP 执行一个network_down 事件。

网卡连接电缆故障：用命令：# ps –ef | grep cluster，确认所有节点上的HACMP 已启动。

用命令：# errclear 0，清空系统错误日志。

用命令：# tail –f /tmp/hacmp.out，监控HACMP 的运行状态。

HACMP配置与维护手册

POWER HA5.5配置与维护手册2010年9月2日目录第一章一体化系统HACMP配置 (3)§1.1系统结构图 (3)§1.2拓扑规划 (3)§1.3磁盘资源规划 (4)§1.4应用规划 (5)§1.5操作系统要求 (6)第二章HACMP日常维护 (8)§2.1HACMP服务正常启停 (8)§2.1.1HACMP启动 (8)§2.1.2停止HA (9)§2.2查看HACMP集群服务状态 (10)§2.2.1查看HACMP服务状态 (10)§2.2.2查看资源组的状态 (10)§2.2.3查看HACMP集群状态 (11)第三章系统切换方案 (12)§3.1rlw1机器应用出现故障，HACMP资源切换 (12)§3.1.1切换rlw1_apprg资源组 (12)§3.1.2恢复rlw1_apprg资源组 (13)§3.2hg2机器应用出现故障，HACMP资源切换 (15)§3.2.1切换rlw2_orarg资源组 (15)§3.2.2恢复rlw2_orarg资源组 (17)第四章HACMP切换测试 (19)§4.1网卡故障模拟测试 (19)§4.2rlw1 系统单机故障模拟测试 (20)§4.3rlw2 系统单机故障模拟测试 (22)§4.4rlw1系统HA 手工切换测试 (24)§4.5rlw2系统HA 手工切换测试 (26)第一章一体化系统HACMP 配置§1.1 系统结构图§1.2 拓扑规划P780(1)主机(LPAR rlw1)共享磁盘阵列（HDS USPV 存储系统)心跳线(tty0)ent2ent0 P780(2) 主机(LPAR rlw2)ent2ent0Service NetworkPersistent Networkrlw1机器boot1地址配置在第一块外置网卡上（en0），将boot2地址配置在第二块外置网卡（en2），persistent ip地址绑在第二块外置网卡上（en2）；rlw2机器boot1地址配置在第一块外置网卡上（en0），将boo2地址配置在第二块外置网卡（en2），persistent ip地址绑在第二块外置网卡上（en2）。

HACMP修改boot IP、persist IP、service IP(非常详细)

环境:Aix5307;HACMP5.4.1问题:boot IP、persistIP、service IP地址变更停掉HACMP做以下操作:修改boot IP1.smit tcpip 修改网卡地址2.修改/etc/hosts的boot地址3.修改HACMP配置smitty hacmp ->Extended Configuration-> Extended Topology Configuration ->Change/Show a Communication Interface->Change/Show Communication Interfaces/Devices选择你修改的网卡名(如下图ent0) 回车回车后将出现下图:不做修改，直接回车即可;以同样方法修改其他boot 地址修改persist IP、service IP1.所有节点(fep1和fep2)修改/etc/hosts 将persist ip以及share ip修改为需要的地址#persist ip139.115.4.111 fep1_per_ip――＞139.115.40.131 fep1_per_ip139.115.4.112 fep2_per_ip――＞139.115.40.132 fep2_per_ip#share ip139.115.4.110 fep_share_ip――＞139.115.40.130 fep_share_ip139.115.4.130 oia_share_ip――＞139.115.40.133 oia_share_ip2.修改HACMP的persist ip配置smitty hacmp ->Extended Configuration-> Extended Topology Configuration->Configure HACMP Persistent Node IP Label/Addresses->Change / Show a Persistent Node IP Label/Address选择你修改的fep1_per_ip回车回车后将出现下图:不做修改，直接回车即可;以同样方法修改其他fep2_per_ip地址3.修改HACMP的share ip配置smitty hacmp ->Extended Configuration-> Extended Resource Configuration-> HACMP Extended Resources Configuration-> Configure HACMP Service IP Labels/Addresses选择你修改的fep_share_ip回车回车后将出现下图:不做修改，直接回车即可;以同样方法修改其他oia_share_ip地址4.修改资源组配置smitty hacmp ->Extended Configuration-> Extended Resource Configuration -> HACMP Extended Resources Configuration-> Change/Show Resources and Attributes for a Resource Group将出现下图：（你的机器会出现两个资源组）选择第一个资源组回车将出现下图：不做修改，直接回车即可，同样方法修改第二个资源组修改完毕以后:1.同步HACMPsmitty hacmp ->Extended Configuration-> Extended Verification and Synchronization 2.启动HACMPsmitty clstart。

HACMP认证学习系列知识

HACMP认证学习系列知识节点规模调整注意事项集群硬件规划软件规划存储规划灾难恢复规划注意：规划是成功的实现的一半，但是就 HACMP 而言，如何强调正确规划的重要性都不为过。

如果规划不当，您可能会在以后某个时候发现自己陷入种种限制之中，而要摆脱这些限制可能是非常痛苦的经历。

因此，请保持镇定从容，并使用产品附带的规划工作表;这些工作表对于任何迁移或问题确定情形或者对于规划的文档记录都是非常有价值的。

规划注意事项在规划高可用性集群时，您应该考虑节点、存储、网络等方面的规模调整，以便即使是在接管情况下，也能够提供应用程序正确运行所必需的资源。

规模调整：选择集群中的节点在开始集群的实现之前，您应该了解需要多少个节点，以及应该使用什么节点类型。

就应用程序所需要的资源而言，将要使用的节点类型是非常重要的。

节点的规模调整应该涵盖以下方面：CPU(CPU 的数量和速度)每个节点中的随机访问存储器 (RAM) 容量磁盘存储(内部)每个节点中的'通信和磁盘适配器数量节点可靠性集群中的节点数量取决于要实现高可用性的应用程序的数量，同时还取决于所需的可用性程度。

在集群中为每个应用程序准备多个备用节点可以提高应用程序的总体可用性。

注意：HACMP V5.1 集群中的最大节点数量是 32。

HACMP V5.1 支持各种各样的节点，涵盖从桌面系统到高端服务器的范围。

SP 节点和逻辑分区(Logical Partition，LPAR)也受支持。

有关进一步的信息，请参阅红皮书《HACMP for AI某 5L V5.1 Planning and Installation Guide》(SC23-4861-02)。

集群资源的共享基于应用程序的需求。

有些节点执行的任务与要实现高可用性的应用程序并不直接相关，并且不需要与应用程序节点共享资源，应该将此类节点配置在单独的集群中以简化实现和管理。

所有的节点都应该提供足够的资源(CPU、内存和适配器)，以维持所有指定的应用程序在故障转移(接管故障节点中的资源)情况下的执行。

HACMP配置文档

HACMP配置及测试文档一、硬件介绍●服务器：7026-B80（192.168.1.80/24；192.168.2.80/24）7029-6C3（192.168.1.61/24；192.168.2.61/24）●存储: DS4300●网络交换机：MightyHub-12、H3C S3100●光纤交换机：Brocade SilkWorm-3800二、准备工作1、安装HACMP，最好光盘安装。

注意：1、HACMP光盘上的cluster.hativoli 和cluster.haview不安装。

2、bos.data; rsct.bsaic.hacmp; rsct.basic.rte; rsct.opt.storagerm这4个文件要从AIX第一张盘上安装。

2、两台服务器通过网络交换机相连，彼此可以ping通注意：如果ping不通，有可能是网络交换机不支持‘自适应模式’或者不支持百兆速率，将速率固定为10-full-duples。

smit dev –> communication-> Ethernet adapter->adapter -> Change / ShowCharacteristics of an Ethernet Adapter调整速率为10full-duplex 如下图：3、两台服务器的串口做心跳，保证tty测试成功在b80一端等待串口tty1的信息（cat < /dev/tty1）在p615一端发送信息到串口tty1上（cat .rhosts > /dev/tty1）注意：串口连线与sa卡有关系，与port属性没有关系。

例如在sa2上建tty1、设置port为0，串口线应该连在服务器串口3上（sa0、sa1、sa2对应的服务器接口为串口1、串口2、串口3）。

一般不使用console口作为心跳接口，需要创建新的tty（smit tty>>add a tty>>选择tty rs232 Asynchronous Terminal，再选择tty所在的sa 设备），因为原来的串口有可能被console占用，信息会一直为login状态，导致串口通信失败。

中间业务平台HACMP安装配置指南

一、HACMP 双机系统配置打算在配置中间业务平台HACMP环境之前首先要制定配置打算。

在IBM HACMP 的配置指南中推举了一种配置打算表的方式〔Planning Worksheet 〕，在进展配置考虑的时候将这些表格填完即可。

通常分行中间业务平台的HACMP环境大体都有相像的拓扑环境，两个Public 类型的网络，一个用于供给中间业务效劳〔使用IP alias方式〕，另一用于连接AS/400的SNA 〔必需使用IP replacement方式〕Cluster WorksheetCluster Name：xxibp_clusterNode Name：xxMID_PRD，xxMID_BAK在主节点上配置拓扑构造然后同步到其他节点，网络拓扑如下：（1）Network概览Network Name Network Type Network Attribute Network Mask Node Namesnet_ibp Ether public 255.255.255.0 xxMID_PRD，xxMID_BAK net_sna Ether public 255.255.255.0 xxMID_PRD，xxMID_BAK net_rs232_01 RS-232 serial N/AxxMID_PRD，xxMID_BAK （2）Network内部构造Network net_ibpService地址：ibp_svc 10.1.7.33Boot地址：ibp_boot2 172.16.101.1ibp_boot1 172.16.100.1Network sna_netService地址：sna_svc 172.16.120.3Boot地址：sna_boot 172.16.120.1Standby地址：sna_stb 172.16.121.1SNA的网络配置IP〔可以使用私有地址，如172网段的任何地址，但两块网卡要在同一网段〕，只是为了能够相互切换，寻常的IP地址不用。

HACMP高可用性群集软件安装和设置(精)

HACMP 高可用性群集软件安装和设置一在两台机器上安装 HACMP 软件包
smitty install_update
选择安装设备为:/dev/cd0
选择欲安装软件部件
二在两台机器上分别配置网卡
三在两台机器上分别配置心跳线
四修改 /.rhosts 和 /etc/hosts文件
五定义群集的拓朴 (topology结构
定义群集的 ID 、名字
定义群集的节点,指定节点名
配置群集所使用的网卡(boot,service,standby 和心跳线六定义群集的资源组
定义资源组的名字、节点关系、参加的节点
配置 application server 名字、启动和停止的脚本文件定义资源组共享的文件系统、卷组等。

编写脚本文件
同步群集拓朴、群集资源的配置。

HACMP的配置步骤

修改/etc/hosts文件为：192.168.1.1 p630a_boot p630a192.168.1.2 p630b_boot p630b192.168.2.1 p630_srv192.168.3.1 p630a_stdby192.168.3.2 p630b_stdby在这里，没有采用传统的如下地址：192.168.2.1 p630a_srv192.168.2.2 p630b_srv是因为，我这里只有一个应用程序服务器，所以只需要一个服务IP标签/地址。

没必要添加多余的IP地址，还会在验证群集时，提示有多余的服务IP标签没有被分配。

建议为每个应用程序服务器配置一个服务IP标签，如果多个应用程序服务器共享一个服务IP标签，那么当一个应用程序服务器出现问题时，可能会因为关联问题而导致其它的应用程序服务器也不可用。

修改/etc/hosts.equiv和/.rhosts文件为：p630a_bootp630b_bootp630ap630bp630a_stdbyp630b_stdby1.Smitty hacmpInitialization and standard ConfigurationAdd nodes to an HACMP cluster配置cluster Name 和bootip 的名称2.Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure an HACMP ClusterAdd/Change/Show an HACMP Cluster查看自动发现信息3. Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure HACMP NodesChange/Show a Node in the HACMP Cluster分别选择发现的节点查看信息4. Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure HACMP NetworksChange/show a Network in the HACMP Cluster显示已经发现的网络（会在下方显示网络地址）点击进入查看信息5. Smitty hacmp (添加基于串口的心跳网络)Extended ConfigurationExtended Topology ConfigurationConfigure HACMP NetworksAdd a Nework to the HACMP Cluster选择Discovered Serial Device Typesrs232修改网络名称6. Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure HACMP NetworksChange/show a Network in the HACMP Cluster查看刚刚添加的网络之后就是配置网络里的接口和设备7.Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure HACMP Communication Interfaces/DevicesAdd Communication Interfaces/Devices在下方选择Add Discovered Communication Interfaces and Devices Communication Devices选择出现的设备，两个都要选8. Smitty hacmpExtended ConfigurationExtended Topology ConfigurationConfigure HACMP Communication Interfaces/DevicesChange/Show Communication Interfaces/Devices查看通信接口和设备9. Smitty hacmpExtended ConfigurationExtended Resource ConfigurationHACMP Extended Resource ConfigurationConfigure HACMP AppicationsConfigure HACMP Appication ServersAdd an Application Server添加Server Name、Start Script和Stop Script (例如：test_app、/etc/start.sh 、/etc/stop.sh)这里的脚本和应用程序名可以根据实际的情况来更改，但必须在2个节点上位于相同的路径，使用相同的名称。

hacmp配置步骤

节点a和b要依次配好，各增加1个永久IP0.60或0.61
###***）到此为止拓扑部分结束
###***）开始resource配置
###***）在最上层菜单的init。。。配置项中配resource比较简单
#07#配 resource to make highly available
#01#建cluster
->Extended Configuration->Extended Topology Configuration->Configure an HACMP Cluster->Add/Change/Show an HACMP Cluster
#02#建2个node，设定“通讯路径”，如果设置通讯路径就可以不在hosts中写第3列的名称了
###检查
netstat -i
netstat -in
再建1个diskhb（可以不配，在下面interface会自动生成）
#05#配interface/device
->Extended Configuration->Extended Topology Configuration->Configure HACMP Communication Interfaces/Devices->Add Communication Interfaces/Devices->选Add Discovered Communication Interface and Devices->两个节点的en5、en6都选上
Server Name=QHL_Clu_3_svc
Start Script=/usr/hacmp/start.sh

HACMP安装配置和存储配置文档

Hacmp（High Availability Cluster Multi-Processing）双机热备份软件的主要功能是提高客户计算机系统及其应用的可靠性，而不是单台主机的可靠性。

1.作为双机系统的两台服务器（主机A和B）同时运行Hacmp软件2.服务器除正常运行自机的应用外，同时又作为对方的备份主机3.两台主机系统（A和B）在整个运行过程中，通过“心跳线”相互监测对方的运行情况（包括系统的软硬件运行、网络通讯和应用运行情况等）4.一旦发现对方主机的运行不正常（出故障）时，故障机上的应用就会立即停止运行，本机（故障机的备份机）就会立即在自己的机器上启动故障机上的应用，把故障机的应用及其资源（包括用到的IP地址和磁盘空间等）接管过来，使故障机上的应用在本机继续运行5.应用和资源的接管过程由Ha软件自动完成，无需人工干预6. 当两台主机正常工作时，也可以根据需要将其中一台机上的应用人为切换到另一台机(备份机)上运行1. 划分清楚两台服务器主机各自要运行的应用(如A机运行应用，B机作为standby）2. 给每个应用(组)分配Service_ip、Standby_ip、boot_ip和心跳线tty，如：主机A（运行应用):Service_ip: 172.16.1.1Standby_ip: 172.16.2.1Boot_ip: 172.16.1.3主机B（standby):Service_ip: 172.16.1.2Standby_ip: 172.16.2.2Boot_ip: 172.16.1.4HACMP的安装配置步骤如下：一、在两台服务器上分别安装HACMP软件#smit installp二、分别检查两台主机上安装的软件是否成功#/usr/sbin/cluster/diag/clverifysoftwareclusterclverify>softwareValid Options are:lppclverify.software> lpp若没有error出现则安装成功。

HACMP 第 1 部分：入门

什么是 HACMP？在解释什么是 HACMP 之前，我们必须定义高可用性的概念。

高可用性在当今的复杂环境中，为应用程序提供连续的服务是成功的 IT 实现的重要组成部分。

高可用性屏蔽或消除计划内和计划外的系统和应用程序停机时间，是帮助为应用程序客户端提供连续服务的组件之一。

这是通过消除硬件和软件单点故障（single points of failure，SPOF）来实现的。

高可用性解决方案将确保任何解决方案组件（无论是硬件、软件还是系统管理）的故障不会导致应用程序及其数据对用户不可用。

高可用性解决方案应该通过适当的设计、规划、硬件选择、软件配置和精心控制的变更管理规程来消除单点故障 (SPOF)。

停机时间停机时间是应用程序不能为其客户端提供服务的时间范围。

可以将停机时间划分为：∙计划内停机：∙硬件升级∙维修∙软件更新/升级∙备份（离线备份）∙测试（需要定期测试以实现集群验证。

）∙开发∙计划外停机：∙管理员错误∙应用程序故障∙硬件故障∙环境灾难用于 AIX 的 IBM 高可用性解决方案 High Availability Cluster Multi Processing 基于久经考验的 IBM 集群技术，并包括两个组件：∙高可用性：通过使用重复和/或共享资源来确保应用程序可供使用的过程。

∙集群多处理：运行在相同节点上并具有共享或并发数据访问的多个应用程序。

基于 HACMP 的高可用性解决方案提供了自动化的故障检测、诊断、应用程序恢复和节点重新集成。

使用适当的应用程序，HACMP 还可以为并行处理应用程序提供并发数据访问，从而提供卓越的水平可伸缩性。

图 1 显示了一个典型的 HACMP 环境。

图 1 HACMP 集群历史和发展IBM High Availability Cluster Multi-Processing 可追溯到 20 世纪 90 年代初。

HACMP 的开发始于 1990 年，目的是为运行在 RS/6000 服务器上的应用程序提供高可用性解决方案。

HA5.1安装文档

一、硬件和HA安装1、分别将2台IBM 650机器上机架，并加电2、分别将5L操作系统光盘放入光驱，安装操作系统3、操作系统安装完成后，安装HA软件需要的以下补丁：bos.adt.libbos.adt.libmbos.adt.syscalls.tcp.client.tcp.serverbos.rte.SRCbos.rte.libcbos.rte.libcfgbos.rte.libcurbos.rte.libpthreadsbos.rte.odmbos.data.*pat*pat.basic.hacmp.2.2.1.30pat.clients.hacmp.2.2.1.30rsct.basic.rte.2.2.1.314、将FastT600和3534交换机上架，连接必要的光纤线5、从装有Windows2000和Storage Manager软件的PC上telnet 到FastT600，IP 地址分别为192.168.128.101(255.255.255.0)、192.168.128.101(255.255.255.0)，对FastT600进行配置。

将FastT600上的4个盘，做成热Raid 10，每盘72.8GB，共145.6GB的容量，做成Raid 10，则FastT600的总可用容量约为145.6GB。

将这145.6GB的容量做成1个100G，1个15G，1个45.6GB，共2个逻辑盘。

6、在p630（cqti-cy-db1）上，建立1个容量为100GB的卷组，datavg,650b_datavg，分别用importvg 和exportvg对卷组信息在2台p630（cqti-cy-app1）机器上进行同步9、在cqti-cy-db1主机上进行以下操作：A、编辑/etc/hosts文件如下：cqti-cy-db1_service 10.11.10.1 cqti-cy-db1cqti-cy-db1_boot1 192.168.1.1 cqti-cy-db1cqti-cy-db1_boot2 192.168.2.1 cqti-cy-db1cqti-cy-app1_service 10.11.10.2 cqti-cy-app1cqti-cy-app1_boot1 192.168.1.2 cqti-cy-app1cqti-cy-app1_boot2 192.168.2.2 cqti-cy-app1B、编辑/usr/hascript/dbstart.sh, /usr/hascript/dbstop.sh, /usr/hascript/appstart.sh, /usr/hascript/appstop.sh 文件，并将其权限改为755，上述4个文件包含了启动和停止应用的脚本文件10、在cqti-cy-app1主机上进行和cqti-cy-db1主机的第9步中同样的操作11、在650两台主机上分别建立db2组、用户和文件系统12、在2台主机上分别安装HACMP软件13、配置HACMP软件，以下步骤只在cqti-cy-db1主机上执行：A、定义节点#smitty hacmp->Initialization and Standard Configuration-> Add Nodes to an HACMP Cluster* Cluster Name [cqgs650net]New Nodes (via selected communication paths) [cqti-cy-db1 cqti-cy-app1] B、定义对外服务的IP地址#smitty hacmp-> Initialization and Standard Configuration-> Configure Resources to Make Highly Available-> Configure Service IP Labels/Addresses-> Add a Service IP Label/AddressIP Label/Address cqti-cy-db1_servicecqti-cy-app1_serviceC、定义应用服务器：->#smitty hacmp-> Initialization and Standard Configuration-> Configure Resources to Make Highly Available-> Configure Application Servers-> Add an Application Server* Server Name [cqti-cy-db1_app]* Start Script [/usr/hascript/dbstart.sh]Stop Script [/usr/hascript/dbstop.sh]* Server Name [cqti-cy-app1_app]* Start Script [/usr/hascript/appstart.sh]Stop Script [/usr/hascript/appstop.sh]D、配置串行网络# smitty hacmp -> Extended Configuration-> Extended Topology Configuration-> Configure HACMP Networks-> Add a Network to the HACMP Cluster* Network Name cqgs_rs232# smitty hacmp -> Extended Configuration-> Extended Topology Configuration-> Configure HACMP Communication Interfaces/Devices-> Add Communication Interfaces/DevicesDevice Name cqti-cy-db1_tty* Node Name [cqti-cy-db1]* Device Path [/dev/tty0]Network Type rs232Device Name cqti-cy-app1_tty* Node Name [cqti-cy-app1]* Device Path [/dev/tty0]Network Type rs232E、增加资源组#smitty hacmp-> Initialization and Standard Configuration-> Configure HACMP Resource GroupsResource Group Name cqti-cy-db1_resParticipating Node Names (Default Node Priority) cqti-cy-db1 cqti-cy-app1 Resource Group Name cqti-cy-app1_resParticipating Node Names (Default Node Priority) cqti-cy-app1 cqti-cy-db1 F、分别更改资源组cqti-cy-db1_res和cqti-cy-app1_res的属性：#smitty hacmp-> Initialization and Standard Configuration-> Configure HACMP Resource Groups-> Change/Show Resources for a Resource Group (standard)Resource Group Name cqti-cy-db1_resParticipating Node Names (Default Node Priority) cqti-cy-db1 cqti-cy-app1 * Service IP Labels/Addresses [cqti-cy-db1_service] +V olume Groups [datavg] +Filesystems (empty is ALL for VGs specified) [] +Application Servers [cqti-cy-db1_app] +Resource Group Name cqti-cy-app1_resParticipating Node Names (Default Node Priority) cqti-cy-app1 cqti-cy-db1 * Service IP Labels/Addresses [cqti-cy-app1_service] +V olume Groups [650b_datavg] +Filesystems (empty is ALL for VGs specified) [] +Application Servers [cqti-cy-app1_app] +G、同步资源组#smitty hacmp-> Initialization and Standard Configuration-> Verify and Synchronize HACMP ConfigurationIgnore Cluster Verification Errors [No]Un/Configure Cluster Resource? [Yes]* Emulate or Actual? [Actual]14、HACMP服务的启动/停止启动:#smitty clstart* Start now, on system restart or both [now]Broadcast message from (tty) at ...true...停止:#smitty clstop* Stop now, on system restart or both [now]BOARDCAST cluster shutdown? [true]* Shutdown mode[graceful]HACMP服务的启动/停止启动:->#smitty->Communication Application and Services->HACMP for AIX->cluster Services->Start Cluster Services* Start now, on system restart or both [now]Broadcast message from (tty) at ...true...停止:->#smitty->Communication Application and Services->HACMP for AIX->cluster Services->Stop Cluster Services* Stop now, on system restart or both [now]BOARDCAST cluster shutdown? [true]Shutdown mode[graceful]Note:有时会发现HA进程只有一个,需执行此文件:/usr/sbin/snmpv3_ssw -1 (注意不是snmpdv3)。

HACMP配置指导手册(IBM原厂)

HACMP配臵指导手册目录第一章HACMP配臵前硬件准备 (3)第二章HACMP软件安装 (4)2.1 操作系统补丁及指定包的安装 (4)2.2 安装HACMP软件包 (5)2.3升级HACMP的补丁到最新 (5)2.4 重启系统 (5)第三章HACMP的配臵 (5)3.1 主备模式下HACMP 配臵 (5)3.1.1 IP 地址规划 (6)3.1.2 设臵BOOT 地址 (6)3.1.3 配臵通信串口 (7)3.1.4 创建Cluster并添加HA节点 (14)3.1.5 添加service label (17)3.1.6 添加应用程序服务器 (20)3.1.7 创建资源组 (23)3.1.8 修改资源组属性 (25)3.1.9 配臵串型网络 (27)3.1.10 添加串行设备 (31)3.1.11 添加永久地址 (35)3.1.12 HA相关参数调整 (38)3.1.13 第三方存储去Reservation脚本配臵 (40)3.1.14 同步校验 (41)3.1.15 启动HA (42)3.1.16 停止HA (43)3.2 互备模式下HACMP 配臵 (45)3.2.1 IP 地址规划 (45)3.2.2 HA 配臵 (46)3.3 Concurrent模式下HACMP 配臵 (47)3.3.1 Concurrent模式下资源组配臵 (47)第一章HACMP配臵前硬件准备在安装HA 软件之前，首先连接好系统硬件设备，包括心跳线连接，存储设备连接等。

并将需要加到资源组中的卷组在分别导入到群集中的每个节点上。

第二章HACMP软件安装2.1 操作系统补丁及指定包的安装AIX系统需要如下软件包，并且AIX补丁版本达到最新。

bos.clvm.enhbos.datarsct.basic(rsct.basic.hacmp,rsct.basic.rte,rsct.basic.sp)pt.basic(pat.basic.hacmp,pat.basic.rte,pat.basic.sp)pat.clients(pat.clients.hacmp,pat.clients.rte,pat.clients.sp)bos.perf.toolsperfagent.toolsbos.adt.syscallsbos.adt.libm如果您要安装并行的资源组，还要安装下面的包：bos.rte.lvmbos.clvm.enh2.2 安装HACMP软件包将HACMP 5.3 光盘加载到SERVER, 由SMITTY进行安装。

HACMP规划、实施与配置的经验分享

HACMP规划、实施与配置的经验分享李一峰IBM系统与科技事业部liyifeng@议题•HACMP的主要概念•HACMP配置要点•HACMP测试、排错要点•HACMP配置界面•HACMP与Oracle 9i RAC高可用& 容错10+2-31Relative CostNo loss of DataLast transactionGood as your last fullbackupData Availability In theory, none Depends, but typically 3 mins Couple of days Downtime •Lock Step CPUs•Hardened Operating System •Hot Swap Everything •Continuous Restart•Redundant Servers •Redundant Networks•Redundant Network Adapters •Heartbeat Monitoring •Failure Detection •Failure Diagnosis •Automated Fallover•Automated Reintegration•Journaled File System•Dynamic CPU Deallocation •Service Processor •Redundant Power •Redundant Cooling •ECC Memory•Hot Swap Adapters •Dynamic KernelAvailability benefitsSolutionsFault Tolerant ComputersH igh A vailabilityClusters StandaloneFailover possibilities什么环境不适合HACMP•You cannot suffer any downtime–Failovers will cause at least some downtime•Your environment is not stable–HACMP depends on stable software levels and stable configuration–HACMP is susceptible to the “fiddle factor”•Your application needs manual intervention to recover from a failure–Manual reset of a device, etc.使用HACMP的考虑点•Application must be able to recover from a stop/restart operation–Must release all resources when stopped—either normally or abnormally–Must tolerate a loss of memory contents–Must tolerate a loss of processor state–Must perform a restart from a checkpoint–Must recover from partial data writes–Must operate in a “transactional”protocol•There must not be a single point of failure in the HA cluster–Shared power supply, non-protected disk, etc.–HACMP is a software solution什么是HACMP•H igh A vailability C luster M ulti P rocessing •Allows a set of applications to move quickly to astandby processing system.–Heartbeat monitor–IP Address takeover–Resource Grouping–Shared I/OSoftware Layers on a HACMP node •Application–Uses the services made highly available byHACMP•HACMP–Makes services highly available forapplications–Co-ordinates resource availability throughthe cluster•RSCT–Provides reliable communication betweennodes–Co-ordination of subsystems•AIX–Operating system services•LVM–Logical storage management•TCP/IP–Manages communications at a logical layerHACMP的构成•HACMP has a number ofcomponents that make up acomprehensive highavailability package for AIX•HACMP is an applicationwhich:–Monitors clustercomponents,–Detects status changes,–Diagnoses and recoversfrom failures and...–Reintegrates previousfailed components back into the cluster uponrecovery.两节点HACMP 拓扑结构示意图Network ClientsSerial HeartbeatpSeries Cluster Node pSeries Cluster NodeIP NetworkService & Standby Network AdaptersShared DiskIP HeartbeatsCluster Nodes•Since the cluster is treated as a single entity, we refer to the individual computers as nodes.•Each node is an independent system•Inter node communication is defined when the cluster is initialized.Service IP aliases •"Service Address" or "Service Label" is the connection tothe computer•AIX allows many addresses on a single adapter•Does not affect the original configuration•Allows separation of services•Faster to move if necessaryIP 地址切换(IPAT)方式一(替换方式)At systembootWithHACMPrunningAfter adapter failure After failure Adapter Type192.168.0.1192.168.0.6nana Boot /Service 1.1.1.11.1.1.1naStandbyBoot 1.1.1.21.1.1.2Standby192.168.0.2192.168.0.2192.168.0.6192.168.0.6192.168.0.2192.168.0.21.1.1.2Node ANode Bhost •Two logical IP networks (Netmask 255.255.255.0)•One physical network•Clients always access 192.168.0.6•MAC address takeover or ARP cache update is also neededIP 地址切换(IPAT)方式二(别名方式)At system bootWith HACMPrunningAfter adapter failure 192.168.0.110.1.1.1nana 1.1.1.11.1.1.1na1.1.1.21.1.1.2192.168.0.2192.168.0.210.1.1.150192.168.0.110.1.1.15010.1.1.110.1.1.15010.1.1.16010.1.1.160192.168.0.210.1.1.160192.168.0.210.1.1.1601.1.1.210.1.1.1Node ANode BAfter failure host 1.1.1.1 1.1.1.2•Initially configured addresses (Boot IP)•Persistent IP addresses -useful for applications like Tivoli •Service IP addresses -used by clients to access the cluster-multiple are allowedHow Volume Groups are Handled •Two types:–Shared–Non-shared•Shared volume groups can"migrate"•Non-Shared volume groupsare node bound•Application data must be ona shared volume group tobe "moved"•Application code may be oneither type of diskApplication Server Scripts •"Application server", a name given to a series of scripts:–Start the application–Stop the application–Monitor the application (optional)–Re-start the application (optional)•Applications must be able to be started from a previously unknown state by a script•Applications must be able to be stopped by a scriptResource Groups•Logical constructs that group related attributes together •The "container" used by HACMP to "move" resources •Participating node list–default node priorities–Home node•Have Policies on:–Start up–Fall over–Fall back–Distribution policy–Dependant resource groupsResource Group Policies: startup•Resource group start up occurs:–during initial cluster start up–initial acquisition of theresource group–May be modified by a"settling" timer •Online on Home Node Only (OHNO)–only start on the highestpriority•Online on First Available Node (OFAN)–will start on any one node •Online on All Available Nodes (OAAN)–The resource groups will start on all nodes•Online Using DistributionPolicy (OUDP)–One resource group pernetwork or node depending onthe distribution policyResource Group Policies: Fallover•Resource group falloveroccurs:–When the current node can no longer support the resourcegroup and it is "moved" toanother node•Failure has occurred•Graceful shutdown withtabkover of the current node •Fallover to Next Priority Node (FNPN)–Resource group is moved to the next node in the resourcegroup's node list•Fallover using Dynamic Node Priority (FDNP)–Resource group is moved to the next node in the resourcegroup's node list asrecalculated based on thedynamic node criteria policy •Bring Offline on Error Node (BOEN)–Resource group is set to an offline state on this node onlyResource Group Policies: Fallback•Resource group fallback occurs:–The resource group is not on its home node– A higher priority nodebecomes available–Can be modified by a fallback timer •Fallback to a Higher Priority Node (FHPN)–When the higher priority node is available and/or the optionaltimer expires, the resourcegroup moves•Never Fallback (NFB)–Regardless if a higher priority node becomes available, theresource group will not moveHACMP 资源组（Cascading Resource Group ）ABABSystem A fails System B fails System B takes over resource groupNo activitiesSystem A returns to clusterSystem B returnsto clusterA owns resource groupB is backup for AA BA BA owns resource groupB is backup for ASystem B releases resource group(Simple standby operation)C a s c a d i ngHACMP 资源组（Rotating Resource Group ）B owns resource group:ABABSystem A fails System B fails System B takes over resource groupSystem A returns to clusterSystem B returns toclusterA B A BA owns resource group:B is backup for ASystem A takes overresource groupRotatingHACMP 资源组（Concurrent Resource Group ）ABABSystem A fails System B fails No activitiesSystem A returns to clusterSystem B returns to clusterA andB owns resource group:A BA B No activitiesConcurrentA andB owns resource group:Custom Resource Groups (HACMP 5.2, 5.3, 5.4)•Relatively same as HACMP v5.1•Custom Resource Groups are only option•“Types”of Cascading, Rotating, Concurrent by name no longer exist.•All previous configuration options can be created via policies:Startup –what happens when the cluster first startsFallover–what happens when a failure occursFallback –what happens when a node rejoins the clusterCustom Resource Groups (HACMP 5.2, 5.3)•Startup Policy (Select one)–Online On Home Node Only (highest priority to be available)–Online On First Available Node ( like rotating resource group)–Online Using Distribution Policy–Online On All Available Nodes (like concurrnet resource group)•Fallover Policy (Select one)–Fallover to Next Priority Node in the List (like cascading)–Fallover Using Dynamic Node priority (user defined priority policy)–Bring Offline (On Error Node Only) (offline resource during an error condition)•Fallback Policy (Select one)–Fallback To Higher Priority node in the List ( like cascading)–Never Fallback (like rotating resource)C-SPOC•Cluster-Single Point of Control, the System Management interface in HACMP•Cluster wide, cluster aware tool–Add, change, delete users–Add, change, delete file systems–Add, change, delete logical volumes–Add, change, delete physical volumes–Start and stop the cluster–Manage log files–File collectionsCluster Communication•TCP/IP based communication–All network adapters•Use separate logical subnets•Use single subnet with heartbeatover IP aliasing•Non-TCP/IP basedcommunication–Serial (RS232) connection–Target mode–Disk heartbeat• A non-TCP/IP basedcommunication network is highlyrecommendedHACMP监测三类故障•Node Failures-Processor hardware or operationg system failures-One or more surviving nodes can acquire resources•Network Adapter Failures-Move IP address to standby network adapter in same node •Network Failure-Message displayed on console and event is logged-As every site's network configurations are unique , no other default actionis taken-Action to be taken in response to network failures is customizable其它类型的故障•Disk Drive Failures–LVM Mirroring–RAID Disk Devices•Other Hardware Failure–Application Failure (Customization needed , SRC)•HACMP Failure–Promoted to node failure•Power Failure–Avoid common power supplies across replicated devices –Use a UPSNetwork availabilityHubHubDual homingDual networksRouterRouterEliminates hubs as SPOFWhere are clients attached?Routing is trickierUse dual homing and dual networks in a cluster backbone with intelligent routers to provide network availabilityHACMP 2 Node Cluster SAN ExampleHACMP配置要点• 1. Network–1) TCP/IP network: adapter, boot IP address–2) Modify /etc/hosts and /.rhostsor /usr/es/sbin/cluster/etc/rhosts–3) No-TCP/IP network:•Target mode SCSI/SSA•Disk Heartbeat•RS232 (define the device)• 2. Storage–1) Internal disk–2) SSA disk, Fibre disk–3) VG/LV/FS (if you used the concurrent vg ,you must install the package bos.clvm)HACMP配置要点• 3. Application–1) Client/Server–2) Client/Application Server/DBServer–3) Informix / Informix HDR–4) Oracle or RAC–5) DB2 / UDB EEE• 4. Resource Planning–1) Volume group–2)Disk drive–3) File system–4)NFS–5)IP Address–6) ApplicationHACMP配置要点• 5. Resource policies–Startup–Fallover–FallbackHACMP测试要点•Power Off Box•Plug out network cable•ifconfig en# down•stop cluster with takeover mode e.g: –#clstop-gr•Shutdown not takeover•monitoring cluster takeover results:–ifconfig-a --> Service IP takeover ?–lsvg-o --> vg takeover or varyon?–df--> fs mounted ?–ps-ef--> application started ?–tail -f /tmp/hacmp.outHACMP排错要点•Cluster Log Files•Cluster Daemons•Monitoring Cluster:•clstat/xclstat•check log files•check daemons by lssrc-g cluster or ps-ef•lsvg-o•ifconfig-a•netstat-in•lslpp-l cluster.*•Config_too_long•Deadman Switch•CDE and HACMP•Apply patchHACMP相关的日志文件•/tmp/clstrmgr.debug•Generated by the clstrmgr daemon•/usr/es/adm/cluster.log•Generated by cluster scripts and daemons•/usr/es/sbin/cluster/history/cluster.mmddyyyy•Cluster history files generated daily•/tmp/cl_sm.log•Generated by the cluster Shared Memory library•/tmp/cspoc.log•Generated by CSPOC commands•/tmp/dms_loads.out•Generated by deadman's switch activity•/tmp/emuhacmp.out•Generated by the event emulator scripts•/tmp/hacmp.out•Generated by event scripts and utilities•/var/adm/clavan.log•Generated by Application Availability Analysis tool•/var/hacmp/clverify/clverify.log•Generated by Cluster Verification utility.•/var/hacmp/clcomd/clcomd.log•Generated by clcomd daemon•/var/hacmp/clcomd/clcomddiag.log•Generated by clcomd daemon, debug information•/var/hacmp/log/clconfigassist.log•Generated by Two-Node Cluster Configuration Assistant•/var/hacmp/log/clutils.log•Generated by cluster utilities and file propagation.•/var/hacmp/log/cl_testtool.log•Generated by the Cluster Test Tool.•system error log•errpt-aDeadman Switch•AIX kernel extension•Reset by clstrmgr daemon•Tune the system using I/O pacingsmit chgsys to changehigh water mark 0 --> 33low water mark 0 --> 24•Increase the syncd frequency/sbin/rc.bootdefault 60change to 45, 30 or 20•Increase the memory size used by communication subsystemno -ano -o thewall=mem_size•Tuning Virtual Memory Managementincreasing minfree/maxfree•Change the Failure Detection Ratesmit hacmp-->Extended Configuration > Extended Topology Configuration > Configure HACMP Network ModulesFast , normal --> slow•Deadman Switch Time to TriggerRunning the /usr/es/sbin/rsct/bin/hatsdmsinfo commandChange Shared LVM Components•Add/Change/Remove VG/LV/FS•Manual update•Lazy update–Automatic export and import by HACMP While failed takeover time–compare time stamp between on VGDA on disk and in /usr/es/sbin/cluster/etc/vg file–extend takeover time•C_SPOC–real time update–perform on only one nodeHACMP相关的AIX文件包•bos.adt.lib•bos.adt.libm•bos.adt.syscalls•.tcp.client•.tcp.server•bos.rte.SRC•bos.rte.libc•bos.rte.libcfg•bos.rte.libcur•bos.rte.libpthreads•bos.rte.odm•pat.basic.hacmp•pat.clients.hacmp•rsct.core.secConcurrent Logical Volume Manager for concurrent access•bos.rte.lvm.rte•bos.clvm.enh•After both RSCT and HACMP have been installed successfully on all the nodes, all the machines have to been rebooted before going on with HACMPconfiguration.HACMP相关术语HACMP配置菜单Smitty hacmp配置管理Extended Configuration123Extended Topology Configuration1.11.21.31.41.5Extended Resource Configuration2.12.2Extended Resources Configuration2.1.12.1.2Extended Resource Group Configuration2.2.12.2.2启动和停止HACMP服务HACMP实施“案例”Oracle RAC所需的HACMP环境。

HACMP配置

Copyright © UNIS Software System CO.,LTD.All Right ReservedThis document is proprietary to UNIS Software System CO.,LTD., which regards information contained herein as its intellectual property. Under the copyright laws, no part of this document may be copied, translated, or reduced to any electronic medium or machine readable form, in whole or in part, without prior written consent of UNIS Software System CO., LTD.版本记录1、软件环境AIX 5300-05HAMCP 5.42、硬件环境P630两台DS4300存储一台串口线一根3. 系统环境3.1 /etc/hosts127.0.0.1 loopback localhost p630a(另一节点为p630b) 192.168.1.100 p630a_stb192.168.1.200 p630b_stb192.168.2.100 p630a_boot192.168.2.200 p630b_boot192.168.3.100 p630a_svc192.168.3.200 p630b_svc3.2 /.rhosts192.168.1.100192.168.1.200192.168.2.100192.168.2.200192.168.3.100192.168.3.2003.3 clcomdES/usr/bin/startsrc -s clcomdES3.4 确认网卡之间相互连通确认串口线的连通smitty tty->Change / add a TTY->rs232->sa->port number : 0确认：host1: cat /etc/hosts>/dev/tty0host2: cat</dev/tty0在 host2 可看到 host1 上 /etc/hosts 的内容。

Hacmp_网络配置与存储配置

3.1.网络配置群集之间节点通过群集通讯网络进行通讯。

如果一个网络上的一个节点的一块网卡失效，群集会通过该节点的另一块网卡进行通讯。

如果节点连接失败，HACMP会将该节点拥有的资源传送给其他可用节点。

附加的，HACMP（通过RSCT拓扑服务的）在节点之间使用心跳信息来检查群集节点的可用性和群集节点通讯接口的可用性。

如果HACMP检测到一个节点没有心跳，该节点就被认为已经失效，它的资源就会自动传送至其他可用节点。

推荐配置群集节点之间的多条通讯路径，这样能防止群集分割。

在分割的群集中的危险在于，不同分割区的群集节点会不经过协调同时访问一个数据，这会造成数据破坏。

3.1.1.网络类型这里我们讨论下列网络类型：物理的和逻辑的网络一个物理的网络连接两个或更多的物理网络接口。

有很多种物理网络，HACMP将其分为两种：➢基于IP的网络，如以太网、令牌环➢基于设备的网络，如RS-232、SSA标记模式在HACMP中，一组逻辑网络中的接口可以直接和其他网络接口通讯，HACMP给每个逻辑网络一个名称（如net_ether_01）。

HACMP中的一个逻辑网络可以包含一或多个子网，RSCT管理每个逻辑子网中的心跳包。

全局网络多个HACMP网络组成一个全局网络。

HACMP网络是一些不同物理网络和/或逻辑网络的集合，这些网络共享一个冲突域，例如，以太网。

HACMP将这种组合的全局网络视为网络一个网络。

RSCT处理全局网络内部路由。

3.1.2.TCP/IP网络HACMP支持的基于IP的网络有：➢ether（以太网）➢atm（异步传输模式-ATM）➢fddi（光纤分布式数据接口-FDDI）➢hps（SP交换）➢token（令牌环）HACMP通过RSCT拓扑服务监视这些网络。

通过IP别名的心跳在HACMP中，你可以配置通过IP别名控制心跳。

在以前的HACMP版本中心跳只能通过服务/非服务IP地址/标签（基本或引导IP地址/标签）来进行交换。

IBM HACMP 配置

IBM HACMP 系列-- 安装和配置一规划是成功的实现的一半，就 HACMP 而言，如何强调正确规划的重要性都不过分。

如果规划做得不正确，您可能会在以后某个时候发现自己陷入种种限制之中，而要摆脱这些限制可能是非常痛苦的经历。

因此，请保持镇定从容，并使用产品附带的规划工作表；这些工作表对于任何迁移或问题确定情形或者对于为规划做文档记录都是非常有价值的。

一. HACMP 软件安装HACMP 软件提供了一系列可用于使应用程序高度可用的功能。

务必记住，并非所有的系统或应用程序组件都受到 HACMP 的保护。

例如，如果某个关键应用程序的所有数据都驻留在单个磁盘上，并且该磁盘发生了故障，则该磁盘就成了整个集群的单点故障，并且未受到 HACMP 的保护。

在此情况下，必须使用 AIX 逻辑卷管理器或存储子系统保护功能。

HACMP 仅在备份节点上提供磁盘接管，以使数据可继续使用。

这就是 HACMP 规划是如此重要的原因，因为整个规划过程中的主要目标是消除单点故障。

当关键集群功能由单个组件提供时，就存在单点故障。

如果该组件发生故障，集群没有提供该功能的其他途径，依赖该组件的应用程序或服务就会变得不可用。

还要记住，规划良好的集群非常容易安装，可提供更高的应用程序可用性，能够按预期执行，并且比规划不当的集群需要更少的维护。

1.1 检查先决条件在完成规划工作表以后，请验证您的系统是否满足 HACMP 所必需的要求；执行这项额外的工作可以消除许多潜在的错误。

HACMP V5.1 需要下列操作系统组件之一：（1）带 RSCT V2.2.1.30 或更高版本的 AIX 5L V5.1 ML5。

（2）带 RSCT V2.3.1.0 或更高版本（建议使用 2.3.1.1）的 AIX 5L V5.2 ML2。

（3）C-SPOC vpath 支持（需要 SDD 1.3.1.3 或更高版本）。

有关先决条件和 APAR 的最新信息，请参考产品附带的自述文件和以下 IBM 网站：/server/cluster/1.2 全新安装HACMP 支持网络安装管理（Network Installation Management，NIM）程序，包括“备选磁盘迁移”(Alternate Disk Migration) 选项。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Hacmp 5.1 的配置方法hacmp5.1 较以前的版本无论是从界面还是从配置方法都有了很大的改变。

具体内容我们可以从IBM hacmp5。

1 红皮书上获知。

但对于大部分hacmp 软件使用者来说，最重要的是要有一个快速入门的步骤。

因此我会在这里介绍一下hacmp 的入门安装与配置。

一,Hacmp 的软件安装。

1.安装前提如果您的操作系统是AIX５．１，那么您要安装维护补丁包ＭＬ０３以上，目前最高的补丁版本是ＭＬ０５．并且您还要安装RSCT 2.2.1.30 或更高版本。

以下的包也是必须要安装的：• bos.adt.lib• bos.adt.libm• bos.adt.syscalls• .tcp.client• .tcp.server• bos.rte.SRC• bos.rte.libc• bos.rte.libcfg• bos.rte.libcur• bos.rte.libpthreads• bos.rte.odm如果您要安装并行的资源组，还要安装下面的包：• bos.rte.lvm.rte5.1.0.25 or higher• bos.clvm.enh.2.开始安装一般基本上除了haview ，netwiew ( Tivoli),的包以外，所有的hacmp的包都要安装。

3.打补丁。

注意，客户总是忽略给hacmp打补丁这一步骤。

其实对hacmp来说，补丁是十分重要的。

很多发现的缺陷都已经在补丁中被解决了。

有的客户严格的按照正确步骤安装和配置完hacmp 的软件后，发现takeover 有问题，ip接管有问题，机器自动宕机等等千奇百怪的问题，其实都与补丁有关。

所以客户一定要注意打补丁这个环节。

现在hacmp最新的补丁是：IY53044 - Latest HACMP for AIX R510 Fixes as of January 2004大家可以从IBM网站上下载，或者打800-810-1818 热线电话索要。

4.重启机器。

在hacmp 5。

1 中为了安全起见，不再使用/.rhosts 文件来控制两台机器之间的命令和数据交换，而是引进的一个新的进程clcomd 。

如果你编辑/etc/inittab文件就会发现安装完hacmp后，在最后添加了一行：clcomdES:2:once:startsrc -s clcomdES >/dev/console 2>&1 。

因此重新启机后， ps –ef |grep clomd ,会发现：root 12908 6478 0 Apr 12 - 0:21 /usr/es/sbin/cluster/clcomd –d ，证明该进程启动了。

Hacmp5.1使用/usr/es/sbin/cluster/etc/rhosts 文件来代替 /.rhosts 文件的功能。

注意：如果两个节点间的通讯发生了什么问题，可以检查rhots 文件，或者编辑rhosts文件加入两个节点的网络信息。

二,hacmp5.1 的配置我们以两台机器为例：test1 和test2 , 共享三块7133 硬盘。

1.首先配置两台机器的ip 和vg , 以及/etc/hosts 和application 启动/停止脚本test1:/>netstat -inName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 0.4.ac.49.f2.d5 77960 0 47805 0 0en0 1500 100.1 100.1.0.1 77960 0 47805 0 0en1 1500 link#3 0.6.29.ec.44.d6 33 0 11 0 0en1 1500 192.168.0 192.168.0.1 33 0 11 0 0test2:/>netstat -inName Mtu Network Address Ipkts Ierrs Opkts Oerrs Collen0 1500 link#2 0.4.ac.49.60.23 31138 0 82582 0 0en0 1500 100.1 100.1.0.2 31138 0 82582 0 0en1 1500 link#3 0.4.ac.3e.b9.4b 36 0 13 0 0en1 1500 192.168.0 192.168.0.2 36 0 13 0 0test1 :hdisk0 0004383268b07574 rootvg activehdisk3 000438325e22bca7 test1vghdisk4 00043832125e5aa8 Nonehdisk5 000438323d0e4487 Nonetest2 :hdisk0 000d29574085126d rootvg activehdisk5 000438325e22bca7 test1vghdisk6 00043832125e5aa8 Nonehdisk7 000438323d0e4487 None/etc/hosts100.1.0.2 test2_boot1 test2100.1.0.1 test1_boot1 test1192.168.0.1 test1_boot2192.168.0.2 test2_boot210.1.0.1 test1_svc10.1.0.2 test2_svc10.1.0.5 test1_per10.1.0.6 test1_pertest2:/ha51>ls –l-rwxr-xr-x 1 root system 65 Apr 13 13:51 start-rw-r--r-- 1 root system 31 Apr 13 11:49 start.log-rwxr-xr-x 1 root system 66 Apr 13 14:01 start1-rw-r--r-- 1 root system 31 Apr 13 14:01 start1.log -rwxrwxrwx 1 root system 64 Apr 13 11:48 stop-rw-r--r-- 1 root system 31 Apr 13 11:48 stop.log-rwxr-xr-x 1 root system 66 Apr 13 14:01 stop1-rw-r--r-- 1 root system 31 Apr 13 14:01 stop1.logvi startdate >> /ha51/start.logbanner " start app1 " >> /tmp/hacmp.outvi stopdate >> /ha51/stop.logbanner "stop app1 " >> /tmp/hacmp.outvi start1date >> /ha51/start1.logbanner " start app2 " >> /tmp/hacmp.outvi stop1date >> /ha51/stop1.logbanner "stop app2 " >> /tmp/hacmp.out注意：在两个节点要保证hosts 和启动/停止脚本要一样存在。

2. 用smitty hacmp 来配置hacmp添加cluster 和nodesmitty hacmpInitialization and Standard ConfigurationExtended ConfigurationSystem Management (C-SPOC)Problem Determination T oolsAdd Nodes to an HACMP ClusterConfigure Resources to Make Highly AvailableConfigure HACMP Resource GroupsVerify and Synchronize HACMP ConfigurationDisplay HACMP Configuration* Cluster Name [ha51tsc]New Nodes (via selected communication paths) [m [test2_boot1 test1_boot1] Currently Configured Node(s)这一部很重要，一般我们都是把每个节点的boot1 作为communication path . New node 可以一起加，也可以一个一个的加。

当回车以后，系统会自己discover hacmp 的资源显示如下：。

IP Network Discovery completed normallyCurrent cluster configuration:No resource groups definedCluster Description of Cluster: ha51tscCluster Security Level: StandardThere are 2 node(s) and 1 network(s) definedNODE test1:Network net_ether_02test1_boot1 100.1.0.1test1_boot2 192.168.0.1NODE test2:Network net_ether_02test2_boot1 100.1.0.2test2_boot2 192.168.0.2。

添加高可用资源（service ip , application server , vg and jfs ）添加服务ip地址Add Nodes to an HACMP ClusterConfigure Resources to Make Highly AvailableConfigure HACMP Resource GroupsVerify and Synchronize HACMP ConfigurationDisplay HACMP ConfigurationConfigure Service IP Labels/AddressesConfigure Application ServersConfigure Volume Groups, Logical Volumes and FilesystemsConfigure Concurrent Volume Groups and Logical VolumesAdd a Service IP Label/AddressChange/Show a Service IP Label/AddressRemove Service IP Label(s)/Address(es)* IP Label/Address [test1_svc ] Network Name [net_ether_02 ]* IP Label/Address [test2_svc ] Network Name [net_ether_02 ]添加application serverConfigure Service IP Labels/AddressesConfigure Application ServersConfigure Volume Groups, Logical Volumes and FilesystemsConfigure Concurrent Volume Groups and Logical VolumesAdd an Application ServerChange/Show an Application ServerRemove an Application Server* Server Name [app1]* Start Script [/ha51/start]* Stop Script [/ha51/stop]* Server Name [app2]* Start Script [/ha51/start1]* Stop Script [/ha51/stop1]添加共享vg , jfs注意在前面的步骤中我们看到已经有一个共享VG test1vg 存在了，它使用传统的方法：1.在test1 节点上创建test1vg , lv,jfs2.Varyoffvg3.在test2 上 importvg4.Varyoffvg现在我们试着用hacmp的功能去创建test2vgConfigure Service IP Labels/AddressesConfigure Application ServersConfigure Volume Groups, Logical Volumes and FilesystemsConfigure Concurrent Volume Groups and Logical VolumesShared Volume GroupsShared Logical VolumesShared File SystemsSynchronize Shared LVM MirrorsSynchronize a Shared Volume Group DefinitionList All Shared Volume GroupsCreate a Shared Volume GroupCreate a Shared Volume Group with Data Path DevicesSet Characteristics of a Shared Volume GroupImport a Shared Volume GroupMirror a Shared Volume GroupUnmirror a Shared Volume Group在选择菜单中同时用F7 选择test1 和 test2Ø test1Ø test2选中PVID00043832125e5aa8Node Names test1,test2PVID 00043832125e5aa8VOLUME GROUP name [test2vg ]Physical partition SIZE in megabytes 4 Volume group MAJOR NUMBER [49]test2:/ha51>lspvhdisk0 000d29574085126d rootvg activehdisk5 000438325e22bca7 test1vghdisk6 00043832125e5aa8 test2vghdisk7 000438323d0e4487 Nonetest1:/ha51>lspvhdisk0 0004383268b07574 rootvg activehdisk3 000438325e22bca7 test1vghdisk4 00043832125e5aa8 test2vghdisk5 000438323d0e4487 None同样方法你可以在两个节点上同时创建ljfsShared Volume GroupsShared Logical VolumesShared File SystemsSynchronize Shared LVM MirrorsSynchronize a Shared Volume Group DefinitionJournaled File SystemsEnhanced Journaled File SystemsAdd a Journaled File SystemAdd a Journaled File System on a Previously Defined Logical VolumeList All Shared File SystemsChange / Show Characteristics of a Shared File SystemRemove a Shared File SystemAdd a Standard Journaled File SystemAdd a Compressed Journaled File SystemAdd a Large File Enabled Journaled File Systemtest1vg test1,test2test2vg test1,test2Node Names test1,test2Volume group name test1vg* SIZE of file system [10 ]* MOUNT POINT [/test1jfs]PERMISSIONS read/writeMount OPTIONS []Start Disk Accounting? no Fragment Size (bytes) 4096 Number of bytes per inode 4096Allocation Group Size (MBytes) 8系统会自动在test1上添加test1jfs 文件系统，并且自动会在两个节点上作update . 但是根据我自己的经验，最好还是用传统的方式在一个结点上创建vg ，lv, jfs . 然后再import 到另一个节点上。