sun cluster

合集下载

SUN系统操作维护文档_1

系统操作维护文档1、Cluster的操作l 查看Cluster状态：使用hastat命令来查看# hostnameChina168-1# hastatGetting Information from all the nodes ......HIGH A V AILABILITY CONFIGURATION AND STATUS------------------------------------------LISTOF NODES CONFIGURED IN <China168> CLUSTERChina168-1 China168-2CURRENT MEMBERS OF THE CLUSTERChina168-1 is a cluster memberChina168-2 is a cluster memberCONFIGURATION STATE OF THE CLUSTERConfiguration State on China168-1: StableConfiguration State on China168-2: StableUPTIME OF NODES IN THE CLUSTERuptime of China168-1: 3:17pm up 1 day(s), 2:30, 3 users, load average: 0.13, 0.41, 0.50uptime of China168-2: 3:18pm up 1 day(s), 2:30, 3 users, load average: 0.08, 0.18, 0.24LOGICAL HOSTS MASTERED BY THE CLUSTER MEMBERS Logical Hosts Mastered on China168-1:NoneLogical Hosts for which China168-1 is Backup Node:NoneLogical Hosts Mastered on China168-2:NoneLogical Hosts for which China168-2 is Backup Node:NoneLOGICAL HOSTS IN MAINTENANCE STA TENoneSTA TUS OF PRIV ATE NETS IN THE CLUSTERStatus of Interconnects on China168-1:interconnect0: selectedinterconnect1: upStatus of private nets on China168-1:To China168-1 - UPTo China168-2 - UPStatus of Interconnects on China168-2:interconnect0: selected interconnect1: upStatus of private nets on China168-2:To China168-1 - UPTo China168-2 - UPSTA TUS OF PUBLIC NETS IN THE CLUSTERStatus of Public Network On China168-1:bkggrp r_adp status fo_time live_adpnafo0 hme0:hme1 OK 89755 hme1Status of Public Network On China168-2:bkggrp r_adp status fo_time live_adpnafo0 hme0:hme1 OK NEVER hme0…………………….我们可以看到上面红字的部分，它表示现在两台机器都是Cluster的成员。

SUNCLUSTER数据库双机热备研究

ｃｓｒｌｔ为例，ｕｅ研究了该软件的工作原理、施方法以实及进行双机热备切换实验。实验表明，用了双机采热备后，ｕｌｓｒ在Ｏａｌ据库服务器发生Ｓｎｃｔ能ｕｅｒｃｅ数故障时，在极短的时间内将Ｏａｌ故障机切换到ｒｃｅ从
点都可以承担一定的处理负载，且可以实现处理并负载在节点之间的动态分配，以实现负载均衡。同
务量比较大的行业要求其应用系统和数据库系统
１集群分类
集群是一种计算机系统，它通过一组松散集成
的计算机软件或硬件连接起来，高度紧密地协作完成计算工作。在用户的角度来看，他们可以被看作是一
台计算机。集群系统中的单个计算机称为节点，通常通过局域网连接。集群一般被用来改进单个计算机的计算速度或可靠性。一般情况下集群比单个计算机（工作站或超级计算机）能价格比要高得多。因性此，集群受到政府、企业、等行业的青睐。教育
０引言
近年来，着信息技术的飞速发展，行各业随各对数据库系统的稳定性要求越来越高，当系统出现
故障时，够在尽可能短的时间内恢复系统正常运能行成为了评价数据库系统稳定性和可靠性的指标。
备用机，保证了应用系统的持续运行。

Sun StorageTek Storage Archive Manager (SAM) 安裝與升級

Sun Microsystems, Inc.請將您對本文件的意見提交至：/hwdocs/feedback Sun StorageT ek™Storage Archive Manager (SAM)安裝與升級指南版本 4 Update 6文件號碼 820-1734-102007 年 5 月，修訂版 A請回收Copyright 2007 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. 版權所有。

Sun Microsystems, Inc. 對於本文件所述技術擁有智慧財產權。

這些智慧財產權包含 /patents 上所列的一項或多項美國專利，以及在美國與其他國家/地區擁有的一項或多項其他專利或申請中專利，但並不以此為限。

本文件及相關產品在限制其使用、複製、發行及反編譯的授權下發行。

未經 Sun 及其授權人 (如果有) 事先的書面許可，不得使用任何方法、任何形式來複製本產品或文件的任何部分。

協力廠商軟體，包含字型技術，其版權歸 Sun 供應商所有，經授權後使用。

本產品中的某些部分可能源自加州大學授權的 Berkeley BSD 系統的開發成果。

UNIX 是在美國和其他國家/地區之註冊商標，已獲得 X/Open Company, Ltd. 專屬授權。

Sun 、Sun Microsystems 、Sun 標誌、AnswerBook2、、Solaris 、SunOS 、SunSolve 、Java 、JavaScript 、JavaHelp 、Solstice DiskSuite 與 Sun StorageTek 是 Sun Microsystems, Inc. 在美國及其他國家/地區的商標或註冊商標。

所有 SPARC 商標都是 SPARC International, Inc. 在美國及其他國家/地區的商標或註冊商標，經授權後使用。

sun cluster 常用命令

Removing card
pnmset
pnmset -c nafo0 -o remove qfe0
Switching active adaptor
pnmset
pnmset -c nafo0 -o switch qfe0 (Adaptor to switchto)
Check status
pnmset
pnmset -l
scrgadm -a -g my-rg -h nodeA,nodeB [ -y Pathprefix=/my-directory ]
Add resource to Group
scrgadm -a -j my-res -g my-rg -t SUNW.nfs -x ServicePaths=/my-directory
Register change to disk group
scconf
scconf -c -D name=mydg,sync
Switch device group to a node
scswitch
scswitch -z -D mydg -h nodeA (node to switchto)
Check status
1. scswitch
scswitch -S -h nodeA (switch resources)
2. shutdown
shutdown -y -g30 -i0
Rebooting a node
1. scswitch
scswitch -S -h nodeA (switch resources)
2. shutdown
Add a quorum device
scconf
scconf -a -q globaldev=d12

Sun cluster 3.x 培训(规划、安装、配置和管理)

Sun cluster3.x
软件架构及切换原理
每一个节点在投票设备上分配了一个64-
重要模块及概念介绍：
Persistent Reservations and Reservation Keys
bit的key;
Su换原理
如果因为某种原因，节点2退出cluster系统，实际上，节点1会从投票设备上删除节点2的64-bit 的key；
双机系统的规划
1、硬件互联的规划
硬件互联要必须解决的一个关键问题：scsi-initiator-id
双机系统的规划
1、硬件互联的规划
(以T3单端口磁盘阵列为例说明)
主机节点冗余 Redundant Servers
每个节点本身的一些部件是冗余的；节点之间互为备份是冗余的；
每个节点对外提供服务的网口是冗余的
Redundant Public Networks
两个节点相互通信的网口是冗余的
Redundant Private Networks
2、This is called split-brain operation.
Sun cluster3.x
投票机制：
软件架构及切换原理
重要模块及概念介绍：
1、两个节点首先尝试保留指定的投票设备；
2、第一个获得投票设备的节点将作为cluster成员保留在cluster中； 3、在竞争中，失去投票设备的节点将退出cluster；
Sun cluster3.x
软件架构及切换原理
重要模块及概念介绍：
投票机制：
Sun cluster架构引入了一种投票机制“voting system”，在这个投票系统中： 1、每个节点拥有一票（one vote）； 2、被指定为投票设备（quorum devices）的共享硬盘拥有一票； 3、整个系统必须拥有多数票（majority）时才能形成cluster，正常运行； (more than 50 percent of all possible votes present)

Solaris11.3+suncluster4.3+VirtualBox

Solaris11.3+suncluster4.3+VirtualBox一、准备工作：笔记本一台:8G内存、CPU I3以上，建议用SSD硬盘操作系统Solaris11.3-X86 安装包+补丁包（repo+SRU）HA软件：Suncluster4.3安装包虚机要求：Node1：OS盘30G（动态分配）；IP:192.168.56.10/24；配置三个网口（1个OS，2个心跳）Node1：OS盘30G（动态分配）；IP:192.168.56.11/24；配置三个网口（1个OS，2个心跳）Quorum盘： 1GData盘：2G*2二、安装OS+打补丁+cluster1.安装操作系统solaris11.3此步骤略。

2.打补丁和安装图形#pkg set-publisher -G '*' -g /sol-11-3-SRU/publisher/solaris/solaris# pkg set-publisher -g file:///mnt/repo/ solaris# pkg update# pkg install --accept solaris-desktop3.安装cluster4.3# mount -F hsfs /root/osc-4_3-bldnum-repo-full.iso /mnt# pkg set-publisher -G '*' -g file:///mnt/repo ha-cluster# pkg install ha-cluster-full# pkg list ha-cluster-full三、添加共享磁盘1.创建磁盘：D:\virtual+vmware\VirtualBox>VBoxManage.exe createhd -filename D:\virtual+vmware\quorum.vdi -size 1024 --format VDI --variant Fixed0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%Medium created. UUID: 7d05bce2-7d62-4fdf-80cb-b37138c4e496D:\virtual+vmware\VirtualBox>VBoxManage.exe createhd -filename D:\virtual+vmware\data1.vdi -size 2048 --format VDI --variant Fixed0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%Medium created. UUID: 55b08a9b-6043-47c1-9ba4-4215c2680de5D:\virtual+vmware\VirtualBox>VBoxManage.exe createhd -filename D:\virtual+vmware\data2.vdi -size 2048 --format VDI --variant Fixed0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%Medium created. UUID: 51d55226-ac4f-4eed-b2ab-567d9c9053122.把磁盘添加到虚拟机里node1、node2：D:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node2 --storagectl "SATA" --port 1 --device 0 --type hdd --medium D:\virtual+vmware\quorum.vdi --mtype shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node2 --storagectl "SATA" --port 2 --device 0 --type hdd --medium D:\virtual+vmware\data1.vdi --mtype shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node2 --storagectl "SATA" --port 3 --device 0 --type hdd --medium D:\virtual+vmware\data2.vdi --mtype shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node1 --storagectl "SATA" --port 1 --device 0 --type hdd --medium D:\virtual+vmware\quorum.vdi --mtype shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node1 --storagectl "SATA" --port 2 --device 0 --type hdd --medium D:\virtual+vmware\data1.vdi --mtype shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe storageattach solaris-node1 --storagectl "SATA" --port 3 --device 0 --type hdd --medium D:\virtual+vmware\data2.vdi --mtype shareable3.磁盘共享：D:\virtual+vmware\VirtualBox>VBoxManage.exe modifyhd d:\virtual+vmware\quorum.vdi --type shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe modifyhd d:\virtual+vmware\data1.vdi --type shareableD:\virtual+vmware\VirtualBox>VBoxManage.exe modifyhd d:\virtual+vmware\data2.vdi --type shareable四、配置集群：1．初始化配置# scinstall*** Main Menu ***Please select from one of the following (*) options:* 1) Create a new cluster or add a cluster node2) Configure a cluster to be JumpStarted from this install server3) Manage a dual-partition upgrade4) Upgrade this cluster node* 5) Print release information for this cluster node* ?) Help with menu options* q) QuitOption: 1*** New Cluster and Cluster Node Menu ***Please select from any one of the following options:1) Create a new cluster2) Create just the first node of a new cluster on this machine3) Add this machine as a node in an existing cluster?) Help with menu optionsq) Return to the Main MenuOption: 2*** Establish Just the First Node of a New Cluster ***This option is used to establish a new cluster using this machine as the first node in that cluster.Before you select this option, the Oracle Solaris Cluster framework software must already be installed. Use the Oracle Solaris Cluster installation media or the IPS packaging system to install Oracle Solaris Cluster software.Press Control-d at any time to return to the Main Menu.Do you want to continue (yes/no) [yes]?>>> Typical or Custom Mode <<<This tool supports two modes of operation, Typical mode and Custom.For most clusters, you can use Typical mode. However, you might need to select the Custom mode option if not all of the Typical defaultscan be applied to your cluster.For more information about the differences between Typical and Custom modes, select the Help option from the menu.Please select from one of the following options:1) Typical2) Custom?) Helpq) Return to the Main MenuOption [1]: 2>>> Cluster Name <<<Each cluster has a name assigned to it. The name can be made up of any characters other than whitespace. Each cluster name should be unique within the namespace of your enterprise.What is the name of the cluster you want to establish? yx_cluster >>> Check <<<This step allows you to run cluster check to verify that certain basic hardware and software pre-configuration requirements have been met. If cluster check detects potential problems with configuring this machineas a cluster node, a report of violated checks is prepared andavailable for display on the screen.Do you want to run cluster check (yes/no) [no]?>>> Cluster Nodes <<<This Oracle Solaris Cluster release supports a total of up to 16nodes.Please list the names of the other nodes planned for the initialcluster configuration. List one node name per line. When finished,type Control-D:Node name (Control-D to finish): node1Node name (Control-D to finish): node2Node name (Control-D to finish): ^DThis is the complete list of nodes:node1node2Is it correct (yes/no) [yes]?>>> Authenticating Requests to Add Nodes <<<Once the first node establishes itself as a single node cluster, other nodes attempting to add themselves to the cluster configuration must be found on the list of nodes you just provided. You can modify thislist by using claccess(1CL) or other tools once the cluster has been established.By default, nodes are not securely authenticated as they attempt to add themselves to the cluster configuration. This is generally considered adequate, since nodes which are not physically connected to the private cluster interconnect will never be able to actually jointhe cluster. However, DES authentication is available. If DES authentication is selected, you must configure all necessary encryption keys before any node will be allowed to join the cluster (seekeyserv(1M), publickey(4)).Do you need to use DES authentication (yes/no) [no]?>>> Minimum Number of Private Networks <<<Each cluster is typically configured with at least two private networks. Configuring a cluster with just one private interconnect provides less availability and will require the cluster to spend moretime in automatic recovery if that private interconnect fails.Should this cluster use at least two private networks (yes/no) [yes]? >>> Point-to-Point Cables <<<The two nodes of a two-node cluster may use a directly-connected interconnect. That is, no cluster switches are configured. However, when there are greater than two nodes, this interactive form of scinstall assumes that there will be exactly one switch for eachprivate network.Does this two-node cluster use switches (yes/no) [yes]? no >>> Cluster Transport Adapters and Cables <<<Transport adapters are the adapters that attach to the private cluster interconnect.Select the first cluster transport adapter:1) net02) net14) net115) net36) net47) net58) net69) net710) net9n) Next >Option: 9Adapter "net5" is an Ethernet adapter.Searching for any unexpected network traffic on "net5" ... doneVerification completed. No traffic was detected over a 10 second sample period.The "dlpi" transport type will be set for this cluster.Select the second cluster transport adapter:1) net02) net14) net115) net36) net47) net58) net69) net710) net9n) Next >Option: 3Adapter "net11" is an Ethernet adapter.Searching for any unexpected network traffic on "net11" ... doneVerification completed. No traffic was detected over a 10 secondsample period.The "dlpi" transport type will be set for this cluster.>>> Network Address for the Cluster Transport <<<The cluster transport uses a default network address of 172.16.0.0. Ifthis IP address is already in use elsewhere within your enterprise,specify another address from the range of recommended privateaddresses (see RFC 1918 for details).The default netmask is 255.255.240.0. You can select another netmask,as long as it minimally masks all bits that are given in the networkaddress.The default private netmask and network address result in an IPaddress range that supports a cluster with a maximum of 64 nodes, 10private networks, and 12 virtual clusters.Is it okay to accept the default network address (yes/no) [yes]?Is it okay to accept the default netmask (yes/no) [yes]?Plumbing network address 172.16.0.0 on adapter bge0 >>NOT DUPLICATE ... done Plumbing network address 172.16.0.0 on adapter bge1 >> NOT DUPLICATE ... done>>> Global Devices File System <<<Each node in the cluster must have a local file system mounted on/global/.CFDEVices/node@<nodeID> before it can successfully participateas a cluster member. Since the "nodeID" is not assigned untilscinstall is run, scinstall will set this up for you.You must supply the name of either an already-mounted file system or araw disk partition which scinstall can use to create the globalCFDEVices file system. This file system or partition should be at least512 MB in size.Alternatively, you can use a loopback file (lofi), with a new filesystem, and mount it on /global/.CFDEVices/node@<nodeid>.If an already-mounted file system is used, the file system must be empty. If a raw disk partition is used, a new file system will becreated for you.If the lofi method is used, scinstall creates a new 100 MB file system from a lofiCFDEVice by using the file /.globalCFDEVices. The lofi method is typically preferred, since it does not require the allocation of a dedicated disk slice.The default is to use /globalCFDEVices.Is it okay to use this default (yes/no) [yes]?>>> Set Global Fencing <<<Fencing is a mechanism that a cluster uses to protect data integrity when the cluster interconnect between nodes is lost. By default,fencing is turned on for global fencing, and each disk uses the global fencing setting. This screen allows you to turn off the globalfencing.Most of the time, leave fencing turned on. However, turn off fencing when at least one of the following conditions is true: 1) Your shared storageCFDEVices, such as Serial Advanced Technology Attachment (SATA) disks, do not support SCSI; 2) You want to allow systems outside your cluster to access storage CFDEVices attached to your cluster; 3) Sun Microsystems has not qualified the SCSI persistent group reservation (PGR) support for your shared storage CFDEVices.If you choose to turn off global fencing now, after your clusterstarts you can still use the cluster(1CL) command to turn on global fencing.Do you want to turn off global fencing (yes/no) [no]?>>> Quorum Configuration <<<Every two-node cluster requires at least one quorum CFDEVice. By default, scinstall selects and configures a shared disk quorum CFDEVice for you.This screen allows you to disable the automatic selection and configuration of a quorum CFDEVice.You have chosen to turn on the global fencing. If your shared storage CFDEVices do not support SCSI, such as Serial Advanced TechnologyAttachment (SATA) disks, or if your shared disks do not supportSCSI-2, you must disable this feature.If you disable automatic quorum CFDEVice selection now, or if you intendto use a quorum CFDEVice that is not a shared disk, you must instead useclsetup(1M) to manually configure quorum once both nodes have joinedthe cluster for the first time.Do you want to disable automatic quorum CFDEVice selection (yes/no) [no]? yes>>> Automatic Reboot <<<Once scinstall has successfully initialized the Oracle Solaris Clustersoftware for this machine, the machine must be rebooted. After thereboot, this machine will be established as the first node in the newcluster.Do you want scinstall to reboot for you (yes/no) [yes]?>>> Confirmation <<<Your responses indicate the following options to scinstall:scinstall -i \-C sap-cluster \-F \-T node=Node1,node=Node2,authtype=sys \-wnetaddr=172.16.0.0,netmask=255.255.240.0,maxnodes=32,maxprivatenets=10,numvirtualclusters=12, numxipvirtualclusters=3 \-A trtype=dlpi,name=net7 -A trtype=dlpi,name=net11 \-B type=direct \-P task=security,state=SECUREAre these the options you want to use (yes/no) [yes]?Do you want to continue with this configuration step (yes/no) [yes]?Initializing cluster name to "sap-cluster" ... doneInitializing authentication options ... doneInitializing configuration for adapter "net7" ... doneInitializing configuration for adapter "net11" ... doneInitializing private network address options ... doneSetting the node ID for "node1" ... done (id=1)Checking for global CFDEVices global file system ... doneUpdating vfstab ... doneVerifying that NTP is configured ... doneInitializing NTP configuration ... doneUpdating nsswitch.conf ... doneAdding cluster node entries to /etc/inet/hosts ... doneConfiguring IP multipathing groups ...doneEnsure that the EEPROM parameter "local-mac-address?" is set to "true" ... doneEnsure network routing is disabled ... doneNetwork routing has been disabled on this node by creating /etc/notrouter. Having a cluster node act as a router is not supported by Oracle Solaris Cluster. Please do not re-enable network routing.Log file - /var/cluster/logs/install/scinstall.log.1686Rebooting ...节点2-node2root@Node2 # scinstall*** Main Menu ***Please select from one of the following (*) options:* 1) Create a new cluster or add a cluster node2) Configure a cluster to be JumpStarted from this install server3) Manage a dual-partition upgrade4) Upgrade this cluster node* 5) Print release information for this cluster node* ?) Help with menu options* q) QuitOption: 1*** New Cluster and Cluster Node Menu ***Please select from any one of the following options:1) Create a new cluster2) Create just the first node of a new cluster on this machine3) Add this machine as a node in an existing cluster?) Help with menu optionsq) Return to the Main MenuOption: 3*** Add a Node to an Existing Cluster ***This option is used to add this machine as a node in an already established cluster. If this is a new cluster, there may only be asingle node which has established itself in the new cluster.Before you select this option, the Oracle Solaris Cluster framework software must already be installed. Use the Oracle Solaris Cluster installation media or the IPS packaging system to install Oracle Solaris Cluster software.Press Control-d at any time to return to the Main Menu.Do you want to continue (yes/no) [yes]?>>> Typical or Custom Mode <<<This tool supports two modes of operation, Typical mode and Custom.For most clusters, you can use Typical mode. However, you might need to select the Custom mode option if not all of the Typical defaultscan be applied to your cluster.For more information about the differences between Typical and Custom modes, select the Help option from the menu.Please select from one of the following options:1) Typical2) Custom?) Helpq) Return to the Main MenuOption [1]: 2>>> Sponsoring Node <<<For any machine to join a cluster, it must identify a node in thatcluster willing to "sponsor" its membership in the cluster. When configuring a new cluster, this "sponsor" node is typically the firstnode used to build the new cluster. However, if the cluster is already established, the "sponsoring" node can be any node in that cluster.Already established clusters can keep a list of hosts which are ableto configure themselves as new cluster members. This machine should be in the join list of any cluster which it tries to join. If the listdoes not include this machine, you may need to add it by using claccess(1CL) or other tools.And, if the target cluster uses DES to authenticate new machines attempting to configure themselves as new cluster members, the necessary encryption keys must be configured before any attempt to join.What is the name of the sponsoring node? Node1 1>>> Cluster Name <<<Each cluster has a name assigned to it. When adding a node to the cluster, you must identify the name of the cluster you are attemptingto join. A sanity check is performed to verify that the "sponsoring"node is a member of that cluster.What is the name of the cluster you want to join? crm-clusterAttempting to contact "Node1" ... doneCluster name "crmjkdb_cluster" is correct.Press Enter to continue:>>> Check <<<This step allows you to run cluster check to verify that certain basic hardware and software pre-configuration requirements have been met. If cluster check detects potential problems with configuring this machineas a cluster node, a report of violated checks is prepared andavailable for display on the screen.Do you want to run cluster check (yes/no) [no]?>>>Autodiscovery of Cluster Transport <<<If you are using Ethernet or Infiniband adapters as the cluster transport adapters, autodiscovery is the best method for configuringthe cluster transport.Do you want to use autodiscovery (yes/no) [yes]?Probing ......................The following connection was discovered:Node1:bge1 -Node1:bge1Probes were sent out from all transport adapters configured forcluster node "Node1". But, they were only received on less than 2of the network adapters on this machine ("Node1"). This may be dueto any number of reasons, including improper cabling, an improper configuration for "Node1", or a switch which was confused by theprobes.You can either attempt to correct the problem and try the probes again or manually configure the transport. To correct the problem mightinvolve re-cabling, changing the configuration for "Node1", orfixing hardware. You must configure the transport manually toconfigure tagged VLAN adapters and non tagged VLAN adapters on the same private interconnect VLAN.Do you want to try again (yes/no) [yes]?Probing .........The following connections were discovered:Node1:net3 -Node1:net3Node1:net7 -Node1:net7Is it okay to configure these connections (yes/no) [yes]?>>> Global Devices File System <<<Each node in the cluster must have a local file system mounted on/global/.CFDEVices/node@<nodeID> before it can successfully participate as a cluster member. Since the "nodeID" is not assigned untilscinstall is run, scinstall will set this up for you.You must supply the name of either an already-mounted file system or a raw disk partition which scinstall can use to create the globalCFDEVices file system. This file system or partition should be at least 512 MB in size.Alternatively, you can use a loopback file (lofi), with a new file system, and mount it on /global/.CFDEVices/node@<nodeid>.If an already-mounted file system is used, the file system must be empty. If a raw disk partition is used, a new file system will becreated for you.If the lofi method is used, scinstall creates a new 100 MB file system from a lofiCFDEVice by using the file /.globalCFDEVices. The lofi methodis typically preferred, since it does not require the allocation of a dedicated disk slice.The default is to use /globalCFDEVices.Is it okay to use this default (yes/no) [yes]?>>> Automatic Reboot <<<Once scinstall has successfully initialized the Oracle Solaris Cluster software for this machine, the machine must be rebooted. The rebootwill cause this machine to join the cluster for the first time.Do you want scinstall to reboot for you (yes/no) [yes]?>>> Confirmation <<<Your responses indicate the following options to scinstall:scinstall -i \-C crmjkdb_cluster \-N Node1 \-A trtype=dlpi,name=bge0 -A trtype=dlpi,name=bge1 \-B type=direct \-m endpoint=:bge0,endpoint=Node1:bge0 \-m endpoint=:bge1,endpoint=Node1:bge1Are these the options you want to use (yes/no) [yes]?Do you want to continue with this configuration step (yes/no) [yes]?Checking CFDEVice to use for global CFDEVices file system ... doneAdding node "Node1" to the cluster configuration ... doneAdding adapter "bge0" to the cluster configuration ... doneAdding adapter "bge1" to the cluster configuration ... doneAdding cable to the cluster configuration ... doneAdding cable to the cluster configuration ... doneCopying the config from "Node1" ... doneCopying the postconfig file from "Node1" if it exists ... doneNo postconfig file found on "Node1", continuingSetting the node ID for "Node1" ... done (id=2)Verifying the major number for the "did" driver with "Node1" ... doneChecking for global CFDEVices global file system ... doneUpdating vfstab ... doneVerifying that NTP is configured ... doneInitializing NTP configuration ... doneUpdating nsswitch.conf ... doneAdding cluster node entries to /etc/inet/hosts ... doneConfiguring IP multipathing groups ...doneEnsure that the EEPROM parameter "local-mac-address?" is set to "true" ... doneEnsure network routing is disabled ... doneNetwork routing has been disabled on this node by creating /etc/notrouter. Having a cluster node act as a router is not supported by Oracle Solaris Cluster. Please do not re-enable network routing.Updating file ("ntp.conf.cluster") on node node1 ... doneUpdating file ("hosts") on node Node1 ... doneLog file - /var/cluster/logs/install/scinstall.log.1561Rebooting ...2．注册服务：# scrgadm -a -t SUNW.HAStoragePlus# scrgadm -a -t SUNW.gds:6注：这些服务只需在一个节点上操作即可#scrgadm –pv 检查是否安装3．添加仲裁# devfsadm -C# scdidadm -C# scdidadm -r# scdiadm -ui# clquorum add d7# scconf -c -q reset4．创建资源组及IP资源# clsetup*** Main Menu ***Please select from one of the following options:1) Quorum2) Resource groups3) Data Services4) Cluster interconnect5) Device groups and volumes6) Private hostnames7) New nodes8) Zone Cluster9) Other cluster tasks?) Help with menu optionsq) QuitOption:Option: 2*** Resource Group Menu ***Please select from one of the following options:1) Create a resource group2) Add a network resource to a resource group3) Add a data service resource to a resource group4) Resource type registration5) Online/Offline or Switchover a resource group6) Suspend/Resume recovery for a resource group7) Enable/Disable a resource8) Change properties of a resource group9) Change properties of a resource10) Remove a resource from a resource group11) Remove a resource group12) Clear the stop_failed error flag from a resource?) Helps) Show current statusq) Return to the main menuOption: 1>>> Create a Resource Group <<<Use this option to create a new resource group. You can also use this option to create new resources for the new group.A resource group is a container into which you can place resources of various types, such as network and data service resources. The cluster uses resource groups to manage its resource types. There are two types of resource groups, failover and scalable.Only failover resource groups may contain network resources. A network resource is either a LogicalHostname or SharedAddress resource.It is important to remember that each scalable resource group depends upon one or more failover resource groups which contains one or more SharedAddress network resources.Is it okay to continue (yes/no) [yes]?Select the type of resource group you want to add:1) Failover Group2) Scalable GroupOption: 1What is the name of the group you want to add? data-rgDo you want to add an optional description (yes/no) [yes]? noBecause this cluster has two nodes, the new resource group will be configured to be hosted by both cluster nodes.At this time, you may select one node to be the preferred node for hosting this group. Or, you may allow the system to select a preferred node on an arbitrary basis.Do you want to specify a preferred node (yes/no) [yes]?Select the preferred node or zone for hosting this group:1) Node12) Node2Option: 1Some types of resources (for example, HA for NFS) require the use of an area in a global file system for storing configuration data. If anyof the resources that will be added to this group require such support, you can specify the full directory path name now.Do you want to specify such a directory now (yes/no) [no]?Is it okay to proceed with the update (yes/no) [yes]?/usr/cluster/bin/clresourcegroupcreate -n Node1,Node2 data-rgCommand completed successfully.Press Enter to continue: Jun 10 19:37:30 Node1 cl_runtime: NOTICE: Received non-interrupt heartbeat on Node1:net7 - Node2:net7.Do you want to add any network resources now (yes/no) [yes]?Select the type of network resource you want to add:1) LogicalHostname2) SharedAddressOption: 1If a failover resource group contains LogicalHostname resources, the most common configuration is to have one LogicalHostname resource for each subnet.How many LogicalHostname resources would you like to create [1]?Each network resource manages a list of one or more logical hostnames for a single subnet. This is true whether the resource is aLogicalHostname or SharedAddress resource type. The most commonconfiguration is to assign a single logical hostname to each networkresource for each subnet. Therefore, clsetup(1M) only supports thisconfiguration. If you need to support more than one hostname for agiven subnet, add the additional support using clresourcegroup(1M). Before clsetup(1M) can create a network resource for any logical hostname, that hostname must be specified in the hosts(4) file on eachnode in the cluster. In addition, the required network adapters mustbe actively available on each of the nodes.What logical hostname do you want to add? CFSAPIs it okay to proceed with the update (yes/no) [yes]?/usr/cluster/bin/clreslogicalhostname create -g data-rg -p R_description="LogicalHostname resource for CFSAP" CFSAPclreslogicalhostname: Failed to retrieve netmask for the given hostname(s)/IP(s). Will try again when the resource being brought online.Command completed successfully.Press Enter to continue:Do you want to add any additional network resources (yes/no) [no]?Do you want to add any data service resources now (yes/no) [yes]? noDo you want to manage and bring this resource group online now (yes/no) [yes]?/usr/cluster/bin/clresourcegroup online -M data-rgCommand completed successfully.Press Enter to continue:*** Resource Group Menu ***Please select from one of the following options:1) Create a resource group2) Add a network resource to a resource group3) Add a data service resource to a resource group4) Resource type registration5) Online/Offline or Switchover a resource group6) Suspend/Resume recovery for a resource group7) Enable/Disable a resource。

cluster用法及搭配

cluster用法及搭配一、“cluster”的基本用法“cluster”可作名词，也可作动词。

1. 作名词时- 表示“群，簇，丛”，例如：a cluster of stars（一群星星）；a cluster of grapes（一串葡萄）。

- 表示“（人或物的）组，团”，如：a cluster of tourists（一群游客）。

2. 作动词时- 意为“群集，聚集”，例如：People clustered around the stage.（人们聚集在舞台周围。

）二、“cluster”的固定搭配1. “cluster together”：聚集在一起。

例如：The sheep clustered together to keep warm.（羊儿们聚在一起取暖。

）2. “cluster around/round”：围绕，簇拥。

如：The children clustered around the magician, eager to see his next trick.（孩子们簇拥在魔术师周围，急切地想看他的下一个把戏。

）三、双语例句1. I saw a beautiful cluster of wildflowers in the meadow. (我在草地上看到了一簇美丽的野花。

)2. A cluster of balloons floated into the sky. (一串气球飘向天空。

)3. They found a cluster of ancient ruins in the forest. (他们在森林里发现了一组古代遗迹。

)4. The bees clustered on the hive. (蜜蜂聚集在蜂巢上。

)5. Look! There's a cluster of small shops over there. (看！那边有一群小商店。

)6. She noticed a cluster of people whispering in the corner. (她注意到一群人在角落里窃窃私语。

Sun Cluster3.1 双机命令简介

Sun Cluster3.1双机命令介绍目录Sun Cluster3.1双机命令介绍 (1)1 scswitch 命令介绍 (2)1.1 scswitch -z -g (2)1.2 scswitch -z -D (2)1.3 scswitch -S -h (2)1.4 scswitch -R -h (2)1.5 scswitch -m -D (2)1.6 组合命令1 (3)1.7 组合命令2 (3)1.8 组合命令3 (3)1.9 组合命令4 (3)2 scrgadm 命令介绍 (3)3 scstat 命令介绍 (4)4 关机命令介绍 (5)4.1 关闭双机 (5)4.2 关闭单机 (5)5 主机进入非群集状态的启动命令 (5)在进行双机命令介绍之前，对以下概念进行说明：node：节点。

指构成双机的两台服务器。

resource_grp：资源组。

在双机中把需要同在一台机子运行的资源组成一个组，在切换时使之一起切换。

1 scswitch 命令介绍1.1 scswitch -z -g【命令】scswitch -z -g resource_grp -h node【描述】把资源组resource_grp 转到新节点node 上。

1.2 scswitch -z -D【命令】scswitch -z -D device_group_name[,...]-h node【描述】把器件组（如磁盘阵列组）device_group_name 转到节点node上。

1.3 scswitch -S -h【命令】scswitch -S -h from_node【描述】把from_node 节点上的资源都关闭。

此后，该节点所有资源会切换到另外一个节点。

1.4 scswitch -R -h【命令】scswitch -R -h node[,...] -g resource_grp[,...]【描述】把node 上的资源组resource_grp 重新启动。

SUN系统培训文档

11
BootPROM的基本命令
命令：boot 启动系统命令，引导系统进入OS环境格式：boot device [filename] [options] device: disk、 net 、cdrom 关于options的说明： -a 执行一个交互式启动方式，可以指定启动时跟设备和交换设备（root and swap）和一些重要文件。/etc/system -r 执行一个重新配置的启动过程。在这过程中，系统会探测所有设备，并更新/devices和/dev以及/etc/path_to_inst目录和文件。 -s 引导系统到S运行级（单用户状态）。使用这种方式处理系统启动问题时非常的有用。 -v 系统启动时显示比较详细的信息。有时候使用者中启动方式对判断系统启动问题会有帮助。
9
多个系统板系统
10
BootPROM的基本命令
命令：banner Banner命令用来罗列出一些必要的信息，例如：OBP版本、内存总量、hostid等信息。 Ok banner Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 300MHz), Keyboard Present OpenBoot 3.11, 256 MB memory installed, Serial #10088994. Ethernet address 8:0:20:99:f2:22, Host ID: 8099f222. Ok

19
所有的变量值都变成默认值

使用set-defaults命令可以把所有NVRAM中的参数修改成为默认值： ok set-defaults

Setting NVRAM parameters to default values.

Sun cluster 3.1安装步骤

安装Sun cluster 3.11．环境准备硬件环境：两台Sun Sparc机器,一台Sun 存储软件环境：Solaris 10 11/06 ,Suncluster 3.1,EIS DVD,3.0.14’两台机器主机名：ecard-master，ecard-standby计划：安装oracle需要注意的问题：1、服务器（无论是sparc还是x86）不建议使用键盘鼠标及显示器，在sun cluster软件中有一个比较明显的bug，若服务器接键盘鼠标显示器，并且安装suncluster软件，普通操作都容易引起pm_tick超时报错，系统panic，然后重启。

并可能导致系统文件系统损坏2、两台服务器的OS的版本必须一致，否则可能导致一些问题3、ZFS功能在Sun Cluster3.1中不被支持，仅在Sun Cluster3.2版本中支持4、如果服务器选择安装Solaris10,在安装过程中，当安装程序询问网络服务是否被enable时，必须选择enable，否则sun cluster配置过程中，第二台服务器不能探测到第一台服务器。

如果安装过程中错误的disable了网络服务，可以使用#netservices open命令打开。

如果没有打开，则包括x-server,wemb等服务只能在本地访问，无法网络访问5、需要判断IPMP配置时，是否使用测试地址。

如果使用测试地址，则cluster系统需要额外4个地址，并且gateway必须可ping通，否则整个服务器的公网fail。

这样增加了复杂性，并且如果网关出问题，将直接导致cluster不可用。

如果不配置测试地址，配置比较简单，也减低了复杂性，减少了IP地址的数量。

但是服务器在启动过程中，会报告无测试地址，并且disable测试地址的探测动作6、在选择网口做公网和私网时，尽量公网和私网均跨越不同网卡，这样可以增加可用性7、如果存储设备为scsi设备，那么当一台服务器重启时，另一台服务器的/var/adm/messaegs文件中会出现scsi设备报警。

SUN服务器系统管理

14
系统管理工具
图形用户界面（GUI）下可以使用窗口菜单方式的系统管理工具：smc，它可以实现用户，软件包，串行接口，打印机等的管理
字符终端模式：控制台，串口终端，以及网络终端模式：
范例：login: root
Password:
Last login: Thu Feb 24 14:03:36 from 10.72.88.88
➢ 1973
UNIX的第4版诞生，内核和外壳用C语言重写而成。
➢ 1982
Bill Joy离开了伯克利，与人合资成立了Sun公司，基于
BSD开发并且发行了SunOS，后来又发行了Solaris
4
SUN solaris常见的版本和内核
OS版本和内核体系： ➢ 1. SPARC平台
Solaris 2.7(7) 64 bit+32 bit Solaris 5.8(8) 64 bit+32 bit Solaris 5.9(9) 64 bit+32 bit ➢ 2. INTEL X86平台 Solaris 5.8(8) 32 bit Solaris 5.9(9) 32 bit
安装步骤：将系统安装光盘放入光驱，在ok提示符号下，键入
ok >boot cdrom
装
表示从光驱启动系统，进行OS的安
然后根据安装要求进行系统参数的设置
13
系统的启动和关机
solaris系统的启动：
常见的启动主机系统的方式：
– 打开主机电源，主机加电自动启动系统
– Ok > boot
从默认设备启动系统
May
Sun Microsystems Inc. SunOS 5.9
15

安装实施及故障解决方案模板(metaset恢复)

模板编号：UEC/D-TSC-0014-1210Metaset 组数据恢复文档密级：普通公开北京荣之联科技股份有限公司版权所有侵权必究一、文档密级普通公开二、文档ID三、文档名称Metaset组恢复四、文档摘要Metaset SunCluster Recover五、更新/编写日期2012-11-5六、涉及产品SunCluster 3.x SVM七、涉及技术方面SunCluster 3.x SVM八、参考文档名称九、【问题简要描述】1.Sun Cluster无法导入原先的卷组。

2.Sun Cluster所以节点无法启动，使用其他单机系统接管存储，并接管存储。

3.从单节点迁移到Sun Cluster环境，保留原先的metaset组数据。

4.Metaset组如果修改名称，按照此方式重建，数据也不会丢失。

十、【原始问题或故障信息】Sun Cluster无法导入原先的卷组。

Sun Cluster所有节点无法启动，使用其他单机系统接管存储，并接管存储。

十一、【问题分析及原因】metaset组建立完毕后会在Sun Cluster中自动注册，did设备加入到metaset组中后，did 设备的0分区保存着数据，metadb中保存了状态信息，如果metadb丢失，只需要重新创建metadb,再按照原先的方式和顺序重新配置metaset，数据即可恢复。

十二、【如何解决或临时解决或不能解决】1.保存当前的节点的metaset信息，一定要保存好explorer。

# /usr/sbin/prtvtoc /dev/global/rdsk/d#s2 > /etc/lv.m/d#.vtoc# /bin/cp /etc/lvm/md.tab /etc/lvm/md.tab_ORIGINAL# /usr/sbin/metastat -p -s [setname] > > /e tc/lvm/md.tab2. 从Sun Cluster中删除此metaset组配置信息。

sun-cluster-3块仲裁盘损坏-无法进入集群处理方法

3块仲裁盘损坏，无法获取足够的投票进入集群。

1、把节点全都启动到单用户模式。

2、改动infrastructure文件把这些损坏的仲裁盘全都删除。

/etc/cluster/ccr/global/infrastructureroot@guosol10a3#cat infrastructureccr_gennum 28ccr_checksum FC57DE024BB398C8C3F70C95E02EF969 guohuiclcluster.state enabledcluster.properties.cluster_id 0x4F5E0FF9cluster.properties.installmode disabledcluster.properties.private_net_number 172.16.0.0cluster.properties.cluster_netmask 255.255.240.0cluster.properties.private_netmask 255.255.248.0cluster.properties.private_subnet_netmask 255.255.255.128 cluster.properties.private_user_net_number 172.16.4.0cluster.properties.private_user_netmask 255.255.254.0cluster.properties.private_maxnodes 64cluster.properties.private_maxprivnets 10cluster.properties.zoneclusters 12cluster.properties.auth_joinlist_type syscluster.properties.auth_joinlist_hostslist .cluster.properties.transport_heartbeat_timeout 10000cluster.properties.transport_heartbeat_quantum 1000cluster.properties.udp_session_timeout 480cluster.properties.cmm_version 1 guosol10a4cluster.nodes.1.state enabledcluster.nodes.1.properties.private_hostname clusternode1-priv cluster.nodes.1.properties.quorum_vote 1cluster.nodes.1.properties.quorum_resv_key 0x4F5E0FF900000001 e1000g1cluster.nodes.1.adapters.1.state enabledcluster.nodes.1.adapters.1.properties.device_name e1000g cluster.nodes.1.adapters.1.properties.device_instance 1cluster.nodes.1.adapters.1.properties.transport_type dlpizy_free 1cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_timeout 10000 cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_quantum 1000 cluster.nodes.1.adapters.1.properties.nw_bandwidth 80cluster.nodes.1.adapters.1.properties.bandwidth 70cluster.nodes.1.adapters.1.properties.ip_address 172.16.0.129 mask 255.255.255.128 0cluster.nodes.1.adapters.1.ports.1.state enabled e1000g2cluster.nodes.1.adapters.2.state enabledcluster.nodes.1.adapters.2.properties.device_name e1000g cluster.nodes.1.adapters.2.properties.device_instance 2cluster.nodes.1.adapters.2.properties.transport_type dlpizy_free 1cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_timeout 10000cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_quantum 1000 cluster.nodes.1.adapters.2.properties.nw_bandwidth 80cluster.nodes.1.adapters.2.properties.bandwidth 70cluster.nodes.1.adapters.2.properties.ip_address 172.16.1.1 mask 255.255.255.128 0cluster.nodes.1.adapters.2.ports.1.state enabledcluster.nodes.1.cmm_version 1 guosol10a3cluster.nodes.2.state enabledcluster.nodes.2.properties.quorum_vote 1cluster.nodes.2.properties.quorum_resv_key 0x4F5E0FF900000002 cluster.nodes.2.properties.private_hostname clusternode2-priv e1000g1cluster.nodes.2.adapters.1.properties.device_name e1000g cluster.nodes.2.adapters.1.properties.device_instance 1cluster.nodes.2.adapters.1.properties.transport_type dlpizy_free 1cluster.nodes.2.adapters.1.properties.dlpi_heartbeat_timeout 10000 cluster.nodes.2.adapters.1.properties.dlpi_heartbeat_quantum 1000 cluster.nodes.2.adapters.1.properties.nw_bandwidth 80cluster.nodes.2.adapters.1.properties.bandwidth 70cluster.nodes.2.adapters.1.properties.ip_address 172.16.0.130 mask 255.255.255.128 cluster.nodes.2.adapters.1.state enabled 0cluster.nodes.2.adapters.1.ports.1.state enabled e1000g2cluster.nodes.2.adapters.2.properties.device_name e1000g cluster.nodes.2.adapters.2.properties.device_instance 2cluster.nodes.2.adapters.2.properties.transport_type dlpizy_free 1cluster.nodes.2.adapters.2.properties.dlpi_heartbeat_timeout 10000 cluster.nodes.2.adapters.2.properties.dlpi_heartbeat_quantum 1000 cluster.nodes.2.adapters.2.properties.nw_bandwidth 80cluster.nodes.2.adapters.2.properties.bandwidth 70cluster.nodes.2.adapters.2.properties.ip_address 172.16.1.2 mask 255.255.255.128 cluster.nodes.2.adapters.2.state enabled 0cluster.nodes.2.adapters.2.ports.1.state enabledcluster.nodes.2.cmm_version 1 switch1cluster.blackboxes.1.state enabledcluster.blackboxes.1.properties.type switch 1cluster.blackboxes.1.ports.1.state enabled 2cluster.blackboxes.1.ports.2.state enabled switch2cluster.blackboxes.2.state enabledcluster.blackboxes.2.properties.type switch 1cluster.blackboxes.2.ports.1.state enabled 2cluster.blackboxes.2.ports.2.state enabledcluster.cables.1.properties.end1 cluster.nodes.1.adapters.1.ports.1 cluster.cables.1.properties.end2 cluster.blackboxes.1.ports.1 cluster.cables.1.state enabledcluster.cables.2.properties.end1 cluster.nodes.1.adapters.2.ports.1 cluster.cables.2.properties.end2 cluster.blackboxes.2.ports.1 cluster.cables.2.state enabledcluster.cables.3.properties.end1 cluster.nodes.2.adapters.1.ports.1 cluster.cables.3.properties.end2 cluster.blackboxes.1.ports.2 cluster.cables.3.state enabledcluster.cables.4.properties.end1 cluster.nodes.2.adapters.2.ports.1 cluster.cables.4.properties.end2 cluster.blackboxes.2.ports.2 cluster.cables.4.state enabledcluster.quorum_ d4cluster.quorum_devices.1.state enabledcluster.quorum_devices.1.properties.votecount 1cluster.quorum_devices.1.properties.gdevname /dev/did/rdsk/d4s2 cluster.quorum_devices.1.properties.path_1 enabledcluster.quorum_devices.1.properties.path_2 enabledcluster.quorum_devices.1.properties.access_mode scsi2cluster.quorum_devices.1.properties.type shared_disk cluster.quorum_ d5cluster.quorum_devices.2.state enabledcluster.quorum_devices.2.properties.votecount 1cluster.quorum_devices.2.properties.gdevname /dev/did/rdsk/d5s2 cluster.quorum_devices.2.properties.path_1 enabledcluster.quorum_devices.2.properties.path_2 enabledcluster.quorum_devices.2.properties.access_mode scsi2cluster.quorum_devices.2.properties.type shared_disk cluster.quorum_ d6cluster.quorum_devices.3.state enabledcluster.quorum_devices.3.properties.votecount 1cluster.quorum_devices.3.properties.gdevname /dev/did/rdsk/d6s2cluster.quorum_devices.3.properties.path_1 enabledcluster.quorum_devices.3.properties.path_2 enabledcluster.quorum_devices.3.properties.access_mode scsi2cluster.quorum_devices.3.properties.type shared_diskcluster.quorum_ d7cluster.quorum_devices.4.state enabledcluster.quorum_devices.4.properties.votecount 1cluster.quorum_devices.4.properties.gdevname /dev/did/rdsk/d7s2cluster.quorum_devices.4.properties.path_1 enabledcluster.quorum_devices.4.properties.path_2 enabledcluster.quorum_devices.4.properties.access_mode scsi2cluster.quorum_devices.4.properties.type shared_disk把损坏的硬盘都删除掉d4,d5,d6在guosol10a3，guosol10a4 上都这样。

SUN 3510硬盘故障处理过程

SUN 3510硬盘故障更换硬盘处理过程SUN 3510硬盘故障，更换硬盘处理过程SUN 3510JBOD在SUN Cluster环境更换硬盘有一套3510 JBOD + V890 x2组成的SUN Cluster环境，当3510 JBOD中有磁盘损坏时，不容易定位故障硬盘的槽位。

3510 JBOD设计时并未作为单独的JBOD使用，一般用作3510 RAID的扩展柜，挂接在RAID控制器里面控制，当出现磁盘故障时，可通过0-11，16-31这样的ID 号直接定位到每个硬盘的插槽号，处理起来比较方便。

另外许多磁盘在托盘把手上贴有该盘的序列号，通过iostat –En命令也可以找到该盘的位置，但是3510阵列所有硬盘的序列号都刚好贴在盘体上，把手盖住了序列号。

如果JBOD来使用，在操作系统内识别到的顺序基本不会跟插槽号一致，导致判断起来有一定困难。

因为硬盘损坏的时候有各种情况发生，不一定会亮黄灯，即使亮黄灯也不一定就表明该硬盘就是要更换的硬盘或者是已故障的硬盘，所以操作起来一定要十分小心。

在操作之前务必要求集成商或用户备份所有重要数据。

本次出现故障硬盘的3510 JBOD配置为146G*12满配，其中10块盘组成raid0+1，1块热备，1块投票。

当前为0，2，3，4，5，7，8，10，11绿灯在闪，表明在读写，一共九块盘。

当热备顶替了故障硬盘时应该是10块盘在闪。

另外，1号，9号盘常绿不闪，6号黄灯，如果要更换必须明确这三块盘的关系：故障硬盘，投票盘，热备盘（已顶替应该在读写闪烁）正常的硬盘顺序the chart below for an array with a boxid set to 0.targets0 3 6 91 4 7 102 5 8 11If the boxid is modified, just subtract (16*boxid) from the target to get the position in the chart:boxid = 1 target from 16 to 27boxid = 2 target from 32 to 43boxid = 3 target from 48 to 59boxid = 4 target from 64 to 75当前系统内SVM RAID组配置情况root@JT-SJ3-APP1 # metastat -s se3510 -tse3510/d209: Soft PartitionDevice: se3510/d100State: OkaySize: 104857600 blocks (50 GB)Extent Start Block Block count0 943718720 104857600se3510/d100: MirrorSubmirror 0: se3510/d101State: Okay Tue Sep 29 05:31:31 2009Submirror 1: se3510/d102State: Okay Thu Sep 15 06:58:50 2011Pass: 1Read option: roundrobin (default)Write option: parallel (default)Size: 1433391360 blocks (683 GB)se3510/d101: Submirror of se3510/d100State: Okay Tue Sep 29 05:31:31 2009Hot spare pool: se3510/hsp001Size: 1433391360 blocks (683 GB)Stripe 0: (interlace: 32 blocks)Device Start Dbase State Reloc Hot Spare Timed24s0 0 No Okay No Tue Sep 29 05:31:31 2009d8s0 0 No Okay No Fri May 23 15:09:55 2008d23s0 0 No Okay No Sat Mar 7 05:39:38 2009d10s0 0 No Okay No Fri May 23 15:09:55 2008d11s0 0 No Okay No Fri May 23 15:09:55 2008se3510/d102: Submirror of se3510/d100State: Okay Thu Sep 15 06:58:50 2011Hot spare pool: se3510/hsp001Size: 1433391360 blocks (683 GB)Stripe 0: (interlace: 32 blocks)Device Start Dbase State Reloc Hot Spare Timed12s0 0 No Okay No Fri May 23 15:09:59 2008d13s0 0 No Okay No d17s0 Thu Sep 15 06:58:50 2011d14s0 0 No Okay No Fri May 23 15:09:59 2008d7s0 0 No Okay No Wed Dec 29 23:02:40 2010d16s0 0 No Okay No Fri May 23 15:09:59 2008root@JT-SJ3-APP2 # scstat -q-- Quorum Summary --Quorum votes possible: 3Quorum votes needed: 2Quorum votes present: 3-- Quorum Votes by Node --Node Name Present Possible Status--------- ------- -------- ------Node votes: JT-SJ3-APP1 1 1 OnlineNode votes: JT-SJ3-APP2 1 1 Online-- Quorum Votes by Device --Device Name Present Possible Status----------- ------- -------- ------Device votes: /dev/did/rdsk/d6s2 1 1 Online当使用dd命令将d6s2数据读出时，1号插槽的盘在闪，当使用ctrl+c中断时，1号盘立即停止闪烁，这表明1号盘对应的应该就是投票盘d6s2当继续使用dd命令将其余所有did设备都读出时，6号盘继续黄灯无法闪烁，9号盘始终未见闪烁，其余九块硬盘均持续闪烁，这表明6号盘或者9号盘其一应该是坏盘。

Sun群集系统故障报告

XX市电视台Sun及oracle数据库系统故障报告编号：2007111507年11月四川XX信息产业有限责任公司目录第一章平台构成 (3)1.1硬件平台构成 (3)1.2操作系统平台: (4)1.3数据库平台: (4)1.4双机软件 (4)第二章故障现象与分析 (4)2.1现象描述 (4)2.2系统报错日志 (4)2.3现象分析 (7)第三章故障排查 (7)第四章故障定性与定位 (8)第五章故障预防、杜绝与实施条件 (13)前言2007年11月15日下午4时，系统工作不正常。

四川XX 信息产业有限责任公司与XX 电视台信息中心关技术人员一道对Sun 服务器、2套磁盘阵列，两套数据库、Sun cluster系统及RAC 做了详细的信息收集、故障分析、故障排查。

现小节如下：第一章平台构成1.1 硬件平台构成包括2套Sun Fire 490服务器、2套Sun Storedge3310磁盘阵列构成。

拓扑图如下所示:服务器配置每台服务器Sun FireV490配置：CPU 个数: 2 CPU 类型: Sun UltraSparc IV 双核CPU 速度: 1050 Mhz ，L2 Cache16MB/CPU ，内存:8GB MB,内置2个72GB FC-AL 硬盘DVD 光驱，4个千兆以太网口、一块双通道Ultra320 SCSI 卡。

2个电源系统。

3310磁盘阵列以态网IPMP V490服务器 V490服务器SCSI 电缆磁盘阵列配置：两套Sun StorEdge3310 SCSI盘阵，每套配置如下：512MB Cache的单阵列控制器，5X72GB 10 K rpm SCSI硬盘，分布通道0,ID号8~12,其中8~11做RAID5卷，12号盘为热备盘。

2个电源系统。

1.2 操作系统平台:系统运行64位Sun Solaris 9操作系统,Sun OS 5.9,OS内核版本Generic_118558-17。

SUN小型机信息查看命令集

下面是查看SUN小型机的信息的命令集：1、查看机型：SUn的小型机的机型都在面板上写着有，如NETRA T 1125，还有比如utra 5,utra 10等等。

2、查看cpu个数 (错误，不正确，因为sun中的top命令不能完全看到所有的cpu情况，与HP用法也不一样)#topCPU states: 99.3% idle, 0.1% user, 0.6% kernel, 0.0% iowait, 0.0% swap表示只有一个cpu正确方法：用dmesg |grep cpu便可以看到正确的 cpu个数了。

3、查看内存#dmesg |grep memmem = 2097152K (0x80000000)avail mem = 20877393924、查看磁盘的个数#vxdisk listDEVICE TYPE DISK GROUP STATUSc0t0d0s2 sliced - - errorc0t0d0s7 simple c0t0d0s7 rootdg onlinec1t0d0s2 sliced - - onlinec1t1d0s2 sliced smpdg2 smpdg onlinec1t2d0s2 sliced smpdbdg1 smpdbdg onlinec2t0d0s2 sliced - - onlinec2t1d0s2 sliced smpdg1 smpdg onlinec2t2d0s2 sliced smpdbdg2 smpdbdg online5、如何查看文件系统#df -kFilesystem kbytes used avail capacity Mounted on/dev/dsk/c0t0d0s0 4032142 1050675 2941146 27% //proc 0 0 0 0% /procfd 0 0 0 0% /dev/fd/dev/dsk/c0t0d0s6 7304977 29 7231899 1% /home/dev/dsk/c0t0d0s5 4032142 402929 3588892 11% /optswap 3418392 32 3418360 1% /tmp/vol/dev/dsk/c0t6d0/informix201730 201730 0 100% /cdrom/informix/dev/vx/dsk/smpdg/smpdg-stat1055 9 941 1% /smpwork/dev/vx/dsk/smpdg/lv_smp17336570 128079 17035126 1% /sms6、查看卷组、逻辑卷的位置#cd /dev/vx/dsk/比如smpdg等等都在该目录下了，然后再进入某个卷组目录就可以看到该卷组下面的逻辑卷了。

详解地铁清分系统

详解地铁清分系统广州地铁自动售检票系统及其清分系统在实际运行中情况良好，系统达到了不间断工作的目标，为广州地铁的整个AFC系统提供持续稳定的服务。

随着信息化技术应用的不断深入，人们对计算机系统高可用性（High Availability）的要求不断提高，特别是企业基于数据库的关键业务系统，不仅希望保障关键业务数据信息的完整，而且希望联机应用能够不间断或者在最短的时间内自动恢复。

AFC系统及其清分系统简介广州地铁自动售检票系统（AutomaticFareCollection，以下简称AFC）是基于计算机、通信、网络、自动控制等技术，实现城市轨道交通售票、检票、计费、收费、清分、管理等全过程的自动化系统。

目前，广州地铁自动售检票系统共分为车票、车站终端设备、车站计算机系统、线路中央计算机系统、清分系统五个层次（如下表所示）。

同时负责连接广州地铁AFC系统与城市一卡通清分系统，规定了对车票管理、票务管理、运营管理和系统维护管理的技术要求。

主要用于广州市轨道交通各条线路之间，与公交系统、银行系统及其他相关系统之间的清算分账、车票交易数据的处理及统计分析。

同时还具备对线路自动售检票（AFC）系统设备运营管理的功能。

远期定位于整个广州市及珠江三角洲城际轨道交通系统的清分中心和AFC运营管理中心。

方案选择和系统现状高可用性可选用的方案较多，如依赖于硬件的容错机方式、群集方式（双机或多机群集系统）、数据复制方式等。

广泛采用的群集方式（Cluster系统），其基本原理可以概括为：同一群集内的节点机所有关键业务数据存储于共享磁盘组，通常是磁盘阵列；故障节点被其它节点替换时，故障节点管辖的数据所在的数据设备（共享磁盘组的一部分）被接管；节点替换/接管的时机决定于集群内运行的监视软件；节点机上运行数据库管理系统，管理该节点机控制的设备上的数据；客户应用可以使用机群中的一个或多个数据库服务器。

节点机的替换意味着节点上运行的数据库管理系统进程的切换，这些过程是在服务器后台完成的，对于前端应用是透明的。

SUN小型机信息查看命令集

下面是查看SUN小型机的信息的命令集：1、查看机型：SUn的小型机的机型都在面板上写着有，如NETRA T 1125，还有比如utra 5,utra 10等等。

Hitachi HDLM在Sun Cluster环境下的安装与配置

Hitachi HDLM在Sun Cluster环境下的安装与配置HDLM(Hitachi Dynamic Link Manager)软件是Hitachi公司开发的用于管理多路径的I/O设备,优化和平衡I/O负载,确保具备冗余链路的I/O设备不会因为单条链路的失效而导致数据无法读取.HDLMl软件主要用在Hitachi公司自己生产的中高端存储上,如HDS9900系列等.HDLM的主要功能:o负载均衡o链路平滑切换o链路failback:当失效的I/O链路被修复时,HDLM将该链路重新启用,并参与链路的failover和负载均衡.o链路健康检测:HDLM软件自动检测I/O链路状态,确保每条链路都是”健康的”.o错误管理HDLM支持Sun Cluster,Veritas Cluster,VxVM,SDS,SVM等软件环境.但并不是所有的I/O 设备都能够被HDLM管理.HDLM只能管理Hitachi存储系统映射到OS中的sd或ssd设备.HDLM 不支持以下I/O设备:o Hitachi存储系统command设备,如CCI command设备o非Hitachi存储系统映射到OS的sd或ssd设备o主机中内嵌的磁盘设备o非磁盘类的设备,如磁带等o系统引导盘当安装了HDLM软件,并且该软件生效时,它将移除原先的sd或ssd逻辑设备,并创建新的逻辑设备文件名.举例说明:主机有两条路径能够到达Hitachi存储系统,假如存储中的某一个LU在主机中被识别为c2t1d1s0和c3t2d1s0(由于多路径的缘故,实际这两个LUN是映射到存储中的同一个LU),在未安装HDLM软件前,操作系统将使用该设备名访问存储,但在安装了HDLM软件后,它将这些设备文件名删除,并创建一个新的设备文件名,如c4t1d1s0,它包含了两条到达存储系统的物理路径.当HDLM和VxVM DMP在Cluster环境下并存时,DMP不能够被禁用,但必须在DMP层禁用HDLM 管理的IO设备,确保DMP只能看到一条主路径.以下步骤将说明Sun Cluster环境下HDLM软件的安装及配置.硬件环境:节点:V890,每个节点安装了两块Qlogic HBA存储:SE9980V,MicroCode:21-07-29-00/00SE9980V Modes for SC3.x 设置:Mode 185:设置成”on”Mode 186:不能设置成”on”软件环境:Cluster:SC3.1 U4VxVM:4.0HDLM:05-09 (版本:5.9)分配的LUNs: 6个LUNs,每个LUN有2条路径,在OS中的对应的设备逻辑名为:c2*,c4*:# format12. c2t50060E80039CA210d0 <HITACHI-OPEN-V-SUN-2114 cyl 36359 alt 2 hd 15 sec 128> /pci@9,600000/SUNW,qlc@1/fp@0,0/ssd@w50060e80039ca210,0…24. c4t50060E80039CA200d0 <HITACHI-OPEN-V-SUN-2114 cyl 36359 alt 2 hd 15 sec 128> /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w50060e80039ca200,0…基本的软件安装/配置顺序为:1安装/配置Cluster -->2安装Volume Manager并封装根盘 --> 3安装VxVM ASL For Hitachi 软件包 --> 4安装并配置HDLM软件 --> 5在VxVM DMP层禁用HDLM所管理的IO多路径设备第1和第2步按常规安装即可,在Cluster安装及VM完成后,Cluster锁盘设置在共享盘Hitachi LUN上,以下的过程为第3步以后的操作.在安装HDLM之前系统的状态:- 从存储分配至主机的LUNs已被操作系统正确识别.- Cluster及VxVM已安装完毕.- 所有的LUN均由VM DMP进行管理.- 锁盘(Quorum Device)已设置,安装模式被禁用.step1.安装VxVM ASL for Hitachi软件包.# pkgadd –d . HTC-ASLstep2. 由于HDLM安装时要求所有被其管理的磁盘没有被使用(即完全处于空闲状态),对于在Cluster环境下,即是这些磁盘上没有任何scsi key驻留,所以需要将锁盘从Cluster中删除,并消除其它盘上的scsi key(假如存在).要删除Cluster中的最后一个锁盘,必须遵循以下步骤:(1) 在主节点上(假设该节点为节点1,本例中为syclt1)将Cluster置于安装模式# scconf -c -q installmode(2) 将另一个节点(节点2)停止至OBP状态下root@syclt2 # sync;sync;shutdown -y -g0 -i0(3) 在主节点上将节点2置于维护模式,并删除最后一个锁盘# scconf -c -q node=<node2>,maintstate# scconf -r -q globaldev=d36(4) 将节点2启动并加入到Cluster中此时Cluster的状态为:step3. 在两个节点安装HDLM 5.9软件包.建议将HDLM的License文件保存至”/var/tmp/hdlm_license”文件中.step4. 在两个节点分别运行”dlmsetconf”命令配置来自存储的LUNs,并重新启动系统.# /opt/DynamicLinkManager/bin/dlmsetconf该命令将搜索系统中所有的Hitachi LUNs,将生成如下两个配置文件:o/kernel/drv/dlmfdrv.conf:保存HDLM所管理的目标设备的配置信息o/opt/DynamicLinkManager/config/dlmfdrv.unconf:如果不需要HDLM管理某个LUN,需要将该LUN的相关配置信息从该文件中删除,然后执行"dlmsetconf -u"命令并重新启动系统.# reboot -- -r节点启动顺序:先将节点2引导至OBP状态下,然后重新启动节点1,待其启动后,再启动节点2,即:root@syclt2 # sync;sync;init 0# sync;sync;reboot -- -r等节点1 syclt1启动后,启动节点2.ok boot -rstep5. 节点正常启动后,HDLM将会把系统原有的设备文件(即以c2和c4开头的设备文件删除,并虚拟出自己的设备文件(为c6*为c7*).step6. 由于使用HDLM来管理LUN,所以需要将这些LUN从VM DMP中隔离,也就是不需要VM DMP的干预,以免引起冲突(这个过程需要在两个节点上都执行).如上所示，以c2、c4、c7开头的设备均为虚假设备，需要从VM DMP中排除，剩下以c6开头的设备需要从VM DMP中移出一条路径。

sun服务器硬件巡检

输出的Status列，所有值正常情况下是” ONLINE”或” STAND-BY”，其它的值都是不正常的；
#sccli> show logical-drives
输出的Status列，其值正常情况下是” Good”，其它的值都是不正常的；
#sccli>show enclosure-status
prtdiag -v
dmesg | more
uptime
date
磁盘阵列管理
#sccli
sccli: selected device /dev/rdsk/c2t0d0s2 [SUN StorEdge yyyy SN#08472F]
#sccli> show disks
输出的Status是不正常的；
#sccli>show FRUs
输出的FRU Status行，所有值正常情况下是”OK”，其它的值都是不正常的；
#sccli> show peripheral-device-status
prtvtoc /dev/dsk/c0t0d0s0
more /etc/vfstab
more /etc/system
metadb 查看卷信息 SVM
metastat -p
metastat
vmstat 2 5
VXVM管理
1、df -k 检查文件系统
2、prtdiag -v|more 检查设备
3、vxprlnt 检查物理卷
4、swap -s 检查交换分区
5、prtconf 检查配置
6、pkginfo -l 检查软件
9、uname -a 主机名
10、date 时间