Sunday, October 7, 2012

CLUSTER ADMINISTRATION




OCR Updation: Three utilities to perform OCR updates
1.       SRVCTL (recommended) – remote administration utility
2.       DBCA (till 10.2)
3.       OEM
4.        
SRVCTL – Service Control:
-          It is the most widely used utility in RAC environment
-          It is used to perform administration & control of OCR file

Registry sequence of services into OCR:
1.       Node applications                (automatically done in 11.2)
2.       ASM instances                      (automatically done in 11.2)
3.       Databases
4.       Database instances
5.       Database services.

Note: To unregister you have to follow in reverse order.





OLR – Oracle Local Registry

-          Both OLR & GPNP profile needed by lower/HAS stack & OCR, VD is needed by upper / CSR Stack.
-          If OLR or GPNP got corrupted, the corresponding node will go down where as if OCR, VD gets corrupted the complete Cluster will go down.
-          Every daemon of the node will communicate with the peer (same) daemon nodes.
-          Oracle availability perform OLR backup at the time of execution of root.sh scrIPt of grid infrastructure installation & stores in the location “$ GRID_HOME/cdata//backup_date_time.olr”.
-          The default location of OLR file is “$ GRID_HOME/cdata/.olr”.

OLR Backup: Using root user

$ G_H# ./ocrconfig – local – manual backup
$ G_H# ./ocrconfig – local – backuploc
$ G_H# ./ocrcheck - local

Restoring OLR:
-          Bring the init level into either init1 or init2
-          Stop the cluster in the specific node
-          Restore the OLR from the backup location “# ./ocrconfig – local – restore
-          Start cluster
-          Change the init level to either 3 or 5 (init 3 for CLI and init 5 for GUI mode)



-           
OCR – Oracle cluster registry or repository


It is a critical & shared Clusterware file and contains the complete cluster information like cluster node name, their corresponding IP’s, CSS parameters, OCR autobackup information & registered resources like nodeapps, ASM instances with their corresponding node names, databases & database instances & database services

CRSD daemon is responsible for updating OCR file whenever the utilities like srvctl, dbca, oem, netca etc.

CSSD daemon automatically brings up online all the cluster resources which got registered in OCR file

To know the OCR location
# ./ocrcheck                                                        // disk ocr location
# cat /etc/oracle/ocr.loc                                    // In linux & HP-UX
# cat /var/opt/oracle/ocr.loc                            // in Solaris & IBM-AIX

OCR Backup method: 3 ways to perform backup
1.       Automatic
2.       Physical
3.       Logical

1.       Automatic:
Oracle automatically perform OCR backup for every regular interval of 4 hrs since the CRS start time and stores in master node.

Identifying the master node:
# vi $ G_H/log//crsd/crsd.log

I AM THE NEW OCR MASTER
OR
THE NEW OCR MASTER NODE IS

Backup location:
$ G_H/cdata/
                Backup00.ocr (latest)
                Backup01.ocr
                Backup02.ocr
                Day.ocr
                Week.ocr

Oracle retains the latest three 4 hours backup, similarly one latest day backup and one latest week backup by purging all the remaining backup.

Note: It is no possible to change the automatic backup interval time

Manual Backup:

# ./ocrconfig – manual backup
                (it will create backup in default location $ G_H/cdata//backup_date_time.ocr)
# ./ocrconfig – backuploc (recommended is shared storage)

Restoring OCR:
-          Stop the complete cluster on all the nodes “# ./crsctl stop crs”
-          Identify the latest backup (backup00.ocr)
-          Restore the backup “ # ./ocrconfig – restore
-          Start the cluster in all the nodes
-          Check the integrity of the restored OCR backup “# ./cluvfy comp ocr –n all –verbose”
-           
2.       Physical backup: Oracle supports image or sector level backup of OCR using dd utility(if OCR in on raw devices). & cp,(if OCR is on general file system)

# ./ cp
# dd if= of= //if: input file, of: output file.

Restoring:
# ./ cp
# dd if= of= //if: input file, of: output file.

3.       Logical backup:
# ./ocrconfig – export
# ./ocrconfig – import

Note: Oracle recommends taking the backup of OCR file whenever the cluster configuration got modified (ex: adding a node/ deleting a node)

OCR Multiplexing: To avoid OCR lost &the complete cluster goes down due to the single point of failure (SPF) of OCR, Oracle supports OCR multiplexing from 10.2 onwards in max 2 locations (1 as primary other as mirror copy) but from 11.2 onwards it is supporting max 5 locations (1 as primary and remaining as mirror copies)

Note: from 11.2 onwards, oracle support storage of OCR in ASM diskgroups so it provides mirroring depending on the redundancy level.

GPNP – Grid Plug n Play Profile:

-          It contains basic cluster information like location of voting disk, ASM spfile location, all the IP addresses and their subnet masks
-          This is a node specific file
-          It is and xml formatted file
Backup loc: $ G_H/gpnp//profile/peer/profile.xml
Actual loc: $ G_H/gpnp/profile/peer/profile.xml

Voting Disk (VD):


-          It is another & shared file which contains the node membership of all the nodes within the cluster
-          CSSD Daemon is responsible for sending the heartbeat messages to other nodes for every 1 sec and write the response into VD

VD Backup:
-          Oracle supports only physical method to take the backup of VD.
-          From 11.2 onwards, oracle not recommend to take the backup of VD because it automatically maintains VD backup into OCR file

Restoring VD:
1.       Stop the CRS on all the nodes
2.       Restore the VD “# ./crsctl –restore vdisk
3.       Start the CRS on all the nodes
4.       Check the integrity of restored VD. “# ./cluvfy comp vdisk –n all verbose”

VD Multiplexing: To avoid VD lost and the complete cluster goes down, due to SPF of VD, oracle supports multiplexing of VC from 10.2 onwards in max 31 locations, but from 11.2 it is supporting in max 15 locations.

Node Eviction:






It is the process of automatically rebooting a cluster node due to private network or VC access failure to avoid data corruption.
If node1 & node2 can communicate with each other but not with the node3 through private network, a split syndrome can occur for the formation of 2 sub cluster and try to master a single resource their by having data corruption. To avoid this split blind syndrome, the master node evicts the corresponding node by the handshake node membership information of D.

CSS Parameter:
1.       Miscount: default 30 sec: It specifies the maximum private network latency to wait before triggering node eviction process by the master node.
2.       Disk timeout: Default is 200 sec: It specifies the VD access latency if elapsed to have node eviction process by the master node.
3.       Reboot Time: default 3 sec: The affected node waits till the reboot time elapsed for actual node reboot process (this is to make some 3rd party application goes down properly)

No comments:

Post a Comment