Sunday, October 7, 2012

CLUSTER ADMINISTRATION




OCR Updation: Three utilities to perform OCR updates
1.       SRVCTL (recommended) – remote administration utility
2.       DBCA (till 10.2)
3.       OEM
4.        
SRVCTL – Service Control:
-          It is the most widely used utility in RAC environment
-          It is used to perform administration & control of OCR file

Registry sequence of services into OCR:
1.       Node applications                (automatically done in 11.2)
2.       ASM instances                      (automatically done in 11.2)
3.       Databases
4.       Database instances
5.       Database services.

Note: To unregister you have to follow in reverse order.





OLR – Oracle Local Registry

-          Both OLR & GPNP profile needed by lower/HAS stack & OCR, VD is needed by upper / CSR Stack.
-          If OLR or GPNP got corrupted, the corresponding node will go down where as if OCR, VD gets corrupted the complete Cluster will go down.
-          Every daemon of the node will communicate with the peer (same) daemon nodes.
-          Oracle availability perform OLR backup at the time of execution of root.sh scrIPt of grid infrastructure installation & stores in the location “$ GRID_HOME/cdata//backup_date_time.olr”.
-          The default location of OLR file is “$ GRID_HOME/cdata/.olr”.

OLR Backup: Using root user

$ G_H# ./ocrconfig – local – manual backup
$ G_H# ./ocrconfig – local – backuploc
$ G_H# ./ocrcheck - local

Restoring OLR:
-          Bring the init level into either init1 or init2
-          Stop the cluster in the specific node
-          Restore the OLR from the backup location “# ./ocrconfig – local – restore
-          Start cluster
-          Change the init level to either 3 or 5 (init 3 for CLI and init 5 for GUI mode)



-           
OCR – Oracle cluster registry or repository


It is a critical & shared Clusterware file and contains the complete cluster information like cluster node name, their corresponding IP’s, CSS parameters, OCR autobackup information & registered resources like nodeapps, ASM instances with their corresponding node names, databases & database instances & database services

CRSD daemon is responsible for updating OCR file whenever the utilities like srvctl, dbca, oem, netca etc.

CSSD daemon automatically brings up online all the cluster resources which got registered in OCR file

To know the OCR location
# ./ocrcheck                                                        // disk ocr location
# cat /etc/oracle/ocr.loc                                    // In linux & HP-UX
# cat /var/opt/oracle/ocr.loc                            // in Solaris & IBM-AIX

OCR Backup method: 3 ways to perform backup
1.       Automatic
2.       Physical
3.       Logical

1.       Automatic:
Oracle automatically perform OCR backup for every regular interval of 4 hrs since the CRS start time and stores in master node.

Identifying the master node:
# vi $ G_H/log//crsd/crsd.log

I AM THE NEW OCR MASTER
OR
THE NEW OCR MASTER NODE IS

Backup location:
$ G_H/cdata/
                Backup00.ocr (latest)
                Backup01.ocr
                Backup02.ocr
                Day.ocr
                Week.ocr

Oracle retains the latest three 4 hours backup, similarly one latest day backup and one latest week backup by purging all the remaining backup.

Note: It is no possible to change the automatic backup interval time

Manual Backup:

# ./ocrconfig – manual backup
                (it will create backup in default location $ G_H/cdata//backup_date_time.ocr)
# ./ocrconfig – backuploc (recommended is shared storage)

Restoring OCR:
-          Stop the complete cluster on all the nodes “# ./crsctl stop crs”
-          Identify the latest backup (backup00.ocr)
-          Restore the backup “ # ./ocrconfig – restore
-          Start the cluster in all the nodes
-          Check the integrity of the restored OCR backup “# ./cluvfy comp ocr –n all –verbose”
-           
2.       Physical backup: Oracle supports image or sector level backup of OCR using dd utility(if OCR in on raw devices). & cp,(if OCR is on general file system)

# ./ cp
# dd if= of= //if: input file, of: output file.

Restoring:
# ./ cp
# dd if= of= //if: input file, of: output file.

3.       Logical backup:
# ./ocrconfig – export
# ./ocrconfig – import

Note: Oracle recommends taking the backup of OCR file whenever the cluster configuration got modified (ex: adding a node/ deleting a node)

OCR Multiplexing: To avoid OCR lost &the complete cluster goes down due to the single point of failure (SPF) of OCR, Oracle supports OCR multiplexing from 10.2 onwards in max 2 locations (1 as primary other as mirror copy) but from 11.2 onwards it is supporting max 5 locations (1 as primary and remaining as mirror copies)

Note: from 11.2 onwards, oracle support storage of OCR in ASM diskgroups so it provides mirroring depending on the redundancy level.

GPNP – Grid Plug n Play Profile:

-          It contains basic cluster information like location of voting disk, ASM spfile location, all the IP addresses and their subnet masks
-          This is a node specific file
-          It is and xml formatted file
Backup loc: $ G_H/gpnp//profile/peer/profile.xml
Actual loc: $ G_H/gpnp/profile/peer/profile.xml

Voting Disk (VD):


-          It is another & shared file which contains the node membership of all the nodes within the cluster
-          CSSD Daemon is responsible for sending the heartbeat messages to other nodes for every 1 sec and write the response into VD

VD Backup:
-          Oracle supports only physical method to take the backup of VD.
-          From 11.2 onwards, oracle not recommend to take the backup of VD because it automatically maintains VD backup into OCR file

Restoring VD:
1.       Stop the CRS on all the nodes
2.       Restore the VD “# ./crsctl –restore vdisk
3.       Start the CRS on all the nodes
4.       Check the integrity of restored VD. “# ./cluvfy comp vdisk –n all verbose”

VD Multiplexing: To avoid VD lost and the complete cluster goes down, due to SPF of VD, oracle supports multiplexing of VC from 10.2 onwards in max 31 locations, but from 11.2 it is supporting in max 15 locations.

Node Eviction:






It is the process of automatically rebooting a cluster node due to private network or VC access failure to avoid data corruption.
If node1 & node2 can communicate with each other but not with the node3 through private network, a split syndrome can occur for the formation of 2 sub cluster and try to master a single resource their by having data corruption. To avoid this split blind syndrome, the master node evicts the corresponding node by the handshake node membership information of D.

CSS Parameter:
1.       Miscount: default 30 sec: It specifies the maximum private network latency to wait before triggering node eviction process by the master node.
2.       Disk timeout: Default is 200 sec: It specifies the VD access latency if elapsed to have node eviction process by the master node.
3.       Reboot Time: default 3 sec: The affected node waits till the reboot time elapsed for actual node reboot process (this is to make some 3rd party application goes down properly)

ASM – Automatic Storage Management


Files Systems:



Disadvantages of Raw Devices:
1.       It supports storage of only 1 file in only 1 raw device. Hence archive redo logs files & flashback logs which are generated numerously are not suitable member for raw devices.
2.       General O/S commands like cp, ls, mv, du etc will not work in Raw Devices.
3.       Only dd (diskdump) is used for format, backup, restore the raw devices
4.       Raw devices will not support collection of I/O statistics
5.       They cannot be resize online
6.       In Linux environment, out of 15 partitions we can use only 14 for creation of raw devices, In Solaris we can use only 6 out of 7 partitions per a disk


To overcome all the disadvantages we use LVM (Logical volume manager) `
1.       It is a logical storage area which is created by collection of multiple disk partitions onto which we can create any type of file system.
2.       It supports storage of multiple file in a single volume.
3.       Online resizing is possible
4.       Supports collection of I/O statistics
5.       It improves the I/O performance & availability with the help of software level RAID techniques*.

Types of LVMs & Vendors:
LVMs
Vendors
1.       VERITAS Volume Manager
2.       Tivole Volume Manager
3.       Sun Volume Manager (SVM)
4.       ASM(from oracle 10g)
Symantec
IBM
Oracle SUN
Oracle


ASM:
-         It is a type of LVM supported from oracle 10g and has a special type of instance. INSTANCE_TYPE=ASM. & has a small footprint of SGA with size 100-128 MB.
-          It supports for creation of logical volume known as disk groups, internally uses both strIPing and mirroring.
-          Hence it does not have any control file to mount, so its least and last stage is nomount. It has to mount the diskgroups
-          Diskgroup is a logical storage area which is created by collection of multiple disk partitions.
-         ASM supports storage of multiple database related files like control files, redo, data, archive logs, flashback logs, RMAN backup pieces, spfile etc. but it will not support the storage of static files like pfile, listener.ora. tnsnames.ora sqlnet.ora etc.
-         From 11.2 onwards, by using ADVM (ASM dynamic volume manager) & ACFS (ASM cluster file system) we can store static files also.

Note: Sometimes ASM instance may contain large pool also.
-          1 ASM instance will support creation of multiple disk groups and will provide services to multiple clients.

ASM Clients: These are general DB instances which are dependent on ASM instance in order to access the diskgroups.




ASM Instance Background processes:

RBAL – Rebalance Master: It is responsible for managing and coordinating the disk group activities and also responsible for generating the plans for even distribution of ASM instance (extends) for better load balancing whenever a new disk is added and removed.

ARBn – ASM Rebalancer: It is a slave process of RBAL background process and it is responsible for actual load balancing of ASM Disks.


ASMB – ASM Background: It is responsible for successful establishment of communication channel between ASM instance & ASM clients.

GMON – Global Monitor: It is responsible for coordinating the disk group activities whenever a disk group becomes offline or drop.

KATE – Konductor for ASM Temporary Errand: It is responsible for making online for disk groups.

ASM client Background Processes:

RBAL – Rebalance Master: It is responsible for successful opening and closing the diskgroups whenever a read or write operations occur.

PZ9X: It is responsible for gathering the dynamic views information globally across all the instances of database.

ASM related dynamic views:
In RAC environment, all the dynamic views start with gv$ , in non-RAC g$
1.       gv$ asm_disk
2.       gv$ asm_diskgroups
3.       gv$ asm_io_stat
4.       gv$ asm_clients
5.       gv$ asm_template (total 19 views)


  
ASM in RAC Environment:

CLUSTER COMPONENTS




OHASD - Oracle High Availability Services Daemon:
-          It is the first and only daemon which is going to start by parent init process and in turn it is responsible for starting some other agents & daemon by reading OLR (oracle local registry file)
-          It needs access to OLR file that contains the startup sequence of other child daemons.

CSRD – Cluster Ready Services Daemon:
-          It is responsible for maintain the cluster configuration and HA (High availability) operations by reading the OCR file (oracle cluster registry)
-          OCR file contains the complete cluster information which is required for CRSD Daemon.

CSSD – Cluster Synchronization Services Daemon:
-          It is responsible for updating node membershIP of all the nodes within the cluster into VD (Voting Disk)
-          In Non-RAC environment, CSSD daemon is responsible for maintaining the communication between ASM instance and ASM Clients database instances.

VD – Voting Disk:
-          It consist the updated information of all the cluster nodes.
-          Both OCR & VD requires 280 MB of space

EVMP – Event Manager Daemon:
It is responsible for publishing & subscribing the events which are generated by CRSD Daemon to the other nodes


OCTSSD – Oracle Cluster Time Synchronization Services Daemon:
-          It is responsible for maintaining the consistency in time
-          It has two modes
o    Observer : if NTP (network time protocol) is enabled
o    Active : if NTP is disabled

GMNPD

GSD: From 10g it is duplicated, it is responsible for performing the administrative tasks whenever GUI application like NETCA or DBCA invoked.

ONS – Oracle Notification Server: It is responsible for publishing the notification events thru FAN (Fast Application Notifications)

VIP – Virtual IP:
-          It is registered as a resource into OCR and maintained the status into OCR.
-          For Release 2 onwards, we require every node specific one private IP in one subnet mask and one public IP & VIP on other subnet mask and 3 unused scan VIP’s on the same subnet mask of public.


Saturday, October 6, 2012

RAC Architecture




Non RAC Specific
Parameters
RAC Specific
Parameters
MMON
LMSN – Global cache service process
SMON
LMON – Global Enqueue Server Monitor
PMON
LMD – Global Enqueue service Daemon
CKPT
LCK – Lock Process
LGWR
DIAG - Diagnostic
DBWR

RECO

ARCH

MMAN

MMNL


Private Interconnect: It is a high bandwidth & low latency communication setup used for transferring cluster specific heartbeat messages.
In RAC environment, cluster Interconnect can also be used for some high level operations like:
1.       Used for indentifying the health, status and message synchronization between all the nodes of the cluster.
2.       Used for maintaining the global resource lock request within the cluster.
3.       Heavily used for transferring the oracle data blocks from one instance buffer cache to other instance buffer cache.
4.       The more the inter instance update will increase more traffic on the network, that’s why is it recommended to have high bandwidth for the cluster interconnect and that could be vendor specific

Global Resource Directory- GRD:
1.       It is new memory component and is a part of shared pool and we can find only in RAC specific instances.
2.       Oracle automatically maintains GRD consistency among all the nodes of the cluster
3.       GRD maintains metadata information of the blocks which are present in database buffer cache(DBBC)
4.       GRD Information like:
a.        SCN #
b.       DBA- Data Block Address/ Data Block Identifier
                                                               i.      It is a combination of file id + block id
c.        Location of the most recent version of the block.
d.       Mode of the block:
                                                               i.      NULL(N) : It indicates that no access rights are available and the block can be accessed locally
                                                              ii.      SHARED (S): it indicates that access rights are available and block can be accessible by multiple instances.
                                                            iii.      EXCLUSIVE(X): It indicates that block can be accessible by only one instance exclusively, basically during DML operations.
e.       Role of the block:
                                                               i.      LOCAL(L): It indicates that the block image is present in only one instance
                                                              ii.      GLOBAL (G): it indicates that the block image is present in multiple instances.
f.         Type of the block image:
                                                               i.      Current (CURR)
                                                              ii.      Consistent read (CR)
                                                            iii.      Past image (PI)


GRD Example:

SCN#
DBA
Location
Mod of the block
Role of the block
Type of the image
U1 (user1 : Select statement from instance 1) N1 (node1)
125
1521
(fileid:15, block id:21)
N1 – Node1
Null (N)
Local (L)
CURR
U1 (User1: issued an update statement)
128
1521
N1
N à X
Null to exclusive mode
L
PI-Past image

CURR à CR

Cache fusion
U2(user2: update statement on same table form node2)
135
1521
N2
XàSàX
G
CURRà CR
               
GRD internally maintains by coordinating GES & GCS.

 Global Enqueue Service – GES: It coordinates with global resource lock request and non cache (other the DBBC) fusion by using LMC & LCK background processes

 Global Cache Services – GCS: It coordinates with cache fusion operations with the help of LSMn background process

Cache Fusion:
-          It is the process of transferring the available data blocks from one instance to buffer cache to another instance buffer cache
-          It is to avoid more costly hardware intensive disk I/O.
o    Disk read à millisecond
o    Cache read à nanosecond
-          It has two phases:
o    Cache fusion phase1 was supported with oracle8i, OPS for only read operations, still I/O has to happen for DML operations
o    From 9i, RAC onwards cache fusion phase2 was introduced for both data read & DML operations.

LMSn – Global Cache Service Process:
-          It is responsible for transferring the data blocks from 1 instance to another instance buffer cache.
-          This process can be controlled by the parameter “GCS_SERVER_PROCESSES”
GCS_SERVER_PROCESSES   = 1 (Default)
                                                = 10 (9.2)
                                                = 20 (10.1)
                                                = 36 (10.2) [0-9 & a-z]
               
Note: Cache fusion operations we can call It as cache coherent technique or soft ping where as disk read we can call as hard ping.

LMON - Global Enqueue Server Monitor:
-          It is responsible for maintaining the consistency for GRD among all the instance of a database
-          It is also responsible for GRD recovery for the failed instance.

LMD – Global Enqueue Service Daemon:
-          It is responsible for maintaining global resource lock requests i.e... Requests for the resource which are coming from the other instances. It will process them and maintain a queue

LCK – Lock Process: It is responsible for non cache fusion operations

DIAG: It is responsible for updating the diagnostic information in alert log files & trace files whenever server process or any other background processes need diagnosis.