Thursday, June 25, 2009

ORACLE 10g DATAGUARD


 Agenda
Ø  Physical vs. Logical Standby
Ø  Standby Protection Modes
Ø  Log Transport Attributes
Ø  Standby Redo Logs
Ø  Setup Physical Standby step-by-step
Ø  Managing and Monitoring standby
Ø  Role Transition: Switchover/Failover

Purpose
To provide an efficient disaster recovery solution by maintaining transactionally consistent copies of the production database at a remote site.

Physical Standby
Ø  Kept in sync with the primary by using media recovery to apply redo generated on primary
Ø  Used for BCP
Ø  Can be opened in read-only mode but redo won’t be applied for that time
Ø   
Logical Standby
Ø  Kept in sync with the primary by transforming redo data received from primary into logical SQL statements and then executing those SQLs against the standby database.
Ø  Used to offload reporting from the primary database
Ø  Can be opened in read-only mode while the changes are being applied

Protection Modes
Decide on Standby Protection Mode before setting it up:
  1. MAXIMUM PROTECTION
Pre-requisites
Ø  Using LGWR SYNC AFFIRM, transport of redo logs to be done in a synchronous fashion.
Ø  Standby redo logs (SRLs) need to be created on standby site.
Ø  At least one standby must be available for the primary database to function.
Ø  Need high speed network.
Pros
Ø  Zero data loss
Cons
Ø  Primary shuts down if in case of network issues unable to commit on standby at the same time.

2. MAXIMUM AVAILABILITY
Pre-requisites
Ø  Using LGWR SYNC AFFIRM, transport of redo logs to be done in a synchronous fashion.
Ø  Standby redo logs (SRLs) need to be created on standby site.
Features
Ø  If network issues, switches to maximum performance and when the fault corrects switches back to maximum availability.
Ø  Data loss only if primary loses it’s redo logs.
SQL> alter database set standby to maximize availability;

3. MAXIMUM PERFORMANCE
Ø  Asynchronous redo shipping using ARC or LGWR ASYNC.
Ø  No impact on primary’s performance even if network issues.
Ø  No need to create SRLs unless real-time apply is needed on the standby site.

Log Transport Services
Log Transport Service Attributes are defined on primary in log_archive_dest_2
ARC(default)
Ø  ARC will first archive the online redo log to local destination on primary. Then second ARC process spawns and writes the archive to remote standby.
Ø  By default, log_archive_local_first=true in init.ora on primary. DO NOT CHANGE IT.
LGWR
Ø  In contrast to ARC, which transmits redo to standby only at log switch time, LGWR attribute instructs LGWR process to transmit redo to standby at the same time while the redo is writing to the online redo logs.
Ø  Transmission of redo can be done synchronously (SYNC) or asynchronously (ASYNC)
AFFIRM
All Disk I/O at standby to be performed synchronously
SYNC
Ø  By default, LGWR archives synchronously. Once I/O is initiated, archiving must wait for I/O to complete. This means transaction is not committed on primary database until redo data necessary to recover that transaction is received by the destination.
ASYNC
Ø  LGWR does not wait for the I/O to complete. LGWR network server process(LNS) performs actual network I/O.
Ø  User-configurable buffer used to accept outbound redo data from LGWR. ASYNC=20480 indicates a 10MB buffer. Maximum can be upto 50MB.
MAX_FAILURE
Ø  Defines number of times to retry a destination that has been closed due to a failure
NET_TIMEOUT
Ø  Used with LGWR ASYNC.
Ø  Defines how many seconds to wait before giving up on a network connection.
REOPEN
Ø  Determines how long the primary waits before retrying a connection

DATAGUARD SETUP

Creating a physical standby
                Both primary and standby systems must be identical in configuration with regards to operating system, platform architecture and database version. H/W config may differ.

1. Enable archiving on primary
log_archive_dest_1=‘LOCATION=
log_archive_format=%t_%s_%r.dbf
log_archive_start=true( As of 10g  release, its deprecated )
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database open;

2. Enable force logging on primary
SQL> alter database force logging;
This is required as any nologging operations would not be logged within redo stream.
In this mode, nologging operations are permitted to run, but changes are placed into redo.

3. Creating password file on primary and standby
Create a password file( if not created yet )
orapwd file=orapw password=
remote_login_passwordfile=exclusive
SYS password must be identical on both primary and standby for log transport services to function.

4. Creating standby controlfile on primary
SQL> alter database create standby controlfile as ‘<../path/standby.ctl>’;

5. Take hotbackup of primary  and copy datafiles,archivelogs and standby controlfile to standby
(NOTE: do not copy redo logs since standyb will create it’s own)

6. Create tnsnames.ora aliases for primary and standby on both primary and standby

7. Prepare init.ora on primary
db_name=‘TEST’
db_unique_name=‘PRI’
service_names=‘PRI_SERVICE’
log_archive_config=‘DG_CONFIG=(PRI,STDBY)’
log_archive_dest_1=‘LOCATION=
log_archive_dest_state_1=enable
log_archive_dest_2=‘SERVICE= ARCH ASYNC reopen=300 max_failure=0
net_timeout=60 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=STDBY’
log_archive_dest_state_2=enable
log_archive_min_succeed_dest=1
log_archive_max_processes=2
standby_file_management=auto
fal_server=
fal_client=

8. Prepare init.ora on standby
db_name=‘TEST’
db_unique_name=‘STDBY’
service_names=‘STDBY_SERVICE’
log_archive_config=‘DG_CONFIG=(PRI,STDBY)’
log_archive_dest_1=‘LOCATION=
log_archive_dest_state_1=enable
log_archive_dest_2=‘SERVICE= ARCH ASYNC reopen=300 max_failure=0
net_timeout=60 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=PRI’
log_archive_dest_state_2=enable
log_archive_min_succeed_dest=1
log_archive_max_processes=2
standby_file_management=auto
fal_server=
fal_client=

9. Mount the standby and start applying the changes
SQL> startup mount;
SQL> alter database recover managed standby database disconnect;
To put standby in read-write mode
SQL> alter database activate standby database;
To stop the apply:
SQL> alter database recover managed standby database cancel immediate;
To start real time apply:
SQL> alter database recover managed standby database using current logfile disconnect;
This needs creation on SRLs (standby redo logs)
To put standby in read-only mode
SQL> alter database recover managed standby database using current logfile disconnect;
SQL> alter database open read only;
Note: Once the standby is made primary (read-write), verify redo logs and tempfiles.

Monitoring
Ø  On primary check if archive logs are getting copied to standby:
SQL> select status from v$archive_dest where dest_id=2;
Ø  On Standby monitor MRP process:
SQL> select status from v$managed_standby where process like ‘%MRP%’;
Status must be “APPLYING_LOG” or “WAIT_FOR_LOG”
ps –ef|grep mrp
Ø  On standby detect archive gap
SQL> select * from v$archive_gap;
This will return records if MRP status is “WAIT_FOR_GAP”
Ø  With 10gR2, v$dataguard_stats is introduced to monitor redo transport/apply progress;
SQL> select value from v$dataguard_stats where name=‘apply lag’;
SQL> select value from v$dataguard_stats where name=‘transport lag’;
Ø  Note: In case Dataguard is RAC, MRP process would be applying on one of the node. If this node   crashes, MRP must be started on that surviving node to which VIP of the crashed node has failed over.

STANDBY REDO LOGS
Guidelines when creating standby redo logs:
Ø  Number of standby redo logs should be the same number as online redo logs plus one.
Ø  Standby redo logs should be exactly the same size as the online redo logs.
Ø  SRLs should be created on both primary and standby to facilitate seamless role changes.
Ø  In a RAC environment, all SRLs should be on a shared disk and may be thread specific.
Ø  Used with maximum protection modes and when real-time apply is used.
How SRLs work?
  1. LGWR process on primary initiates a connection with standby.
  2. Standby listener responds by spawning a process called RFS(remote file server)
  3. RFS process creates n/w conn with processes on primary and waits for data to arrive.
  4. Once data comes, RFS places it into standby redo logs.
  5. When log switch occurs on primary, standby redo logs are switched and RFS will go to next available standby redo log.

SWITCHOVER
Ø  Switchover allows a primary and standby to reverse roles without any data loss.
Ø  No need to re-create the old primary. Performed for planned maintenance.
Steps:
1. Verify if primary can be switched over to standby
SQL> select switchover_status from v$database;
If value returns “TO_STANDBY”, its alright to switch the primary to standby role.
2. Convert primary to standby
SQL> alter database commit to switchover to physical standby;
If value is “SESSIONS ACTIVE” from step 1, then
SQL> alter database commit to switchover to physical standby with session shutdown;
3. Shutdown the restart the old primary as standby
SQL> shutdown immediate;
SQL> startup mount;
At this point, we now have both databases as standby.
4. On target standby database, verify switchover status. If value is “TO_PRIMARY” then
SQL> alter database commit to switchover to primary;
If value is “SESSIONS ACTIVE”, then append “WITH SESSION SHUTDOWN” to above command.
5. Shutdown and restart the new primary database
SQL> shutdown immediate;startup;


FAILOVER
Ø  Failover implies data loss and can result in the need to re-create old primary.
Steps:
1. Identify and resolve any gaps that may exist on standby.
SQL> select * from v$archive_gap;
Copy missing archives from primary to standby and register them to standby controlfile.
SQL> alter database register physical logfile ‘’;
2. If standby redo logs are configured and active,
SQL> alter database recover managed standby database finish;
If NO SRLs or they are not active,
SQL> alter database recover managed standby database skip standby logfile;
3. Convert standby to primary;
SQL> alter database commit to switchover to primary;
4. Restart new primary
SQL> shutdown immediate;startup;
Note: Once the standby is made primary (read-write), verify redo logs and tempfiles.


DATA GUARD BROKER

Agenda
Ø  DG Broker Concepts & Advantages
Ø  Setup using CLI(DGMGRL)
Ø  Useful broker commands
Ø  Switchover

Concepts
Ø  Any Data Guard configuration consists of one primary database and up to nine standby databases.
Ø  The Data Guard broker logically groups these primary and standby databases into a broker configuration so as to manage and monitor them together as an integrated unit.
Ø  Data Guard broker is a centralized framework to manage entire Data Guard configuration through a client connection to any database in the configuration.
Ø  Accessed either locally or remotely using either of the two clients: CLI interface(DGMGRL) or the Data Guard page from GUI( OEM Grid Control )
Ø  DGMGRL does not have the ability to create standby (GUI can do it). CLI is used mostly for configuration and management.
Ø  Easy switchover/failover with one command thereby minimizing overall downtime associated with the planned/unplanned outage
Ø  Integrated with CRS so that database role changes occur smoothly and seamlessly.
Ø  Instead of managing primary and standby databases with various SQL*Plus statements, broker provides a single unified configuration

DMON
Ø  Data Guard monitor process (DMON) runs for every database instance that is managed by the broker and maintains the broker configuration in a binary configuration file
Whenever a broker command is issued, DMON process:
Ø  Carries out the request on primary database
Ø  Coordinates with DMON process for each of the other databases
Ø  Updates its local configuration file
Ø  Communicates with DMON process for each of the other databases to update their copies of the configuration file

Prerequisites
Set up following parameters on primary and standby:
DG_BROKER_START=TRUE
DG_BROKER_CONFIG_FILE1=‘
DG_BROKER_CONFIG_FILE2=‘
LOCAL_LISTENER
GLOBAL_DBNAME in listener.ora as db_unique_name_DGMGRL.db_domain
Ø  To enable DGMGRL to restart instances, a service with a specific name must be statically registered with the local listener of each instance.
Ø  For RAC, ensure dg_broker_config_files are on shared storage and accessible to all instances.
Ø  START_OPTIONS for RAC database must be set to MOUNT in OCR using SRVCTL (For switchover/Failover operations for broker and CRS to coordinate while restarting instances and database role reversal)
Ø  SPFILE must be used

Switchover
Once SWITCHOVER is issued, the broker does the following:
  1. Verifies state of primary and standby database are enabled and in ONLINE state
  2. Shuts down all RAC instances except one
  3. Switches roles between the primary and standby databases.
  4. Updates the  broker configuration file to record the changes in roles
  5. Restarts the new standby database and any RAC instances that were shutdown prior to switchover.
  6. Restarts the new primary database, opens it in read-write mode, and starts log transport services transmitting redo data to the archived redo log files for standby database
  7. After switchover completes, the overall Data Guard protection mode remains at the same protection level as it was before the switchover.
  8. For DGMGRL to restart instances automatically, you must connect to the database as SYSDBA using the username and password specified in the remote password file before beginning the switchover.

Dataguard Broker setup
1. Set up init parameters on primary to enable broker
Note: For RAC, ensure dg_broker_config_files are on shared storage and accessible to all the instances.
Note: Broker config files are named as dr1<<db_unique_name>>.dat anddr2<<db_unique_name>>.dat
SQL> alter system set dg_broker_start=false sid='*';
System altered.

SQL> alter system set dg_broker_config_file1='/n01/dg_broker_config_files/dr1TESTPRI.dat' sid='*';
System altered.

SQL> alter system set dg_broker_config_file2='/n01/dg_broker_config_files/dr2TESTPRI.dat' sid='*';
System altered.

SQL> alter system set dg_broker_start=true  sid='*';
System altered.
2. Verify if DMON process has started on all the instances of primary. Example:
$ ps -ef|grep dmon|grep -v grep
oracle   16190     1  0 08:53 ?        00:00:00 ora_dmon_TESTPRIR1

$ ps -ef|grep dmon|grep -v grep
oracle   29723     1  0 08:53 ?        00:00:00 ora_dmon_TESTPRIR2
3. Set up init parameters on standby
SQL> alter system set dg_broker_start=false sid='*';
System altered.

SQL> alter system set dg_broker_config_file1='/export/crawlspace/dg_broker_config_files/dr1TESTDG.dat' sid='*';
System altered.

SQL> alter system set dg_broker_config_file2='/export/crawlspace/dg_broker_config_files/dr2TESTDG.dat' sid='*';
System altered.

SQL> alter system set dg_broker_start=true  sid='*';
System altered.
4. GLOBAL_DBNAME should be set to <<db_unique_name>>_DGMGRL.<<db_domain>>in listener.ora on all instances of both primary and standby.
This is important otherwise you'll have TNS-12154 error during switchover operation.
Example:
SID_LIST_LISTENER_TESTPRI =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /apps/oracle/product/10g/db)
      (PROGRAM = extproc)
    )
    (SID_DESC =
      (SID_NAME = TESTPRIR1)
      (GLOBAL_DBNAME = TESTPRI_DGMGRL)
      (ORACLE_HOME = /apps/oracle/product/10g/db)
    )
  )
5. DGMGRL Configuration
5.1Connect
DGMGRL> CONNECT sys/sys
Connected.

5.2Create Configuration
DGMGRL> CREATE CONFIGURATION 'DG_TEST' AS PRIMARY DATABASE IS 'TESTPRI' CONNECT IDENTIFIER IS TESTPRI;
Configuration "DG_TEST" created with primary database "TESTPRI".

5.3Verify configuration
DGMGRL> SHOW CONFIGURATION;
Configuration
  Name:            DG_TEST
  Enabled:         NO
  Protection Mode: MaxPerformance
  Databases:
    TESTPRI - Primary database

Current status for "DG_TEST":
DISABLED

5.4Verify database; if RAC verify if all instances are validated
DGMGRL> show database 'TESTPRI';
Database
  Name:            TESTPRI
  Role:            PRIMARY
  Enabled:         NO
  Intended State:  ONLINE
  Instance(s):
    TESTPRIR1
    TESTPRIR2

Current status for "TESTPRI":
DISABLED

5.5Add standby database to the configuration
DGMGRL> ADD DATABASE 'TESTDG' AS CONNECT IDENTIFIER IS TESTDG MAINTAINED AS PHYSICAL;
Database "TESTDG" added.

5.6Enable the broker
DGMGRL> ENABLE CONFIGURATION;
Enabled.

5.7Verfying again
DGMGRL> SHOW CONFIGURATION;
Configuration
  Name:            DG_TEST
  Enabled:         YES
  Protection Mode: MaxPerformance
  Databases:
    TESTPRI - Primary database
    TESTDG  - Physical standby database

Current status for "DG_TEST":
SUCCESS

6. Troubleshooting
Let us see some sample issues and their fix
Issue
DGMGRL> CONNECT sys/sys
ORA-16525: the Data Guard broker is not yet available

Fix
Set dg_broker_start=true

Issue
After enabling the configuration, on issuing SHOW CONFIGURATION, this error comes 
Warning: ORA-16608: one or more sites have warnings

Fix
To know details of the error, you may check log which will be generated at bdump with naming as drc{DB_NAME}.log or there are various monitorable properties that can be used to query the database status and assist in further troubleshooting.

Few Monitorable properties to troubleshoot
DGMGRL> SHOW DATABASE 'TESTPRI' 'StatusReport';
DGMGRL> SHOW DATABASE 'TESTPRI' 'LogXptStatus';
DGMGRL> SHOW DATABASE 'TESTPRI' 'InconsistentProperties';
DGMGRL> SHOW DATABASE 'TESTPRI' 'InconsistentLogXptProps';
DGMGRL> SHOW DATABASE 'TESTDG' 'StatusReport';
DGMGRL> SHOW DATABASE 'TESTDG' 'LogXptStatus';
DGMGRL> SHOW DATABASE 'TESTDG' 'InconsistentProperties';
DGMGRL> SHOW DATABASE 'TESTDG' 'InconsistentLogXptProps';

Issue
DGMGRL> SHOW DATABASE 'TESTPRI' 'StatusReport';
STATUS REPORT
       INSTANCE_NAME   SEVERITY ERROR_TEXT
          TESTPRIR2    WARNING ORA-16714: The value of property ArchiveLagTarget is inconsistent with the database setting.
          TESTPRIR2    WARNING ORA-16714: The value of property LogArchiveMaxProcesses is inconsistent with the database setting.

Issue
DGMGRL> SHOW DATABASE 'TESTPRI' 'InconsistentProperties';
INCONSISTENT PROPERTIES
   INSTANCE_NAME        PROPERTY_NAME         MEMORY_VALUE         SPFILE_VALUE         BROKER_VALUE 
      TESTPRIR2     ArchiveLagTarget                    0                                         0 
      TESTPRIR2 LogArchiveMaxProcesses                    4                    2                    4 

Example
DGMGRL> SHOW DATABASE 'TESTPRI' 'LogArchiveMaxProcesses';
  LogArchiveMaxProcesses = '4'

Fix
DGMGRL> EDIT DATABASE 'TESTPRI' SET PROPERTY 'LogArchiveMaxProcesses'=2;

or

SQL> alter system set log_archive_max_processes=4 scope=spfile sid='*';
System altered.

DGMGRL> SHOW DATABASE 'TESTPRI' 'LogArchiveMaxProcesses';
  LogArchiveMaxProcesses = '4'

More commands
DGMGRL> SHOW DATABASE VERBOSE 'dbname';
This will show all property values in detail

DGMGRL> HELP;
List of all broker commands with usage help
Equivalent Broker Commands to 'ALTER SYSTEM'
SQL> alter database recover managed standby database cancel;
DGMGRL> edit database 'stby_dbname' set state='LOG-APPLY-OFF';

SQL> alter database recover managed standby database disconnect;
DGMGRL> edit database 'stby_dbname' set state='ONLINE';

SQL> alter system set log_archive_max_processes=4;
DGMGRL> edit database 'dbname' set property 'LogArchiveMaxProcesses'=4;

SQL> alter system set log_archive_dest_state_2='enable' scope=both;
DGMGRL> edit database 'stby_dbname' set property 'LogShipping'='ON';

SQL> alter system set log_archive_dest_state_2='defer' scope=both;
DGMGRL> edit database 'stby_dbname' set property 'LogShipping'='OFF';

DGMGRL> edit database 'pri_dbname' set state='LOG-TRANSPORT-OFF';
This will defer all standby databases


1 comment:

  1. It was wondering if I could use this write-up on my other website, I will link it back to your website though.Great Thanks.
    gatwick to london taxi transfer

    ReplyDelete