Evitar mensajes SWITCHOVER de dataguard en el alert log

Hoy vamos a ver un caso sencillo que puede dar algun susto si no sabemos de donde viene.

Ultimamente habreis visto que en el alert log de vuestras bases de datos donde tenemos Dataguards activados, tenemos unos sospechosos mensajes relativos al SWIRCHOVER del estilo

 Mem# 0: +REDO1/WINTRA_STBY/ONLINELOG/group_12.430.1176371311
  Mem# 1: +REDO2/ORCL_SITE2/ONLINELOG/group_12.448.1176371317
2024-12-09T12:04:01.654207+01:00
ARC4 (PID:22664): Archived Log entry 1874 added for T-6.S-29707 ID 0xf4a75427 LAD:1
2024-12-09T12:04:02.545651+01:00
 rfs (PID:3350): Selected LNO:10 for T-5.S-22376 dbid 4104620583 branch 722623079
2024-12-09T12:04:02.595323+01:00
PR00 (PID:22983): Media Recovery Waiting for T-5.S-22376 (in transit)
2024-12-09T12:04:02.609412+01:00
Recovery of Online Redo Log: Thread 5 Group 10 Seq 22376 Reading mem 0
  Mem# 0: +REDO1/ORCL_SITE2/ONLINELOG/group_10.419.1176369743
  Mem# 1: +REDO2/ORCL_SITE2/ONLINELOG/group_10.437.1176369749
2024-12-09T12:04:02.973552+01:00
ARC1 (PID:22656): Archived Log entry 1875 added for T-5.S-22375 ID 0xf4a75427 LAD:1
2024-12-09T12:47:21.593967+01:00
 rfs (PID:5113): krsr_rfs_atc: Identified database type as 'PHYSICAL STANDBY': Client is Foreground (PID:26765)
2024-12-09T12:47:23.731609+01:00
SWITCHOVER VERIFY BEGIN
SWITCHOVER VERIFY COMPLETE
2024-12-09T13:32:28.283411+01:00
 rfs (PID:5336): krsr_rfs_atc: Identified database type as 'PHYSICAL STANDBY': Client is Foreground (PID:29093)
2024-12-09T13:32:30.524186+01:00
SWITCHOVER VERIFY BEGIN
SWITCHOVER VERIFY COMPLETE
2024-12-09T14:17:29.627321+01:00
 rfs (PID:3408): krsr_rfs_atc: Identified database type as 'PHYSICAL STANDBY': Client is Foreground (PID:540)
2024-12-09T14:17:31.443773+01:00
SWITCHOVER VERIFY BEGIN
SWITCHOVER VERIFY COMPLETE

La primera pregunta que nos hacemos es

Quien demonios esta haciendo el SWITHVER VERIFY en nuestro dataguard?

La respuesta es, que lo hace el propio Oracle.
Parece ser que, a Oracle se les ha escapado sin avisar un cambio que hace que el TFA ejecute periodicamente DGMGRL VALIDATE DATABASE, lo que nos genera estos mensajes en el alert.log.

Esta previsto que esto se solucione en AHF 24.8 , donde el comando validate no estara en el schedule del TFA.

Tenemos que esperarnos a que se libere esa version?

Afortunadamente no, ya que podemos eliminar esa ejecucion de nuestro profile de ejecucion con con el comando

# tfactl modifyprofile db_dataguard disable 

Corrupcion de bloques no detectados por validate

Hoy vamos a ver una entada que nos puede traer un poco de cabeza.

Supongamos tenemos la tipica consulta que nos devuelve un error de corrupcion de bloque ORA-01578: ORACLE data block corrupted

SQL> select something from sometable  where file_name='whatever';
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 377, block # 2818432)
ORA-01110: data file 377:
'+DATA/TESTDB/DATAFILE/TESTDB.20221119.110001.377.dbf' 

Ante este error, nuestros pasos suelen ser claros.

  • Buscamos los bloques corruptos con rman validate o en su defecto dbverify dbv
  • Comprobamos la vista V$DATABASE_BLOCK_CORRUPTION;
  • Recuperamos de backup datafile & block recover datafile 377block 2818432;

Pero , que ocurre si tras ejecutar el validate o el dbv ?

Si miramos el alert.log durane la validacion, veremos lineas con el contenido

TESTDB(3):Completely zero block found during validation

Esto nos indica que es un error en un bloque que contiene solo ceros.
Oracle por diseño no escribe bloques con todo ceros, por lo que este error es heredado del sistema operativo o el sistema de almacenamiento.

Para solucionar este problema deberemos de recuperar ese datafile desde un backup con cualquiera de los metodos soportados

Mas informacion como siempre en Oracle Support

  • Physical Corrupted Blocks consisting of all Zeroes indicate a problem with OS, HW or Storage (Doc ID 1545366.1)

Bucle sleep en codigo PL-SQL

Vamos a ver una entrada rapidisima y sencillisima para dummies.

Como introducimos una espera en codigo PL-SQL

La respuesta es sencillisima, con la funcion DBMS_LOCK.SLEEP
Veamos por ejemplo como forzar dos esperas de 1 minuto para obtener un hang analyzer

$ORACLE_HOME/bin/sqlplus -s "/as sysdba" << EOF
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze 3;
exec dbms_lock.sleep(60);
-- Wait upto 1 minute before getting the second hanganalyze
oradebug hanganalyze 3;
-- Wait upto 1 minute before getting the second hanganalyze
exec dbms_lock.sleep(60);
oradebug hanganalyze 3;
oradebug tracefile_name;
EOF

Uso de variables de entorno en script sde rman

Hoy vamos a ver una entrada muy rapida para dummies.

Muchas veces queremos hacer que el log de rman tenga una variable de entorno ( usualmente a fecha),haciendo algo similar a

#!/bin/bash
HORA=`date +%Y%m%d_%H:%M:%S`
rman  cmdfile restore_${ORACLE_SID}.cmd  log logs/${HORA}_restore_${ORACLE_SID}.log

Pero cuando vamos al subdirecorio de logs nos encontramos con que nos ha creado un fichero llamado ${HORA}_restore_${ORACLE_SID}.log

Como solucionamos esto?

La solucion no pasa por jugar con las comillas sino con el uso del parametro MSGLOG
Simplemente tendremos que cambiar nuestro script por

#!/bin/bash
HORA=`date +%Y%m%d_%H:%M:%S`
rman  cmdfile restore_${ORACLE_SID}.cmd  MSGLOG logs/${HORA}_restore_${ORACLE_SID}.log

Y funcionara tal y como queremos

Eliminar un nodo del rac

Hoy vamos a ver la manera de eliminar de manera limpia un nodo de un RAC .
Supondremos que queremos eliminar de nuestro cluster el nododo que llamamos rac1, los pasos a llevar a cabo seran:

Eliminar del nodo las bases de datos corriendo

Supongamos que tenemos la BBDD

ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/db
ORACLE_SID=TEST1
DB_NAME=TEST
INSTANCE_NAME=TEST1
NODE_NAME=rac1

Para cada una de las bases de datos , la parariamos con :

 srvctl stop instance  -db $DB_NAME -i $INSTANCE_NAME

Y posteriormente la eliminariamos con:

rac1.pamplona.name:oracle (TEST:/u01/app/19c/grid) srvctl config database -db $DB_NAME
Database unique name: TEST
Database name:
Oracle home: /u01/app/oracle/product/12.1.0.2/db
Oracle user: oracle
Spfile: +DATA/TEST/PARAMETERFILE/spfile.357.935866381
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: +DATA,+REDO,+FRA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: osoper
Database instances: TEST1,TEST2
Configured nodes: rac1,rac2
Database is administrator managed

dbca -silent -deleteInstance -nodeList rac1 -gdbName $DB_NAME -instanceName $INSTANCE_NAME -sysDBAUserName sys -sysDBAPassword syspass

Tras esto comprobariamos que no quedan bases de datos en este nodo con crsctl stat rs -t

Eliminams los recursos del cluster de ese nodo

Una vez hemos eliminado las bases de datos, eliminaremos los recursos del cluster.
El primer paso es mover la mgtdb y para el proxy

srvctl relocate mgmtdb -n rac2
srvctl stop asm -proxy -n wiractst01

Eliminamos el listener

srvctl disable listener -l LISTENER -n rac1
srvctl stop listener -l LISTENER -n rac1

Desinstalando los binarios del CRS

Con el entorno cargado el ASM procederemos a desinstalar los binarios del RAC , para este proposito usaremos el comando deinstall del $ORACLE_HOME deel nodo, este comando se encargara de hacernos una desinstalacion limpia borrandonos:

  • Configuraciones
  • Binarios
  • Interfaces de red virtuales
$ORACLE_HOME/deinstall/deinstall -local

rac1.pamplona.name:oracle (+ASM1:/u01/app/19c/grid/deinstall) $ORACLE_HOME/deinstall/deinstall -local
Checking for required files and bootstrapping ...
Please wait ...
Location of logs /u01/app/oraInventory/logs/

############ ORACLE DECONFIG TOOL START ############
######################### DECONFIG CHECK OPERATION START #########################
## [START] Install check configuration ##
Checking for existence of the Oracle home location /u01/app/19c/grid
Oracle Home type selected for deinstall is: Oracle Grid Infrastructure for a Cluster
Oracle Base selected for deinstall is: /u01/app/oracle
Checking for existence of central inventory location /u01/app/oraInventory
Checking for existence of the Oracle Grid Infrastructure home /u01/app/19c/grid
The following nodes are part of this cluster: rac1,rac2
Checking for sufficient temp space availability on node(s) : 'rac1'
## [END] Install check configuration ##

Traces log file: /u01/app/oraInventory/logs/crsdc_2021-03-23_09-16-26-PM.log
Network Configuration check config START
Network de-configuration trace file location: /u01/app/oraInventory/logs/netdc_check9_09-16-27PM.log
Network Configuration check config END
Asm Check Configuration START
ASM de-configuration trace file location: /u01/app/oraInventory/logs/asmcadc_check2021-03-23_09-16-27PM.log
Database Check Configuration START
Database de-configuration trace file location: /u01/app/oraInventory/logs/databasedc_check2021-03-23_09-16-27PM.log
Oracle Grid Management database was found in this Grid Infrastructure home
Database Check Configuration END
######################### DECONFIG CHECK OPERATION END #########################

####################### DECONFIG CHECK OPERATION SUMMARY #######################
Oracle Grid Infrastructure Home is: /u01/app/19c/grid
The following nodes are part of this cluster: rac1,rac2
The cluster node(s) on which the Oracle home deinstallation will be performed are:rac1
Oracle Home selected for deinstall is: /u01/app/19c/grid
Inventory Location where the Oracle home registered is: /u01/app/oraInventory
Option -local will not modify any ASM configuration.
Oracle Grid Management database was found in this Grid Infrastructure home
Oracle Grid Management database will be relocated to another node during deconfiguration of local node
Do you want to continue (y - yes, n - no)? [n]: y
A log of this session will be written to: '/u01/app/oraInventory/logs/deinstall_deconfig2021-03-23_09-16-21-PM.out'
Any error messages from this session will be written to: '/u01/app/oraInventory/logs/deinstall_deconfig2021-03-23_09-16-21-PM.err'

######################## DECONFIG CLEAN OPERATION START ########################
Database de-configuration trace file location: /u01/app/oraInventory/logs/databasedc_clean2021-03-23_09-16-27PM.log
ASM de-configuration trace file location: /u01/app/oraInventory/logs/asmcadc_clean2021-03-23_09-16-27PM.log
ASM Clean Configuration END

Network Configuration clean config START
Network de-configuration trace file location: /u01/app/oraInventory/logs/netdc_clean2021-03-23_09-16-27PM.log
Network Configuration clean config END
Run the following command as the root user or the administrator on node "wiractst01".
/u01/app/19c/grid/crs/install/rootcrs.sh -force  -deconfig -paramfile "/tmp/deinstall2021-03-23_09-15-25PM/response/deinstall_OraGI19Home1.rsp"
Press Enter after you finish running the above commands
<----------------------------------------

Ejecutamos el comando como root

[oracle@rac1~]$ sudo /u01/app/19c/grid/crs/install/rootcrs.sh -force  -deconfig -paramfile "/tmp/deinstall2021-03-23-15-25PM/response/deinstall_OraGI19Home1.rsp"
Using configuration parameter file: /tmp/deinstall2021-03-23_09-15-25PM/response/deinstall_OraGI19Home1.rsp
The log of current session can be found at:
  /u01/app/oraInventory/logs/crsdeconfig_wiractst01_2021-03-23_09-19-35PM.log
Redirecting to /bin/systemctl restart rsyslog.service
2021/04/09 21:23:06 CLSRSC-336: Successfully deconfigured Oracle Clusterware stack on this node

Y continuamos el proceso dandole a [ ENTER ]

######################### DECONFIG CLEAN OPERATION END #########################
####################### DECONFIG CLEAN OPERATION SUMMARY #######################
Local node configuration of Oracle Grid Management database was removed successfully
Oracle Clusterware is stopped and successfully de-configured on node "rac1"
Oracle Clusterware is stopped and de-configured successfully.
#######################################################################
############# ORACLE DECONFIG TOOL END #############
Using properties file /tmp/deinstall2021-03-23-15-25PM/response/deinstall_2021-03-23-16-21-PM.rsp
Location of logs /u01/app/oraInventory/logs/
############ ORACLE DEINSTALL TOOL START ############
####################### DEINSTALL CHECK OPERATION SUMMARY #######################
A log of this session will be written to: '/u01/app/oraInventory/logs/deinstall_deconfig2021-03-23-16-21-PM.out'
Any error messages from this session will be written to: '/u01/app/oraInventory/logs/deinstall_deconfig2021-03-23-16-21-PM.err'
######################## DEINSTALL CLEAN OPERATION START ########################
## [START] Preparing for Deinstall ##
Setting LOCAL_NODE to rac1
Setting CLUSTER_NODES to rac1
Setting CRS_HOME to true
Setting oracle.installer.invPtrLoc to /tmp/deinstall2021-03-23-15-25PM/oraInst.loc
Setting oracle.installer.local to true
## [END] Preparing for Deinstall ##
Setting the force flag to false
Setting the force flag to cleanup the Oracle Base
Oracle Universal Installer clean START
Detach Oracle home '/u01/app/19c/grid' from the central inventory on the local node : Done
Delete directory '/u01/app/19c/grid' on the local node : Done
The Oracle Base directory '/u01/app/oracle' will not be removed on local node. The directory is in use by Oracle Home '/u01/app/oracle/product/12.1.0.2/db'.
Oracle Universal Installer cleanup was successful.
Oracle Universal Installer clean END
## [START] Oracle install clean ##
## [END] Oracle install clean ##
######################### DEINSTALL CLEAN OPERATION END #########################

####################### DEINSTALL CLEAN OPERATION SUMMARY #######################
Successfully detached Oracle home '/u01/app/19/grid' from the central inventory on the local node.
Successfully deleted directory '/u01/app/19/grid' on the local node.
Oracle Universal Installer cleanup was successful.

Review the permissions and contents of '/u01/app/oracle' on nodes(s) 'rac1'.
If there are no Oracle home(s) associated with '/u01/app/oracle', manually delete '/u01/app/oracle' and its contents.
Oracle deinstall tool successfully cleaned up temporary directories.
#######################################################################
############# ORACLE DEINSTALL TOOL END #############

Ahora, iremos a uno de los nodos que quedan y eliminaremos el nodo 1

[root@rac2 ~]$ .oraenv
ORACLE_SID = [root] ? +ASM2
ORACLE_BASE environment variable is not being set since this
information is not available for the current user ID root.
You can set ORACLE_BASE manually if it is required.
Resetting ORACLE_BASE to its previous value or ORACLE_HOME
The Oracle base has been set to /u01/app/19c/grid

[root@rac2 ~]$  $ORACLE_HOME/bin/crsctl delete node -n  rac1
CRS-4661: Node rac1 successfully deleted.

Comprobamos de nuevo como usuario oracle el numero de nodos

rac2:oracle (+ASM2:/home/oracle) olsnodes -s -t
rac2  Active  Unpinned

Mas informacion en:

  • How to Add Node/Instance or Remove Node/Instance with Oracle Clusterware and RAC (Doc ID 1332451.1)
  • https://oracledbwr.com/step-by-step-deleting-node-in-oracle-rac-12c-release-1-environment

Mas entradas para dummies sobre RAC:
Comandos basicos en Orace RAC
Comandos basicos del RAC II
Eliminar un nodo del rac