Thursday, February 19, 2015

RAC - Error Diagnosis

Checking your RAC isnt difficult but there are so many commands.

Example 1 - Server Down

Logging into my main server I check that the cluster is running ok.

If OEM is running, log in and click on <Cluster> (top right tab) and the Hosts item (2nd in the list) shows 1/1 indicating that one is down. Click on the 2 (2 nodes in my setup) and you can see on the next page which node is unavailable.

If OEM isnt running (and even if it is), its quicker to start a terminal session and run crsctl.

This gives an overall general view of things.

     grid_env
     crsctl check crs

     CRS-4638: Oracle High Availability Services is online
     CRS-4537: Cluster Ready Services is online
     CRS-4529: Cluster Synchronization Services is online
     CRS-4533: Event Manager is online

but it isnt helpful.

     crsctl status resource -t
     --------------------------------------------------------------------------------
     NAME TARGET STATE SERVER STATE_DETAILS
     --------------------------------------------------------------------------------
     Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg ONLINE ONLINE ol5-112-rac1
ora.LISTENER.lsnr ONLINE ONLINE ol5-112-rac1
ora.asm ONLINE ONLINE ol5-112-rac1 Started
ora.eons ONLINE ONLINE ol5-112-rac1
ora.gsd OFFLINE OFFLINE ol5-112-rac1
ora.net1.network ONLINE ONLINE ol5-112-rac1
ora.ons ONLINE ONLINE ol5-112-rac1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE ol5-112-rac1
ora.oc4j 1 OFFLINE OFFLINE
ora.ol5-112-rac1.vip 1 ONLINE ONLINE ol5-112-rac1
ora.ol5-112-rac2.vip 1 ONLINE INTERMEDIATE ol5-112-rac1 FAILED OVER
ora.orcl.db 1 ONLINE ONLINE ol5-112-rac1 Open
2 ONLINE OFFLINE
ora.scan1.vip 1 ONLINE ONLINE ol5-112-rac1

I've jiggled the output slightly to make it more readable.

GSD and OC4J are normally down in this installation so no worries there.

We can also see that ora.orcl.db (orcl is my database) 2 is offline, and rac2 vip is failed over to rac1 indicating that the server is unavailable.

You can also use crs_stat but the output is less friendly and the FAILOVER is obvious.

So the cluster is running and access to the database is possible but the high availability option of the 2nd instance is gone. I restart the other server and the status is corrected.

ora.ol5-112-rac1.vip 1 ONLINE ONLINE ol5-112-rac1
ora.ol5-112-rac2.vip 1 ONLINE ONLINE ol5-112-rac2
ora.orcl.db 1 ONLINE ONLINE ol5-112-rac1 Open
2 ONLINE ONLINE ol5-112-rac2 Open
ora.scan1.vip 1 ONLINE ONLINE ol5-112-rac1





<u>Clusterware ALERT Log (alert<nodename>.log)</u>



RAC has got an alert log. Its has logs for everything as you would expect. These are in $GRID_HOME (or whatever you call it)/log/<nodename>





<u>crsctl options</u>



crsctl check crs - checks the viability of the CRS stack

crsctl check cssd - checks the viability of CSS

crsctl check crsd - checks the viability of CRS

crsctl check evmd - checks the viability of EVM

crsctl set css <parm> <value> - sets a parameter override

crsctl get css <parm> - gets the value of a CSS parameter

crsctl unset css <parm> - sets CSS parameter to its default

crsctl query css votedisk - lists the voting disks used by CSS

crsctl add css votedisk <path> - adds a new voting disk

crsctl delete css votedisk <path> - removes a voting disk

crsctl enable crs - enables startup for all CRS daemons

crsctl disable crs - disables startup for all CRS daemons

crsctl start crs - starts all CRS daemons

crsctl stop crs - stops all CRS daemons

crsctl start resources - starts CRS resources

crsctl stop resources - stops CRS resources



to be continued......



Happyjohn

No comments:

Post a Comment