Wednesday, September 28, 2011

Oracle ASM: ORA-01078: failure in processing system parameters and ORA-29701: unable to connect to Cluster Synchronization Service

Recently we had a power outage in our lab which caused all the Linux servers running Oracle database, switches, storage to go down. When the power was restored the servers, storages, and switches were brought up cleanly.

I noticed that Oracle ASM instance was down, so I tried to manually start it and got the ORA-01078 and ORA-29701 error messages.



sh-3.2$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 28 15:20:59 2011

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup;
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
SQL>



-bash-3.2$ . ./.bash_profile_grid
-bash-3.2$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0/grid
-bash-3.2$ which crsctl
/u01/app/oracle/product/11.2.0/grid/bin/crsctl

Tried to use crsctl to start all the Oracle resources

-bash-3.2$ crsctl start resource -all
CRS-5702: Resource 'ora.LISTENER.lsnr' is already running on 'isvx7'
CRS-2672: Attempting to start 'ora.cssd' on 'isvx7'
CRS-2679: Attempting to clean 'ora.diskmon' on 'isvx7'
CRS-2681: Clean of 'ora.diskmon' on 'isvx7' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'isvx7'
CRS-2676: Start of 'ora.diskmon' on 'isvx7' succeeded
CRS-2676: Start of 'ora.cssd' on 'isvx7' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'isvx7'
CRS-2676: Start of 'ora.asm' on 'isvx7' succeeded
CRS-2672: Attempting to start 'ora.DATA.dg' on 'isvx7'
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
CRS-2674: Start of 'ora.DATA.dg' on 'isvx7' failed
CRS-2679: Attempting to clean 'ora.DATA.dg' on 'isvx7'
CRS-2681: Clean of 'ora.DATA.dg' on 'isvx7' succeeded
CRS-4000: Command Start failed, or completed with errors.
-bash-3.2$

On the Linux server running Oracle

[root@isvx7 ~]# multipath -d -l
mulipath.conf line 111, invalid keyword: prio
mpath144 (36005076802828000c000000000000050) dm-3 IBM,2145
[size=300G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:3:3 sdaf 65:240 [active][undef]
 \_ 6:0:3:3 sdag 66:0   [active][undef]
 \_ 5:0:0:3 sdh  8:112  [active][undef]
 \_ 6:0:0:3 sdi  8:128  [active][undef]
 \_ 6:0:1:3 sdp  8:240  [active][undef]
 \_ 5:0:1:3 sdq  65:0   [active][undef]
 \_ 5:0:2:3 sdx  65:112 [active][undef]
 \_ 6:0:2:3 sdy  65:128 [active][undef]
mpath143 (36005076802828000c000000000000053) dm-2 IBM,2145
[size=300G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:3:2 sdad 65:208 [active][undef]
 \_ 6:0:3:2 sdae 65:224 [active][undef]
 \_ 5:0:0:2 sdf  8:80   [active][undef]
 \_ 6:0:0:2 sdg  8:96   [active][undef]
 \_ 5:0:1:2 sdn  8:208  [active][undef]
 \_ 6:0:1:2 sdo  8:224  [active][undef]
 \_ 5:0:2:2 sdv  65:80  [active][undef]
 \_ 6:0:2:2 sdw  65:96  [active][undef]
mpath142 (36005076802828000c000000000000052) dm-1 IBM,2145
[size=300G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:3:1 sdab 65:176 [active][undef]
 \_ 6:0:3:1 sdac 65:192 [active][undef]
 \_ 5:0:0:1 sdc  8:32   [active][undef]
 \_ 6:0:0:1 sde  8:64   [active][undef]
 \_ 6:0:1:1 sdl  8:176  [active][undef]
 \_ 5:0:1:1 sdm  8:192  [active][undef]
 \_ 5:0:2:1 sdt  65:48  [active][undef]
 \_ 6:0:2:1 sdu  65:64  [active][undef]
mpath141 (36005076802828000c000000000000051) dm-0 IBM,2145
[size=300G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 5:0:3:0 sdaa 65:160 [active][undef]
 \_ 5:0:0:0 sdb  8:16   [active][undef]
 \_ 6:0:0:0 sdd  8:48   [active][undef]
 \_ 6:0:1:0 sdj  8:144  [active][undef]
 \_ 5:0:1:0 sdk  8:160  [active][undef]
 \_ 6:0:2:0 sdr  65:16  [active][undef]
 \_ 5:0:2:0 sds  65:32  [active][undef]
 \_ 6:0:3:0 sdz  65:144 [active][undef]

When I looked at the owner and group of the disks, I noticed that the they had changed to owner root and group disk. No wonder Oracle is not seeing them.

[root@isvx7 ~]# ls -l /dev/mapper/mpath141
brw-rw---- 1 root disk 253, 0 Sep 28 09:57 /dev/mapper/mpath141
[root@isvx7 ~]# ls -l /dev/mapper/mpath142
brw-rw---- 1 root disk 253, 1 Sep 28 09:57 /dev/mapper/mpath142
[root@isvx7 ~]# ls -l /dev/mapper/mpath143
brw-rw---- 1 root disk 253, 2 Sep 28 09:57 /dev/mapper/mpath143
[root@isvx7 ~]# ls -l /dev/mapper/mpath144
brw-rw---- 1 root disk 253, 3 Sep 28 09:57 /dev/mapper/mpath144
Changed the owner and group of the volumes to oracle and dba

[root@isvx7 ~]# chown -R oracle:dba /dev/mapper/mpath141
[root@isvx7 ~]# chown -R oracle:dba /dev/mapper/mpath142
[root@isvx7 ~]# chown -R oracle:dba /dev/mapper/mpath143
[root@isvx7 ~]# chown -R oracle:dba /dev/mapper/mpath144
[root@isvx7 ~]# ls -l /dev/mapper/mpath143
brw-rw---- 1 oracle dba 253, 2 Sep 28 09:57 /dev/mapper/mpath143
[root@isvx7 ~]#

When I brought up asmca. I saw that the my +DATA diskgroup could now be seen, but was not mounted. I mounted the diskgroup, and everything worked fine after that.