December 25, 2011

Forcing the issue


Some ASM commands have the "force" option that allows the administrator to override a default behaviour. While some uses of the force option are perfectly safe and indeed required, some may render your disk group unusable. Let's have a closer look.

Mount force

The force option becomes a must when a disk group mount reports missing disks. This is one of the cases when it's safe and required to use the force option. Provided we are not missing too many disks, the mount force should succeed. Basically, at least one partner disk - from every disk partnership in the disk group - must be available.

Let's look at one example. I have created a normal redundancy disk group PLAY with three disks:

SQL> create diskgroup PLAY disk '/dev/ASMPLAY01','/dev/ASMPLAY02','/dev/ASMPLAY03';

Diskgroup created.

I then dismounted the disk group and deleted disk /dev/ASMPLAY01. After that, my disk group mount fails, telling me that a disk is missing:

SQL> alter diskgroup PLAY mount;
alter diskgroup PLAY mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "2"

As I am missing only one disk, I should be able to mount force the disk group:

SQL> alter diskgroup PLAY mount force;

Diskgroup altered.

ASM will now do some clean up - it will offline the missing disk and eventually drop it from the disk group. These actions will be logged in the ASM alert log for all to see:

SQL> alter diskgroup PLAY mount force
NOTE: cache registered group PLAY number=2 incarn=0xb71d3834
NOTE: cache began mount (first) of group PLAY number=2 incarn=0xb71d3834
NOTE: Assigning number (2,2) to disk (/dev/ASMPLAY03)
NOTE: Assigning number (2,1) to disk (/dev/ASMPLAY02)
...
NOTE: process _user5733_+asm (5733) initiating offline of disk 0.3916286251 () with mask 0x7e in group 2
NOTE: checking PST: grp = 2
GMON checking disk modes for group 2 at 29 for pid 19, osid 5733
NOTE: checking PST for grp 2 done.
WARNING: Disk 0 () in group 2 mode 0x7f is now being offlined
...
SUCCESS: diskgroup PLAY was mounted
SUCCESS: alter diskgroup PLAY mount force
...
WARNING: PST-initiated drop of 1 disk(s) in group 2(.3072145460))
SQL> alter diskgroup PLAY drop disk PLAY_0000 force /* ASM SERVER */
...
NOTE: starting rebalance of group 2/0xb71d3834 (PLAY) at power 1
Starting background process ARB0
SUCCESS: alter diskgroup PLAY drop disk PLAY_0000 force /* ASM SERVER */
ARB0 started with pid=21, OS id=5762
NOTE: assigning ARB0 to group 2/0xb71d3834 (PLAY) with 1 parallel I/O
SUCCESS: PST-initiated drop disk in group 2(3072145460))
NOTE: F1X0 copy 1 relocating from 0:2 to 2:2 for diskgroup 2 (PLAY)
NOTE: F1X0 copy 3 relocating from 2:2 to 65534:4294967294 for diskgroup 2 (PLAY)
NOTE: Attempting voting file refresh on diskgroup PLAY
...
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0xb71d3834 (PLAY)
...
SUCCESS: grp 2 disk _DROPPED_0000_PLAY going offline

Interestingly, ASM used the force option with the DROP DISK operation. More on that later.

The mount force operation would fail in a clustered environment if the ASM instance is not the first to mount the disk group.

There is a change in disk group mount force behavior in ASM version 11.2.0.3. A disk group mount, without the force option, will succeed in Exadata and Oracle Database Appliance - as long as the result leaves more than one failgroup for normal redundancy or more than two failgroups for high redundancy disk groups.

It is important to understand that this discussion only applies to normal and high redundancy disk groups. An external redundancy disk group cannot be mounted if it has missing disks.

Disk force

The CREATE DISKGROUP command does not have the force option. But if I am creating a disk group with disks that are not CANDIDATE, PROVISIONED or FORMER, I have to add force next to the disk name. Here is an example.

SQL> create diskgroup PLAY disk '/dev/ASMPLAY01','/dev/ASMPLAY02','/dev/ASMPLAY03';
create diskgroup PLAY disk '/dev/ASMPLAY01','/dev/ASMPLAY02','/dev/ASMPLAY03'
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15033: disk '/dev/ASMPLAY01' belongs to diskgroup "PLAY"

SQL> select disk_number, path, header_status from v$asm_disk where path like '%PLAY%';

DISK_NUMBER PATH             HEADER_STATUS
----------- ---------------- ----------------
          0 /dev/ASMPLAY01   MEMBER
          2 /dev/ASMPLAY02   FORMER
          1 /dev/ASMPLAY03   FORMER

SQL>

If I am 100% sure that it is safe to (re)use disk '/dev/ASMPLAY01', I can specify the force option for that disk in my CREATE DISKGROUP statement:

SQL> create diskgroup PLAY disk
'/dev/ASMPLAY01' FORCE,
'/dev/ASMPLAY02',
'/dev/ASMPLAY03';

Diskgroup created.

Let me say that again. My confidence that the disk can be reused has to be 100%. Anything less is not acceptable, as I will be destroying the content on that disk and taking it away from the disk group it belongs to.

The same applies to an ADD DISK operation of the ALTER DISKGROUP command. If the disk to be added to a disk group is not CANDIDATE, PROVISIONED or FORMER, I have to specify force next to the disk name.

This behavior has an interesting and time-consuming consequence. The other day I had to recreate a disk group in an Exadata environment. As it was a full rack, I had 168 disks for that disk group. That would normally be a trivial operation with a create disk group statement like this:

create diskgroup RECO
disk 'o/*/RECO*'
attribute
'compatible.asm'='11.2.0.0.0',
'compatible.rdbms'='11.2.0.0.0',
'au_size'='4M',
'cell.smart_scan_capable'='TRUE';

For reasons beyond the scope of this post, some disks had the header marked as MEMBER and some FORMER. So I had to compile a complete list of MEMBER disks and then specify every single disk in the CREATE DISKGROUP statement, making sure to specify FORCE next to each MEMBER and not to specify anything next to any FORMER disks. The create disk statement then looked like this:

create diskgroup RECO disk
'o/192.168.10.1/RECO_CD_00_exacel01',
'o/192.168.10.1/RECO_CD_01_exacel01',
'o/192.168.10.1/RECO_CD_02_exacel01',
'o/192.168.10.1/RECO_CD_03_exacel01',
'o/192.168.10.1/RECO_CD_04_exacel01' FORCE,
'o/192.168.10.1/RECO_CD_05_exacel01',
'o/192.168.10.1/RECO_CD_06_exacel01',
'o/192.168.10.1/RECO_CD_07_exacel01',
'o/192.168.10.1/RECO_CD_08_exacel01',
'o/192.168.10.1/RECO_CD_09_exacel01',
'o/192.168.10.1/RECO_CD_10_exacel01',
'o/192.168.10.1/RECO_CD_11_exacel01' FORCE,
'o/192.168.10.2/RECO_CD_00_exacel02',
'o/192.168.10.2/RECO_CD_01_exacel02',
'o/192.168.10.2/RECO_CD_02_exacel02',
'o/192.168.10.2/RECO_CD_03_exacel02' FORCE,
'o/192.168.10.2/RECO_CD_04_exacel02' FORCE,
'o/192.168.10.2/RECO_CD_05_exacel02',
'o/192.168.10.2/RECO_CD_06_exacel02',
'o/192.168.10.2/RECO_CD_07_exacel02' FORCE,
'o/192.168.10.2/RECO_CD_08_exacel02',
'o/192.168.10.2/RECO_CD_09_exacel02',
'o/192.168.10.2/RECO_CD_10_exacel02',
'o/192.168.10.2/RECO_CD_11_exacel02',
'o/192.168.10.3/RECO_CD_00_exacel03',
'o/192.168.10.3/RECO_CD_01_exacel03',
'o/192.168.10.3/RECO_CD_02_exacel03' FORCE,
'o/192.168.10.3/RECO_CD_03_exacel03',
...
'o/192.168.10.14/RECO_CD_11_exacel14'
attribute
'compatible.asm'='11.2.0.0.0',
'compatible.rdbms'='11.2.0.0.0',
'au_size'='4M',
'cell.smart_scan_capable'='TRUE';

Forcing disk drop

As we have seen in the ASM alert log above, a forced disk drop is required when the disk fails or is not accessible by ASM for any reason.

When we issue ALTER DISKGROUP ... DROP DISK command (without the FORCE option), the ASM moves data from the disk to be dropped to the remaining disks in the disk group. It then marks the disk as FORMER, updates the Partnership and Status Table (PST) and then drops the disk.

If ASM cannot access the disk (to be dropped) for any reason, we have to use DROP DISK FORCE. In that case the ASM has to copy the data from its partner disks. Once the data redundancy has been re-established, it simply updates the PST to say that the disk is no longer a member of that disk group. As ASM cannot access the disk, it is not able to mark its disk header as FORMER.

Forcing disk group drop

To drop a disk group I have to mount it first. If I cannot mount a disk group, but must drop it, I can use the force option of the DROP DISKGROUP statement, like this:

SQL> drop diskgroup PLAY force including contents;

Diskgroup dropped.

If ASM determines that the disk group is mounted anywhere (in the clustered environment), this operation fails.

Forcing disk group dismount

ASM does not allow a disk group to be dismounted if it's still being accessed. But I can force the disk group dismount even if some files in the disk group are open. Here is an example:

SQL> alter diskgroup PLAY dismount;
alter diskgroup PLAY dismount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15027: active use of diskgroup "PLAY" precludes its dismount

Yes, a database is using the disk group:

SQL> select group_number, db_name, status from v$asm_client;

GROUP_NUMBER DB_NAME  STATUS
------------ -------- ------------
           1 BR       CONNECTED
           2 BR       CONNECTED

But I am not very considerate today, so I will dismount the disk group anyway:

SQL> alter diskgroup PLAY dismount force;

Diskgroup altered.

Note that the forced disk group dismount will cause all datafiles in that database to go offline, which means they will need recovery (and restore if I drop disk group PLAY).

Undrop disks

The UNDROP DISKS clause of the ALTER DISKGROUP statement cancels all pending drops of disks within disk groups. But the UNDROP DISKS cannot be used to restore disks that are being dropped as the result of a DROP DISKGROUP statement, or for disks that are being dropped using the force clause.

Command line force

The equivalent of the force option in asmcmd is the -f flag on the command line and the FORCE keyword in the XML configuration file.

The asmcmd has an additional feature relevant to this discussion. The asmcmd lsdsk command with a -M flag displays the disks that are visible to some but not all active instances, as explained by asmcmd itself:

$ asmcmd help lsdsk
lsdsk

List Oracle ASM disks.

lsdsk [-kptgMI][-G diskgroup ] [--suppressheader] [ --member|--candidate] [--discovery][--statistics][pattern]

The options for the lsdsk command are described below.
...

-M - Displays the disks that are visible to some but not all active instances. These are disks that, if included in a disk group, will cause the mount of that disk group to fail on the instances where the disks are not visible.
...

Conclusion

It is important to understand the power of the force and use it wisely.