The views expressed on this blog are my own and do not necessarily reflect the views of Oracle

January 4, 2012

ASM file number 4


Some long-running ASM operations, like rebalance, drop disk, create/delete/resize file, cannot be described by a single record in the active change directory. For those operations ASM uses the Continuing Operations Directory (COD) - ASM file number 4. There is one COD per disk group.

If a process performing the long-running operation dies before completing it, a recovery process will look at the entry and either complete or rollback the operation. There are two types of continuing operations - background and rollback.

Background operation

A background operation is preformed by an ASM instance background process. It is done as part of a disk group maintenance and it continues until it is either completed or the ASM instance dies. If the instance dies, then the recovering instance needs to resume the background operation. Disk group rebalance is the best example of a background operation.

Let's query X$KFFXP to find the ACD allocation units for disk group 3 (group_kffxp=3). COD is ASM file number 4, hence number_kffxp=4 in our query:

SQL> SELECT x.xnum_kffxp "Extent",
x.au_kffxp "AU",
x.disk_kffxp "Disk #",
d.name "Disk name"
FROM x$kffxp x, v$asm_disk_stat d
WHERE x.group_kffxp=d.group_number
and x.disk_kffxp=d.disk_number
and x.group_kffxp=3
and x.number_kffxp=4
ORDER BY 1, 2;

    Extent         AU     Disk # Disk name
---------- ---------- ---------- ------------------------------
         0          8          0 ASMDISK5

This is telling us the ACD is in allocation unit 8 on disk ASMDISK5. Let's have a closer look (note the AU size of 4 MB for this disk group):

$ kfed read /dev/oracleasm/disks/ASMDISK5 ausz=4m aun=8 blkn=0 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            9 ; 0x002: KFBTYP_COD_BGO
...
kfrcbg.size:                          0 ; 0x000: 0x0000
kfrcbg.op:                            0 ; 0x002: 0x0000
kfrcbg.inum:                          0 ; 0x004: 0x00000000
kfrcbg.iser:                          0 ; 0x008: 0x00000000

This shows the COD block for a background operation (kfbh.type=KFBTYP_COD_BGO) and not much happening at the moment - all  kfrcbg fields are 0. Most notably the operation code (kfrcbg.op) is 0 which means there are no background operations in progress. The op code 1 would indicate a disk rebalance in progress.

Rollback operation

A rollback operation is similar to a database transaction. It is started at the request of an ASM foreground process. To begin a rollback operation a slot must be found in the rollback directory – block 1 of the continuing operations directory. If all slots are busy then the operation sleeps until one is free. During the operation the disk group is in an inconsistent state. The operation needs to either complete or rollback all its changes to the disk group. The foreground is usually performing the operation on behalf of a database instance. If the database instance dies or ASM foreground process dies or an unrecoverable error occurs then the operation must be terminated.

Creating a file is a good example of a rollback operation. If an error occurs while allocating the space for the file, then the partially created file must be deleted. If the database instance does not commit the creation, the file must be automatically deleted. If the ASM instance dies then this must be done by the recovering instance.

Let's have a look at block 1 of the COD:

$ kfed read /dev/oracleasm/disks/ASMDISK5 ausz=4m aun=8 blkn=1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           15 ; 0x002: KFBTYP_COD_RBO
...
kfrcrb10[0].opcode:                   1 ; 0x000: 0x0001
kfrcrb10[0].inum:                     1 ; 0x002: 0x0001
kfrcrb10[0].iser:                     1 ; 0x004: 0x00000001
kfrcrb10[0].pnum:                    18 ; 0x008: 0x00000012
kfrcrb10[1].opcode:                   0 ; 0x00c: 0x0000
kfrcrb10[1].inum:                     0 ; 0x00e: 0x0000
kfrcrb10[1].iser:                     0 ; 0x010: 0x00000000
kfrcrb10[1].pnum:                     0 ; 0x014: 0x00000000
...

Fields kfrcrb10[i] track the active rollback operations. We see that there is one operation in progress and (kfrcrb10[0] have non-null values) and from the opcode list we know this is a file create operation. The value kfrcrb10[0].inum=1 means that the operation is running in ASM instance 1.

The rollback operation opcodes are:

1 - Create a file
2 - Delete a file
3 - Resize a file
4 - Drop alias entry
5 - Rename alias entry
6 - Rebalance space COD
7 - Drop disks force
8 - Attribute drop
9 - Disk Resync
10 - Disk Repair Time
11 - Volume create
12 - Volume delete
13 - Attribute directory creation
14 - Set zone attributes
15 - User drop

Conclusion

ASM file number 4 - the continuing operations directory (COD) - keeps track of long-running ASM operations. In case of any problems, COD entries can be used to either continue or terminate/rollback the operation. The operation cleanup is performed by another ASM instance (in a cluster environments) or by the same ASM instance - usually after the instance restart.