Some long-running ASM operations, like the rebalance, drop disk, create/delete/resize file, cannot be described by a single record in the ASM active change directory. Those operations are tracked via the ASM continuing operations directory (COD) - the ASM file number 4. There is one COD per disk group.
If the process performing the long-running operation dies before completing it, a recovery process will look at the entry and either complete or rollback the operation. There are two types of continuing operations - background and rollback.
Background operation
A background operation is performed by an ASM instance background process. It is done as part of a disk group maintenance and it continues until it is either completed or the ASM instance dies. If the instance dies, then the recovering instance needs to resume the background operation. The disk group rebalance is the best example of a background operation.
Let's query the X$KFFXP view to find the COD allocation units for disk group 3 (group_kffxp=3). COD is ASM file number 4, hence number_kffxp=4 in the query:
SQL> SELECT x.xnum_kffxp "Extent",
x.au_kffxp "AU",
x.disk_kffxp "Disk #",
d.name "Disk name"
FROM x$kffxp x, v$asm_disk_stat d
WHERE x.group_kffxp=d.group_number
and x.disk_kffxp=d.disk_number
and x.group_kffxp=3
and x.number_kffxp=4
ORDER BY 1, 2;
Extent AU Disk # Disk name
---------- ---------- ---------- ------------------------------
0 8 0 ASMDISK5
SQL>
This is telling us that the ACD is in allocation unit 8 on disk ASMDISK5. Let's have a closer look (note the AU size of 4 MB for this disk group):
$ kfed read /dev/oracleasm/disks/ASMDISK5 ausz=4m aun=8 blkn=0 | more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 9 ; 0x002: KFBTYP_COD_BGO
...
kfrcbg.size: 0 ; 0x000: 0x0000
kfrcbg.op: 0 ; 0x002: 0x0000
kfrcbg.inum: 0 ; 0x004: 0x00000000
kfrcbg.iser: 0 ; 0x008: 0x00000000
$
This shows the COD block for a background operation (kfbh.type=KFBTYP_COD_BGO) and not much happening at the moment - all kfrcbg fields are 0. Most notably the operation code (kfrcbg.op) is 0, which means that there are no active background operations. The op code 1 would indicate an active disk rebalance operation.
Rollback operation
A rollback operation is similar to a database transaction. It is started at the request of an ASM foreground process. To begin a rollback operation a slot must be found in the rollback directory – block 1 of the ASM continuing operations directory. If all slots are busy then the operation sleeps until one is free. During the operation the disk group is in an inconsistent state. The operation needs to either complete or rollback all its changes to the disk group. The foreground is usually performing the operation on behalf of a database instance. If the database instance dies or the ASM foreground process dies, or an unrecoverable error occurs, then the operation must be terminated.
Creating a file is a good example of a rollback operation. If an error occurs while allocating the space for the file, then the partially created file must be deleted. If the database instance does not commit the file creation, the file must be automatically deleted. If the ASM instance dies then this must be done by the recovering instance.
Let's have a look at block 1 of the COD:
$ kfed read /dev/oracleasm/disks/ASMDISK5 ausz=4m aun=8 blkn=1 | more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 15 ; 0x002: KFBTYP_COD_RBO
...
kfrcrb10[0].opcode: 1 ; 0x000: 0x0001
kfrcrb10[0].inum: 1 ; 0x002: 0x0001
kfrcrb10[0].iser: 1 ; 0x004: 0x00000001
kfrcrb10[0].pnum: 18 ; 0x008: 0x00000012
kfrcrb10[1].opcode: 0 ; 0x00c: 0x0000
kfrcrb10[1].inum: 0 ; 0x00e: 0x0000
kfrcrb10[1].iser: 0 ; 0x010: 0x00000000
kfrcrb10[1].pnum: 0 ; 0x014: 0x00000000
...
$
Fields kfrcrb10[i] track the active rollback operations. We see that there is one operation in progress (kfrcrb10[0] have non-null values), and from the opcode list we know this is a file create operation. The value kfrcrb10[0].inum=1 means that the operation is running in the ASM instance 1.
The rollback operation opcodes are:
1 - Create a file
2 - Delete a file
3 - Resize a file
4 - Drop alias entry
5 - Rename alias entry
6 - Rebalance space COD
7 - Drop disks force
8 - Attribute drop
9 - Disk Resync
10 - Disk Repair Time
11 - Volume create
12 - Volume delete
13 - Attribute directory creation
14 - Set zone attributes
15 - User drop
Conclusion
The ASM continuing operations directory (COD) - keeps track of the long-running ASM operations. In case of any problems, the COD entries can be used to either continue or rollback the operation. The operation cleanup is performed by another ASM instance (in a cluster environments), or by the same ASM instance - usually after the instance restart.
No comments:
Post a Comment