April 26, 2010

kfed - ASM metadata editor


The kfed is an undocumented ASM utility that can be used to read and modify ASM metadata blocks. It is a standalone utility, independent of ASM instance, so it can be used with either mounted or dismounted disk groups. The most powerful kfed feature is its ability to fix corrupt ASM metadata.

The kfed binary is present in the recent ASM versions, but if you don't see it in your $ORACLE_HOME/bin directory (e.g. it may not be present in version 10.1), it can be built as follows:

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins* ikfed

kfed read

With the kfed read command we can read a single ASM metadata block. The syntax is:

$ kfed read [aun=ii aus=jj blkn=kk dev=]asm_disk_name

Where the command line parameters are
  • aun - Allocation Unit (AU) number to read from. Default is AU0, or the very beginning of the ASM disk.
  • aus - AU size. Default is 1048576 (1MB). Specify the aus when reading from a disk group with non-default AU size.
  • blkn - block number to read. Default is block 0, or the very first block of the AU.
  • dev - ASM disk or device name. Note that the keyword dev can be omitted, but the ASM disk name is mandatory.
Use kfed to read ASM disk header block

The following is an example of using the kfed utility to read the ASM disk header from ASM disk /dev/sda1.

$ kfed read /dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3102721733 ; 0x00c: 0xb8efc6c5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
...
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                   12284 ; 0x0c4: 0x00002ffc
...

Note that the above kfed command is equivalent to this one (with all parameters explicitly set to their default values):

$ kfed read aun=0 aus=1048576 blkn=0 dev=/dev/sda1

We see that the above kfed output is nicely formatted and human readable (sort of). The fields are grouped based on the actual content of the ASM metadata block.

In this example, the kfbh fields show the block header data, and the most important one is kfbh.type, which says KFBTYP_DISKHEAD, meaning the ASM disk header. This is the expected block type for an ASM disk header.

We then see the actual content of the ASM disk header metadata block - the kfdhdb fields. Some of those are the disk number (kfdhdb.dsknum), 0 in this case, the group redundancy type (kfdhdb.grptyp), normal redundancy in this case, the disk header status (kfdhdb.hdrsts), member in this case, the disk name (kfdhdb.dskname) - DATA_0000, etc.

Please see ASM disk header for the complete explanation of kfdhdb fields.

Use kfed to read any ASM metadata block

The next example shows how to read an ASM File Directory block. To do that we would use the following kfed command:

$ kfed read aun=10 blkn=1 dev=/dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
...

Note that I had to specify AU10 and block 1 to read a File Directory block. Have a look at the ASM File Directory post to learn how to locate a File Directory block.

Is my ASM metadata block corrupt

If you see kfbh.type=KFBTYP_INVALID, in the disk header on a disk you believe belongs to an ASM disk group, that indicates that the ASM disk header is corrupt. But don't jump to conclusions! Are you looking at the right disk? Is this the right disk partition? Can you access that disk via some other name - in a multipath setup? If you are not sure, or if the disk header is in fact damaged, contact Oracle Support for assistance.

Note that this applies to any ASM metadata block. If ASM expects to find a metadata block and instead finds a block that is zeroed out or contains rubbish, it will report the block as KFBTYP_INVALID, and an error (usually ORA-15196) will be reported in the ASM and/or database alert log (depends on which instance discovers the problem).

kfed write

With the kfed write command we can write to a single ASM metadata block. The syntax is:

$ kfed write [aun=ii aus=jj blkn=kk dev=]asm_disk_name text=new_contents chksum=yes

Where the new command line parameters are
  • text - a text file with the new block contents
  • checksum=yes - calculate and write the correct checksum. Note that the checksum in the text file with the new content does not have to be correct.
Use kfed to write the correct checksum to ASM metadata block

An ASM metadata may look fine, but in fact be corrupt. For example the block checksum (kfbh.check) could be wrong, in which case that would need to be corrected. Indeed, if the only problem is an incorrect checksum, that can be easily corrected by simply reading the block and then writing it back! The kfed will calculate the new checksum and write the block back with the correct checksum.

Here are the complete steps to correct the bad checksum for block 2 in AU0 on disk /dev/sda1:

$ kfed read aun=0 blkn=2 dev=/dev/sda1 > /tmp/aun0_blkn2_sda1.kfed
$ kfed write aun=0 blkn=2 dev=/dev/sda1 text=/tmp/aun0_blkn2_sda1.kfed chksum=yes

NOTE: Please seek Oracle Support assistance with any suspected ASM metadata block corruption.

kfed find

The kfed find will examine all blocks in an allocation unit and report back on the block types found. The syntax is:

$ kfed find [aun=ii aus=jj dev=]asm_disk_name

We see that the find command parameters are the same as for the read command, but the difference is that the find operates on all blocks in an allocation unit.

Use kfed find command to verify blocks in AU0

This is an example of using the kfed find to verify that all blocks in AU0 have the expected ASM metadata.

$ kfed find /dev/sda1

The expected result is type 1 for block 0, type 2 for block 1 and type 3 for all other blocks, i.e.:

$ kfed find /dev/sda1
Block 0 has type 1
Block 1 has type 2
Block 2 has type 3
Block 3 has type 3
Block 4 has type 3
...
Block 255 has type 3


If you see anything else in the output, that indicates a corrupted ASM metadata block. In that case please seek assistance from Oracle Support.

Note that my allocation unit size is 1MB, so there are only 255 blocks in the AU. If your allocation unit size is 4MB, the same command should return block type information for 1024 blocks.

I should also point out that with the above find command we only looked at the expected ASM metadata block types. We did not look at the actual metadata block contents. Some ASM metadata block corruptions are indeed with the block contents, i.e the block type is correct, but the contents is wrong. Such corruptions are only detected when ASM reads the corrupt block, in which case an ORA-15196 error will be reported. Please seek assistance from Oracle Support if you are unfortunate enough to encounter that error.

Conclusion

The kfed if an unassuming but very powerful utility. While I have shown only few commands, the kfed can also format an empty ASM file, perform a sanity check on an ASM metadata block, display data structure sizes and perform few other more obscure operations.

29 comments:

  1. Where is backup header located which can be restored in case disk header is curropted

    ReplyDelete
    Replies
    1. A backup copy of ASM disk header is in the second last block of allocation unit 1. If the allocation unit (AU) size is 1 MB there will be 256 blocks per AU. In that case the copy of the disk header will be in block 254 (note that blocks go from 0 to 255) of AU 1. The command to read that block would be:

      $ kfed read aun=1 blkn=254

      If the AU size is 4 MB, there will be 1024 blocks per AU. In that case the copy of the disk header will be in block 1022 (blocks now go from 0 to 1023) of AU 1. The command to read that block would be:

      $ kfed read ausz=4194304 aun=1 blkn=1022

      Note that this time I had to specify the AU size in the kfed command as AU had a non default size.

      While we are on this topic, if the only problem is the disk header corruption, that can easily be repaired with 'kfed repair' command. I didn't give an example for that in the original post as you should really consult Oracle Support for assistance with that kind of problem.

      Cheers,
      Bane

      Delete
    2. Hi Bane,

      thanks for the kfed repair command :-) Just dug me out of a hole.

      We tried using the kfed merge option based on other postings and in the end ran the repair command. This brought the ASM diskgroup online and I was able to bring up 5 of the 7 development databases. Still working on the other 2.

      Cheers

      Dave - Reading, UK

      Delete
  2. Hi Dave,

    Good to hear you found the repair option useful. Let me know if you get stuck or if you have any questions on this.

    Cheers,
    Bane

    ReplyDelete
  3. Hi Bane,
    I was looking for ASM stuffs to understand it more on google and found your article. It's really very informative. I can easily understand each and every step as it has been explained very clearly.

    Thanks for your article.

    Best Regards,
    Ramakant

    ReplyDelete
    Replies
    1. Cool! Thanks for your kind words Ramakant.
      Cheers,
      Bane

      Delete
  4. Bane,

    A few weeks back we had a production issue where block 40 and 41 on one of the ASM disk were corrupted which was in the area of ASM allocation table. During a rebalance the corruption was detected and the disk group went offline. We opened a SR with Oracle and they reviewed logs, ask DMP files, IMG and kfed output. I have two questions first how or what can you see from a kfed output that would tell you if you can use a kfed repair to resolve your problem? Second question have you ever come across where something has zero'ed consecutive blocks and all the vendors involve can not find what caused the issue? We final resolved the issue but Oracle support made us recreate the disk group and recover the database. Lucky for use we had a standby and good backups. If possible I would like to pick your brain some more around are issues since no one has been able to give us a good root cause analysis.

    ReplyDelete
    Replies
    1. Hi Javier,

      There are two types of repairs we can do with the kfed. The first one is the actual 'kfed repair' command, that fixes the corrupt or lost disk header block only.

      The other type of repair is a manual editing of the damaged block. We basically read the damaged block (and couple of blocks around it (as the damaged block may well be all zeroed out, as it was in your case), and then see if we can repair or reconstruct the block. This type of repair is of course more challenging - you need to understand the structure of the block, you need to know what data it had and finally understand if it can be manually repaired or not.

      Now to the specific metadata block in your case - allocation table block. It's much easier to fix a partially corrupt block then the completely zeroed out block. With the completely zeroed blocks we can recreate them as empty blocks. That way they have incorrect contents, but they are valid as far as the structure goes. That allows us to mount the disk group and make sure it stays mounted. We would then attempt disk group repair (ALTER DISKGROUP REPAIR) to see if ASM can fix this (using the file directory data that we hope is not damaged). If that works we are good to go. If not, we can attempt to find what should have been in those blocks (by querying X$KFFXP, which again needs a good file directory). With that info we then attempt to recreate the blocks...

      Now, none of this is trivial and even if you know exactly what you are doing, it can take hours. The biggest challenge may be finding the person that can do this for you. And finally, there is no guarantee that the problem can be fixes. All this is a best effort based as patching is not a supported method of data recovery. I know customers expect this type of service as a matter of course, but in reality this is out of scope for Oracle Support. The only time you can insist on Oracle fixing this (or at least attempting to fix it) is when it's clear that the problem is caused by Oracle bug...

      Back to the root cause question. Yes, I have seen this type of problem, but it is rare that we can tell with 100% confidence what caused the problem. The reason we claim it's not Oracle/ASM is because there is no routine/function that writes zeros to ASM metadata blocks. When we write empty blocks, sure they are empty, but they are formatted - they have the header/tail and the check-sum - they are never zeroed out blocks. That is our justification that the this type of change came from outside.

      And yes, there is no substitute for backups. Unfortunately some people still don't appreciate that...

      Cheers,
      Bane

      Delete
  5. Hi Bane,

    Is there a way to use kfed to read the entire AU at once ? (not individual blocks).

    regards,
    VK

    ReplyDelete
    Replies
    1. Hi VK,
      No. That's why I used shell scripting to read multiple blocks in my examples.
      Cheers,
      Bane

      Delete
  6. Okay, thanks Bane for a instant reply. I think I can give a try using AMDU then.

    VK

    ReplyDelete
  7. Hi Bane,

    I am getting some conflicting information when I query the v$asm_disk view. The header status for all 114 disks should be "Member". But ,with the exception of three disks, all other disks appear as "Candidates"?

    There are no errors in either the RDBMS or the ASM alert logs.

    The database is working fine for the time being.
    Is it a case of disk header corruption? or simple misreporting? And more importantly, how can I fix the problem?

    The details are as follows

    O/S: AIX 5.3.12.1
    Database: 11.2.0.3

    select DISK_NUMBER,HEADER_STATUS,substr(PATH,1,20),label
    from v$asm_disk;

    *Output:*

    17 CANDIDATE /dev/rhdisk10
    18 CANDIDATE /dev/rhdisk100
    19 CANDIDATE /dev/rhdisk101
    20 CANDIDATE /dev/rhdisk102
    21 CANDIDATE /dev/rhdisk103
    22 CANDIDATE /dev/rhdisk104
    23 CANDIDATE /dev/rhdisk105
    24 CANDIDATE /dev/rhdisk106
    ...
    112 MEMBER /dev/rhdisk147
    113 MEMBER /dev/rhdisk148
    114 MEMBER /dev/rhdisk149

    ReplyDelete
    Replies
    1. Hmm, interesting...

      To get a complete info about your setup, please connect to ASM with 'sqlplus / as sysasm' and run the following:
      spool /tmp/asm_gv.html
      set markup HTML on
      break on INST_ID on GROUP_NUMBER
      alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';
      select SYSDATE "Date and Time" from DUAL;
      select * from GV$ASM_OPERATION order by 1;
      select * from V$ASM_DISKGROUP order by 1, 2;
      select * from V$ASM_DISK order by 1, 2, 3;
      select * from V$ASM_ATTRIBUTE where NAME not like 'template%' order by 1;
      select * from V$VERSION where BANNER like '%Database%' order by 1;
      select * from V$ASM_CLIENT order by 1, 2;
      show parameter asm
      show parameter cluster
      show parameter instance
      show parameter spfile
      show sga
      spool off
      exit

      To determine if the problem is with disk headers, please do the following:
      kfed read /dev/rhdisk10 blkn=0 > /tmp/rhdisk10.kfed
      kfed read /dev/rhdisk10 blkn=1 >> /tmp/rhdisk10.kfed
      kfed read /dev/rhdisk100 blkn=0 > /tmp/rhdisk100.kfed
      kfed read /dev/rhdisk100 blkn=1 >> /tmp/rhdisk100.kfed
      kfed read /dev/rhdisk147 blkn=0 > /tmp/rhdisk147.kfed
      kfed read /dev/rhdisk147 blkn=1 >> /tmp/rhdisk147.kfed

      That will show me blocks 0 (disk header) and block 1 (free space table) for those 3 disks. If the problem is with disk header block only, this will be easy to fix.

      Now send me /tmp/asm_gv.html, /tmp/rhdisk10.kfed, /tmp/rhdisk100.kfed and /tmp/rhdisk147.kfed (email to bane dot radulovic at gmail dot com). Once we sort it out, we can post the solution here.

      Cheers,
      Bane

      Delete
  8. Hi Bane,

    Went thru your article and must tell you how good it is... I ran into to issues with Disk group corruption and had to consult the Oracle Support. They told us to use KFED but I never really got what they were upto, until today..

    Keep up.

    Regards
    Hardik
    http://handsonoracle.blogspot.in/

    ReplyDelete
  9. thank you for all the great info on asm and kfed. I have a question.

    I have installed 11gR2 Grid Infrastructure, and cleared one ASM disk group. Two other disk groups are still available from a 10gR2 installation of ASM. I used asmca to mount these other disk groups. All is well. Is this an okay path to using older disk groups?

    ASMCA allows me to edit the ASM compat and DB compat. kfed show the header block to be in good shape, as well as the backup header blocks.

    Thanks, Paul

    ReplyDelete
    Replies
    1. Thanks Paul,
      I am not sure how you 'cleared' the disk group. Maybe you meant to say that you created one new disk group and that you have two existing disk groups. If so, then yes - you are good to go.
      In any case, based on what you are saying, it all sounds fine.
      Cheers,
      Bane

      Delete
  10. Hi Bane,

    would it be possible to use kfed repair command against ASM disks used for 10g ASM. Let's they got corrupted while were in ASM 10g, but will kfed 11g help to recover metadata? Also is there any place for backup of disk header in 10g ASM disk, might be not full but at least something to compare with

    Thank you,
    Andrey

    ReplyDelete
    Replies
    1. Hi Andrey,

      No, new version of kfed will not help, as it will not find the disk header backup. There is no such backup in 10g :(

      We might be able to repair the damage if the problem is with the disk header only. Get me the same info as in the post above, dated March 20, 2013 at 8:00 AM - the result of those queries and two kfed dumps for one good disk and two kfed dumps for the bad disk:
      kfed read [path to good disk] blkn=0 > /tmp/good.kfed
      kfed read [path to good disk] blkn=1 >> /tmp/good.kfed
      kfed read [path to bad disk] blkn=0 > /tmp/bad.kfed
      kfed read [path to bad disk] blkn=1 >> /tmp/bad.kfed

      Email those to bane.radulovic at gmail.com and I will see what can be done.

      Cheers,
      Bane

      Delete
  11. Thank you Bane!
    unfortunately all disks were corrupted
    and based on metadata one of the wrong parameters was endian and it was set to 0 on Linux.

    It was changed to 1 and disk groups were mounted, but looks like corruption went beyond that
    and database open failed, had to use hidden parameters.

    Thanks very much for very useful information, I learnt a lot over weekend when was dealing with the issue.

    Cheers,
    Andrey

    ReplyDelete
  12. Hello Bane,

    I have a problem where storage is taken offline and restored and for some reason, the beginning of the disk has "moved" and I have garbage characters before the ASM header. Using /sbin/parted in linux (or dd which is a long way around), I can fix the offset and kfed repair/read works. When I use kfed read on a later block, I can read the partition but not repair it with the same blkn/ausz options. I would like a way to fix a disk with garbage data before the oracle datafile header without having to fix it at the OS level. Is that possible with kfed or some other oracle utility?

    ReplyDelete
    Replies
    1. No, you cannot use kfed to correct that. The parted is the right tool to fix this and that is all you need to do.

      There is no reason for concern. Your ASM disk probably did not have a partition table before the move. Now that the disk was restored, it was assumed that the disk has the partition table, and everything has 'moved'.

      All ASM disks should have the partition table, so all is well now.

      Cheers,
      Bane

      Delete
    2. Actually, the disk did have a partition table before and after storage was restored and the partition restored, the ASM disk/datafile appeared "moved" ahead (not overwritten by the partition table.) In one case, linux LVM had been used to create the partitions and after LVM was restored and turned back on, the ASM datafile/disk was ahead of the partition. Using parted for that wasn't possible, I think, because it the partition was under LVM control. Only way to fix it was dirty: Using od -c, determined the offset was 249 bytes (in that case) and dumped the disk and restored with: dd if=/dev/ skip=1 ibs=249b but it took hours. Hoped there was an easier way. Looks like I did it the only way I could... Thanks for your help in validating my approach!

      Delete
    3. Thanks for sharing this info.
      Cheers,
      Bane

      Delete
  13. Bane,

    How are you doing. I wanted to follow up on a comment I added on this post in 2012 which block 40 and 41 were corrupted. Now that 12c is out I tested the same type of corruption issue and 12c is handling the issue well. Have you had a chance to deep dive ASM 12c. I create a blog post on my test.

    ASM 12c External Redundancy Diskgroup Handles Corrupted Blocks Better http://db12c.blogspot.com/2013/07/asm-12c-external-redundancy-diskgroup.html

    ReplyDelete
    Replies
    1. Hi Javier,

      Thanks for sharing that post and demonstrating the new feature. Now that ASM version 12 is out, I am preparing some posts. The first one will be about new features and later on I will talk about each feature in more details.

      Cheers,
      Bane

      Delete
  14. Bane,
    The new ASM book is finally published (mostly covers 11gr2 ASM as it relates to Cloud Stoarge):
    http://www.amazon.com/Database-Cloud-Storage-Essential-Management/dp/0071790152/ref=sr_1_3?s=books&ie=UTF8&qid=1375416266&sr=1-3

    Thank you for all your contributions.

    ReplyDelete
    Replies
    1. Cool! Thanks for letting me know Nitin.
      Cheers,
      Bane

      Delete
  15. Recently I had issues with crs upgrade and was suggested to deconfig and run root.sh. As part of this disks of data diskgroup became former. How can we change back to member and mount diskgroup back. AU size is 4 MB.

    ReplyDelete