The views expressed on this blog are my own and do not necessarily reflect the views of Oracle

April 26, 2010

kfed - ASM metadata editor


The kfed is an undocumented ASM utility that can be used to read and modify ASM metadata blocks. It is a standalone utility, independent of ASM instance, so it can be used with either mounted or dismounted disk groups. The most powerful kfed feature is its ability to fix corrupt ASM metadata.

The kfed binary is present in the recent ASM versions, but if you don't see it in your $ORACLE_HOME/bin directory (e.g. it may not be present in version 10.1), it can be built as follows:

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins* ikfed

kfed read

With the kfed read command we can read a single ASM metadata block. The syntax is:

$ kfed read [aun=ii aus=jj blkn=kk dev=]asm_disk_name

Where the command line parameters are
  • aun - Allocation Unit (AU) number to read from. Default is AU0, or the very beginning of the ASM disk.
  • aus - AU size. Default is 1048576 (1MB). Specify the aus when reading from a disk group with non-default AU size.
  • blkn - block number to read. Default is block 0, or the very first block of the AU.
  • dev - ASM disk or device name. Note that the keyword dev can be omitted, but the ASM disk name is mandatory.
Use kfed to read ASM disk header block

The following is an example of using the kfed utility to read the ASM disk header from ASM disk /dev/sda1.

$ kfed read /dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3102721733 ; 0x00c: 0xb8efc6c5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
...
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.dsksize:                   12284 ; 0x0c4: 0x00002ffc
...

Note that the above kfed command is equivalent to this one (with all parameters explicitly set to their default values):

$ kfed read aun=0 aus=1048576 blkn=0 dev=/dev/sda1

We see that the above kfed output is nicely formatted and human readable (sort of). The fields are grouped based on the actual content of the ASM metadata block.

In this example, the kfbh fields show the block header data, and the most important one is kfbh.type, which says KFBTYP_DISKHEAD, meaning the ASM disk header. This is the expected block type for an ASM disk header.

We then see the actual content of the ASM disk header metadata block - the kfdhdb fields. Some of those are the disk number (kfdhdb.dsknum), 0 in this case, the group redundancy type (kfdhdb.grptyp), normal redundancy in this case, the disk header status (kfdhdb.hdrsts), member in this case, the disk name (kfdhdb.dskname) - DATA_0000, etc.

Please see ASM disk header for the complete explanation of kfdhdb fields.

Use kfed to read any ASM metadata block

The next example shows how to read an ASM File Directory block. To do that we would use the following kfed command:

$ kfed read aun=10 blkn=1 dev=/dev/sda1 | more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
...

Note that I had to specify AU10 and block 1 to read a File Directory block. Have a look at the ASM File Directory post to learn how to locate a File Directory block.

Is my ASM metadata block corrupt

If you see kfbh.type=KFBTYP_INVALID, in the disk header on a disk you believe belongs to an ASM disk group, that indicates that the ASM disk header is corrupt. But don't jump to conclusions! Are you looking at the right disk? Is this the right disk partition? Can you access that disk via some other name - in a multipath setup? If you are not sure, or if the disk header is in fact damaged, contact Oracle Support for assistance.

Note that this applies to any ASM metadata block. If ASM expects to find a metadata block and instead finds a block that is zeroed out or contains rubbish, it will report the block as KFBTYP_INVALID, and an error (usually ORA-15196) will be reported in the ASM and/or database alert log (depends on which instance discovers the problem).

kfed write

With the kfed write command we can write to a single ASM metadata block. The syntax is:

$ kfed write [aun=ii aus=jj blkn=kk dev=]asm_disk_name text=new_contents chksum=yes

Where the new command line parameters are
  • text - a text file with the new block contents
  • checksum=yes - calculate and write the correct checksum. Note that the checksum in the text file with the new content does not have to be correct.
Use kfed to write the correct checksum to ASM metadata block

An ASM metadata may look fine, but in fact be corrupt. For example the block checksum (kfbh.check) could be wrong, in which case that would need to be corrected. Indeed, if the only problem is an incorrect checksum, that can be easily corrected by simply reading the block and then writing it back! The kfed will calculate the new checksum and write the block back with the correct checksum.

Here are the complete steps to correct the bad checksum for block 2 in AU0 on disk /dev/sda1:

$ kfed read aun=0 blkn=2 dev=/dev/sda1 > /tmp/aun0_blkn2_sda1.kfed
$ kfed write aun=0 blkn=2 dev=/dev/sda1 text=/tmp/aun0_blkn2_sda1.kfed chksum=yes

NOTE: Please seek Oracle Support assistance with any suspected ASM metadata block corruption.

kfed find

The kfed find will examine all blocks in an allocation unit and report back on the block types found. The syntax is:

$ kfed find [aun=ii aus=jj dev=]asm_disk_name

We see that the find command parameters are the same as for the read command, but the difference is that the find operates on all blocks in an allocation unit.

Use kfed find command to verify blocks in AU0

This is an example of using the kfed find to verify that all blocks in AU0 have the expected ASM metadata.

$ kfed find /dev/sda1

The expected result is type 1 for block 0, type 2 for block 1 and type 3 for all other blocks, i.e.:

$ kfed find /dev/sda1
Block 0 has type 1
Block 1 has type 2
Block 2 has type 3
Block 3 has type 3
Block 4 has type 3
...
Block 255 has type 3


If you see anything else in the output, that indicates a corrupted ASM metadata block. In that case please seek assistance from Oracle Support.

Note that my allocation unit size is 1MB, so there are only 255 blocks in the AU. If your allocation unit size is 4MB, the same command should return block type information for 1024 blocks.

I should also point out that with the above find command we only looked at the expected ASM metadata block types. We did not look at the actual metadata block contents. Some ASM metadata block corruptions are indeed with the block contents, i.e the block type is correct, but the contents is wrong. Such corruptions are only detected when ASM reads the corrupt block, in which case an ORA-15196 error will be reported. Please seek assistance from Oracle Support if you are unfortunate enough to encounter that error.

Conclusion

The kfed if an unassuming but very powerful utility. While I have shown only few commands, the kfed can also format an empty ASM file, perform a sanity check on an ASM metadata block, display data structure sizes and perform few other more obscure operations.

ASM metadata


An ASM instance manages metadata needed to make ASM files available to Oracle databases and ASM clients. ASM metadata is stored in disk groups – in metadata blocks.

Some ASM metadata is at the fixed position in every ASM disk, and is referred to as physically addressed metadata. Other ASM metadata is organised in files (directories) and is referred to as virtually addressed metadata. The virtually addressed metadata is managed like all other ASM files – they get mirrored as per the file type redundancy policy, are subject to rebalance and can grow as needed.

Each ASM disk has ASM metadata, with some of this metadata relevant to that disk only and some relevant to the whole disk group. For example, the ASM disk header is relevant to that disk only, while  the Partnership and Status Table (PST) is relevant to the whole disk group.

Physically addressed metadata

The physical ASM metadata are the following structures:
Allocation units 0 on every ASM disk will always have the disk header (block 0), the Free Space Table (block 1) and the Allocation Table - in the rest of the allocation unit 0 blocks.

Allocation unit 1 (AU1) on every ASM disk is always reserved for the Partnership and Status Table. While AU1 on every disk will be reserved - only some disks will have the actual PST data.

Virtually addressed metadata

The virtually addressed metadata are the following structures:

ASM metadata lives in ASM disk groups

ASM metadata is stored in disk groups - in other words if there are no disk groups there is no ASM metadata.  This sounds obvious, but the point is that ASM does not store anything outside of its disk groups.

Each ASM disk has ASM metadata. Some of this metadata is relevant to that disk only and some is relevant to the whole disk group. For example, the ASM disk header is relevant to that disk only, but the partnership and status table (PST) is relevant to the whole disk group.

Some metadata will be on every disk - e.g. a disk header and an allocation table. Other metadata will be on a subset of disks - e.g. allocation unit 1 on every ASM disk will be reserved for the PST, but only a subset of disks will actually have the PST data.

Some metadata will not be present at all - e.g. in a 10.2 disk group there will be no staleness directory, as that feature is only relevant to 11.1 and later version.  And even in 11.1 - an external redundancy disk group will not have the staleness directory as that feature is relevant to a normal and high redundancy disk groups only.

ASM metadata blocks

ASM metadata is organized in ASM metadata blocks. For a complete discussion on this topic please see the ASM metadata blocks post.

ASM metadata structures consist of one or more ASM metadata blocks - where the block type will match the ASM metadata type.  For example an ASM disk header will consist of exactly one metadata block of type KFBTYP_DISKHEAD; an allocation table will consist of a number of metadata blocks, all of type KFBTYP_ALLOCTBL, etc.

April 25, 2010

About ASM Allocation Units, Extents, Mirroring and Failgroups



ASM Allocation Units

An ASM allocation unit (AU) is the fundamental space unit within an ASM disk group. Every ASM disk is divided into allocation units.

When a disk group is created, the allocation unit size can be set with the  disk group attribute AU_SIZE (in ASM versions 11.1 and later). The AU size can be 1, 2, 4, 8, 16, 32 or 64 MB. If not explicitly set, the AU size defaults to 1 MB (4MB in Exadata).

AU size is a disk group attribute, so each disk group can have a different AU size.

ASM Extents

An ASM extent consists of one or more allocation units. An ASM file consists of one or more ASM extents.

We distinguish between physical and virtual extents. A virtual extent, or an extent set, consists of one physical extent in an external redundancy disk group, at least two physical extents in a normal redundancy disk group and at least three physical extents in a high redundancy disk group.

Before ASM version 11.1 we had uniform extent size. ASM version 11.1 introduced the variable sized extents that enable support for larger data files, reduce (ASM and database) SGA memory requirements for very large databases, and improve performance for file create and open operations. The initial extent size equals the disk group AU_SIZE and it increases by a factor of 4 or 16 at predefined thresholds. This feature is automatic for newly created and resized data files with disk group compatibility attributes COMPATIBLE.ASM and COMPATIBLE.RDBMS set to 11.1 or higher.

The extent size of a file varies as follows:

  • Extent size always equals the disk group AU_SIZE for the first 20,000 extent sets
  • Extent size equals 4*AU_SIZE for the next 20,000 extent sets
  • Extent size equals 16*AU_SIZE for the next 20,000 and higher extent sets

There is nasty bug 8898852 to do with this feature. See more on that in MOS Doc ID 965751.1.

ASM Mirroring

ASM mirroring protects data integrity by storing multiple copies of the same data on different disks. When a disk group is created, ASM administrator can specify the disk group redundancy as follows:

  • External – no ASM mirroring
  • Normal – 2-way mirroring
  • High – 3-way mirroring

ASM mirrors extents – it does not mirror disks or blocks. ASM file mirroring is the result of mirroring of the extents that constitute the file. In ASM we can specify the redundancy level per file. For example, one file in a normal redundancy disk group, can have its extents mirrored once (default behavior). Another file, in the same disk group, can be triple mirrored – provided there are at least three failgroups in the disk group.  In fact all ASM metadata files are triple mirrored in a normal redundancy disk group – provided there are at least three failgroups.

ASM Failgroups

ASM disks within a disk group are partitioned into failgroups (also referred to as failure groups or fail groups). The failgroups are defined at the time the disk group is created.  If we omit the failgroup specification, then ASM automatically places each disk into its own failgroup. The only exception is Exadata, where all disks from the same storage cell are automatically placed in the same failgroup.

Normal redundancy disk groups require at least two failgroups. High redundancy disk groups require at least three failgroups. Disk groups with external redundancy do not have failgroups.

When an extent is allocated for a mirrored file, ASM allocates a primary copy and a mirror copy. Primary copy is store on one disk and the mirror copy on some other disk in a different failgroup.

When adding disks to an ASM disk group for which failgroups are manually specified, it is imperative to add the disks to the correct failgroup.

About ASM disk groups, disks and files


Oracle ASM uses disk groups to store data files. An ASM disk group is a collection of disks managed as a unit. Within a disk group, ASM exposes a file system interface for Oracle database files. The content of files that are stored in a disk group is evenly distributed to eliminate hot spots and to provide uniform performance across the disks. The performance is comparable to the performance of raw devices. [From Oracle® Automatic Storage Management Administrator's Guide 11g Release 2].


ASM Disk Groups


An ASM disk group consists of one or more disks and is the fundamental object that ASM manages. Each disk group is self contained and has its own ASM metadata. It is that ASM metadata that an ASM instance manages.


The idea with ASM is to have small number of disk groups. In ASM versions before 11.2, two disk groups should be sufficient – one for datafiles and one for backups/archive logs. In 11.2 you would want to create a separate disk group for ASM spfile, Oracle Cluster Registry (OCR) and voting disks – provided you opt to place those objects in an ASM disk group.


ASM Disks


Disks to be used by ASM have to be set up and provisioned by OS/storage administrator before ASM installation/setup. Disks can be local physical devices (IDE, SATA, SCSI, etc), SAN based LUNs (iSCSI, FC, FCoE, etc) or NAS/NFS based disks. Disks to be used for ASM should be partitioned. Even if the whole disk is to be used by ASM, it should have a single partition.


The above is true for all environments except for Exadata – where ASM makes use of grid disks, created from cell disks and presented to ASM via LIBCELL interface.


An ASM disk group can have up to 10,000 disks. Maximum size for an individual ASM disk is 2 TB. Due to bug 6453944, it is possible to add disks over 2 TB to an ASM disk group. The fix for bug 6453944 is in 10.2.0.4, 11.1.0.7 and 11.2. MOS Doc ID 736891.1 has more on that.

ASM looks for disks in the OS location specified by ASM_DISKSTRING initialization parameter. All platforms have the default value, so this parameter does not have to be specified. In a cluster, ASM disks can have different OS names on different nodes. In fact, ASM does not care about the OS disk names, as those are not kept in ASM metadata.


ASM Files


Any ASM file is allocated from and completely contained within a single disk group. However, a disk group might contain files belonging to several databases and a single database can have files in multiple disk groups.

ASM can store all Oracle database file types – datafiles, control files, redo logs, backup sets, data pump files, etc – but not binaries or text files. In addition to that, ASM also stores its metadata files within the disk group. ASM has its own file numbering scheme – independent of database file numbering.  ASM file numbers under 256 are reserved for ASM metadata files.


ASM Cluster File System (ACFS), introduced in 11.2, extends ASM support to database and application binaries, trace and log files, and in fact any files that can be stored on a traditional file systems. And most importantly, the ACFS is a cluster file system.

April 18, 2010

Set up ASM with a single ASMCA command


ASM Configuration Assistant (ASMCA) was introduced in ASM version 11.2. It is used to configure ASM instances, and to create and manage disk groups, volumes and ASM cluster file systems (ACFS). ASMCA can be used in GUI or command-line mode.

In this post I will show how to use the ASMCA - in a non-cluster environment - to create and start an ASM  instance, create a disk group and start related/required services. I will use the ASMCA in a command line mode with the silent option.

Perform the Grid Infrastructure software only installation

You may want to do this with ASM job role separation option, in which case you should perform all steps as OS user grid. Otherwise, perform all steps as OS user oracle.

Set up the disks to be used by ASM

I used ASMLIB to create 4 disks for ASM, and I can see them as follows:

$ oracleasm listdisks
DISK1
DISK2
DISK3
DISK4

Configure ASM

Run the following command:

$ asmca -silent -configureASM -sysAsmPassword s3kr3t1 -asmsnmpPassword s3kr3t2 -diskString 'ORCL:*' -diskGroupName DATA -disk 'ORCL:*' -redundancy EXTERNAL

As I used ASMLIB disks, I specified 'ORCL:*' for ASM discovery string. Make sure you specify the correct value for your environment.

On a successful run, the above command should have returned:

ASM created and started successfully.
DiskGroup DATA created successfully.

And it should have performed the following:
  • Start the cluster synchronisation services daemon – ocssd.bin
  • Start three agents – cssdagent, oraagent.bin and orarootagent.bin
  • Start the disk monitor – diskmon.bin
  • Create and start ASM instance +ASM
  • Create the external redundancy disk group DATA
  • Create ASM spfile in disk group DATA
I have also published this in MOS Doc ID 1068788.1.

April 17, 2010

ORA-4031 and ORA-600 [723] in ASM alert log on rebalance operation

Had an interesting support call the other day. Customer reported ORA-4031 and ORA-600 [723] in ASM alert log on rebalance operation. Not a big deal, right?

Closer look at ORA-600 [723] trace file showed memory leak with "KFNS Locked AU " chunks. ASM alert log also had ORA-600 [kfnsMasterWait01] and ORA-600 [kfgscDelete00] errors. ASM version was 11.1.0.7.1 i.e. 11.1 with PSU 1.

Search for known issues lead me to couple of very bad bugs (9217933 and 9250812), already alerted on Metalink/MOS:
Note:985150.1 - ORA-600 [KFNSMASTERWAIT01] occurs after applying PSU 11.1.0.7.1.
Note:985183.1 - New patch required after installing PSU 11.1.0.7.1 to ASM environments.
Note:9209238.8 - Bug 9209238 - 11.1.0.7.2 Patch Set Update (PSU 2).

In short, if you are on 11.1.0.7.1. you better apply 11.1.0.7.2 (11.1 PSU2).

Oracle ASM Job Role Separation Option with SYSASM


The SYSASM privilege (introduced in 11.1) is fully separated from the SYSDBA privilege in 11.2. If you choose to use this optional feature, and designate different operating system groups as the OSASM and the OSDBA groups, then the SYSASM administrative privilege is available only to members of the OSASM group. [From Oracle® Grid Infrastructure Installation Guide 11g Release 2].

To set up ASM admin and DBA job role separation, you need at least two OS users – one for the database, typically called oracle, and another one for the Grid Infrastructure, typically called grid.

The database OS user has to be in the software install group (oinstall), OSDBA group (dba) and OSDBA for ASM group (asmdba). [OSDBA group is designated at the installation time, which makes it SS_DBA_GRP in $ORACLE_HOME/rdbms/lib/config.c].

In my case that OS user is called oracle and the OSDBA group is called dba:

$ id oracle
uid=502(oracle) gid=500(oinstall) groups=500(oinstall),502(dba),506(asmdba)

$ grep "define SS_DBA_GRP" $ORACLE_HOME/rdbms/lib/config.c
#define SS_DBA_GRP "dba"

The Grid Infrastructure OS user has to be in the software install group (oinstall), OSASM group (asmadmin) and OSDBA for ASM group (asmdba). [OSASM and OSDBA for ASM groups are designated at the Grid Infrastructure installation time, which makes them SS_ASM_GRP and SS_DBA_GRP in $GRID_HOME/rdbms/lib/config.c].

In my case the OS user is called grid, the OSASM group is called asmadmin and the OSDBA for ASM group is called asmdba:

$ id grid
uid=1100(grid) gid=500(oinstall) groups=500(oinstall),502(dba),506(asmdba),1000(asmadmin),1301(asmoper)

$ egrep "define SS_DBA_GRP|define SS_ASM_GRP" $ORACLE_HOME/rdbms/lib/config.c
#define SS_DBA_GRP "asmdba"
#define SS_ASM_GRP "asmadmin"

To administer ASM the OS user grid should connect to ASM instance as SYSASM, as follows:

$ sqlplus / as sysasm

Given my OS user names and groups, the ownership of ASM disks has to be grid:asmadmin. In my Linux environment, with ASMLIB, my disk ownership is as follows:

$ ls -l /dev/oracleasm/disks/
total 0
brw-rw---- 1 grid asmadmin 8, 5 Mar 1 15:05 DISK1
brw-rw---- 1 grid asmadmin 8, 6 Mar 1 15:05 DISK2
brw-rw---- 1 grid asmadmin 8, 7 Mar 1 15:05 DISK3
...

The ownership is correct, as I specified the correct user and group at the time ASMLIB was installed. That can be verified as follows (note that this is ASMLIB specific):

$ egrep "^ORACLEASM_UID|^ORACLEASM_GID" /etc/sysconfig/oracleasm
ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin

Finally, and this is very important, the correct ownership of the oracle binary – in my database home – has to be oracle:asmadmin:

$ ls -l $ORACLE_HOME/bin/oracle
-r-xr-s--x 1 oracle asmadmin 173515991 Apr 8 12:10 /u01/app/oracle/product/11.2.0/dbhome_2/bin/oracle

With all this in place we have the correct set up for Oracle ASM job role separation feature.

In version 11.2 ASM runs from the Grid Infrastructure home

Starting with version 11.2, ASM runs from the Grid Infrastructure home in both single instance and cluster installs. This is very different to earlier versions so I would encourage you to invest some time in understanding the differences. Here is a quick overview of some of the more interesting features.

Integration of Oracle Clusterware (Cluster Ready Services) and ASM

Oracle Cluster Registry (OCR), voting disks and ASM spfile can now be stored in an ASM disk group. ASM instances, disk groups and other ASM related objects are now resources managed by Clusterware and stored in OCR.

ASM job role and user separation option

I have a separate post on that – please see ASM job role and user separation option.

ORACLE_BASE

ORACLE_BASE directory cannot be shared in cluster installs, but it can be shared in single instance installations.

No localconfig in single instance installs

Although localconfig utility is mentioned in 11.2 documentation, in the context of Cluster Synchronization Service (CSS) and host name change, there is no localconfig in 11.2. Instead we now have roothas.pl. For more info on this please see MOS Doc ID 986740.1.

ASM can be set up during the install or later with ASMCA

ASM Configuration Assistant (ASMCA), a new tool in 11.2, is used to create and configure ASM instances, disk groups, volumes and ASM Cluster File Systems (ACFS). ASMCA can be used in GUI and command line modes. In addition to that, a silent mode can be used to automate ASM setup.

About ASM Support Guy blog


This blog is (mostly) about Oracle Automatic Storage Management (ASM) and my passion for it. Most of the posts are about ASM features and functionality, in the form of a bite sized lessons. Some topics are simple while some are about advanced subjects like ASM metadata and ASM functionality, that may not be obvious or not even documented.

While I am still around, this blog is not active any more, as I have moved on from the ASM.

Cheers,
Bane