April 25, 2010

About ASM Allocation Units, Extents, Mirroring and Failgroups



ASM Allocation Units

An ASM allocation unit (AU) is the fundamental space unit within an ASM disk group. Every ASM disk is divided into allocation units.

When a disk group is created, the allocation unit size can be set with the  disk group attribute AU_SIZE (in ASM versions 11.1 and later). The AU size can be 1, 2, 4, 8, 16, 32 or 64 MB. If not explicitly set, the AU size defaults to 1 MB (4MB in Exadata).

AU size is a disk group attribute, so each disk group can have a different AU size.

ASM Extents

An ASM extent consists of one or more allocation units. An ASM file consists of one or more ASM extents.

We distinguish between physical and virtual extents. A virtual extent, or an extent set, consists of one physical extent in an external redundancy disk group, at least two physical extents in a normal redundancy disk group and at least three physical extents in a high redundancy disk group.

Before ASM version 11.1 we had uniform extent size. ASM version 11.1 introduced the variable sized extents that enable support for larger data files, reduce (ASM and database) SGA memory requirements for very large databases, and improve performance for file create and open operations. The initial extent size equals the disk group AU_SIZE and it increases by a factor of 4 or 16 at predefined thresholds. This feature is automatic for newly created and resized data files with disk group compatibility attributes COMPATIBLE.ASM and COMPATIBLE.RDBMS set to 11.1 or higher.

The extent size of a file varies as follows:

  • Extent size always equals the disk group AU_SIZE for the first 20,000 extent sets
  • Extent size equals 4*AU_SIZE for the next 20,000 extent sets
  • Extent size equals 16*AU_SIZE for the next 20,000 and higher extent sets

There is nasty bug 8898852 to do with this feature. See more on that in MOS Doc ID 965751.1.

ASM Mirroring

ASM mirroring protects data integrity by storing multiple copies of the same data on different disks. When a disk group is created, ASM administrator can specify the disk group redundancy as follows:

  • External – no ASM mirroring
  • Normal – 2-way mirroring
  • High – 3-way mirroring

ASM mirrors extents – it does not mirror disks or blocks. ASM file mirroring is the result of mirroring of the extents that constitute the file. In ASM we can specify the redundancy level per file. For example, one file in a normal redundancy disk group, can have its extents mirrored once (default behavior). Another file, in the same disk group, can be triple mirrored – provided there are at least three failgroups in the disk group.  In fact all ASM metadata files are triple mirrored in a normal redundancy disk group – provided there are at least three failgroups.

ASM Failgroups

ASM disks within a disk group are partitioned into failgroups (also referred to as failure groups or fail groups). The failgroups are defined at the time the disk group is created.  If we omit the failgroup specification, then ASM automatically places each disk into its own failgroup. The only exception is Exadata, where all disks from the same storage cell are automatically placed in the same failgroup.

Normal redundancy disk groups require at least two failgroups. High redundancy disk groups require at least three failgroups. Disk groups with external redundancy do not have failgroups.

When an extent is allocated for a mirrored file, ASM allocates a primary copy and a mirror copy. Primary copy is store on one disk and the mirror copy on some other disk in a different failgroup.

When adding disks to an ASM disk group for which failgroups are manually specified, it is imperative to add the disks to the correct failgroup.

19 comments:

  1. Bane Radulović

    Thanks for your nice post

    I am clear enough about relationship between asm allocation unit and asm extent after reading this post. but plz can you go further i need the know the relationship between traditional oracle datablock, extents and asm allocation units and asm extents

    ReplyDelete
    Replies
    1. Thanks Sadock,

      This question comes up every now and then, so I should write a separate post on that. Short answer to your question is as follows.

      To find out where a particular oracle data block is, you first use something like DBMS_ROWID.ROWID_BLOCK_NUMBER to locate the block within a data file. Say you do that and determine your data is in block 10 (of a particular file).

      An oracle datafile will have a structure like this:

      |File header||--||--||...||---||...
      B0 B1 B2 B10

      You then query X$KFFXP (in ASM) to find the allocation units for that file. How many data blocks fit in one allocation unit will depend on data block size and AU size. As our data is in block 10, it will always be in the first AU for that data file. Say the first AU for that file as AU 100 (of a particular disk).

      An ASM disk will have a structure like this:

      |Disk header||---||---||...||-----||...
      AU0 AU1 AU2 AU100
      / \
      / \
      / \
      / \
      |--||...||---||...
      B0 B10

      So our data block is 10 blocks into AU100.

      As I said, I will write a detailed post on this with an actual examples and queries.

      Cheers,
      Bane

      Delete
    2. Bane

      Once again thanks for your fast response.

      I wish i could be the first one to read the post you have to write on this frequently asked question. However the explanations you have given also very helpful, at large i have idea now. I am putting this blog in my favorite list

      Regards

      Sadock

      Delete
    3. No worries,
      This is yet another incentive to write it.
      Cheers,
      Bane

      Delete
    4. Check this out: http://asmsupportguy.blogspot.com.au/2012/10/where-is-my-data.html

      Delete
  2. As you said above " If we omit the failgroup specification, then ASM automatically places each disk into its own failgroup."
    Please can you explain this a little more. I want to know in case of Normal redundancy where 2 fail groups are required and when we add the disks without providing any failgroup name then in which failgroup disk will be added? How ASM will handle the other failgroup?

    Thanks.

    ReplyDelete
    Replies
    1. Let's say you have two SANs with disk PLAY01 provisioned from SAN1 and disk PLAY02 from SAN2. The correct way to create the disk group is like this:
      create diskgroup DATA normal redundancy
      failgroup F1 disk 'ORCL:PLAY01'
      failgroup F2 disk 'ORCL:PLAY02';

      If you now add PLAY03, provisioned from SAN1, like this:
      alter diskgroup DATA add disk 'ORCL:PLAY03';

      You now have 3 failgroups - F1, F2 and PLAY03.

      Let's say a primary copy of your data is on disk PLAY01. A copy of that data will be placed in some other failgroup, say PLAY03. You now have both copies of your data in SAN1, and if SAN1 goes down, your disk group goes down.

      So the correct way to add that disk is:
      alter diskgroup DATA add failgroup F1 disk 'ORCL:PLAY03';

      Now ASM cannot place all copies of your data in the same SAN, as it has to keep primary and mirror in different failgroups.

      Please let me know if this answers your question.

      Cheers,
      Bane

      Delete
  3. Hi Bane,

    Thanks, you answered my doubt exactly. :)
    I could not find this kind of clarification in any other sites. Thanks a lot.
    Now I have another questions regarding rebalancing while adding a disk or dropping a disk in a diskgroup. E.g.
    CREATE DISKGROUP data NORMAL REDUNDANCY
    FAILGROUP controller1 DISK
    '/devices/diska1' NAME diska1,
    '/devices/diska2' NAME diska2,
    '/devices/diska3' NAME diska3,
    '/devices/diska4' NAME diska4
    FAILGROUP controller2 DISK
    '/devices/diskb1' NAME diskb1,
    '/devices/diskb2' NAME diskb2,
    '/devices/diskb3' NAME diskb3,
    '/devices/diskb4' NAME diskb4
    ;
    Now if diskb1 is dropped from controller2 failgroup then as per my understanding, the contents of diskb1 will be copied over to diskb2,diskb3 and diskb4 during rebalancing. But now diska1 from controller1 will not be having the partnered disk and this is called as reduced redundancy. Means mirroring for diska1 contents are not available. Later when we will add diskb1 back in controller2 after repairing the issue then rebalancing will take place again and copy the data back to diskb1 and ASM will make it again partner for diska1 in controller1.
    Kindly confirm if my understanding is correct otherwise explain how it works internally.

    Another scenario is
    CREATE DISKGROUP data NORMAL REDUNDANCY
    FAILGROUP controller1 DISK
    '/devices/diska1' NAME diska1,
    '/devices/diska2' NAME diska2,
    '/devices/diska3' NAME diska3,
    '/devices/diska4' NAME diska4
    FAILGROUP controller2 DISK
    '/devices/diskb1' NAME diskb1,
    '/devices/diskb2' NAME diskb2,
    '/devices/diskb3' NAME diskb3,
    '/devices/diskb4' NAME diskb4
    ;
    now further we add another disk in controller2 failgroup say diskb5 then how oracle will do the rebalancing and how oracle will make it a partnered disk with controller1 disks as controller1 has 4 disks but controller2 has 5 disks.

    Kindly clarify. Thanks a ton.

    ReplyDelete
    Replies
    1. 1. "...if diskb1 is dropped from controller2 failgroup then the contents of diskb1 will be copied over to diskb2, diskb3 and diskb4 during rebalancing..."

      Correct.

      2. "...But now diska1 from controller1 will not be having the partnered disk and this is called reduced redundancy..."

      Each disk from failgroup controller1 parners with each disk in failgroup controller2. And each disk from controller2 partners with each disk in controller1. When you drop diskb1, the partnership changes so that disks from controller1 will have 3 partners instead of 4.

      Reduced redundancy is something else. If you drop (or lose) all disks from controller2, that will be be reduced redundancy as you are now left with a single copy of your data.

      You can end up with reduced redundancy after dropping (or losing) a single disk, but only if your disk group was full. In that case, there will be no room for the rebalance and some data will not be mirrored. That's why you should not fill your disk group to the brim and have REQUIRED_MIRROR_FREE_MB spare in your disk group.

      3. "...we add another disk in controller2 failgroup say diskb5 then how oracle will do the rebalancing and how oracle will make it a partnered disk with controller1 disks as controller1 has 4 disks but controller2 has 5 disks...

      As per the above, all disks in controller1 will have 5 partner disks.

      Have a look at http://asmsupportguy.blogspot.com.au/2011/07/how-many-partners.html for detailed discussion on this topic.

      Cheers,
      Bane

      Delete
  4. Hi, Bane,

    I have a question, I have RAC DB using ASM, I have created multiple data files with smaller size, say 16G (max is 32G) for one tablespace, will this setting give better performance? at least, it has less extents.

    Thanks,

    Hank

    ReplyDelete
    Replies
    1. Hi Hank,

      No, the file size doesn't really have much to do do with the performance. I assume you are worried about I/O performance.

      The (excessive) number of extents, does slow down the file open operations (bigger extent map to be loaded into ASM SGA and sent over to database SGA), but this is insignificant compared to day to day I/O performance (I guess that is what you are really concerned with). 'Excessive' here is over 20000 extents (20 GB file with 1 MB AU size, or 80 GB file with 4 MB AU size, etc). That is why we introduced the variable extent size feature.

      Also larger extent maps need more (SGA) memory, but again, this has nothing to do with I/O performance.

      Cheers,
      Bane

      Delete
  5. Hi Bane,
    Nice article on disk partnership. But at the same time I am surprised how ASM mirroring and striping works.
    As per Oracle documentation "When Oracle ASM allocates an extent for a mirrored file, Oracle ASM allocates a primary copy and a mirror copy. Oracle ASM chooses the disk on which to store the mirror copy in a different failure group than the primary copy".
    Here Oracle clearly said that it stores the mirror copy in different failure group.
    But in your article you mentioned that "In a normal redundancy disk group with 3 disks - and no manually specified failgroups - every disk would have two partners".
    It means if we have no fail groups then also mirroring is happening as usual among disk0, disk1 and disk2 in your article.
    These two statements are bit conflicting, please can you explain about this?

    Also in the same situation ("In a normal redundancy disk group with 3 disks - and no manually specified failgroups"), how striping will work?

    Earlier I used to think that Oracle does the mirroring on different fail group and striping on same fail group (among the disks in same fail group).
    E.g.
    CREATE DISKGROUP data NORMAL REDUNDANCY
    FAILGROUP controller1 DISK
    '/devices/diska1' NAME diska1,
    '/devices/diska2' NAME diska2,
    '/devices/diska3' NAME diska3,
    '/devices/diska4' NAME diska4
    FAILGROUP controller2 DISK
    '/devices/diskb1' NAME diskb1,
    '/devices/diskb2' NAME diskb2,
    '/devices/diskb3' NAME diskb3,
    '/devices/diskb4' NAME diskb4
    ;

    In above diskgroup, the contents of diska1 will be striped across diska2,diska3 and diska4 and mirrored with diskb1, diskb2,diskb3 and diskb4 (as you tought me that there are upto 8 partners). Kindly clarify if my understanding is correct otherwise please explain how mirroring and striping will work here.

    Thanks a ton for helping people.

    ReplyDelete
    Replies
    1. If you don't specify failgroups when creating a redundant disk group, each disk becomes a failgroup. That is why a normal redundancy disk group with 3 disks (and no explicitly named failgroups) will have 3 failgroups. So each disk will have two partners.

      The file striping is not per failgroup - it is per disk group. Each file will be striped across all disks in a failgroup. Let's say your AU size is 1MB and you create an 8MB file. That file will have 8 primary extents and 8 extent mirrors. Primary extent 1 will be placed on diska1, copy of extent 1 will be placed on diskb1, extent 2 on diska2, copy of extent 2 on disk b2, extent 3 on diska3, copy of extent 3 on diskb3, extent 4 on diska4, copy of extent 4 on disk b4, extent 5 on diskb1, copy of extent 5 on diska1, extent 6 on diskb2, copy of extent 6 on diska2, extent 7 on diskb3, copy of extent 7 on diska3, extent 8 on diskb4 and finally copy of extent 8 on diska4.

      You see that the file is spread (striped) across all disks, and that extent copies are always in a different failgroup. If file was larger, we would keep doing the same with the rest of extents - extent 9 on diska1, copy of extent 9 on diskb1, etc...

      Please let me know if this makes sense.

      Cheers,
      Bane

      Delete
  6. Hi Bane,
    Fantastic, very nicely explained. Now concept of striping is cleared.
    Also I learnt that ASM can write the primary extent and secondary extents in any failgroup so there is no fixed failgroup responsible to store all the primary extents? Please correct me.

    But as oracle said we can have more fail groups than required so I added 4 fail groups for normal redundancy.
    Oracle says in normal redundancy ASM will have only one mirror copy (apart from one primary copy).
    So how ASM will do the mirroring and striping if we have additional fail groups like controller3 and controller4 in below example.
    Kindly explain this scenario also.

    CREATE DISKGROUP data NORMAL REDUNDANCY
    FAILGROUP controller1 DISK
    '/devices/diska1' NAME diska1,
    '/devices/diska2' NAME diska2,
    '/devices/diska3' NAME diska3,
    '/devices/diska4' NAME diska4
    FAILGROUP controller2 DISK
    '/devices/diskb1' NAME diskb1,
    '/devices/diskb2' NAME diskb2,
    '/devices/diskb3' NAME diskb3,
    '/devices/diskb4' NAME diskb4
    FAILGROUP controller3 DISK
    '/devices/diska1' NAME diskc1,
    '/devices/diska2' NAME diskc2,
    '/devices/diska3' NAME diskc3,
    '/devices/diska4' NAME diskc4
    FAILGROUP controller4 DISK
    '/devices/diskb1' NAME diskd1,
    '/devices/diskb2' NAME diskd2,
    '/devices/diskb3' NAME diskd3,
    '/devices/diskb4' NAME diskd4
    ;

    ReplyDelete
  7. 1. "...ASM can write the primary extent and secondary extents in any failgroup...".
    Correct, although I would say in any disk.

    2. How ASM will do the mirroring and striping if we have additional fail groups?
    The number of failgroups is immaterial to striping and mirroring.

    All files will be striped across all disks.

    As for the mirroring, a primary extent in say diska1, will have a copy in one of the partner disks, and the partner disk will be in some other failgroup.

    Now that we have 16 disks, disk1a will have up to 8 partners. Same with all other disks - each disk will have up to 8 partners and the extent from a disk will have a copy on one (and only one) partner disk.

    ReplyDelete
    Replies
    1. Thanks Bane.
      Almost all doubts are clear now, will ping you again if get stuck anywhere in ASM.

      Thanks a ton.

      Delete
  8. Hi Bane,

    I am Shaik Mujeeb from Hyderabad,India.

    You have explained in very nice and understandable way.

    I have One Query.

    I am not clear on below:
    ------------------------------------------------

    Let's say your AU size is 1MB and you create an 8MB file. That file will have 8 primary extents and 8 extent mirrors.

    Primary
    extent 1 on diska1, copy of extent 1 on diskb1,
    extent 2 on diska2, copy of extent 2 on diskb2,
    extent 3 on diska3, copy of extent 3 on diskb3,
    extent 4 on diska4, copy of extent 4 on diskb4,

    extent 5 on diskb1, copy of extent 5 on diska1,
    extent 6 on diskb2, copy of extent 6 on diska2,
    extent 7 on diskb3, copy of extent 7 on diska3,
    extent 8 on diskb4 copy of extent 8 on diska4.
    -----------------------------------------------

    Untill now I thought that


    Primary

    extent 1 on diska1, copy of extent 1 on diskb1,
    extent 2 on diska2, copy of extent 2 on diskb2,
    extent 3 on diska3, copy of extent 3 on diskb3,
    extent 4 on diska4, copy of extent 4 on diskb4,

    extent 5 on diska1, copy of extent 5 on diskb1,
    extent 6 on diska2, copy of extent 6 on diskb2,
    extent 7 on diska3, copy of extent 7 on diskb3,
    extent 8 on diska4 copy of extent 8 on diskb4.

    and Thought that whatever data will come It will come to 1st extent and then 2nd extent, 3rd extent 4th extent like that..

    for next extent means extent 5, ASM will again write the data from diska1 to diska4 not diskb1 to diskb4 of failure group.

    and also can you clarify whenver a user request for the data from where it will pic the data?

    Is it from PRIMARY DISK Group or Failure DISK Group?

    Can you please clarify on this.

    ReplyDelete
  9. Hi,

    Excellent insight about ASM internal. However i have one query. How the AU size impacts the performance. And in which scenarios we should use high AU.

    ReplyDelete