November 27, 2011

Rebalancing act

ASM ensures that file extents are evenly distributed across all disks in a disk group. This is true for the initial file creation and for file resize operations. That means we should always have a balanced space distribution across all disks in a disk group.

Rebalance operation

Disk group rebalance is triggered automatically on ADD, DROP and RESIZE disk operations and on moving a file between hot and cold regions. Running rebalance by explicitly issuing ALTER DISKGROUP ... REBALANCE is called a manual rebalance. We might want to do that to change the rebalance power for example. We can also run the rebalance manually if a disk group becomes unbalanced for any reason.

The POWER clause of the ALTER DISKGROUP ... REBALANCE statement specifies the degree of parallelism of the rebalance operation. It can be set to a minimum value of 0 which halts the current rebalance until the statement is either implicitly or explicitly re-run. A higher values may reduce the total time it takes to complete the rebalance operation.

The ALTER DISKGROUP ... REBALANCE command by default returns immediately so that we can run other commands while the rebalance operation takes place in the background. To check the progress of the rebalance operations we can query V$ASM_OPERATION view.

Three phase power

The rebalance operation has three distinct phases. First, ASM has to come up with the rebalance plan. That will depend on the rebalance reason, disk group size, number of files in the disk group, whether or not partnership has to modified, etc. In any case this shouldn't take more than a couple of minutes.

The second phase is the moving or relocating the extents among the disks in the disk group. This is where the bulk of the time will be spent. As this phase is progressing, ASM will keep track of the number of extents moved, and the actual I/O performance. Based on that it will be calculating the estimated time to completion (GV$ASM_OPERATION.EST_MINUTES). Keep in mind that this is an estimate and that the actual time may change depending on the overall (mostly disk related) load. If the reason for the rebalance was a failed disk(s) in a redundant disk group, at the end of this phase the data mirroring is fully re-established.

The third phase is disk(s) compacting (ASM version 11.1.0.7 and later). The idea of the compacting phase is to move the data as close to the outer tracks of the disks as possible. Note that at this stage or the rebalance, the EST_MINUTES will keep showing 0. This is a 'feature' that will hopefully be addressed in the future. The time to complete this phase will again depend on the number of disks, reason for rebalance, etc. Overall time should be a fraction of the second phase.

Some notes about rebalance operations
  • Rebalance is per file operation.
  • An ongoing rebalance is restarted if the storage configuration changes either when we alter the configuration, or if the configuration changes due to a failure or an outage. If the new rebalance fails because of a user error a manual rebalance may be required.
  • There can be one rebalance operation per disk group per ASM instance in a cluster.
  • Rebalancing continues across a failure of the ASM instance performing the rebalance.
  • The REBALANCE clause (with its associated POWER and WAIT/NOWAIT keywords) can also be used in ALTER DISKGROUP commands for ADD, DROP or RESIZE disks.
Tuning rebalance operations

If the POWER clause is not specified in an ALTER DISKGROUP statement, or when rebalance is implicitly run by ADD/DROP/RESIZE disk, then the rebalance power defaults to the value of the ASM_POWER_LIMIT initialization parameter. We can adjust the value of this parameter dynamically. Higher power limit should result in a shorter time to complete the rebalance, but this is by no means linear and it will depends on the (storage system) load, available throughput and underlying disk response times.

The power can be changed for a rebalance that is in progress. We just need to issue another ALTER DISKGROUP ... REBALANCE command with different value for POWER. This interrupts the current rebalance and restarts it with modified POWER.

Relevant initialization parameters and disk group attributes

ASM_POWER_LIMIT

The ASM_POWER_LIMIT initialization parameter specifies the default power for disk rebalancing in a disk group. The range of values is 0 to 11 in versions prior to 11.2.0.2. Since version 11.2.0.2 the range of values is 0 to 1024, but that still depends on the disk group compatibility (see the notes below). The default value is 1. A value of 0 disables rebalancing.
  • For disk groups with COMPATIBLE.ASM set to 11.2.0.2 or greater, the operational range of values is 0 to 1024 for the rebalance power.
  • For disk groups that have COMPATIBLE.ASM set to less than 11.2.0.2, the operational range of values is 0 to 11 inclusive.
  • Specifying 0 for the POWER in the ALTER DISKGROUP REBALANCE command will stop the current rebalance operation (unless you hit bug 7257618).
_DISABLE_REBALANCE_COMPACT

Setting initialization parameter _DISABLE_REBALANCE_COMPACT=TRUE will disable the compacting phase of the disk group rebalance - for all disk groups.

_REBALANCE_COMPACT

This is a hidden disk group attribute. Setting _REBALANCE_COMPACT=FALSE will disable the compacting phase of the disk group rebalance - for that disk group only.

_ASM_IMBALANCE_TOLERANCE

This initialization parameter controls the percentage of imbalance between disks. Default value is 3%.

Processes

The following table has a brief summary of the background processes involved in the rebalance operation.


Process Description
ARBn ASM Rebalance Process. Rebalances data extents within an ASM disk group. Possible processes are ARB0-ARB9 and ARBA.
RBAL ASM Rebalance Master Process. Coordinates rebalance activity. In an ASM instance, it coordinates rebalance activity for disk groups. In a database instances, it manages ASM disk groups.
Xnnn Exadata only - ASM Disk Expel Slave Process. Performs ASM post-rebalance activities. This process expels dropped disks at the end of an ASM rebalance.

When a rebalance operation is in progress, the ARBn processes will generate trace files in the background dump destination directory, showing the rebalance progress.

Views

In an ASM instance, V$ASM_OPERATION displays one row for every active long running ASM operation executing in the current ASM instance. GV$ASM_OPERATION will show cluster wide operations.

During the rebalance, the OPERATION will show REBAL, STATE will shows the state of the rebalance operation, POWER will show the rebalance power and EST_MINUTES will show an estimated time the operation should take.

In an ASM instance, V$ASM_DISK displays information about ASM disks. During the rebalance, the STATE will show the current state of the disks involved in the rebalance operation.

Is your disk group balanced

Run the following query in your ASM instance to get the report on the disk group imbalance.

SQL> column "Diskgroup" format A30
SQL> column "Imbalance" format 99.9 Heading "Percent|Imbalance"
SQL> column "Variance" format 99.9 Heading "Percent|Disk Size|Variance"
SQL> column "MinFree" format 99.9 Heading "Minimum|Percent|Free"
SQL> column "DiskCnt" format 9999 Heading "Disk|Count"
SQL> column "Type" format A10 Heading "Diskgroup|Redundancy"

SQL> SELECT g.name "Diskgroup",
  100*(max((d.total_mb-d.free_mb)/d.total_mb)-min((d.total_mb-d.free_mb)/d.total_mb))/max((d.total_mb-d.free_mb)/d.total_mb) "Imbalance",
  100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
  100*(min(d.free_mb/d.total_mb)) "MinFree",
  count(*) "DiskCnt",
  g.type "Type"
FROM v$asm_disk d, v$asm_diskgroup g
WHERE d.group_number = g.group_number and
  d.group_number <> 0 and
  d.state = 'NORMAL' and
  d.mount_status = 'CACHED'
GROUP BY g.name, g.type;

                                           Percent Minimum
                                 Percent Disk Size Percent  Disk Diskgroup
Diskgroup                      Imbalance  Variance    Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
ACFS                                  .0        .0    12.5     2 NORMAL
DATA                                  .0        .0    48.4     2 EXTERN
PLAY                                 3.3        .0    98.1     3 NORMAL
RECO                                  .0        .0    82.9     2 EXTERN

NOTE: The above query is from Oracle Press book Oracle Automatic Storage Management, Under-the-Hood & Practical Deployment Guide, by Nitin Vengurlekar, Murali Vallath and Rich Long.







38 comments:

  1. Hello,

    Our client has implemented a Two Node Oracle 10g R2 RAC on HP-UX v2. The Database is on ASM and on HP EVA 4000 SAN. The database size in around 1.2 TB.
    Now the requirement is to migrate the Database and Clusterware files to a New SAN (EVA 6400).

    SAN to SAN migration can't be done as the customer didn't get license for such storage migration.

    My immediate suggestion was to connect the New SAN and present the LUNs and add the Disks from New SAN and wait for rebalance to complete. Then drop the Old Disks which are on Old SAN.
    [Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. (Doc ID 837308.1).]

    Clients wants us to suggest alternate solutions as they are worried that presenting LUNs from Old SAN and New SAN at the same time may give some issues and also if re-balance fails then it may affect the database. Also they are not able to estimate the time to re-balance a 1.2 TB database across Disks from 2 different SAN. Downtime window is ony 48 hours.

    Is it possible to roughly estimate the time to re-balance a 1 TB of Banking Solution Database?

    Rgds,
    Thiru

    ReplyDelete
  2. Hi Thiru,

    Your suggestion is fine. I would just add that it is best to connect both SANs and then ADD new disks and DROP existing disks in the same alter diskgroup command. That way, there will be only one rebalance operation.

    There is really no need for downtime here, unless for some reason they need to shut down servers so they can connect to new SAN.

    If the customer is worried they need to make sure they have current backups, and if they cannot afford downtime, they need to have some sort of standby system.

    Now to your question about estimating the rebalance time. There is no easy answer to that :( You may want to consider the following to get an idea about the time it will take:
    1. Review their ASM alert logs and look for past rebalance operations. How long did those take? What was the operation? If they added a new disk/LUN, the alert log will show you how long it took. Note the rebalance power used.
    2. Use the new SAN to create a disk group on another server. Restore the production database there. Now add 1TB worth of disks and see how long it takes to rebalance. Now drop 1TB and see how long it takes. This will not be the same as in production as you will have two different SANs, with different I/O characteristics, but it will give you an idea. You only need a single instance setup for this test, as the rebalance runs on one node anyway.

    Hope this helps.

    Cheers,
    Bane

    ReplyDelete
  3. Hi,

    There are a few other questions interesting to me - how are file extents being "evenly distributed across all disks in a disk group":
    1) There used to be some level of imbalance - at least up to 11.1 a parameter existed _asm_imbalance_tolerance which wasn't 0 by default. (Probably for performance reasons).

    2) Considering that very often LUNs of an ASM diskgroup are created as volumes from same RAID Array striped on hardware level (let's say it's RAID10 or RAID0). Due to the double striping it's possible two adjacent extents of same file to get placed on same physical disk (even though they belong to different LUNs) - thus discarding the idea behind the striping.
    So I've been wondering - how big is this issue, what is the probability of it to happen, and is ASM doing something special to try to avoid such situations.

    ReplyDelete
  4. Hi Yavor,
    File extents are evenly distributed to all disks by placing file extents on all disks in a disk group. Let's say we have 5 disks in an external redundancy disk group with 1MB allocation unit size. And let's say we create a 10 MB file. Each extent will be 1MB, so there will be 2 extents per disk. But if our file is say 11MB, there will be an extra extent on one of the disks - hence the imbalance.
    1) When extents of a new file are being placed on disks in a disk group, a starting disk is picked at random. Following my example above, if we now create a 3 MB file and ASM picks as the starting disk the one that has an extra extent, that disk will end up with two extra extents compared to some other disks. Hidden parameter _asm_imbalance_tolerance was created so that ASM performs an extra check and make sure the imbalance is in the limited as per that parameter. Yes, that extra check and possible extra extents moving has a small performance hit.
    2) You are correct - that might be an issue and ASM cannot do anything about it. That being said, in my experience I haven't seen this being an issue, or at least I haven't had a case where that was the cause of any problems.
    Cheers,
    Bane

    ReplyDelete
  5. Hi

    Excellent explanation Bane

    Suppose we are doing a reblance of a disk group with power 6 and we feel that the rebalance is slow and would like to increase the power limit how do we do it.
    whether we have to stop the current rebalance with power 0 and then start the rebalance manually with the required power limit say 100.

    In this case whether ASM will start the rebalance from the start or will it start from where it has left. Is there any mark in the disks of the disk groups.

    Thanks
    Raghu

    ReplyDelete
    Replies
    1. Thanks Raghu,

      To change the rebalance power you just run the rebalance command with a new power level. There is no need for the intermediate (power 0) step.

      When you change the power in the middle of a running rebalance, the current operation stops and the new rebalance continues where the previous one ended. ASM is not going to put all the extents back where they were before the initial rebalance starts.

      How does ASM know where the previous rebalance got to?

      First, ASM tracks the rebalance in the Continuing Operations Directory (see http://asmsupportguy.blogspot.com.au/2012/01/asm-file-number-4.html). This is more important for forcibly aborted rebalance operations, e.g. when ASM instance 1 crashed in the middle or a rebalance, a second ASM instance can use that information to take over and complete the rebalance.

      Second, at the start of every rebalance, ASM looks at the extent distribution map and creates the rebalance plan. That way a new rebalance knows what it needs to do to get the disk group in a balanced state.

      Please let me know if that answers your questions.

      Cheers,
      Bane

      Delete
  6. Hello,
    So glad I came upon this page!
    I am currently troubleshooting an ASM anamoly.
    This ASM environment holds two databases, one over 1TB. There are 12 disks in the diskgroup, all presented from a SAN Raid5 and all 500GB in size.

    From this snapshot of the iostat you can see that of the 12 LUNs in the Datagroup, two in question are the only ones consistently hitting high await, svctm and %util(emcpowers and emcpowerw) during normal business user processing.

    I suspect it is a matter of how ASM has striped the data for this database which includes one large 'blob'(700GB).

    Is it possible we are hitting Bug 7699985: UNBALANCED DISTRIBUTION OF FILES ACROSS DISKS.
    Although the imbalace report does not seem too bad -

    -----------------output------------------

    Columns Described in Script Minimum
    Percent Percent Disk Diskgroup
    Diskgroup Name Imbalance Variance Free Count Redundancy
    ------------------------------ --------- --------------- ------- ----- ----------
    ASM_FRA .3 0 56.7 45 EXTERN
    ASM_DATA1 .7 0 44.2 108 EXTERN

    2 rows selected.


    Is there anyway to verify this?

    Thanks,
    Michele

    Linux OSW v3.0
    zzz ***Thu Aug 23 14:00:49 EDT 2012
    avg-cpu: %user %nice %system %iowait %steal %idle
    0.63 0.00 0.46 11.67 0.00 87.24

    Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
    emcpowerk 0.00 0.00 30.33 0.67 11744.00 21.33 379.53 1.62 52.19 10.38 32.17
    emcpowers 0.00 0.00 30.00 0.00 13616.00 0.00 453.87 4.33 150.81 33.19 99.57
    emcpowerw 0.00 0.00 22.33 0.00 10213.33 0.00 457.31 3.88 175.52 44.79 100.03
    emcpowerz 0.00 0.00 32.00 0.00 13098.67 0.00 409.33 1.04 32.50 5.44 17.40
    emcpoweraa 0.00 0.00 29.00 0.00 13194.67 0.00 454.99 1.63 56.10 9.70 28.13
    emcpowerab 0.00 0.00 28.33 4.67 12416.00 20.67 376.87 0.64 19.35 4.42 14.60
    emcpowerac 0.00 0.00 32.00 0.00 12629.00 0.00 394.66 0.93 29.07 6.79 21.73
    emcpowerad 0.00 0.00 31.00 0.33 11984.00 10.67 382.81 0.74 23.49 6.23 19.53
    emcpowerae 0.00 0.00 25.67 0.00 10965.33 0.00 427.22 1.29 50.22 11.96 30.70
    emcpoweraf 0.00 0.00 28.67 0.00 12587.33 0.00 439.09 0.70 24.05 5.03 14.43
    emcpowero 0.00 0.00 13.29 1.99 4826.58 31.89 317.91 0.41 26.89 6.02 9.20
    emcpowerh 0.00 0.00 11.96 2.33 5464.45 51.16 386.09 0.30 20.81 7.47 10.66



    ReplyDelete
    Replies
    1. You didn't tell me your ASM version. Bug 7699985 is fixed in 11.2. If you are on an earlier version the workaround is to set init parameter _asm_imbalance_tolerance to 0 (in ASM instance) and restart ASM.

      It's easy to find out if your disks are unbalanced. Just select total_mb, free_mb from v$asm_disk and see if free_mb is similar for all 12 disks. If not, your disks are not balanced.

      But the problem is more likely to be a hot block(s) issue. While ASM spreads the files across all disks, the minimum storage unit is 1MB (I don't know your allocation unit size, but 1MB is default). We can fit quite a few oracle blocks in that 1MB, so if you have hot blocks on those two disks, we would see something like that.

      The other thing you can do is request another 500GB LUN from your storage admin. Then drop emcpowerw and add that new LUN (at the same time so that there is only one rebalance). Return 500GB (emcpowerw) to the storage admin. Now check if high q/await/%util migrated to some other disk or went away. If it went away, do the same with emcpowers. If the issues simply migrates to another disk, you have hot blocks and you need to chase those down.

      Cheers,
      Bane

      Delete
    2. This comment has been removed by the author.

      Delete
    3. I am facing this same issue, i.e.: some disks in a diskgroup are hot as reported by iostats and OEM,
      only it is an online redo log disk group (one of three; +LOG1,+LOG2,+LOG3).
      +LOG1 is very unbalanced as to IO, while +LOG2 and +LOG3 are very evenly balanced.
      Two disks have over a billion IOs while many others are less than 1 million.

      ASM version is 11.2.0.1.0
      Linux RedHat 5.8
      EMC power SAN

      Columns Described in Script Percent Minimum
      Percent Disk Size Percent Disk Diskgroup
      Diskgroup Name Imbalance Variance Free Count Redundancy
      --------------- --------- --------- ------- ----- ----------
      LOG1____________1.3_______.0________84.6____16____EXTERN
      LOG2____________1.3_______.0________84.6____16____EXTERN
      LOG3_____________.8_______.0________69.6____16____EXTERN
      ARCHLOG_________1.5_______.0________99.6_____8____EXTERN
      CRS_VOTE_________.0_______.0________91.4_____1____EXTERN
      DATAFILE_________.7_______.0__________.7____38____EXTERN
      TEMPFILE_________.1_______.0__________.6_____6____EXTERN

      => OEM Disk Group I/O Cumulative Statistics (zero relative) <=
      Disk Group___Avg__Avg__Total______Read_______Write
      _____________Resp_Thru_I/O Call___Total______Total
      __LOG1_0000__0.01_0.01_13475148___5425489____8049659
      __LOG1_0001__0____0____3029180____3020449____8731
      __LOG1_0002__0.01_0____445260_____444938_____322
      __LOG1_0003__0.01_0____534222_____534222_____0
      __LOG1_0004__0____0.02_30350066___30325422___24644
      __LOG1_0005__0.02_0____446791_____446721_____70
      __LOG1_0006__0.01_0____461979_____461886_____93
      __LOG1_0007__0____0.25_386913815__386781883__131932
      __LOG1_0008__0____1.02_1588774475_1582051461_6723014
      __LOG1_0009__0____0.22_339324949__339181388__143561
      __LOG1_0010__0.01_0____444896_____444868_____28
      __LOG1_0011__0____0.18_280314002__280149889__164113
      __LOG1_0012__0____0.5__778163759__772053271__6110488
      __LOG1_0013__0.02_0____444044_____444035_____9
      __LOG1_0014__0.02_0____438811_____438783_____28
      __LOG1_0015__0____0.87_1342205588_1329654489_12551099

      => iostats (one relative) <=
      ASM Disk___SAN Disk____Rds___Wrts_______R/sec__W/sec___Util
      LOG3_0005 "emcpowerad"_0.4___25.8_______3.2____1651.2__2.32
      LOG2_0004 "emcpowerao"_1.6___38.2______12.8____1638.4__2.46
      LOG2_0016 "emcpowerap"_1.6___23.8______12.8____1638.4__1.92
      LOG2_0008 "emcpowerp"__0.4___33.2_______3.2____1638.4__2.24
      LOG1_0016 "emcpowerat"_420.8__0.4___13456________12.8_17.4
      LOG1_0014 "emcpoweraa"_0.4__ _0_________3.2_______0____0.02
      LOG2_0011 "emcpowerv"__1.6___14.6______12.8_____647.8__1.12
      LOG1_0011 "emcpowerc"__1.6____0________12.8_______0____0.06
      LOG1_0015 "emcpowerd"__1.6____0________12.8_______0____0.02
      LOG1_0013 "emcpoweri"__420.8__0_____13427.2_______0___17.2
      LOG1_0012 "emcpowerz"__3.4____0_______70.4________0____0.06
      LOG2_0006 "emcpoweru"__1.6___23.8______12.8____1638.4__1.94
      LOG2_0007 "emcpowerq"__1.6____0________12.8_______0____0.04
      LOG2_0005 "emcpowero"__0.4____0_________3.2_______0____0
      LOG2_0010 "emcpowern"__0.4___54.2_______3.2____1638.4__3.06
      LOG2_0002 "emcpowerm"__0.4___29.4_______3.2____1638.4__1.7
      LOG2_0001 "emcpowers"__1.6____0.4______12.8_______3.2__0.02
      LOG2_0014 "emcpowerar"_20.4__26______9830.4____1638.4__5.84
      LOG3_0004 "emcpowerx"__0______0_________0_________0____0
      LOG2_0013 "emcpoweram"_17_____0______8192_________0____3
      LOG3_0007 "emcpoweral"_0______0.4_______0_______87.2___0.08
      LOG3_0002 "emcpowerak"_0_____31_________0_____1638.4___3.08
      LOG3_0006 "emcpoweraf"_0______0_________0________0_____0
      LOG2_0009 "emcpoweraq"_18.6___0______9011________0_____3.38
      LOG2_0015 "emcpoweraj"_20.4___8.6____9830.4____535.4___4.28
      LOG3_0003 "emcpowerah"_0_____22_________0____1638.4____1.68
      LOG2_0012 "emcpowerw"__20.4__29.2____9830.4___1651.2___6.7
      LOG2_0003 "emcpowerae"_20.6___1______9836.8____106.4___3.54

      Delete
    4. Hi Don,

      Very interesting. Given that LOG disk groups have redo logs only, hot blocks are out of the question. I assume all LOG disk groups for the same database, right? Not sure what LOG1_0016 is doing here as LOG1 shows 16 disks (LOG1_0000-LOG1_0015)?

      Can you get me the following please and I will have a look:

      1. spool /tmp/asm_gv.html
      set markup HTML on
      break on INST_ID on GROUP_NUMBER
      alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';
      select SYSDATE "Date and Time" from DUAL;
      select * from GV$ASM_OPERATION order by 1;
      select * from V$ASM_DISKGROUP_STAT order by 1, 2;
      select * from V$ASM_DISK_STAT order by 1, 2, 3;
      select * from V$ASM_ATTRIBUTE where NAME not like 'template%' order by 1;
      select * from V$VERSION where BANNER like '%Database%' order by 1;
      select * from V$ASM_CLIENT order by 1, 2;
      show parameter asm
      show parameter cluster
      show parameter instance
      show parameter spfile
      show sga
      spool off
      exit

      2. The result of
      asmcmd find +LOG1 "*"
      asmcmd find +LOG2 "*"
      asmcmd find +LOG3 "*"

      It may be best to email me that at bane.radulovic at gmail.com and once we find out what is going on, we can post back here.

      Cheers,
      Bane

      Delete
    5. Hi Bane,

      I surly have too many disks in my log groups (16). Its nice to see someone confirm that craziness, since I did not set it up I do not know why it was done.

      What’s even more interesting is that OEM and Unix iostat show the same anomaly on LOG1 group disks.

      Since LOG2 and LOG3 stats look good, LOG3 is already shared between nodes, and there is plenty of space, I plan to put LOG1 files on LOG2 and be done with LOG1 as you suggested.

      Thanks a lot,
      Don C.
      PS: Thanks for the new command 'asmcmd find "*"'. Very cool.

      Delete
  7. Bane,
    Thank you for the quick response. We are using 11.1.07 version. I am the Sys Admin, so I asked the DBA to check v$asm_disk for me and they look pretty balanced -
    TOTAL_MB FREE_MB
    --------------- ---------------
    511993 228048
    511993 227985
    511993 228039
    511993 226152
    511993 228002
    511993 228014
    511993 228021
    511993 228003
    511993 228061
    511993 226119
    511993 226159
    511993 226175

    This same symptom occurs in our acceptance environment with similar data (a large blob), so you are very likely to be correct about the hot blocks.
    How do we investigate and mitigate a hot block issue?
    I will also talk to storage admin to see if we can do the test with a new LUN.

    Thank you,
    Michele

    ReplyDelete
    Replies
    1. Hi Michele,

      Yes, as expected the disk group is balanced. Your DBA should know hot to go track down hot blocks. If not they should engage Oracle Support.

      Cheers,
      Bane

      Delete
  8. Thank you again for your help!

    ReplyDelete
  9. Bane,

    Nice blog...
    I have a few more questions regarding ASM
    Can you explain about the memory initilization parameters of the ORACLE ASM like
    DB_CACHE_SIZE
    LARGE_POOL_SIZE
    SHARED_POOL_SIZE

    How the ASM uses the memory that is allocated to them????


    Thanks
    naveen

    ReplyDelete
    Replies
    1. Hi Naveen,

      An ASM instance manages ASM metadata, so the buffer cache in an ASM instance will have ASM metadata blocks. There will be no Oracle database blocks in ASM SGA at any time.

      In the shared pool of an ASM instance you will find SQL statements, but also the file extent maps that need to be passed to the database instances.

      Large pool would be use for ASM related packages, like dbms_diskgroup.

      The SGA requirements for an ASM instance will depend on number of databases it serves, the size of the files it manages, the number of files, etc.

      Unfortunately, there is no good info on SGA tuning for ASM...

      Feel free to let me know if you have a more specific question or concern.

      Cheers,
      Bane

      Delete
    2. Thanks Bane,

      Are there any books that discuss more about these parameters??

      Thanks
      naveen

      Delete
    3. Not really. My favourite book, Oracle Automatic Storage Management: Under-the-Hood & Practical Deployment Guide, barely mentions them. And with the push for automatic memory management, I don't expect much on this topic in the future either.

      Cheers,
      Bane

      Delete
  10. Hi,

    I need some clarification during Disk drop from ASM Disk group on windows.

    Currently we have 4 and 2 disks on disk groups.


    SQL> select group_number, name, TOTAL_MB, FREE_MB from V$asm_disk_stat order by name;

    GROUP_NUMBER NAME TOTAL_MB FREE_MB
    ------------ ------------------------------ ---------- ----------
    1 DATA1_0000 255997 244604
    1 DATA1_0001 255997 244550
    1 DATA1_0002 255997 244590
    1 DATA1_0003 255997 244524
    2 DATA2_0000 255997 235618
    2 DATA2_0001 255997 235642
    2 DATA2_0002 255997 235626
    2 DATA2_0003 255997 235621
    3 DATA3_0000 255997 236167
    3 DATA3_0001 255997 236172
    4 FLASH_0000 255997 252834
    4 FLASH_0001 255997 252829

    And I am going to use below drop command to release DISKS -

    alter diskgroup FLASH drop disk FLASH_0001;
    alter diskgroup DATA3 drop disk DATA3_0001;

    alter diskgroup DATA2 drop disk DATA2_0003;
    alter diskgroup DATA2 drop disk DATA2_0002;
    alter diskgroup DATA2 drop disk DATA2_0001;

    alter diskgroup DATA1 drop disk DATA1_0003;
    alter diskgroup DATA1 drop disk DATA1_0002;
    alter diskgroup DATA1 drop disk DATA1_0001;

    I need some clarification -

    1. How can we increase the execution time of these above command. ASM_POWER_LIMIT value is 1.

    2. What is the acutal commande to execute this.

    3. For Rollback, Is it right operation -


    ALTER DISKGROUP FLASH ADD DISK '\\.\ORCLDISKFLASH1' NAME FLASH_0001 NOFORCE ;

    4. and important - how can I estimate time duration before excution of drop disk command.

    Thanks in advance.

    ReplyDelete
    Replies
    1. Hi Anil,

      1. How can we increase the execution time of these above command? ASM_POWER_LIMIT value is 1.

      Use the REBALANCE POWER clause like this:

      SQL> alter diskgroup FLASH drop disk FLASH_0001 rebalance power 10;

      If you have RAC (say 4 nodes), you can drop disks from all 4 disk groups at the same time - one per ASM instance. If you are running a single instance, you have to do this serially. Drop disks from FLASH first, wait for the rebalance to complete, drop disks from DATA1, wait for the rebalance to complete, etc.

      2. What is the actual command to execute this?

      The commands would be:

      SQL> alter diskgroup FLASH drop disk FLASH_0001 rebalance power 10;

      If you are running a single instance ASM, wait for the rebalance to complete, then proceed to drop disks from DATA1. If you are in RAC, you can run the next command immediately:

      SQL> alter diskgroup DATA1 drop disk DATA1_0001, DATA1_0002, DATA1_0003 rebalance power 10;

      Note that all 3 disks are dropped in a single command. That is better than 3 separate DROP DISK commands as there will be only one rebalance operation - saving you time to complete the drop.

      SQL> alter diskgroup DATA2 drop disk DATA2_0001, DATA2_0002, DATA2_0003 rebalance power 10;

      Same here - drop all 3 disks in a single command. And finally, the last one:

      SQL> alter diskgroup DATA3 drop disk DATA3_0001 rebalance power 10;

      3. For Rollback, Is it right operation -
      ALTER DISKGROUP FLASH ADD DISK '\\.\ORCLDISKFLASH1' NAME FLASH_0001 NOFORCE ;

      Not sure what you mean by rollback. If you want to cancel the DROP DISK command for some reason, you can do this:

      SQL> alter diskgroup FLASH undrop disks;

      That can be done BEFORE the drop completes, i.e. while the rebalance is still running.

      If you want to add the disk back, once it is dropped, you do this:

      SQL> alter diskgroup FLASH add disk '\\.\ORCLDISKFLASH1';
      or
      SQL> alter diskgroup FLASH add disk '\\.\ORCLDISKFLASH1' rebalance power 10;

      The second ADD DISK should complete faster as it has the higher rebalance power. Note that there is no need to specify the disk name and there is no need for NOFORCE option.

      4. How can I estimate time duration before execution of drop disk command.

      There is no way to estimate the time before you start the drop. Once you issue DROP DISK, run the following query:

      SQL> select GROUP_NUMBER, INST_ID, OPERATION, STATE, SOFAR, EST_WORK, EST_MINUTES from GV$ASM_OPERATION;

      The EST_MINUTES will give you an indication of the time to complete. For a complete discussion on that topic have a look at
      http://asmsupportguy.blogspot.com.au/2012/07/when-will-my-rebalance-complete.html.

      Cheers,
      Bane

      Delete
  11. Thanks a lot... Bane.
    It will really very helpfull to me.

    ReplyDelete
  12. Does the rbal background process only kick in when adding/dropping ASM disks ?

    When there is no rebalancing to be done does the rbal process read any kind of ASM metadata ?

    ReplyDelete
    Replies
    1. The main role of the RBAL is to coordinate the rebalance - come up with the plan, make sure the rebalance power is observed, etc, but it also takes care of interrupted rebalance operations.

      So, yes it will 'kick in' during disk drop and add, but remember that other operations can also trigger the rebalance - disk resize, file zone change, manual rebalance, etc.

      Cheers,
      Bane

      Delete
  13. Hi Bane,

    I've been doing some research on ASM high redundancy and noticed that the imbalance was very high - 50%.

    Digging further into x$kffxp I can see that the primary extents are evenly distributed but the mirror and 2nd mirror copy are not...

    Is this expected behaviour?

    I'm running 11.2.0.3.

    Chris

    ReplyDelete
    Replies
    1. Hi Chris,

      No, this is not expected. Everything should be evenly balanced.

      Would you be able to share your findings to see if I can spot anything unusual? You can post here or if you want to provide more details, feel free to email me at bane.radulovic at gmail.com.

      Cheers,
      Bane

      Delete
  14. Hi Bane,

    Thanks for getting back to me.

    Here's some analysis:

    1) The imbalance is 50%
    Percent Minimum
    Percent Disk Size Percent Disk Diskgroup
    Diskgroup Imbalance Variance Free Count Redundancy
    ------------------------------ --------- --------- ------- ----- ----------
    DATA 50.2 .0 95.9 18 HIGH
    RECO 50.1 .0 84.2 17 HIGH
    REDO .3 .0 91.0 4 HIGH

    2) Analysis of an example file from x$kffxp, as you can see the primary extents are appropriately distributed, the mirror/2nd mirror are not which confirms the imbalance finding.
    select DISK_KFFXP,LXN_KFFXP,count(1)
    from x$kffxp
    where GROUP_KFFXP=1
    and NUMBER_KFFXP=262
    group by DISK_KFFXP, LXN_KFFXP
    order by LXN_KFFXP, DISK_KFFXP

    DISK_KFFXP LXN_KFFXP COUNT(1)
    ---------- ---------- ----------
    2 0 401
    3 0 402
    4 0 402
    5 0 400
    6 0 401
    7 0 401
    8 0 401
    9 0 400
    10 0 401
    11 0 400
    12 0 402
    13 0 401
    14 0 401
    15 0 401
    16 0 401
    17 0 400
    18 0 401
    19 0 401
    2 1 196
    3 1 224
    4 1 677
    5 1 523
    6 1 192
    7 1 196
    8 1 415
    9 1 396
    10 1 397
    11 1 425
    12 1 396
    13 1 396
    14 1 389
    15 1 389
    16 1 689
    17 1 518
    18 1 415
    19 1 384
    2 2 205
    3 2 175
    4 2 525
    5 2 679
    6 2 207
    7 2 205
    8 2 386
    9 2 407
    10 2 405
    11 2 377
    12 2 408
    13 2 404
    14 2 412
    15 2 414
    16 2 515
    17 2 686
    18 2 389
    19 2 418

    Prior to digging into this I attempted a rebalance operation & a asm disk check which passed.

    Thanks again,

    Chris.

    ReplyDelete
    Replies
    1. Hi Chris,

      Thanks for posting the details.

      There are actually two separate issues here.

      First, your disk groups are probably well balanced, but that imbalance query is not accurate for disk groups with lot of disks and lot of free space. I will add that to my post, but for now run this query that should give us more accurate (im)balance for your disk groups:

      select g.name "Diskgroup",
      100*(max((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576)))-min((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576))))/max((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576))) "Imbalance",
      100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
      100*(min(d.free_mb/d.total_mb)) "MinFree",
      count(*) "DiskCnt",
      g.type "Type"
      from &asm_disk d , &asm_diskgroup g
      where
      d.group_number = g.group_number and
      d.group_number <> 0 and
      d.state = 'NORMAL' and
      d.mount_status = 'CACHED'
      group by g.name, g.type;

      For the record, and credits, that query is from MOS Doc ID 1271089.1.

      BTW, your disk group RECO also has lot of free space, not many disks, so for that one the original query returns the correct result.

      The second issue, with secondary/tertiary extent imbalance is a bug. I did not know about it until you pointed this out. I tested it and managed to reproduce it straight away. I then checked if someone logged the bug for it and they sure did. Someone from Exadata team actually spotted this. Now the bug is unpublished, with no public entries, so there is really no point to give you the bug number.

      Note that the second issue is not actually causing the disk group imbalance. While the extents for that one file are not evenly balanced, some other file will have more extents on disks where this one has many, so the disk group should be balanced. The new imbalance query should confirm that.

      Cheers,
      Bane

      Delete
  15. Thanks Bane, this environment happens to be an ODA loaner which I'm currently assessing.

    Using the query in Doc ID 1271089.1 I'm getting similar results:
    Columns Described in Script Minimum
    Percent Percent Disk Diskgroup
    Diskgroup Name Imbalance Varience Free Count Redundancy
    ------------------------------ --------- ---------- ------- ----- ----------
    DATA 47.6 0 96.0 18 HIGH
    RECO 49.6 0 84.2 17 HIGH
    REDO .3 0 91.0 4 HIGH


    Columns Described in Script Partner Partner Inactive
    Count Space % Failgroup Partnership
    Diskgroup Name Imbalance Imbalance Count Count
    ------------------------------ --------- --------- --------- -----------
    DATA 2 50.0 18 0
    RECO 2 50.0 17 0
    REDO 0 .0 4 0

    It would be great if you could upload the bug number for my records.

    Thanks,

    Chris.

    ReplyDelete
    Replies
    1. Sure,
      Please email me bane.radulovic at gmail.com.
      Cheers,
      Bane

      Delete
  16. Alejandro, thank you for this very informative article on ASM rebalancing! We recently had a Fiber Channel connected NetApp FAS6080 (aka IBM N7900) with a ASM disk group, set for external redundancy, originally 28TB in size spend more than 3 hours in the second phase (extents rebalance) and an additional 3 hours in the third phase (compacting) while adding 4TB of LUNs. Each LUN is 2TB in size and the new LUNs added together in one command at a rebalance power of 10. All LUNs are the same size in the disk group. The third phase was not a “fraction of the second phase” in duration. This is a 11.2.0.3 grid infrastructure with the April 2013 PSU applied. ASM compatibility is 11.2.0.0 No databases were served by the particular ASM instance used to add the LUNs.

    It would seem unlikely that the ASM instance knows the physical geometry of the NetApp filer LUNs to move data to the outside edge of the physical spindles in this third phase. This appears best suited for JBOD implementations. Should we, as a best practice, simply use “_DISABLE_REBALANCE_COMPACT=TRUE”? And if so, is that set at the ASM instance or the database instances served?

    Thanks again!

    ReplyDelete
  17. Bane, apologies for referring to you as Alejandro above – cut and pasted the wrong name :’)

    ReplyDelete
    Replies
    1. No worries at all.

      I agree that the third phase may not take just a "fraction of the second phase". In fact, I have also see this take hours. I guess I need to correct that in the post.

      You are also right about the compacting being a waste of time on non-JBOD systems. You can set _DISABLE_REBALANCE_COMPACT to TRUE in ASM instances. Now, that is not what Oracle calls the best practice, but I think it should be.

      Cheers,
      Bane

      Delete
  18. Hello,

    How can I accurately determine the rebalance power performance impact on the application databases when swapping out (45) 1088GB devices for (90) 500GB devices?

    ReplyDelete
    Replies
    1. There is no formula or procedure to do that. The most important factors will be the size of the disk group (that would be a known value) and the actual I/O load at the time you are going to do this (that could be known if your load is very stable or if you are going to shut down all databases, but it sounds like you want to do it with no downtime). Any spare I/O bandwidth can be used to perform the disk swap.

      If you have no record of your previous disk add/drop stats, you can start the process with power 5 and see how it goes. You can always change it to 1 if it slows things down significantly or go higher if it's fine.

      Cheers,
      Bane

      Delete
  19. Hi Bane.... So we have recently added around 3TB space to our ASM disk named DATA01 and while running the rebalance step, I found that the compact rebalnace part ran for almost 6-7 hrs and we had this issue for the first time.

    Also, now I see that there are lot of trace files generated in the Management database i.e., NGMTDB.

    What could be the reason and is there any relation between space addition and traces being generated in MGMTDB

    ReplyDelete
  20. Thanks and that i have a super provide: What Do House Renovations Cost house renovation vancouver

    ReplyDelete