July 22, 2012

When will my rebalance complete


This is the top ASM questions people ask. But if you expect me to respond with a number of minutes, you will be disappointed.

After all, ASM has given you an estimate, and you still want to know when exactly is that rebalance going to finish. Instead, I will show you how to check if the rebalance is actually progressing, what phase it is in, and if there is a reason for concern.

Understanding the rebalance

As explained in the rebalancing act, the rebalance operation has three phases - planning, extents relocation and compacting. As far as the overall time to complete is concerned, the planing phase time is insignificant so there is no need to worry about it. The extent relocation phase will take most of the time, so the main focus will be on that.I will also show what is going on during the compacting phase.

It is important to know why the rebalance is running. If you are adding a new disk, say to increase the available disk group space, it doesn't really matter how long it will take for the rebalance to complete. OK maybe it does, if your database is hung because you ran out of space in your archive log destination. Similarly if you are resizing or dropping disk(s), to adjust the disk group space, you are generally not concerned with the time it takes for the rebalance to complete.

But if a disk has failed and ASM has initiated rebalance, there may be legitimate reason for concern. If your disk group is normal redundancy AND if another disk fails AND it's the partner of that disk that has already failed, your disk group will be dismounted, all your databases that use that disk group will crash and you may lose data. In such cases I understand that you want to know when that rebalance will complete. Actually, you want to see the relocation phase completed, as once it does, all your data is fully redundant again.

Extents relocation

To have a closer look at the extents relocation phase, I drop one of the disks with the default rebalance power:

SQL> show parameter power

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------------
asm_power_limit                      integer     1

SQL> set time on
16:40:57 SQL> alter diskgroup DATA1 drop disk DATA1_CD_06_CELL06;

Diskgroup altered.

Initial estimated time to complete is 26 minutes:

16:41:21 SQL> select INST_ID, OPERATION, STATE, POWER, SOFAR, EST_WORK, EST_RATE, EST_MINUTES from GV$ASM_OPERATION where GROUP_NUMBER=1;

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT          1
         2 REBAL RUN           1        516      53736       2012          26
         4 REBAL WAIT          1

About 10 minutes into the rebalance, the estimate is 24 minutes:

16:50:25 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT          1
         2 REBAL RUN           1      19235      72210       2124          24
         4 REBAL WAIT          1

While that EST_MINUTES doesn't give me much confidence, I see that the SOFAR (number of allocation units moved so far) is going up, which is a good sign.

ASM alert log shows the time of the drop disk, the OS process ID of the ARB0 doing all the work, and most importantly - that there are no errors:

Wed Jul 11 16:41:15 2012
SQL> alter diskgroup DATA1 drop disk DATA1_CD_06_CELL06
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
...
NOTE: starting rebalance of group 1/0x6ecaf3e6 (DATA1) at power 1
Starting background process ARB0
Wed Jul 11 16:41:24 2012
ARB0 started with pid=41, OS id=58591
NOTE: assigning ARB0 to group 1/0x6ecaf3e6 (DATA1) with 1 parallel I/O
NOTE: F1X0 copy 3 relocating from 0:2 to 55:35379 for diskgroup 1 (DATA1)
...

ARB0 trace file should show which file extents are being relocated. It does, and that is how I know that ARB0 is doing what it is supposed to do:

$ tail -f /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_58591.trc
...
ARB0 relocating file +DATA1.282.788356359 (120 entries)
*** 2012-07-11 16:48:44.808
ARB0 relocating file +DATA1.283.788356383 (120 entries)
...
*** 2012-07-11 17:13:11.761
ARB0 relocating file +DATA1.316.788357201 (120 entries)
*** 2012-07-11 17:13:16.326
ARB0 relocating file +DATA1.316.788357201 (120 entries)
...

Note that there may be lot of arb0 trace files in the trace directory, so that's why we need to know the OS process ID of the ARB0 actually doing the rebalance. That information is in the alert log of the ASM instance performing the rebalance.

I can also look at the pstack of the ARB0 process to see what is going on. It does show me that ASM is relocating extents (key functions on the stack being kfgbRebalExecute - kfdaExecute - kffRelocate):

# pstack 58591
#0  0x0000003957ccb6ef in poll () from /lib64/libc.so.6
...
#9  0x0000000003d711e0 in kfk_reap_oss_async_io ()
#10 0x0000000003d70c17 in kfk_reap_ios_from_subsys ()
#11 0x0000000000aea50e in kfk_reap_ios ()
#12 0x0000000003d702ae in kfk_io1 ()
#13 0x0000000003d6fe54 in kfkRequest ()
#14 0x0000000003d76540 in kfk_transitIO ()
#15 0x0000000003cd482b in kffRelocateWait ()
#16 0x0000000003cfa190 in kffRelocate ()
#17 0x0000000003c7ba16 in kfdaExecute ()
#18 0x0000000003d4beaa in kfgbRebalExecute ()
#19 0x0000000003d39627 in kfgbDriver ()
#20 0x00000000020e8d23 in ksbabs ()
#21 0x0000000003d4faae in kfgbRun ()
#22 0x00000000020ed95d in ksbrdp ()
#23 0x0000000002322343 in opirip ()
#24 0x0000000001618571 in opidrv ()
#25 0x0000000001c13be7 in sou2o ()
#26 0x000000000083ceba in opimai_real ()
#27 0x0000000001c19b58 in ssthrdmain ()
#28 0x000000000083cda1 in main ()

After about 35 minutes the EST_MINUTES dropps to 0:

17:16:54 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         2 REBAL RUN           1      74581      75825       2129           0
         3 REBAL WAIT          1
         4 REBAL WAIT          1

And soon after that, the ASM alert log shows:
  • Disk emptied
  • Disk header erased
  • PST update completed successfully
  • Disk closed
  • Rebalance completed
Wed Jul 11 17:17:32 2012
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Wed Jul 11 17:17:41 2012
GMON updating for reconfiguration, group 1 at 20 for pid 38, osid 93832
NOTE: group 1 PST updated.
SUCCESS: grp 1 disk DATA1_CD_06_CELL06 emptied
NOTE: erasing header on grp 1 disk DATA1_CD_06_CELL06
NOTE: process _x000_+asm2 (93832) initiating offline of disk 0.3916039210 (DATA1_CD_06_CELL06) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xe96a042a, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 21 for pid 38, osid 93832
NOTE: PST update grp = 1 completed successfully
NOTE: initiating PST update: grp = 1, dsk = 0/0xe96a042a, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 22 for pid 38, osid 93832
NOTE: cache closing disk 0 of grp 1: DATA1_CD_06_CELL06
NOTE: PST update grp = 1 completed successfully
GMON updating for reconfiguration, group 1 at 23 for pid 38, osid 93832
NOTE: cache closing disk 0 of grp 1: (not open) DATA1_CD_06_CELL06
NOTE: group 1 PST updated.
Wed Jul 11 17:17:41 2012
NOTE: membership refresh pending for group 1/0x6ecaf3e6 (DATA1)
GMON querying group 1 at 24 for pid 19, osid 38421
GMON querying group 1 at 25 for pid 19, osid 38421
NOTE: Disk  in mode 0x8 marked for de-assignment
SUCCESS: refreshed membership for 1/0x6ecaf3e6 (DATA1)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x6ecaf3e6 (DATA1)
NOTE: Attempting voting file refresh on diskgroup DATA1

So the estimated time was 26 minutes and the rebalance actually took about 36 minutes (in this particular case the compacting took less than a minute so I have ignored it). That is why it is more important to understand what is going on, then to know when will the rebalance complete.

Note that the estimated time may also be increasing. If the system is under heavy load, the rebalance will take more time - especially with the rebalance power 1. For a large disk group (many TB) and large number of files, the rebalance can take hours and possibly days.

If you want to get an idea how long will a drop disk take in your environment, you need to test it. Just drop one of the disks, while your system is under normal/typical load. Your data is fully redundant during such disk drop, so you are not exposed to a disk group dismount in case its partner disk fails during the rebalance.

Compacting

In another example, to look at the compacting phase, I add the same disk back, with rebalance power 10:

17:26:48 SQL> alter diskgroup DATA1 add disk '/o/*/DATA1_CD_06_celll06' rebalance power 10;

Diskgroup altered.

Initial estimated time to complete is 6 minutes:

17:27:22 SQL> select INST_ID, OPERATION, STATE, POWER, SOFAR, EST_WORK, EST_RATE, EST_MINUTES from GV$ASM_OPERATION where GROUP_NUMBER=1;

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         2 REBAL RUN          10        489      53851       7920           6
         3 REBAL WAIT         10
         4 REBAL WAIT         10

After about 10 minutes, the EST_MINUTES drops to 0:

17:39:05 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT         10
         2 REBAL RUN          10      92407      97874       8716           0
         4 REBAL WAIT         10

And I see the following in the ASM alert log

Wed Jul 11 17:39:49 2012
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Wed Jul 11 17:39:58 2012
GMON updating for reconfiguration, group 1 at 31 for pid 43, osid 115117
NOTE: group 1 PST updated.
Wed Jul 11 17:39:58 2012
NOTE: membership refresh pending for group 1/0x6ecaf3e6 (DATA1)
GMON querying group 1 at 32 for pid 19, osid 38421
SUCCESS: refreshed membership for 1/0x6ecaf3e6 (DATA1)
NOTE: Attempting voting file refresh on diskgroup DATA1

That means ASM has completed the second phase of the rebalance and is compacting now. If that is true, the pstack should show kfdCompact() function. Indeed it does:

# pstack 103326
#0  0x0000003957ccb6ef in poll () from /lib64/libc.so.6
...
#9  0x0000000003d711e0 in kfk_reap_oss_async_io ()
#10 0x0000000003d70c17 in kfk_reap_ios_from_subsys ()
#11 0x0000000000aea50e in kfk_reap_ios ()
#12 0x0000000003d702ae in kfk_io1 ()
#13 0x0000000003d6fe54 in kfkRequest ()
#14 0x0000000003d76540 in kfk_transitIO ()
#15 0x0000000003cd482b in kffRelocateWait ()
#16 0x0000000003cfa190 in kffRelocate ()
#17 0x0000000003c7ba16 in kfdaExecute ()
#18 0x0000000003c4b737 in kfdCompact ()
#19 0x0000000003c4c6d0 in kfdExecute ()
#20 0x0000000003d4bf0e in kfgbRebalExecute ()
#21 0x0000000003d39627 in kfgbDriver ()
#22 0x00000000020e8d23 in ksbabs ()
#23 0x0000000003d4faae in kfgbRun ()
#24 0x00000000020ed95d in ksbrdp ()
#25 0x0000000002322343 in opirip ()
#26 0x0000000001618571 in opidrv ()
#27 0x0000000001c13be7 in sou2o ()
#28 0x000000000083ceba in opimai_real ()
#29 0x0000000001c19b58 in ssthrdmain ()
#30 0x000000000083cda1 in main ()

The tail on ARB0 trace file now shows relocating just 1 entry at the time (another sign of compacting):

$ tail -f /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_103326.trc
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
...

The V$ASM_OPERATION keeps showing EST_MINUTES=0 (compacting):

17:42:39 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT         10
         4 REBAL WAIT         10
         2 REBAL RUN          10      98271      98305       7919           0

The X$KFGMG shows REBALST_KFGMG=2 (compacting):

17:42:50 SQL> select NUMBER_KFGMG, OP_KFGMG, ACTUAL_KFGMG, REBALST_KFGMG from X$KFGMG;

NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           1          1           10             2

Once the compacting phase completes, the alert log shows "stopping process ARB0" and "rebalance completed":

Wed Jul 11 17:43:48 2012
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x6ecaf3e6 (DATA1)

In this case, the extents relocation took about 12 minutes and the compacting took about 4 minutes.

The compacting phase can actually take significant amount of time. In one case I have seen the extents relocation run for 60 minutes and the compacting after that took another 30 minutes. But it doesn't really matter how long it takes for the compacting to complete, because as soon as the second phase of the rebalance (extent relocation) completes, all data is fully redundant and we are not exposed to disk group dismount due to partner disk failure.

Changing the power

Rebalance power can be changed dynamically, i.e. during the rebalance, so if your rebalance with the default power is 'too slow', you can increase it. How much? Well, do you understand your I/O load, your I/O throughput and most importantly your limits? If not, increase the power to 5 (just run 'ALTER DISKGROUP ... REBALANCE POWER 5;') and see if it makes a difference. Give it 10-15 minutes, before you jump to conclusions. Should you go higher? Again, as long as you are not adversely impacting your database I/O performance, you can keep increasing the power. But I haven't seen much improvement beyond power 30.

The testing is the key here. You really need to test this under your regular load and in your production environment. There is no point testing with no databases running and on a system that  runs off different storage system.

13 comments:

  1. Thanks for sharing, Bane

    Will the described dynamic Rebalance power changes really modify current operation attribute or rebalance operation re-starts after the ALTER DISKGROUP ... REBALANCE POWER.. command?
    Does this require compatible.asm=11.2?
    "But I haven't seen much improvement beyond power 30" - can't this limit depend on the diskgroup disk count?

    Sincerely, Igor

    ReplyDelete
    Replies
    1. Hi Igor,

      The rebalance processes actually stop and then start again, with the new power. We say that the rebalance continues because it does not not roll back what it did up to that point. It simply continues the work until the disk group is balances.

      No, this does not require any particular compatible.asm value.

      With higher power the disk I/O will be more aggressive, i.e. we will do more I/O at the same time, so the limitation will be overall I/O rate, not the number of disks.

      Cheers,
      Bane

      Delete
  2. Can I upgrade ASM from 10205 50 11203 while ASM rebalancing is in progress?

    ReplyDelete
  3. Hi Bane,

    I have multiple servers running ASM 10205 and 11203. (AIX). Queries on ASM views are slow on 2 servers as compared to others.
    Select * from v$asm_disk takes 1 second on all servers where these 2 serves take 2 minutes. One of them running 10205 asm and other one running 11203 asm
    Comparison Truss output from RBAL process on good vs. bad server:
    KIOCTL function takes 0.4 seconds on bad servers as compared to .0002 seconds on good server.... rest all looks similar.

    For e.g
    bad server
    0.0002: kopen("/dev/ora_rdo4", O_RDONLY) = 85
    0.4295: kioctl(85, 65281, 0x0FFFFFFFFFFF2A58, 0x0000000000000000) = 0
    0.0005: close(85) = 0

    Good server
    0.0008: kopen("/dev/ora_data06", O_RDONLY|O_LARGEFILE) = 12
    0.0003: kioctl(12, 65281, 0x0FFFFFFFFFFFC780, 0x0000000000000000) = 0
    0.0009: close(12)


    Any suggestions?

    ReplyDelete
    Replies
    1. Not sure what is going on. It looks like you found the culprit, but to confirm that the problem is only when you try to access disks, you can query _stat views - but in 11.2 only. E.e. query from v$asm_disk_stat is from memory and should come back straight away.
      Is there anything common (storage wise) to those two servers that have the problem - same (storage) network, switch, SAN, etc? I'd try to work out why just those two have the problem.
      Cheers,
      Bane

      Delete
  4. Hi Bane,

    if some blocks of a disk fail(can not read or write) in the second phase of the rebalance (extent relocation) ,and the diskgroup is external redundancy. what will happen???
    how can i demo it? i just have a esxi machine...

    Any suggestions?
    Thank You!

    ReplyDelete
    Replies
    1. The fact that we are in the middle of a rebalance doesn't change the way ASM works. If we cannot read/write to a disk in an external redundancy disk group, the disk goes offline and the disk group gets dismounted.

      More interesting question is what will happen when you try to mount the disk group back. If the problem was with the access to ASM metadata you will not be able to mount the disk group any more. In that case you would have to fix the issue - e.g. if the whole disk went missing, put it back, fix permissions, etc. If you cannot fix it, you would have to recreate the disk group and restore database data from backups.

      If the problem was with database data, the disk group would mount, but the disk group will be dismounted again when the rebalance continues or when the database tries to access the data. In that case you should be able to stop the rebalance, drop the 'offending' file(s), restore/recreate those files and move on. That would work only if the read/write issue was with a limited part of one disk. If the problem was with the whole disk, you would not be able to do much, as we spread all files across all disks. In that case you would be looking at recreating the disk group and restoring data...

      Cheers,
      Bane

      Delete
  5. Hi Bane,

    I have ACFS with 12 TB size, All the backups goes to ACFS and when ever the ACFS file system reaches 90%, Backups will slow down tremendously. If the filesystem is less than 50% then my backup completes in 4 hours and if the file system is 70% when the backups are kicked off the same backup runs for 12 hrs. Another strange thing is even if the file system is 100% full the backup never fails it hangs. Database and GI version is 11203 on RHEL5,5.
    Did you ever notice this kind of behaviour in ACFS?

    ReplyDelete
    Replies
    1. No, I haven't seen this issue. These days I work with Exadata that does not support ACFS [yet], so I am not exposed to all issues.

      Having said that, there is bug 12836020 that you may be hitting. It is about the poor performance and in particular when files are frequently extended.

      Get the patch for bug 12836020 and see if it helps.

      Cheers,
      Bane

      Delete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. Sometimes, the most difficult problems we face with QuickBooks are errors and issues that result from them. At Quickbooks Customer Service, we provide advanced technical solutions for all setting of QuickBooks to help you clear your errors and avoid new ones. Our executives are qualified, experienced and will help you correct the issues in your Quickbook system.

    ReplyDelete