ASM Support Guy: Rebalancing act

November 27, 2011

Rebalancing act

ASM ensures that file extents are evenly distributed across all disks in a disk group. This is true for the initial file creation and for file resize operations. That means we should always have a balanced space distribution across all disks in a disk group.

Rebalance operation

Disk group rebalance is triggered automatically on ADD, DROP and RESIZE disk operations and on moving a file between hot and cold regions. Running rebalance by explicitly issuing ALTER DISKGROUP ... REBALANCE is called a manual rebalance. We might want to do that to change the rebalance power for example. We can also run the rebalance manually if a disk group becomes unbalanced for any reason.

The POWER clause of the ALTER DISKGROUP ... REBALANCE statement specifies the degree of parallelism of the rebalance operation. It can be set to a minimum value of 0 which halts the current rebalance until the statement is either implicitly or explicitly re-run. A higher values may reduce the total time it takes to complete the rebalance operation.

The ALTER DISKGROUP ... REBALANCE command by default returns immediately so that we can run other commands while the rebalance operation takes place in the background. To check the progress of the rebalance operations we can query V$ASM_OPERATION view.

Three phase power

The rebalance operation has three distinct phases. First, ASM has to come up with the rebalance plan. That will depend on the rebalance reason, disk group size, number of files in the disk group, whether or not partnership has to modified, etc. In any case this shouldn't take more than a couple of minutes.

The second phase is the moving or relocating the extents among the disks in the disk group. This is where the bulk of the time will be spent. As this phase is progressing, ASM will keep track of the number of extents moved, and the actual I/O performance. Based on that it will be calculating the estimated time to completion (GV$ASM_OPERATION.EST_MINUTES). Keep in mind that this is an estimate and that the actual time may change depending on the overall (mostly disk related) load. If the reason for the rebalance was a failed disk(s) in a redundant disk group, at the end of this phase the data mirroring is fully re-established.

The third phase is disk(s) compacting (ASM version 11.1.0.7 and later). The idea of the compacting phase is to move the data as close to the outer tracks of the disks as possible. Note that at this stage or the rebalance, the EST_MINUTES will keep showing 0. This is a 'feature' that will hopefully be addressed in the future. The time to complete this phase will again depend on the number of disks, reason for rebalance, etc. Overall time should be a fraction of the second phase.

Some notes about rebalance operations

Rebalance is per file operation.
An ongoing rebalance is restarted if the storage configuration changes either when we alter the configuration, or if the configuration changes due to a failure or an outage. If the new rebalance fails because of a user error a manual rebalance may be required.
There can be one rebalance operation per disk group per ASM instance in a cluster.
Rebalancing continues across a failure of the ASM instance performing the rebalance.
The REBALANCE clause (with its associated POWER and WAIT/NOWAIT keywords) can also be used in ALTER DISKGROUP commands for ADD, DROP or RESIZE disks.

Tuning rebalance operations

If the POWER clause is not specified in an ALTER DISKGROUP statement, or when rebalance is implicitly run by ADD/DROP/RESIZE disk, then the rebalance power defaults to the value of the ASM_POWER_LIMIT initialization parameter. We can adjust the value of this parameter dynamically. Higher power limit should result in a shorter time to complete the rebalance, but this is by no means linear and it will depends on the (storage system) load, available throughput and underlying disk response times.

The power can be changed for a rebalance that is in progress. We just need to issue another ALTER DISKGROUP ... REBALANCE command with different value for POWER. This interrupts the current rebalance and restarts it with modified POWER.

Relevant initialization parameters and disk group attributes

ASM_POWER_LIMIT

The ASM_POWER_LIMIT initialization parameter specifies the default power for disk rebalancing in a disk group. The range of values is 0 to 11 in versions prior to 11.2.0.2. Since version 11.2.0.2 the range of values is 0 to 1024, but that still depends on the disk group compatibility (see the notes below). The default value is 1. A value of 0 disables rebalancing.

For disk groups with COMPATIBLE.ASM set to 11.2.0.2 or greater, the operational range of values is 0 to 1024 for the rebalance power.
For disk groups that have COMPATIBLE.ASM set to less than 11.2.0.2, the operational range of values is 0 to 11 inclusive.
Specifying 0 for the POWER in the ALTER DISKGROUP REBALANCE command will stop the current rebalance operation (unless you hit bug 7257618).

_DISABLE_REBALANCE_COMPACT

Setting initialization parameter _DISABLE_REBALANCE_COMPACT=TRUE will disable the compacting phase of the disk group rebalance - for all disk groups.

_REBALANCE_COMPACT

This is a hidden disk group attribute. Setting _REBALANCE_COMPACT=FALSE will disable the compacting phase of the disk group rebalance - for that disk group only.

_ASM_IMBALANCE_TOLERANCE

This initialization parameter controls the percentage of imbalance between disks. Default value is 3%.

Processes

The following table has a brief summary of the background processes involved in the rebalance operation.

Process	Description
ARBn	ASM Rebalance Process. Rebalances data extents within an ASM disk group. Possible processes are ARB0-ARB9 and ARBA.
RBAL	ASM Rebalance Master Process. Coordinates rebalance activity. In an ASM instance, it coordinates rebalance activity for disk groups. In a database instances, it manages ASM disk groups.
Xnnn	Exadata only - ASM Disk Expel Slave Process. Performs ASM post-rebalance activities. This process expels dropped disks at the end of an ASM rebalance.

When a rebalance operation is in progress, the ARBn processes will generate trace files in the background dump destination directory, showing the rebalance progress.

Views

In an ASM instance, V$ASM_OPERATION displays one row for every active long running ASM operation executing in the current ASM instance. GV$ASM_OPERATION will show cluster wide operations.

During the rebalance, the OPERATION will show REBAL, STATE will shows the state of the rebalance operation, POWER will show the rebalance power and EST_MINUTES will show an estimated time the operation should take.

In an ASM instance, V$ASM_DISK displays information about ASM disks. During the rebalance, the STATE will show the current state of the disks involved in the rebalance operation.

Is your disk group balanced

Run the following query in your ASM instance to get the report on the disk group imbalance.

SQL> column "Diskgroup" format A30
SQL> column "Imbalance" format 99.9 Heading "Percent|Imbalance"
SQL> column "Variance" format 99.9 Heading "Percent|Disk Size|Variance"
SQL> column "MinFree" format 99.9 Heading "Minimum|Percent|Free"
SQL> column "DiskCnt" format 9999 Heading "Disk|Count"
SQL> column "Type" format A10 Heading "Diskgroup|Redundancy"

SQL> SELECT g.name "Diskgroup",
100*(max((d.total_mb-d.free_mb)/d.total_mb)-min((d.total_mb-d.free_mb)/d.total_mb))/max((d.total_mb-d.free_mb)/d.total_mb) "Imbalance",
100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
100*(min(d.free_mb/d.total_mb)) "MinFree",
count(*) "DiskCnt",
g.type "Type"
FROM v$asm_disk d, v$asm_diskgroup g
WHERE d.group_number = g.group_number and
d.group_number <> 0 and
d.state = 'NORMAL' and
d.mount_status = 'CACHED'
GROUP BY g.name, g.type;

Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Imbalance Variance Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
ACFS .0 .0 12.5 2 NORMAL
DATA .0 .0 48.4 2 EXTERN
PLAY 3.3 .0 98.1 3 NORMAL
RECO .0 .0 82.9 2 EXTERN

NOTE: The above query is from Oracle Press book Oracle Automatic Storage Management, Under-the-Hood & Practical Deployment Guide, by Nitin Vengurlekar, Murali Vallath and Rich Long.

38 comments:

ThiruDecember 1, 2011 at 1:17 AM
Hello,

Our client has implemented a Two Node Oracle 10g R2 RAC on HP-UX v2. The Database is on ASM and on HP EVA 4000 SAN. The database size in around 1.2 TB.
Now the requirement is to migrate the Database and Clusterware files to a New SAN (EVA 6400).

SAN to SAN migration can't be done as the customer didn't get license for such storage migration.

My immediate suggestion was to connect the New SAN and present the LUNs and add the Disks from New SAN and wait for rebalance to complete. Then drop the Old Disks which are on Old SAN.
[Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. (Doc ID 837308.1).]

Clients wants us to suggest alternate solutions as they are worried that presenting LUNs from Old SAN and New SAN at the same time may give some issues and also if re-balance fails then it may affect the database. Also they are not able to estimate the time to re-balance a 1.2 TB database across Disks from 2 different SAN. Downtime window is ony 48 hours.

Is it possible to roughly estimate the time to re-balance a 1 TB of Banking Solution Database?

Rgds,
Thiru
ReplyDelete
Replies
Bane RadulovicDecember 1, 2011 at 9:14 AM
Hi Thiru,

Your suggestion is fine. I would just add that it is best to connect both SANs and then ADD new disks and DROP existing disks in the same alter diskgroup command. That way, there will be only one rebalance operation.

There is really no need for downtime here, unless for some reason they need to shut down servers so they can connect to new SAN.

If the customer is worried they need to make sure they have current backups, and if they cannot afford downtime, they need to have some sort of standby system.

Now to your question about estimating the rebalance time. There is no easy answer to that :( You may want to consider the following to get an idea about the time it will take:
1. Review their ASM alert logs and look for past rebalance operations. How long did those take? What was the operation? If they added a new disk/LUN, the alert log will show you how long it took. Note the rebalance power used.
2. Use the new SAN to create a disk group on another server. Restore the production database there. Now add 1TB worth of disks and see how long it takes to rebalance. Now drop 1TB and see how long it takes. This will not be the same as in production as you will have two different SANs, with different I/O characteristics, but it will give you an idea. You only need a single instance setup for this test, as the rebalance runs on one node anyway.

Hope this helps.

Cheers,
Bane
ReplyDelete
Replies
Yavor NikolovJanuary 31, 2012 at 8:16 AM
Hi,

There are a few other questions interesting to me - how are file extents being "evenly distributed across all disks in a disk group":
1) There used to be some level of imbalance - at least up to 11.1 a parameter existed _asm_imbalance_tolerance which wasn't 0 by default. (Probably for performance reasons).

2) Considering that very often LUNs of an ASM diskgroup are created as volumes from same RAID Array striped on hardware level (let's say it's RAID10 or RAID0). Due to the double striping it's possible two adjacent extents of same file to get placed on same physical disk (even though they belong to different LUNs) - thus discarding the idea behind the striping.
So I've been wondering - how big is this issue, what is the probability of it to happen, and is ASM doing something special to try to avoid such situations.
ReplyDelete
Replies
Bane RadulovicJanuary 31, 2012 at 9:14 PM
Hi Yavor,
File extents are evenly distributed to all disks by placing file extents on all disks in a disk group. Let's say we have 5 disks in an external redundancy disk group with 1MB allocation unit size. And let's say we create a 10 MB file. Each extent will be 1MB, so there will be 2 extents per disk. But if our file is say 11MB, there will be an extra extent on one of the disks - hence the imbalance.
1) When extents of a new file are being placed on disks in a disk group, a starting disk is picked at random. Following my example above, if we now create a 3 MB file and ASM picks as the starting disk the one that has an extra extent, that disk will end up with two extra extents compared to some other disks. Hidden parameter _asm_imbalance_tolerance was created so that ASM performs an extra check and make sure the imbalance is in the limited as per that parameter. Yes, that extra check and possible extra extents moving has a small performance hit.
2) You are correct - that might be an issue and ASM cannot do anything about it. That being said, in my experience I haven't seen this being an issue, or at least I haven't had a case where that was the cause of any problems.
Cheers,
Bane
ReplyDelete
Replies
AnonymousJune 21, 2012 at 11:39 PM
Hi

Excellent explanation Bane

Suppose we are doing a reblance of a disk group with power 6 and we feel that the rebalance is slow and would like to increase the power limit how do we do it.
whether we have to stop the current rebalance with power 0 and then start the rebalance manually with the required power limit say 100.

In this case whether ASM will start the rebalance from the start or will it start from where it has left. Is there any mark in the disks of the disk groups.

Thanks
Raghu
ReplyDelete
Replies
AnonymousAugust 24, 2012 at 5:56 AM
Hello,
So glad I came upon this page!
I am currently troubleshooting an ASM anamoly.
This ASM environment holds two databases, one over 1TB. There are 12 disks in the diskgroup, all presented from a SAN Raid5 and all 500GB in size.

From this snapshot of the iostat you can see that of the 12 LUNs in the Datagroup, two in question are the only ones consistently hitting high await, svctm and %util(emcpowers and emcpowerw) during normal business user processing.

I suspect it is a matter of how ASM has striped the data for this database which includes one large 'blob'(700GB).

Is it possible we are hitting Bug 7699985: UNBALANCED DISTRIBUTION OF FILES ACROSS DISKS.
Although the imbalace report does not seem too bad -

-----------------output------------------

Columns Described in Script Minimum
Percent Percent Disk Diskgroup
Diskgroup Name Imbalance Variance Free Count Redundancy
------------------------------ --------- --------------- ------- ----- ----------
ASM_FRA .3 0 56.7 45 EXTERN
ASM_DATA1 .7 0 44.2 108 EXTERN

2 rows selected.

Is there anyway to verify this?

Thanks,
Michele

Linux OSW v3.0
zzz ***Thu Aug 23 14:00:49 EDT 2012
avg-cpu: %user %nice %system %iowait %steal %idle
0.63 0.00 0.46 11.67 0.00 87.24

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
emcpowerk 0.00 0.00 30.33 0.67 11744.00 21.33 379.53 1.62 52.19 10.38 32.17
emcpowers 0.00 0.00 30.00 0.00 13616.00 0.00 453.87 4.33 150.81 33.19 99.57
emcpowerw 0.00 0.00 22.33 0.00 10213.33 0.00 457.31 3.88 175.52 44.79 100.03
emcpowerz 0.00 0.00 32.00 0.00 13098.67 0.00 409.33 1.04 32.50 5.44 17.40
emcpoweraa 0.00 0.00 29.00 0.00 13194.67 0.00 454.99 1.63 56.10 9.70 28.13
emcpowerab 0.00 0.00 28.33 4.67 12416.00 20.67 376.87 0.64 19.35 4.42 14.60
emcpowerac 0.00 0.00 32.00 0.00 12629.00 0.00 394.66 0.93 29.07 6.79 21.73
emcpowerad 0.00 0.00 31.00 0.33 11984.00 10.67 382.81 0.74 23.49 6.23 19.53
emcpowerae 0.00 0.00 25.67 0.00 10965.33 0.00 427.22 1.29 50.22 11.96 30.70
emcpoweraf 0.00 0.00 28.67 0.00 12587.33 0.00 439.09 0.70 24.05 5.03 14.43
emcpowero 0.00 0.00 13.29 1.99 4826.58 31.89 317.91 0.41 26.89 6.02 9.20
emcpowerh 0.00 0.00 11.96 2.33 5464.45 51.16 386.09 0.30 20.81 7.47 10.66

ReplyDelete
Replies
AnonymousAugust 24, 2012 at 11:36 PM
Bane,
Thank you for the quick response. We are using 11.1.07 version. I am the Sys Admin, so I asked the DBA to check v$asm_disk for me and they look pretty balanced -
TOTAL_MB FREE_MB
--------------- ---------------
511993 228048
511993 227985
511993 228039
511993 226152
511993 228002
511993 228014
511993 228021
511993 228003
511993 228061
511993 226119
511993 226159
511993 226175

This same symptom occurs in our acceptance environment with similar data (a large blob), so you are very likely to be correct about the hot blocks.
How do we investigate and mitigate a hot block issue?
I will also talk to storage admin to see if we can do the test with a new LUN.

Thank you,
Michele
ReplyDelete
Replies
AnonymousAugust 27, 2012 at 10:15 PM
Thank you again for your help!
ReplyDelete
Replies
AnonymousSeptember 12, 2012 at 8:44 AM
Bane,

Nice blog...
I have a few more questions regarding ASM
Can you explain about the memory initilization parameters of the ORACLE ASM like
DB_CACHE_SIZE
LARGE_POOL_SIZE
SHARED_POOL_SIZE

How the ASM uses the memory that is allocated to them????

Thanks
naveen
ReplyDelete
Replies
AnilOctober 16, 2012 at 4:05 AM
Hi,

I need some clarification during Disk drop from ASM Disk group on windows.

Currently we have 4 and 2 disks on disk groups.

SQL> select group_number, name, TOTAL_MB, FREE_MB from V$asm_disk_stat order by name;

GROUP_NUMBER NAME TOTAL_MB FREE_MB
------------ ------------------------------ ---------- ----------
1 DATA1_0000 255997 244604
1 DATA1_0001 255997 244550
1 DATA1_0002 255997 244590
1 DATA1_0003 255997 244524
2 DATA2_0000 255997 235618
2 DATA2_0001 255997 235642
2 DATA2_0002 255997 235626
2 DATA2_0003 255997 235621
3 DATA3_0000 255997 236167
3 DATA3_0001 255997 236172
4 FLASH_0000 255997 252834
4 FLASH_0001 255997 252829

And I am going to use below drop command to release DISKS -

alter diskgroup FLASH drop disk FLASH_0001;
alter diskgroup DATA3 drop disk DATA3_0001;

alter diskgroup DATA2 drop disk DATA2_0003;
alter diskgroup DATA2 drop disk DATA2_0002;
alter diskgroup DATA2 drop disk DATA2_0001;

alter diskgroup DATA1 drop disk DATA1_0003;
alter diskgroup DATA1 drop disk DATA1_0002;
alter diskgroup DATA1 drop disk DATA1_0001;

I need some clarification -

1. How can we increase the execution time of these above command. ASM_POWER_LIMIT value is 1.

2. What is the acutal commande to execute this.

3. For Rollback, Is it right operation -

ALTER DISKGROUP FLASH ADD DISK '\\.\ORCLDISKFLASH1' NAME FLASH_0001 NOFORCE ;

4. and important - how can I estimate time duration before excution of drop disk command.

Thanks in advance.
ReplyDelete
Replies
AnilOctober 16, 2012 at 11:11 PM
Thanks a lot... Bane.
It will really very helpfull to me.
ReplyDelete
Replies
AnonymousNovember 16, 2012 at 3:02 AM
Does the rbal background process only kick in when adding/dropping ASM disks ?

When there is no rebalancing to be done does the rbal process read any kind of ASM metadata ?
ReplyDelete
Replies
AnonymousJune 19, 2013 at 12:43 AM
Hi Bane,

I've been doing some research on ASM high redundancy and noticed that the imbalance was very high - 50%.

Digging further into x$kffxp I can see that the primary extents are evenly distributed but the mirror and 2nd mirror copy are not...

Is this expected behaviour?

I'm running 11.2.0.3.

Chris
ReplyDelete
Replies
AnonymousJune 20, 2013 at 5:43 PM
Hi Bane,

Thanks for getting back to me.

Here's some analysis:

1) The imbalance is 50%
Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Imbalance Variance Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
DATA 50.2 .0 95.9 18 HIGH
RECO 50.1 .0 84.2 17 HIGH
REDO .3 .0 91.0 4 HIGH

2) Analysis of an example file from x$kffxp, as you can see the primary extents are appropriately distributed, the mirror/2nd mirror are not which confirms the imbalance finding.
select DISK_KFFXP,LXN_KFFXP,count(1)
from x$kffxp
where GROUP_KFFXP=1
and NUMBER_KFFXP=262
group by DISK_KFFXP, LXN_KFFXP
order by LXN_KFFXP, DISK_KFFXP

DISK_KFFXP LXN_KFFXP COUNT(1)
---------- ---------- ----------
2 0 401
3 0 402
4 0 402
5 0 400
6 0 401
7 0 401
8 0 401
9 0 400
10 0 401
11 0 400
12 0 402
13 0 401
14 0 401
15 0 401
16 0 401
17 0 400
18 0 401
19 0 401
2 1 196
3 1 224
4 1 677
5 1 523
6 1 192
7 1 196
8 1 415
9 1 396
10 1 397
11 1 425
12 1 396
13 1 396
14 1 389
15 1 389
16 1 689
17 1 518
18 1 415
19 1 384
2 2 205
3 2 175
4 2 525
5 2 679
6 2 207
7 2 205
8 2 386
9 2 407
10 2 405
11 2 377
12 2 408
13 2 404
14 2 412
15 2 414
16 2 515
17 2 686
18 2 389
19 2 418

Prior to digging into this I attempted a rebalance operation & a asm disk check which passed.

Thanks again,

Chris.
ReplyDelete
Replies
AnonymousJune 20, 2013 at 9:56 PM
Thanks Bane, this environment happens to be an ODA loaner which I'm currently assessing.

Using the query in Doc ID 1271089.1 I'm getting similar results:
Columns Described in Script Minimum
Percent Percent Disk Diskgroup
Diskgroup Name Imbalance Varience Free Count Redundancy
------------------------------ --------- ---------- ------- ----- ----------
DATA 47.6 0 96.0 18 HIGH
RECO 49.6 0 84.2 17 HIGH
REDO .3 0 91.0 4 HIGH

Columns Described in Script Partner Partner Inactive
Count Space % Failgroup Partnership
Diskgroup Name Imbalance Imbalance Count Count
------------------------------ --------- --------- --------- -----------
DATA 2 50.0 18 0
RECO 2 50.0 17 0
REDO 0 .0 4 0

It would be great if you could upload the bug number for my records.

Thanks,

Chris.
ReplyDelete
Replies
AnonymousJune 30, 2013 at 2:59 AM
Alejandro, thank you for this very informative article on ASM rebalancing! We recently had a Fiber Channel connected NetApp FAS6080 (aka IBM N7900) with a ASM disk group, set for external redundancy, originally 28TB in size spend more than 3 hours in the second phase (extents rebalance) and an additional 3 hours in the third phase (compacting) while adding 4TB of LUNs. Each LUN is 2TB in size and the new LUNs added together in one command at a rebalance power of 10. All LUNs are the same size in the disk group. The third phase was not a “fraction of the second phase” in duration. This is a 11.2.0.3 grid infrastructure with the April 2013 PSU applied. ASM compatibility is 11.2.0.0 No databases were served by the particular ASM instance used to add the LUNs.

It would seem unlikely that the ASM instance knows the physical geometry of the NetApp filer LUNs to move data to the outside edge of the physical spindles in this third phase. This appears best suited for JBOD implementations. Should we, as a best practice, simply use “_DISABLE_REBALANCE_COMPACT=TRUE”? And if so, is that set at the ASM instance or the database instances served?

Thanks again!
ReplyDelete
Replies
AnonymousJune 30, 2013 at 3:05 AM
Bane, apologies for referring to you as Alejandro above – cut and pasted the wrong name :’)
ReplyDelete
Replies
AnonymousAugust 14, 2013 at 1:06 AM
Hello,

How can I accurately determine the rebalance power performance impact on the application databases when swapping out (45) 1088GB devices for (90) 500GB devices?
ReplyDelete
Replies
UnknownOctober 5, 2021 at 5:20 AM
Hi Bane.... So we have recently added around 3TB space to our ASM disk named DATA01 and while running the rebalance step, I found that the compact rebalnace part ran for almost 6-7 hrs and we had this issue for the first time.

Also, now I see that there are lot of trace files generated in the Management database i.e., NGMTDB.

What could be the reason and is there any relation between space addition and traces being generated in MGMTDB
ReplyDelete
Replies
AnonymousAugust 13, 2024 at 2:52 PM
Thanks and that i have a super provide: What Do House Renovations Cost house renovation vancouver
ReplyDelete
Replies

Add comment