Rebalance operation
Disk group rebalance is triggered automatically on ADD, DROP and RESIZE disk operations and on moving a file between hot and cold regions. Running rebalance by explicitly issuing ALTER DISKGROUP ... REBALANCE is called a manual rebalance. We might want to do that to change the rebalance power for example. We can also run the rebalance manually if a disk group becomes unbalanced for any reason.
The POWER clause of the ALTER DISKGROUP ... REBALANCE statement specifies the degree of parallelism of the rebalance operation. It can be set to a minimum value of 0 which halts the current rebalance until the statement is either implicitly or explicitly re-run. A higher values may reduce the total time it takes to complete the rebalance operation.
The ALTER DISKGROUP ... REBALANCE command by default returns immediately so that we can run other commands while the rebalance operation takes place in the background. To check the progress of the rebalance operations we can query V$ASM_OPERATION view.
Three phase power
The rebalance operation has three distinct phases. First, ASM has to come up with the rebalance plan. That will depend on the rebalance reason, disk group size, number of files in the disk group, whether or not partnership has to modified, etc. In any case this shouldn't take more than a couple of minutes.
The second phase is the moving or relocating the extents among the disks in the disk group. This is where the bulk of the time will be spent. As this phase is progressing, ASM will keep track of the number of extents moved, and the actual I/O performance. Based on that it will be calculating the estimated time to completion (GV$ASM_OPERATION.EST_MINUTES). Keep in mind that this is an estimate and that the actual time may change depending on the overall (mostly disk related) load. If the reason for the rebalance was a failed disk(s) in a redundant disk group, at the end of this phase the data mirroring is fully re-established.
The third phase is disk(s) compacting (ASM version 11.1.0.7 and later). The idea of the compacting phase is to move the data as close to the outer tracks of the disks as possible. Note that at this stage or the rebalance, the EST_MINUTES will keep showing 0. This is a 'feature' that will hopefully be addressed in the future. The time to complete this phase will again depend on the number of disks, reason for rebalance, etc. Overall time should be a fraction of the second phase.
Some notes about rebalance operations
- Rebalance is per file operation.
- An ongoing rebalance is restarted if the storage configuration changes either when we alter the configuration, or if the configuration changes due to a failure or an outage. If the new rebalance fails because of a user error a manual rebalance may be required.
- There can be one rebalance operation per disk group per ASM instance in a cluster.
- Rebalancing continues across a failure of the ASM instance performing the rebalance.
- The REBALANCE clause (with its associated POWER and WAIT/NOWAIT keywords) can also be used in ALTER DISKGROUP commands for ADD, DROP or RESIZE disks.
Tuning rebalance operations
If the POWER clause is not specified in an ALTER DISKGROUP statement, or when rebalance is implicitly run by ADD/DROP/RESIZE disk, then the rebalance power defaults to the value of the ASM_POWER_LIMIT initialization parameter. We can adjust the value of this parameter dynamically. Higher power limit should result in a shorter time to complete the rebalance, but this is by no means linear and it will depends on the (storage system) load, available throughput and underlying disk response times.
The power can be changed for a rebalance that is in progress. We just need to issue another ALTER DISKGROUP ... REBALANCE command with different value for POWER. This interrupts the current rebalance and restarts it with modified POWER.
Relevant initialization parameters and disk group attributes
ASM_POWER_LIMIT
The ASM_POWER_LIMIT initialization parameter specifies the default power for disk rebalancing in a disk group. The range of values is 0 to 11 in versions prior to 11.2.0.2. Since version 11.2.0.2 the range of values is 0 to 1024, but that still depends on the disk group compatibility (see the notes below). The default value is 1. A value of 0 disables rebalancing.
- For disk groups with COMPATIBLE.ASM set to 11.2.0.2 or greater, the operational range of values is 0 to 1024 for the rebalance power.
- For disk groups that have COMPATIBLE.ASM set to less than 11.2.0.2, the operational range of values is 0 to 11 inclusive.
- Specifying 0 for the POWER in the ALTER DISKGROUP REBALANCE command will stop the current rebalance operation (unless you hit bug 7257618).
_DISABLE_REBALANCE_COMPACT
Setting initialization parameter _DISABLE_REBALANCE_COMPACT=TRUE will disable the compacting phase of the disk group rebalance - for all disk groups.
_REBALANCE_COMPACT
This is a hidden disk group attribute. Setting _REBALANCE_COMPACT=FALSE will disable the compacting phase of the disk group rebalance - for that disk group only.
_ASM_IMBALANCE_TOLERANCE
This initialization parameter controls the percentage of imbalance between disks. Default value is 3%.
Processes
Setting initialization parameter _DISABLE_REBALANCE_COMPACT=TRUE will disable the compacting phase of the disk group rebalance - for all disk groups.
_REBALANCE_COMPACT
This is a hidden disk group attribute. Setting _REBALANCE_COMPACT=FALSE will disable the compacting phase of the disk group rebalance - for that disk group only.
_ASM_IMBALANCE_TOLERANCE
This initialization parameter controls the percentage of imbalance between disks. Default value is 3%.
Processes
The following table has a brief summary of the background processes involved in the rebalance operation.
Process | Description |
---|---|
ARBn | ASM Rebalance Process. Rebalances data extents within an ASM disk group. Possible processes are ARB0-ARB9 and ARBA. |
RBAL | ASM Rebalance Master Process. Coordinates rebalance activity. In an ASM instance, it coordinates rebalance activity for disk groups. In a database instances, it manages ASM disk groups. |
Xnnn | Exadata only - ASM Disk Expel Slave Process. Performs ASM post-rebalance activities. This process expels dropped disks at the end of an ASM rebalance. |
When a rebalance operation is in progress, the ARBn processes will generate trace files in the background dump destination directory, showing the rebalance progress.
Views
In an ASM instance, V$ASM_OPERATION displays one row for every active long running ASM operation executing in the current ASM instance. GV$ASM_OPERATION will show cluster wide operations.
During the rebalance, the OPERATION will show REBAL, STATE will shows the state of the rebalance operation, POWER will show the rebalance power and EST_MINUTES will show an estimated time the operation should take.
In an ASM instance, V$ASM_DISK displays information about ASM disks. During the rebalance, the STATE will show the current state of the disks involved in the rebalance operation.
Is your disk group balanced
Run the following query in your ASM instance to get the report on the disk group imbalance.
NOTE: The above query is from Oracle Press book Oracle Automatic Storage Management, Under-the-Hood & Practical Deployment Guide, by Nitin Vengurlekar, Murali Vallath and Rich Long.
Is your disk group balanced
Run the following query in your ASM instance to get the report on the disk group imbalance.
SQL> column "Diskgroup" format A30
SQL> column "Imbalance" format 99.9 Heading "Percent|Imbalance"
SQL> column "Variance" format 99.9 Heading "Percent|Disk Size|Variance"
SQL> column "MinFree" format 99.9 Heading "Minimum|Percent|Free"
SQL> column "DiskCnt" format 9999 Heading "Disk|Count"
SQL> column "Type" format A10 Heading "Diskgroup|Redundancy"
SQL> SELECT g.name "Diskgroup",
100*(max((d.total_mb-d.free_mb)/d.total_mb)-min((d.total_mb-d.free_mb)/d.total_mb))/max((d.total_mb-d.free_mb)/d.total_mb) "Imbalance",
100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
100*(min(d.free_mb/d.total_mb)) "MinFree",
count(*) "DiskCnt",
g.type "Type"
FROM v$asm_disk d, v$asm_diskgroup g
WHERE d.group_number = g.group_number and
d.group_number <> 0 and
d.state = 'NORMAL' and
d.mount_status = 'CACHED'
GROUP BY g.name, g.type;
Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Imbalance Variance Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
ACFS .0 .0 12.5 2 NORMAL
DATA .0 .0 48.4 2 EXTERN
PLAY 3.3 .0 98.1 3 NORMAL
RECO .0 .0 82.9 2 EXTERN
SQL> column "Imbalance" format 99.9 Heading "Percent|Imbalance"
SQL> column "Variance" format 99.9 Heading "Percent|Disk Size|Variance"
SQL> column "MinFree" format 99.9 Heading "Minimum|Percent|Free"
SQL> column "DiskCnt" format 9999 Heading "Disk|Count"
SQL> column "Type" format A10 Heading "Diskgroup|Redundancy"
SQL> SELECT g.name "Diskgroup",
100*(max((d.total_mb-d.free_mb)/d.total_mb)-min((d.total_mb-d.free_mb)/d.total_mb))/max((d.total_mb-d.free_mb)/d.total_mb) "Imbalance",
100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
100*(min(d.free_mb/d.total_mb)) "MinFree",
count(*) "DiskCnt",
g.type "Type"
FROM v$asm_disk d, v$asm_diskgroup g
WHERE d.group_number = g.group_number and
d.group_number <> 0 and
d.state = 'NORMAL' and
d.mount_status = 'CACHED'
GROUP BY g.name, g.type;
Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Imbalance Variance Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
ACFS .0 .0 12.5 2 NORMAL
DATA .0 .0 48.4 2 EXTERN
PLAY 3.3 .0 98.1 3 NORMAL
RECO .0 .0 82.9 2 EXTERN
NOTE: The above query is from Oracle Press book Oracle Automatic Storage Management, Under-the-Hood & Practical Deployment Guide, by Nitin Vengurlekar, Murali Vallath and Rich Long.
Hello,
ReplyDeleteOur client has implemented a Two Node Oracle 10g R2 RAC on HP-UX v2. The Database is on ASM and on HP EVA 4000 SAN. The database size in around 1.2 TB.
Now the requirement is to migrate the Database and Clusterware files to a New SAN (EVA 6400).
SAN to SAN migration can't be done as the customer didn't get license for such storage migration.
My immediate suggestion was to connect the New SAN and present the LUNs and add the Disks from New SAN and wait for rebalance to complete. Then drop the Old Disks which are on Old SAN.
[Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. (Doc ID 837308.1).]
Clients wants us to suggest alternate solutions as they are worried that presenting LUNs from Old SAN and New SAN at the same time may give some issues and also if re-balance fails then it may affect the database. Also they are not able to estimate the time to re-balance a 1.2 TB database across Disks from 2 different SAN. Downtime window is ony 48 hours.
Is it possible to roughly estimate the time to re-balance a 1 TB of Banking Solution Database?
Rgds,
Thiru
Hi Thiru,
ReplyDeleteYour suggestion is fine. I would just add that it is best to connect both SANs and then ADD new disks and DROP existing disks in the same alter diskgroup command. That way, there will be only one rebalance operation.
There is really no need for downtime here, unless for some reason they need to shut down servers so they can connect to new SAN.
If the customer is worried they need to make sure they have current backups, and if they cannot afford downtime, they need to have some sort of standby system.
Now to your question about estimating the rebalance time. There is no easy answer to that :( You may want to consider the following to get an idea about the time it will take:
1. Review their ASM alert logs and look for past rebalance operations. How long did those take? What was the operation? If they added a new disk/LUN, the alert log will show you how long it took. Note the rebalance power used.
2. Use the new SAN to create a disk group on another server. Restore the production database there. Now add 1TB worth of disks and see how long it takes to rebalance. Now drop 1TB and see how long it takes. This will not be the same as in production as you will have two different SANs, with different I/O characteristics, but it will give you an idea. You only need a single instance setup for this test, as the rebalance runs on one node anyway.
Hope this helps.
Cheers,
Bane
Hi,
ReplyDeleteThere are a few other questions interesting to me - how are file extents being "evenly distributed across all disks in a disk group":
1) There used to be some level of imbalance - at least up to 11.1 a parameter existed _asm_imbalance_tolerance which wasn't 0 by default. (Probably for performance reasons).
2) Considering that very often LUNs of an ASM diskgroup are created as volumes from same RAID Array striped on hardware level (let's say it's RAID10 or RAID0). Due to the double striping it's possible two adjacent extents of same file to get placed on same physical disk (even though they belong to different LUNs) - thus discarding the idea behind the striping.
So I've been wondering - how big is this issue, what is the probability of it to happen, and is ASM doing something special to try to avoid such situations.
Hi Yavor,
ReplyDeleteFile extents are evenly distributed to all disks by placing file extents on all disks in a disk group. Let's say we have 5 disks in an external redundancy disk group with 1MB allocation unit size. And let's say we create a 10 MB file. Each extent will be 1MB, so there will be 2 extents per disk. But if our file is say 11MB, there will be an extra extent on one of the disks - hence the imbalance.
1) When extents of a new file are being placed on disks in a disk group, a starting disk is picked at random. Following my example above, if we now create a 3 MB file and ASM picks as the starting disk the one that has an extra extent, that disk will end up with two extra extents compared to some other disks. Hidden parameter _asm_imbalance_tolerance was created so that ASM performs an extra check and make sure the imbalance is in the limited as per that parameter. Yes, that extra check and possible extra extents moving has a small performance hit.
2) You are correct - that might be an issue and ASM cannot do anything about it. That being said, in my experience I haven't seen this being an issue, or at least I haven't had a case where that was the cause of any problems.
Cheers,
Bane
Hi
ReplyDeleteExcellent explanation Bane
Suppose we are doing a reblance of a disk group with power 6 and we feel that the rebalance is slow and would like to increase the power limit how do we do it.
whether we have to stop the current rebalance with power 0 and then start the rebalance manually with the required power limit say 100.
In this case whether ASM will start the rebalance from the start or will it start from where it has left. Is there any mark in the disks of the disk groups.
Thanks
Raghu
Thanks Raghu,
DeleteTo change the rebalance power you just run the rebalance command with a new power level. There is no need for the intermediate (power 0) step.
When you change the power in the middle of a running rebalance, the current operation stops and the new rebalance continues where the previous one ended. ASM is not going to put all the extents back where they were before the initial rebalance starts.
How does ASM know where the previous rebalance got to?
First, ASM tracks the rebalance in the Continuing Operations Directory (see http://asmsupportguy.blogspot.com.au/2012/01/asm-file-number-4.html). This is more important for forcibly aborted rebalance operations, e.g. when ASM instance 1 crashed in the middle or a rebalance, a second ASM instance can use that information to take over and complete the rebalance.
Second, at the start of every rebalance, ASM looks at the extent distribution map and creates the rebalance plan. That way a new rebalance knows what it needs to do to get the disk group in a balanced state.
Please let me know if that answers your questions.
Cheers,
Bane
Hello,
ReplyDeleteSo glad I came upon this page!
I am currently troubleshooting an ASM anamoly.
This ASM environment holds two databases, one over 1TB. There are 12 disks in the diskgroup, all presented from a SAN Raid5 and all 500GB in size.
From this snapshot of the iostat you can see that of the 12 LUNs in the Datagroup, two in question are the only ones consistently hitting high await, svctm and %util(emcpowers and emcpowerw) during normal business user processing.
I suspect it is a matter of how ASM has striped the data for this database which includes one large 'blob'(700GB).
Is it possible we are hitting Bug 7699985: UNBALANCED DISTRIBUTION OF FILES ACROSS DISKS.
Although the imbalace report does not seem too bad -
-----------------output------------------
Columns Described in Script Minimum
Percent Percent Disk Diskgroup
Diskgroup Name Imbalance Variance Free Count Redundancy
------------------------------ --------- --------------- ------- ----- ----------
ASM_FRA .3 0 56.7 45 EXTERN
ASM_DATA1 .7 0 44.2 108 EXTERN
2 rows selected.
Is there anyway to verify this?
Thanks,
Michele
Linux OSW v3.0
zzz ***Thu Aug 23 14:00:49 EDT 2012
avg-cpu: %user %nice %system %iowait %steal %idle
0.63 0.00 0.46 11.67 0.00 87.24
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
emcpowerk 0.00 0.00 30.33 0.67 11744.00 21.33 379.53 1.62 52.19 10.38 32.17
emcpowers 0.00 0.00 30.00 0.00 13616.00 0.00 453.87 4.33 150.81 33.19 99.57
emcpowerw 0.00 0.00 22.33 0.00 10213.33 0.00 457.31 3.88 175.52 44.79 100.03
emcpowerz 0.00 0.00 32.00 0.00 13098.67 0.00 409.33 1.04 32.50 5.44 17.40
emcpoweraa 0.00 0.00 29.00 0.00 13194.67 0.00 454.99 1.63 56.10 9.70 28.13
emcpowerab 0.00 0.00 28.33 4.67 12416.00 20.67 376.87 0.64 19.35 4.42 14.60
emcpowerac 0.00 0.00 32.00 0.00 12629.00 0.00 394.66 0.93 29.07 6.79 21.73
emcpowerad 0.00 0.00 31.00 0.33 11984.00 10.67 382.81 0.74 23.49 6.23 19.53
emcpowerae 0.00 0.00 25.67 0.00 10965.33 0.00 427.22 1.29 50.22 11.96 30.70
emcpoweraf 0.00 0.00 28.67 0.00 12587.33 0.00 439.09 0.70 24.05 5.03 14.43
emcpowero 0.00 0.00 13.29 1.99 4826.58 31.89 317.91 0.41 26.89 6.02 9.20
emcpowerh 0.00 0.00 11.96 2.33 5464.45 51.16 386.09 0.30 20.81 7.47 10.66
You didn't tell me your ASM version. Bug 7699985 is fixed in 11.2. If you are on an earlier version the workaround is to set init parameter _asm_imbalance_tolerance to 0 (in ASM instance) and restart ASM.
DeleteIt's easy to find out if your disks are unbalanced. Just select total_mb, free_mb from v$asm_disk and see if free_mb is similar for all 12 disks. If not, your disks are not balanced.
But the problem is more likely to be a hot block(s) issue. While ASM spreads the files across all disks, the minimum storage unit is 1MB (I don't know your allocation unit size, but 1MB is default). We can fit quite a few oracle blocks in that 1MB, so if you have hot blocks on those two disks, we would see something like that.
The other thing you can do is request another 500GB LUN from your storage admin. Then drop emcpowerw and add that new LUN (at the same time so that there is only one rebalance). Return 500GB (emcpowerw) to the storage admin. Now check if high q/await/%util migrated to some other disk or went away. If it went away, do the same with emcpowers. If the issues simply migrates to another disk, you have hot blocks and you need to chase those down.
Cheers,
Bane
This comment has been removed by the author.
DeleteI am facing this same issue, i.e.: some disks in a diskgroup are hot as reported by iostats and OEM,
Deleteonly it is an online redo log disk group (one of three; +LOG1,+LOG2,+LOG3).
+LOG1 is very unbalanced as to IO, while +LOG2 and +LOG3 are very evenly balanced.
Two disks have over a billion IOs while many others are less than 1 million.
ASM version is 11.2.0.1.0
Linux RedHat 5.8
EMC power SAN
Columns Described in Script Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Name Imbalance Variance Free Count Redundancy
--------------- --------- --------- ------- ----- ----------
LOG1____________1.3_______.0________84.6____16____EXTERN
LOG2____________1.3_______.0________84.6____16____EXTERN
LOG3_____________.8_______.0________69.6____16____EXTERN
ARCHLOG_________1.5_______.0________99.6_____8____EXTERN
CRS_VOTE_________.0_______.0________91.4_____1____EXTERN
DATAFILE_________.7_______.0__________.7____38____EXTERN
TEMPFILE_________.1_______.0__________.6_____6____EXTERN
=> OEM Disk Group I/O Cumulative Statistics (zero relative) <=
Disk Group___Avg__Avg__Total______Read_______Write
_____________Resp_Thru_I/O Call___Total______Total
__LOG1_0000__0.01_0.01_13475148___5425489____8049659
__LOG1_0001__0____0____3029180____3020449____8731
__LOG1_0002__0.01_0____445260_____444938_____322
__LOG1_0003__0.01_0____534222_____534222_____0
__LOG1_0004__0____0.02_30350066___30325422___24644
__LOG1_0005__0.02_0____446791_____446721_____70
__LOG1_0006__0.01_0____461979_____461886_____93
__LOG1_0007__0____0.25_386913815__386781883__131932
__LOG1_0008__0____1.02_1588774475_1582051461_6723014
__LOG1_0009__0____0.22_339324949__339181388__143561
__LOG1_0010__0.01_0____444896_____444868_____28
__LOG1_0011__0____0.18_280314002__280149889__164113
__LOG1_0012__0____0.5__778163759__772053271__6110488
__LOG1_0013__0.02_0____444044_____444035_____9
__LOG1_0014__0.02_0____438811_____438783_____28
__LOG1_0015__0____0.87_1342205588_1329654489_12551099
=> iostats (one relative) <=
ASM Disk___SAN Disk____Rds___Wrts_______R/sec__W/sec___Util
LOG3_0005 "emcpowerad"_0.4___25.8_______3.2____1651.2__2.32
LOG2_0004 "emcpowerao"_1.6___38.2______12.8____1638.4__2.46
LOG2_0016 "emcpowerap"_1.6___23.8______12.8____1638.4__1.92
LOG2_0008 "emcpowerp"__0.4___33.2_______3.2____1638.4__2.24
LOG1_0016 "emcpowerat"_420.8__0.4___13456________12.8_17.4
LOG1_0014 "emcpoweraa"_0.4__ _0_________3.2_______0____0.02
LOG2_0011 "emcpowerv"__1.6___14.6______12.8_____647.8__1.12
LOG1_0011 "emcpowerc"__1.6____0________12.8_______0____0.06
LOG1_0015 "emcpowerd"__1.6____0________12.8_______0____0.02
LOG1_0013 "emcpoweri"__420.8__0_____13427.2_______0___17.2
LOG1_0012 "emcpowerz"__3.4____0_______70.4________0____0.06
LOG2_0006 "emcpoweru"__1.6___23.8______12.8____1638.4__1.94
LOG2_0007 "emcpowerq"__1.6____0________12.8_______0____0.04
LOG2_0005 "emcpowero"__0.4____0_________3.2_______0____0
LOG2_0010 "emcpowern"__0.4___54.2_______3.2____1638.4__3.06
LOG2_0002 "emcpowerm"__0.4___29.4_______3.2____1638.4__1.7
LOG2_0001 "emcpowers"__1.6____0.4______12.8_______3.2__0.02
LOG2_0014 "emcpowerar"_20.4__26______9830.4____1638.4__5.84
LOG3_0004 "emcpowerx"__0______0_________0_________0____0
LOG2_0013 "emcpoweram"_17_____0______8192_________0____3
LOG3_0007 "emcpoweral"_0______0.4_______0_______87.2___0.08
LOG3_0002 "emcpowerak"_0_____31_________0_____1638.4___3.08
LOG3_0006 "emcpoweraf"_0______0_________0________0_____0
LOG2_0009 "emcpoweraq"_18.6___0______9011________0_____3.38
LOG2_0015 "emcpoweraj"_20.4___8.6____9830.4____535.4___4.28
LOG3_0003 "emcpowerah"_0_____22_________0____1638.4____1.68
LOG2_0012 "emcpowerw"__20.4__29.2____9830.4___1651.2___6.7
LOG2_0003 "emcpowerae"_20.6___1______9836.8____106.4___3.54
Hi Don,
DeleteVery interesting. Given that LOG disk groups have redo logs only, hot blocks are out of the question. I assume all LOG disk groups for the same database, right? Not sure what LOG1_0016 is doing here as LOG1 shows 16 disks (LOG1_0000-LOG1_0015)?
Can you get me the following please and I will have a look:
1. spool /tmp/asm_gv.html
set markup HTML on
break on INST_ID on GROUP_NUMBER
alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';
select SYSDATE "Date and Time" from DUAL;
select * from GV$ASM_OPERATION order by 1;
select * from V$ASM_DISKGROUP_STAT order by 1, 2;
select * from V$ASM_DISK_STAT order by 1, 2, 3;
select * from V$ASM_ATTRIBUTE where NAME not like 'template%' order by 1;
select * from V$VERSION where BANNER like '%Database%' order by 1;
select * from V$ASM_CLIENT order by 1, 2;
show parameter asm
show parameter cluster
show parameter instance
show parameter spfile
show sga
spool off
exit
2. The result of
asmcmd find +LOG1 "*"
asmcmd find +LOG2 "*"
asmcmd find +LOG3 "*"
It may be best to email me that at bane.radulovic at gmail.com and once we find out what is going on, we can post back here.
Cheers,
Bane
Hi Bane,
DeleteI surly have too many disks in my log groups (16). Its nice to see someone confirm that craziness, since I did not set it up I do not know why it was done.
What’s even more interesting is that OEM and Unix iostat show the same anomaly on LOG1 group disks.
Since LOG2 and LOG3 stats look good, LOG3 is already shared between nodes, and there is plenty of space, I plan to put LOG1 files on LOG2 and be done with LOG1 as you suggested.
Thanks a lot,
Don C.
PS: Thanks for the new command 'asmcmd find "*"'. Very cool.
Bane,
ReplyDeleteThank you for the quick response. We are using 11.1.07 version. I am the Sys Admin, so I asked the DBA to check v$asm_disk for me and they look pretty balanced -
TOTAL_MB FREE_MB
--------------- ---------------
511993 228048
511993 227985
511993 228039
511993 226152
511993 228002
511993 228014
511993 228021
511993 228003
511993 228061
511993 226119
511993 226159
511993 226175
This same symptom occurs in our acceptance environment with similar data (a large blob), so you are very likely to be correct about the hot blocks.
How do we investigate and mitigate a hot block issue?
I will also talk to storage admin to see if we can do the test with a new LUN.
Thank you,
Michele
Hi Michele,
DeleteYes, as expected the disk group is balanced. Your DBA should know hot to go track down hot blocks. If not they should engage Oracle Support.
Cheers,
Bane
Thank you again for your help!
ReplyDeleteIt's a pleasure!
DeleteBane,
ReplyDeleteNice blog...
I have a few more questions regarding ASM
Can you explain about the memory initilization parameters of the ORACLE ASM like
DB_CACHE_SIZE
LARGE_POOL_SIZE
SHARED_POOL_SIZE
How the ASM uses the memory that is allocated to them????
Thanks
naveen
Hi Naveen,
DeleteAn ASM instance manages ASM metadata, so the buffer cache in an ASM instance will have ASM metadata blocks. There will be no Oracle database blocks in ASM SGA at any time.
In the shared pool of an ASM instance you will find SQL statements, but also the file extent maps that need to be passed to the database instances.
Large pool would be use for ASM related packages, like dbms_diskgroup.
The SGA requirements for an ASM instance will depend on number of databases it serves, the size of the files it manages, the number of files, etc.
Unfortunately, there is no good info on SGA tuning for ASM...
Feel free to let me know if you have a more specific question or concern.
Cheers,
Bane
Thanks Bane,
DeleteAre there any books that discuss more about these parameters??
Thanks
naveen
Not really. My favourite book, Oracle Automatic Storage Management: Under-the-Hood & Practical Deployment Guide, barely mentions them. And with the push for automatic memory management, I don't expect much on this topic in the future either.
DeleteCheers,
Bane
Hi,
ReplyDeleteI need some clarification during Disk drop from ASM Disk group on windows.
Currently we have 4 and 2 disks on disk groups.
SQL> select group_number, name, TOTAL_MB, FREE_MB from V$asm_disk_stat order by name;
GROUP_NUMBER NAME TOTAL_MB FREE_MB
------------ ------------------------------ ---------- ----------
1 DATA1_0000 255997 244604
1 DATA1_0001 255997 244550
1 DATA1_0002 255997 244590
1 DATA1_0003 255997 244524
2 DATA2_0000 255997 235618
2 DATA2_0001 255997 235642
2 DATA2_0002 255997 235626
2 DATA2_0003 255997 235621
3 DATA3_0000 255997 236167
3 DATA3_0001 255997 236172
4 FLASH_0000 255997 252834
4 FLASH_0001 255997 252829
And I am going to use below drop command to release DISKS -
alter diskgroup FLASH drop disk FLASH_0001;
alter diskgroup DATA3 drop disk DATA3_0001;
alter diskgroup DATA2 drop disk DATA2_0003;
alter diskgroup DATA2 drop disk DATA2_0002;
alter diskgroup DATA2 drop disk DATA2_0001;
alter diskgroup DATA1 drop disk DATA1_0003;
alter diskgroup DATA1 drop disk DATA1_0002;
alter diskgroup DATA1 drop disk DATA1_0001;
I need some clarification -
1. How can we increase the execution time of these above command. ASM_POWER_LIMIT value is 1.
2. What is the acutal commande to execute this.
3. For Rollback, Is it right operation -
ALTER DISKGROUP FLASH ADD DISK '\\.\ORCLDISKFLASH1' NAME FLASH_0001 NOFORCE ;
4. and important - how can I estimate time duration before excution of drop disk command.
Thanks in advance.
Hi Anil,
Delete1. How can we increase the execution time of these above command? ASM_POWER_LIMIT value is 1.
Use the REBALANCE POWER clause like this:
SQL> alter diskgroup FLASH drop disk FLASH_0001 rebalance power 10;
If you have RAC (say 4 nodes), you can drop disks from all 4 disk groups at the same time - one per ASM instance. If you are running a single instance, you have to do this serially. Drop disks from FLASH first, wait for the rebalance to complete, drop disks from DATA1, wait for the rebalance to complete, etc.
2. What is the actual command to execute this?
The commands would be:
SQL> alter diskgroup FLASH drop disk FLASH_0001 rebalance power 10;
If you are running a single instance ASM, wait for the rebalance to complete, then proceed to drop disks from DATA1. If you are in RAC, you can run the next command immediately:
SQL> alter diskgroup DATA1 drop disk DATA1_0001, DATA1_0002, DATA1_0003 rebalance power 10;
Note that all 3 disks are dropped in a single command. That is better than 3 separate DROP DISK commands as there will be only one rebalance operation - saving you time to complete the drop.
SQL> alter diskgroup DATA2 drop disk DATA2_0001, DATA2_0002, DATA2_0003 rebalance power 10;
Same here - drop all 3 disks in a single command. And finally, the last one:
SQL> alter diskgroup DATA3 drop disk DATA3_0001 rebalance power 10;
3. For Rollback, Is it right operation -
ALTER DISKGROUP FLASH ADD DISK '\\.\ORCLDISKFLASH1' NAME FLASH_0001 NOFORCE ;
Not sure what you mean by rollback. If you want to cancel the DROP DISK command for some reason, you can do this:
SQL> alter diskgroup FLASH undrop disks;
That can be done BEFORE the drop completes, i.e. while the rebalance is still running.
If you want to add the disk back, once it is dropped, you do this:
SQL> alter diskgroup FLASH add disk '\\.\ORCLDISKFLASH1';
or
SQL> alter diskgroup FLASH add disk '\\.\ORCLDISKFLASH1' rebalance power 10;
The second ADD DISK should complete faster as it has the higher rebalance power. Note that there is no need to specify the disk name and there is no need for NOFORCE option.
4. How can I estimate time duration before execution of drop disk command.
There is no way to estimate the time before you start the drop. Once you issue DROP DISK, run the following query:
SQL> select GROUP_NUMBER, INST_ID, OPERATION, STATE, SOFAR, EST_WORK, EST_MINUTES from GV$ASM_OPERATION;
The EST_MINUTES will give you an indication of the time to complete. For a complete discussion on that topic have a look at
http://asmsupportguy.blogspot.com.au/2012/07/when-will-my-rebalance-complete.html.
Cheers,
Bane
Thanks a lot... Bane.
ReplyDeleteIt will really very helpfull to me.
Does the rbal background process only kick in when adding/dropping ASM disks ?
ReplyDeleteWhen there is no rebalancing to be done does the rbal process read any kind of ASM metadata ?
The main role of the RBAL is to coordinate the rebalance - come up with the plan, make sure the rebalance power is observed, etc, but it also takes care of interrupted rebalance operations.
DeleteSo, yes it will 'kick in' during disk drop and add, but remember that other operations can also trigger the rebalance - disk resize, file zone change, manual rebalance, etc.
Cheers,
Bane
Hi Bane,
ReplyDeleteI've been doing some research on ASM high redundancy and noticed that the imbalance was very high - 50%.
Digging further into x$kffxp I can see that the primary extents are evenly distributed but the mirror and 2nd mirror copy are not...
Is this expected behaviour?
I'm running 11.2.0.3.
Chris
Hi Chris,
DeleteNo, this is not expected. Everything should be evenly balanced.
Would you be able to share your findings to see if I can spot anything unusual? You can post here or if you want to provide more details, feel free to email me at bane.radulovic at gmail.com.
Cheers,
Bane
Hi Bane,
ReplyDeleteThanks for getting back to me.
Here's some analysis:
1) The imbalance is 50%
Percent Minimum
Percent Disk Size Percent Disk Diskgroup
Diskgroup Imbalance Variance Free Count Redundancy
------------------------------ --------- --------- ------- ----- ----------
DATA 50.2 .0 95.9 18 HIGH
RECO 50.1 .0 84.2 17 HIGH
REDO .3 .0 91.0 4 HIGH
2) Analysis of an example file from x$kffxp, as you can see the primary extents are appropriately distributed, the mirror/2nd mirror are not which confirms the imbalance finding.
select DISK_KFFXP,LXN_KFFXP,count(1)
from x$kffxp
where GROUP_KFFXP=1
and NUMBER_KFFXP=262
group by DISK_KFFXP, LXN_KFFXP
order by LXN_KFFXP, DISK_KFFXP
DISK_KFFXP LXN_KFFXP COUNT(1)
---------- ---------- ----------
2 0 401
3 0 402
4 0 402
5 0 400
6 0 401
7 0 401
8 0 401
9 0 400
10 0 401
11 0 400
12 0 402
13 0 401
14 0 401
15 0 401
16 0 401
17 0 400
18 0 401
19 0 401
2 1 196
3 1 224
4 1 677
5 1 523
6 1 192
7 1 196
8 1 415
9 1 396
10 1 397
11 1 425
12 1 396
13 1 396
14 1 389
15 1 389
16 1 689
17 1 518
18 1 415
19 1 384
2 2 205
3 2 175
4 2 525
5 2 679
6 2 207
7 2 205
8 2 386
9 2 407
10 2 405
11 2 377
12 2 408
13 2 404
14 2 412
15 2 414
16 2 515
17 2 686
18 2 389
19 2 418
Prior to digging into this I attempted a rebalance operation & a asm disk check which passed.
Thanks again,
Chris.
Hi Chris,
DeleteThanks for posting the details.
There are actually two separate issues here.
First, your disk groups are probably well balanced, but that imbalance query is not accurate for disk groups with lot of disks and lot of free space. I will add that to my post, but for now run this query that should give us more accurate (im)balance for your disk groups:
select g.name "Diskgroup",
100*(max((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576)))-min((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576))))/max((d.total_mb-d.free_mb + (128*g.allocation_unit_size/1048576))/(d.total_mb + (128*g.allocation_unit_size/1048576))) "Imbalance",
100*(max(d.total_mb)-min(d.total_mb))/max(d.total_mb) "Variance",
100*(min(d.free_mb/d.total_mb)) "MinFree",
count(*) "DiskCnt",
g.type "Type"
from &asm_disk d , &asm_diskgroup g
where
d.group_number = g.group_number and
d.group_number <> 0 and
d.state = 'NORMAL' and
d.mount_status = 'CACHED'
group by g.name, g.type;
For the record, and credits, that query is from MOS Doc ID 1271089.1.
BTW, your disk group RECO also has lot of free space, not many disks, so for that one the original query returns the correct result.
The second issue, with secondary/tertiary extent imbalance is a bug. I did not know about it until you pointed this out. I tested it and managed to reproduce it straight away. I then checked if someone logged the bug for it and they sure did. Someone from Exadata team actually spotted this. Now the bug is unpublished, with no public entries, so there is really no point to give you the bug number.
Note that the second issue is not actually causing the disk group imbalance. While the extents for that one file are not evenly balanced, some other file will have more extents on disks where this one has many, so the disk group should be balanced. The new imbalance query should confirm that.
Cheers,
Bane
Thanks Bane, this environment happens to be an ODA loaner which I'm currently assessing.
ReplyDeleteUsing the query in Doc ID 1271089.1 I'm getting similar results:
Columns Described in Script Minimum
Percent Percent Disk Diskgroup
Diskgroup Name Imbalance Varience Free Count Redundancy
------------------------------ --------- ---------- ------- ----- ----------
DATA 47.6 0 96.0 18 HIGH
RECO 49.6 0 84.2 17 HIGH
REDO .3 0 91.0 4 HIGH
Columns Described in Script Partner Partner Inactive
Count Space % Failgroup Partnership
Diskgroup Name Imbalance Imbalance Count Count
------------------------------ --------- --------- --------- -----------
DATA 2 50.0 18 0
RECO 2 50.0 17 0
REDO 0 .0 4 0
It would be great if you could upload the bug number for my records.
Thanks,
Chris.
Sure,
DeletePlease email me bane.radulovic at gmail.com.
Cheers,
Bane
Alejandro, thank you for this very informative article on ASM rebalancing! We recently had a Fiber Channel connected NetApp FAS6080 (aka IBM N7900) with a ASM disk group, set for external redundancy, originally 28TB in size spend more than 3 hours in the second phase (extents rebalance) and an additional 3 hours in the third phase (compacting) while adding 4TB of LUNs. Each LUN is 2TB in size and the new LUNs added together in one command at a rebalance power of 10. All LUNs are the same size in the disk group. The third phase was not a “fraction of the second phase” in duration. This is a 11.2.0.3 grid infrastructure with the April 2013 PSU applied. ASM compatibility is 11.2.0.0 No databases were served by the particular ASM instance used to add the LUNs.
ReplyDeleteIt would seem unlikely that the ASM instance knows the physical geometry of the NetApp filer LUNs to move data to the outside edge of the physical spindles in this third phase. This appears best suited for JBOD implementations. Should we, as a best practice, simply use “_DISABLE_REBALANCE_COMPACT=TRUE”? And if so, is that set at the ASM instance or the database instances served?
Thanks again!
Bane, apologies for referring to you as Alejandro above – cut and pasted the wrong name :’)
ReplyDeleteNo worries at all.
DeleteI agree that the third phase may not take just a "fraction of the second phase". In fact, I have also see this take hours. I guess I need to correct that in the post.
You are also right about the compacting being a waste of time on non-JBOD systems. You can set _DISABLE_REBALANCE_COMPACT to TRUE in ASM instances. Now, that is not what Oracle calls the best practice, but I think it should be.
Cheers,
Bane
Hello,
ReplyDeleteHow can I accurately determine the rebalance power performance impact on the application databases when swapping out (45) 1088GB devices for (90) 500GB devices?
There is no formula or procedure to do that. The most important factors will be the size of the disk group (that would be a known value) and the actual I/O load at the time you are going to do this (that could be known if your load is very stable or if you are going to shut down all databases, but it sounds like you want to do it with no downtime). Any spare I/O bandwidth can be used to perform the disk swap.
DeleteIf you have no record of your previous disk add/drop stats, you can start the process with power 5 and see how it goes. You can always change it to 1 if it slows things down significantly or go higher if it's fine.
Cheers,
Bane
Hi Bane.... So we have recently added around 3TB space to our ASM disk named DATA01 and while running the rebalance step, I found that the compact rebalnace part ran for almost 6-7 hrs and we had this issue for the first time.
ReplyDeleteAlso, now I see that there are lot of trace files generated in the Management database i.e., NGMTDB.
What could be the reason and is there any relation between space addition and traces being generated in MGMTDB
Thanks and that i have a super provide: What Do House Renovations Cost house renovation vancouver
ReplyDelete