December 8, 2015

ASM data scrubbing

According to Wikipedia, data scrubbing is “an error correction technique that uses a background task to periodically inspect main memory or storage for errors, and then correct detected errors using redundant data in form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.”

Oracle documentation says that ASM disk scrubbing “improves availability and reliability by searching for data that may be less likely to be read. Disk scrubbing checks logical data corruptions and repairs them automatically in normal and high redundancy disks groups. The scrubbing process repairs logical corruptions using the mirror disks. Disk scrubbing can be combined with disk group rebalance to reduce I/O resources.” This feature is available in ASM version 12c.

Setup

First let's look at my setup. I have a single normal redundancy disk group DATA, with ASM filter driver (AFD) disks.

[grid@dbserver]$ sqlplus / as sysasm
SQL*Plus: Release 12.1.0.2.0 Production on Tue Dec 8 14:08:22 2015
...

SQL> select NAME, TYPE, STATE from v$asm_diskgroup_stat;

NAME         TYPE   STATE
------------ ------ -----------
DATA         NORMAL MOUNTED

SQL> select DISK_NUMBER, PATH from v$asm_disk_stat;

DISK_NUMBER PATH
----------- ----------------
        0 AFD:ASMDISK01
        2 AFD:ASMDISK03
        4 AFD:ASMDISK05
        1 AFD:ASMDISK02
        6 AFD:ASMDISK07
        5 AFD:ASMDISK06
        3 AFD:ASMDISK04

7 rows selected.

SQL>

And I have some datafiles in that disk group:

[grid@dbserver]$ asmcmd ls +DATA/BR/DATAFILE
SYSTEM.261.887497785
USERS.262.8874978313
UNDOTBS1.269.887497831
SYSAUX.270.887497739
T1.305.897911209
T2.306.897911479
T3.307.897911659

Scrubbing a file

We can scrub a disk group, an individual disk or an individual file. In this post I will demonstrate the feature by scrubbing a single file.

To scrub a file, we connect to an ASM instance and run the ALTER DISKGROUP SCRUB FILE command:

SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair power high wait;
...

The REPAIR keyword is optional, in which case a corruption would be reported, but not repaired.

In the ASM alert log we see the time stamped command, and once the scrubbing finishes, we see the outcome, which is something like this:

Tue Dec 08 11:55:56 2015
SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair power high wait
Tue Dec 08 11:55:56 2015
NOTE: Waiting for scrubbing to finish
Tue Dec 08 11:56:43 2015
NOTE: Scrubbing finished
Tue Dec 08 11:56:43 2015
SUCCESS: alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair power high wait

There were no corruptions in my file, so none were reported.

While the scrubbing was running, I saw two ASM processes doing the actual work:

[grid@dbserver]$ ps -ef | grep asm_sc
grid     17902     1  0 11:27 ?        00:00:00 asm_scrb_+ASM
grid     24365     1  0 11:49 ?        00:00:01 asm_scc0_+ASM
...

SCRB - ASM disk scrubbing master process that coordinates disk scrubbing operations.

SCCn - ASM disk scrubbing slave check process that performs the checking operations. The possible processes are SCC0-SCC9.
We would see additional processes during the repair operation:

SCRn - ASM disk scrubbing slave repair process that performs repair operation. The possible processes are SCR0-SCR9.
  
SCVn - ASM disk scrubbing slave verify process that performs the verifying operations. The possible processes are SCV0-SCV9.

Corrupted block found

To make this interesting, let's corrupt one block - say block 200 - in that datafile. First I used find_block.pl script, to locate both copies of the block:

[grid@dbserver ]$ $ORACLE_HOME/perl/bin/perl find_block.pl +DATA/BR/DATAFILE/t3.307.897911659 200
dd if=/dev/sdo1 bs=8192 count=1 skip=1460552 of=block_200.dd
dd if=/dev/sdd1 bs=8192 count=1 skip=1462088 of=block_200.dd

Then I used those dd commands to extract the blocks.

[root@dbserver]# dd if=/dev/sdo1 bs=8192 count=1 skip=1460552 of=block_200_primary.dd
[root@dbserver]# dd if=/dev/sdd1 bs=8192 count=1 skip=1462088 of=block_200_mirror.dd
Using diff command I verified both copies are exactly the same.

[root@dbserver]# diff block_200_primary.dd block_200_mirror.dd
[root@dbserver]#

Let's now corrupt the mirror block, which is not trivial with the ASM filter driver.

First create a text file, to use as a corruption. Sure, I could have just used /dev/zero for that.

[root@dbserver]# od -c block_200_mirror.dd > block_200_corrupt.txt

Shut down the database and ASM, then unload the ASM filter driver, otherwise I cannot dd into that disk.

[root@dbserver]# modprobe -r oracleafd
[root@dbserver]# lsmod | grep oracle
oracleacfs           3308260  0
oracleadvm            508030  0
oracleoks             506741  2 oracleacfs,oracleadvm

Now corrupt the mirror copy of block 200.

[root@dbserver]# dd if=block_200_corrupt.txt of=/dev/sdd1 bs=8192 count=1 seek=1462088
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00160027 s, 5.1 MB/s

Confirm the mirror block is corrupt, by reading the content again.

[root@dbserver]# dd if=/dev/sdd1 bs=8192 count=1 skip=1462088 of=block_200_corr.dd
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00152298 s, 5.4 MB/s

[root@dbserver]# diff block_200_primary.dd  block_200_corr.dd
Binary files block_200_primary.dd and block_200_corr.dd differ
[root@dbserver]#

As we can see, the content of the mirror block is now different.

Scrub a file

Load the ASM filter driver back.

[root@dbserver]# insmod /lib/modules/.../oracleafd.ko

And restart the ASM instance.

We can now scrub the datafile.

SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' wait;
...

The ASM alert log shows that a corrupted block was found.

Tue Dec 08 13:25:48 2015
SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' wait
Starting background process SCRB
Tue Dec 08 13:25:48 2015
SCRB started with pid=25, OS id=4585
Tue Dec 08 13:25:48 2015
NOTE: Waiting for scrubbing to finish
Tue Dec 08 13:25:49 2015
NOTE: Corrupted block 200 found in file +DATA/BR/DATAFILE/t3.307.897911659
Tue Dec 08 13:25:50 2015
NOTE: Corrupted block 200 found in file +DATA/BR/DATAFILE/t3.307.897911659
Tue Dec 08 13:26:39 2015
NOTE: Scrubbing finished
Tue Dec 08 13:26:39 2015
SUCCESS: alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' wait

In the background dump destination we see newly generated trace files.

[grid@dbserver]$ ls -lrt | tail
-rw-r-----. 1 grid oinstall  36887 Dec  8 13:25 +ASM_scc0_4587.trc
-rw-r-----. 1 grid oinstall  36967 Dec  8 13:25 +ASM_scv0_4592.trc
-rw-r-----. 1 grid oinstall   5088 Dec  8 13:25 +ASM_gmon_16705.trc
-rw-r-----. 1 grid oinstall  36965 Dec  8 13:25 +ASM_scv1_4599.trc
-rw-r-----. 1 grid oinstall 551218 Dec  8 13:26 alert_+ASM.log

And if we have a closer look, in, say the SCC0 trace file, we see the scrubbing report.

[grid@dbserver]$ more ./+ASM_scc0_4587.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_scc0_4587.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
...
HARD VIOLATION(S) DETECTED!
CORRUPTED BLOCK INFORMATION:
memory address: 0x7f13531fb000
block size: 8192
block number: 200 [0xc8]
block offset in this write request of 1 block(s): 0
file type: datafile (type value: 2)
...
HARD check failed at block 200
One way to get a quick scrubbing report would be to grep the trace files for the relevant key words.

[grid@dbserver]$ egrep -i "corrupted|block number|failed" *sc*trc
+ASM_scc0_4587.trc:CORRUPTED BLOCK INFORMATION:
+ASM_scc0_4587.trc:  block number: 200 [0xc8]
+ASM_scc0_4587.trc:  Magic number  Block size  Block number  Checksum  Head-tail match
+ASM_scc0_4587.trc:FAILED CHECK(S):
+ASM_scc0_4587.trc:  MAGIC NUMBER CHECK FAILED
+ASM_scc0_4587.trc:  BLOCK SIZE CHECK FAILED
+ASM_scc0_4587.trc:  BLOCK NUMBER CHECK FAILED
+ASM_scc0_4587.trc:  HEAD-TAIL MATCH CHECK FAILED
+ASM_scc0_4587.trc:HARD check failed at block 200

+ASM_scv0_4592.trc:CORRUPTED BLOCK INFORMATION:
+ASM_scv0_4592.trc:  block number: 200 [0xc8]
+ASM_scv0_4592.trc:  Magic number  Block size  Block number  Checksum  Head-tail match
+ASM_scv0_4592.trc:FAILED CHECK(S):
+ASM_scv0_4592.trc:  MAGIC NUMBER CHECK FAILED
+ASM_scv0_4592.trc:  BLOCK SIZE CHECK FAILED
+ASM_scv0_4592.trc:  BLOCK NUMBER CHECK FAILED
+ASM_scv0_4592.trc:  HEAD-TAIL MATCH CHECK FAILED
+ASM_scv0_4592.trc:HARD check failed at block 200
+ASM_scv0_4592.trc:NOTE: Corrupted block 200 found in file +DATA/BR/DATAFILE/t3.307.897911659

+ASM_scv1_4599.trc:CORRUPTED BLOCK INFORMATION:
+ASM_scv1_4599.trc:  block number: 200 [0xc8]
+ASM_scv1_4599.trc:  Magic number  Block size  Block number  Checksum  Head-tail match
+ASM_scv1_4599.trc:FAILED CHECK(S):
+ASM_scv1_4599.trc:  MAGIC NUMBER CHECK FAILED
+ASM_scv1_4599.trc:  BLOCK SIZE CHECK FAILED
+ASM_scv1_4599.trc:  BLOCK NUMBER CHECK FAILED
+ASM_scv1_4599.trc:  HEAD-TAIL MATCH CHECK FAILED
+ASM_scv1_4599.trc:HARD check failed at block 200
+ASM_scv1_4599.trc:NOTE: Corrupted block 200 found in file +DATA/BR/DATAFILE/t3.307.897911659
[grid@dbserver]$

Fix the corruption

But the corruption wasn't repaired, since I didn't ask for that. Let's fix it - sorry, scrub it.

SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair wait;
...

This time the ASM alert log shows the block was scrubbed.

Tue Dec 08 13:35:02 2015
SQL> alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair wait
Tue Dec 08 13:35:02 2015
NOTE: Waiting for scrubbing to finish
Tue Dec 08 13:35:03 2015
NOTE: Corrupted block 200 found in file +DATA/BR/DATAFILE/t3.307.897911659
Tue Dec 08 13:35:04 2015
NOTE: Scrubbing block 200 in file 307.897911659 in slave
NOTE: Successfully scrubbed block 200 in file 307.897911659
Tue Dec 08 13:35:53 2015
NOTE: Scrubbing finished
Tue Dec 08 13:35:53 2015
SUCCESS: alter diskgroup DATA scrub file '+DATA/BR/DATAFILE/t3.307.897911659' repair wait

Let's check, by reading the block back and comparing it to the primary copy.

[root@dbserver ~]# dd if=/dev/sdd1 bs=8192 count=1 skip=1462088 of=block_200_after_scrub_repair.dd
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000856456 s, 9.6 MB/s

[root@dbserver]# diff block_200_primary.dd block_200_after_scrub_repair.dd
[root@dbserver]#

This shows that the block was successfully repaired.

Conclusion

ASM data scrubbing can detect and repair Oracle data blocks that are affected by media or logical corruptions. It can also correct Oracle data blocks written to incorrect location, or overwritten by an external, non-Oracle process (like in my example).

November 25, 2015

Error count

This tiny Perl script might be used to report the error type and count for ASM disks in engineered systems, including Exadata. In those systems, the ASM uses griddisks, that are created from celldisks. The celldisks are in turn created from the physical disks.

errorCount.pl script

To quickly check for errors on any of those disks, we can use errorCount.pl Perl script. This is the complete script with comments:

#!/usr/bin/perl
# Process lines from standard input or a file(s)
while (<>) {
 # Strip whitespace
 s/\s+//g;
 # Get a disk name
 if ( /name/ ) {
  $name = $_;
 }
 # Get error type for non-zero counts
 elsif ( /err.*?Count:[1-9]/ ) {
  $errTypeCount = $_;
  # Print the disk name and the error type/count
  print "$name, $errTypeCount\n";
 }
}

Stripped to bare bones the errorCount.pl becomes:

#!/usr/bin/perl
while (<>) {
 s/\s+//g;
 if ( /name/ ) { $name = $_ }
 elsif ( /err.*?Count:[1-9]/ ) { print "$name, $_\n" }
}

Usage

Use the script with the output of the cellcli -e list physicaldisk|celldisk|griddisk detail command, on an Exadata storage cell. For example:

# cellcli -e list griddisk detail | errorCount.pl
name:DATA_CD_00_exacell03, errorCount:342
name:RECO_CD_00_exacell03, errorCount:728
name:RECO_CD_06_exacell03, errorCount:8
#

Use the script with the output of a dcli command, that is normally run on a database server. For example:

# dcli -g cell_group -l root cellcli -e list celldisk detail | errorCount.pl
exacell01:name:CD_03_exacell01, exacell01:errorCount:80
exacell01:name:CD_06_exacell01, exacell01:errorCount:64
#

The above shows the errors on cell disks 3 and 6, on storage cell 1. Have a closer look at those cell disks:

# dcli -c exacell01 -l root cellcli -e list celldisk CD_03_exacell01,CD_06_exacell01 detail
exacell01: name:                       CD_03_exacell01
exacell01: comment:
exacell01: creationTime:               2015-09-22T10:59:08+10:00
exacell01: deviceName:                 /dev/sdd
exacell01: devicePartition:            /dev/sdd
exacell01: diskType:                   HardDisk
exacell01: errorCount:                 80
exacell01: freeSpace:                  0
exacell01: id:                         bb74cae4-bb47-4d95-b7ee-e3cc5bdf780f
exacell01: interleaving:               none
exacell01: lun:                        0_3
exacell01: physicalDisk:               E1D9RY
exacell01: raidLevel:                  0
exacell01: size:                       557.859375G
exacell01: status:                     normal
exacell01:
exacell01: name:                       CD_06_exacell01
exacell01: comment:
exacell01: creationTime:               2015-09-22T10:59:08+10:00
exacell01: deviceName:                 /dev/sdg
exacell01: devicePartition:            /dev/sdg
exacell01: diskType:                   HardDisk
exacell01: errorCount:                 64
exacell01: freeSpace:                  0
exacell01: id:                         404565b2-1be7-4171-8678-9991157156da
exacell01: interleaving:               none
exacell01: lun:                        0_6
exacell01: physicalDisk:               E1EB4J
exacell01: raidLevel:                  0
exacell01: size:                       557.859375G
exacell01: status:                     normal
#

Use the script with the sundiag [physicaldisk|celldisk|griddisk]-detail.out files. For example on a celldisk detailed report:

# errorCount.pl celldisk-detail.out
name:CD_00_exacell03, errorCount:1070
name:CD_04_exacell03, errorCount:4200
name:CD_06_exacell03, errorCount:8
name:FD_02_exacell03, errorCount:5300
#

Or on a physical disk detailed report:

# errorCount.pl physicaldisk-detail.out
name:20:0, errMediaCount:1000
name:20:5, errMediaCount:2000
name:FLASH_1_0, errHardWriteCount:3000
name:FLASH_1_0, errMediaCount:4000
name:FLASH_1_0, errSeekCount:5000
name:FLASH_1_1, errOtherCount:6000
name:FLASH_4_0, errHardReadCount:7000
#

Yes, I made the numbers up, to make the output interesting.

The diamond operator (<>) in the while loop, lets us process multiple files, like this:

# errorCount.pl celldisk-detail.out physicaldisk-detail.out
...

But a quicker way to do the above would be:

# cat *detail.out | errorCount.pl
name:CD_03_dmq1cel04, errorCount:2
name:CD_07_dmq1cel04, errorCount:2
name:CD_09_dmq1cel04, errorCount:1
name:CD_11_dmq1cel04, errorCount:1
name:DATA_CD_03_dmq1cel04, errorCount:2
name:DATA_CD_07_dmq1cel04, errorCount:2
name:DATA_CD_09_dmq1cel04, errorCount:1
name:DATA_CD_11_dmq1cel04, errorCount:1
#

Check any count

The script can be easily modified to report on any disk attribute that reports a non-zero count. For example to check if there is free space on a cell disk, we can use the modified script freeSpace.pl:

#!/usr/bin/perl
while (<>) {
 s/\s+//g;
 if ( /name/ ) { $name = $_ }
 elsif ( /freeSpace:[1-9]/ ) { print "$name, $_\n" }
}

Like this:

# dcli -g cell_group -l root cellcli -e list celldisk detail | freeSpace.pl
exacell01:name:CD_00_exacell01, exacell01:freeSpace:528.6875G
#

Conclusion

In engineered systems, including Exadata, the ASM uses griddisks, that are created from celldisks. The celldisks are in turn created from the physical disks. To quickly check for errors on any of those disks, we can use the errorCount.pl Perl script, either directly on the cell or via the dcli utility, that we run on a database server.