ASM Support Guy: November 2015

This tiny Perl script might be used to report the error type and count for ASM disks in engineered systems, including Exadata. In those systems, the ASM uses griddisks, that are created from celldisks. The celldisks are in turn created from the physical disks.

errorCount.pl script

To quickly check for errors on any of those disks, we can use errorCount.pl Perl script. This is the complete script with comments:

#!/usr/bin/perl

# Process lines from standard input or a file(s)

while (<>) {

# Strip whitespace

s/\s+//g;

# Get a disk name

if ( /name/ ) {

$name = $_;

}

# Get error type for non-zero counts

elsif ( /err.*?Count:[1-9]/ ) {

$errTypeCount = $_;

# Print the disk name and the error type/count

print "$name, $errTypeCount\n";

}

Stripped to bare bones the errorCount.pl becomes:

#!/usr/bin/perl

while (<>) {

s/\s+//g;

if ( /name/ ) { $name = $_ }

elsif ( /err.*?Count:[1-9]/ ) { print "$name, $_\n" }

}

Usage

Use the script with the output of the cellcli -e list physicaldisk|celldisk|griddisk detail command, on an Exadata storage cell. For example:

# cellcli -e list griddisk detail | errorCount.pl

name:DATA_CD_00_exacell03, errorCount:342

name:RECO_CD_00_exacell03, errorCount:728

name:RECO_CD_06_exacell03, errorCount:8

Use the script with the output of a dcli command, that is normally run on a database server. For example:

# dcli -g cell_group -l root cellcli -e list celldisk detail | errorCount.pl

exacell01:name:CD_03_exacell01, exacell01:errorCount:80

exacell01:name:CD_06_exacell01, exacell01:errorCount:64

The above shows the errors on cell disks 3 and 6, on storage cell 1. Have a closer look at those cell disks:

# dcli -c exacell01 -l root cellcli -e list celldisk CD_03_exacell01,CD_06_exacell01 detail

exacell01: name: CD_03_exacell01

exacell01: comment:

exacell01: creationTime: 2015-09-22T10:59:08+10:00

exacell01: deviceName: /dev/sdd

exacell01: devicePartition: /dev/sdd

exacell01: diskType: HardDisk

exacell01: errorCount: 80

exacell01: freeSpace: 0

exacell01: id: bb74cae4-bb47-4d95-b7ee-e3cc5bdf780f

exacell01: interleaving: none

exacell01: lun: 0_3

exacell01: physicalDisk: E1D9RY

exacell01: raidLevel: 0

exacell01: size: 557.859375G

exacell01: status: normal

exacell01:

exacell01: name: CD_06_exacell01

exacell01: comment:

exacell01: creationTime: 2015-09-22T10:59:08+10:00

exacell01: deviceName: /dev/sdg

exacell01: devicePartition: /dev/sdg

exacell01: diskType: HardDisk

exacell01: errorCount: 64

exacell01: freeSpace: 0

exacell01: id: 404565b2-1be7-4171-8678-9991157156da

exacell01: interleaving: none

exacell01: lun: 0_6

exacell01: physicalDisk: E1EB4J

exacell01: raidLevel: 0

exacell01: size: 557.859375G

exacell01: status: normal

Use the script with the sundiag [physicaldisk|celldisk|griddisk]-detail.out files. For example on a celldisk detailed report:

# errorCount.pl celldisk-detail.out

name:CD_00_exacell03, errorCount:1070

name:CD_04_exacell03, errorCount:4200

name:CD_06_exacell03, errorCount:8

name:FD_02_exacell03, errorCount:5300

Or on a physical disk detailed report:

# errorCount.pl physicaldisk-detail.out

name:20:0, errMediaCount:1000

name:20:5, errMediaCount:2000

name:FLASH_1_0, errHardWriteCount:3000

name:FLASH_1_0, errMediaCount:4000

name:FLASH_1_0, errSeekCount:5000

name:FLASH_1_1, errOtherCount:6000

name:FLASH_4_0, errHardReadCount:7000

Yes, I made the numbers up, to make the output interesting.

The diamond operator (<>) in the while loop, lets us process multiple files, like this:

# errorCount.pl celldisk-detail.out physicaldisk-detail.out

...

But a quicker way to do the above would be:

# cat *detail.out | errorCount.pl

name:CD_03_dmq1cel04, errorCount:2

name:CD_07_dmq1cel04, errorCount:2

name:CD_09_dmq1cel04, errorCount:1

name:CD_11_dmq1cel04, errorCount:1

name:DATA_CD_03_dmq1cel04, errorCount:2

name:DATA_CD_07_dmq1cel04, errorCount:2

name:DATA_CD_09_dmq1cel04, errorCount:1

name:DATA_CD_11_dmq1cel04, errorCount:1

Check any count

The script can be easily modified to report on any disk attribute that reports a non-zero count. For example to check if there is free space on a cell disk, we can use the modified script freeSpace.pl:

#!/usr/bin/perl

while (<>) {

s/\s+//g;

if ( /name/ ) { $name = $_ }

elsif ( /freeSpace:[1-9]/ ) { print "$name, $_\n" }

}

Like this:

# dcli -g cell_group -l root cellcli -e list celldisk detail | freeSpace.pl

exacell01:name:CD_00_exacell01, exacell01:freeSpace:528.6875G

Conclusion

In engineered systems, including Exadata, the ASM uses griddisks, that are created from celldisks. The celldisks are in turn created from the physical disks. To quickly check for errors on any of those disks, we can use the errorCount.pl Perl script, either directly on the cell or via the dcli utility, that we run on a database server.

ASM Support Guy

November 25, 2015

Error count