[Pkg-nagios-devel] Bug#959956: No out of space check_snmp_storage warning on btrfs

IB Development Team dev at ib.pl
Thu May 7 13:09:05 BST 2020


Package: nagios-snmp-plugins
Version: 2.0.0-1

Hello,

We've noticed out of space condition in one of btrfs filesystems
monitored with check_snmp_storage; problem was not detected by
check_snmp_storage.

Filesystem status:

root at mysrv:~# btrfs fi df -b /mnt
Data, single: total=2222194688, used=2222194688
System, DUP: total=8388608, used=16384
System, single: total=4194304, used=0
Metadata, DUP: total=484835328, used=163921920
Metadata, single: total=8388608, used=0
GlobalReserve, single: total=16777216, used=0

root at mysrv:~# df -B1 /mnt
Filesystem       1B-blocks       Used     Available Use% Mounted on
/dev/mapper/myvg 3221225472   2566848512         0  100% /mnt

Net-snmp snmp infos for this fs:

hrStorageTable (used by check_snmp_storage):

iso.3.6.1.2.1.25.2.3.1.1.69 = INTEGER: 69                  //
hrStorageIndex
iso.3.6.1.2.1.25.2.3.1.2.69 = OID: iso.3.6.1.2.1.25.2.1.4  //
hrStorageType
iso.3.6.1.2.1.25.2.3.1.3.69 = STRING: "/mnt"               //
hrStorageDescr
iso.3.6.1.2.1.25.2.3.1.4.69 = INTEGER: 4096                //
hrStorageAllocationUnits
iso.3.6.1.2.1.25.2.3.1.5.69 = INTEGER: 786432              //
hrStorageSize
iso.3.6.1.2.1.25.2.3.1.6.69 = INTEGER: 626672              //
hrStorageUsed

dskTable (not used by check_snmp_storage):

iso.3.6.1.4.1.2021.9.1.1.19 = INTEGER:
19                                 // dskIndex
iso.3.6.1.4.1.2021.9.1.2.19 = STRING:
"/mnt"                              // dskPath
iso.3.6.1.4.1.2021.9.1.3.19 = STRING: "/dev/mapper/myvg"           //
dskDevice
iso.3.6.1.4.1.2021.9.1.4.19 = INTEGER:
-1                                 // dskMinimum
iso.3.6.1.4.1.2021.9.1.5.19 = INTEGER:
10                                 // dskMinPercent
iso.3.6.1.4.1.2021.9.1.6.19 = INTEGER:
3145728                            // dskTotal (Total size of the
disk/partion (kBytes))
iso.3.6.1.4.1.2021.9.1.7.19 = INTEGER:
0                                  // dskAvail (Available space on the
disk)
iso.3.6.1.4.1.2021.9.1.8.19 = INTEGER:
2506688                            // dskUsed (Used space on the disk)
iso.3.6.1.4.1.2021.9.1.9.19 = INTEGER:
80                                 // dskPercent (Percentage of space
used on disk)
iso.3.6.1.4.1.2021.9.1.10.19 = INTEGER:
0                                 // dskPercentNode (Percentage of
inodes used on disk)
iso.3.6.1.4.1.2021.9.1.11.19 = Gauge32:
3145728                           // dskTotalLow Total size of the
disk/partion (kBytes). Together with dskTotalHigh composes 64-bit
number)
iso.3.6.1.4.1.2021.9.1.12.19 = Gauge32:
0                                 // dskTotalHigh (Total size of the
disk/partion (kBytes). Together with dskTotalLow composes 64-bit
number.)
iso.3.6.1.4.1.2021.9.1.13.19 = Gauge32:
0                                 // dskAvailLow (Available space on
the disk (kBytes). Together with dskAvailHigh composes 64-bit number.)
iso.3.6.1.4.1.2021.9.1.14.19 = Gauge32:
0                                 // dskAvailHigh (Available space on
the disk (kBytes). Together with dskAvailLow composes 64-bit number.)
iso.3.6.1.4.1.2021.9.1.15.19 = Gauge32:
2506688                           // dskUsedLow (Used space on the disk
(kBytes). Together with dskUsedHigh composes 64-bit number.)
iso.3.6.1.4.1.2021.9.1.16.19 = Gauge32:
0                                 // dskUsedHigh (Used space on the
disk (kBytes). Together with dskUsedLow composes 64-bit number.)
iso.3.6.1.4.1.2021.9.1.100.19 = INTEGER:
1                                // dskErrorFlag (Error flag signaling
that the disk or partition is under the minimum required space
configured for it.)
iso.3.6.1.4.1.2021.9.1.101.19 = STRING: "/mnt: less than 10% free (=
0%)" // dskErrorMsg (A text description providing a warning and the
space left on the disk.)

The cause of problem is that in btrfs free space may be less than
total-used and by default check_snmp_storage checks used space which
was in this case about 80% (with 0% available in the same time and OS
was throwing OOS errors on write).

The solution is to configure warn/crit levels for %free not %used and
use avail from dskTable because hrStorageTable does not provide this
info (check_snmp_storage calculates free=total-used which is wrong for
btrfs as above).

Attached please find patch generated for upstream source

https://raw.githubusercontent.com/dnsmichi/manubulon-snmp/master/plugins/check_snmp_storage.pl

that works ok for us (this allows one to use new -u switch to use
dskTable and its avail info instead of default hrStorageTable and its
free=total-used calculation). This also adds a few spaces to plugin
output for better message readability.

-- 
Regards,
Pawel Boguslawski

IB Development Team
https://dev.ib.pl/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: check_snmp_storage.pl-1.3.3-ib1.patch
Type: text/x-patch
Size: 14304 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-nagios-devel/attachments/20200507/89f02830/attachment.bin>


More information about the Pkg-nagios-devel mailing list