We want to alert on specific drives that tend to fill up quickly when something goes wrong at different levels than the defaults.
Sounds simple enough right? An override and it’s time for Coffee. Well being the careful person I am I decided perhaps a test is in order. So I create a folder on the drive and put a 10 MB file in it called 1.log plus the following batch file
copy 1.log %now%.log
PING 188.8.131.52 -n 1 -w 2000 >NUL
Simply this script copies 1.log to a new file name that has the hour, minute and second as the file name. Then it tries to ping something for 2000 ms (2 Seconds) and does it again.
Don’t wander off while this is running or use it for evil please 😉
So I let this run till my drive is about 55% full, or was that 45% empty? But I am bothered by the complete absence of any kind of alert or alarm or even status change of the server in question .
If you look up the definition of redundancy here you will find 2 things of note, The good kind of redundancy (6. Electronics Duplication or repetition of elements in electronic equipment to provide alternative functional channels in case of failure.) and the kind we are going to deal with here (2. Something redundant or excessive; a superfluity.)
I didn’t notice the first time but there is a paragraph on the properties of the Logical Disk Free Space Monitor, and although I am glad it wasn’t harder to find I am bothered by it’s content.
The Logical Disk Free Space monitoring routine is a high configurable solution that enables Operators to set varying threshold values for system and non-system logical disk volumes. In addition separate threshold values can be set for Warning and Error states.
Since logical disk volumes may vary in size from a few gigabytes to many terabytes or more the Logical Disk Free Space monitoring routine requires that an Operator indicate both the Megabyte and Percentage based threshold values that must be passed before the Warning and Error thresholds reached. This means that in order for the threshold to be reached both the Megabyte and Percentage based threshold values for the System or Non-System Drive must be breached.
So lets say like me you have several different drives of varying sizes that you want alerts on and the defaults from the table below just don’t do it for you. Like me you probably figured you could just set the Non-System Drive Error Percent Threshold and be done with it. Then like me you find that you get no alarm because although you are below the Non-System Drive Error Percent Threshold you are still over the Non-System Drive Error Mbytes Threshold that defaults to 1GB. Sadly now your option is to check the full size on each drive you are monitoring, do the math and figure out how many MB is X% of your drive and enter that value in Non-System Drive Error Mbytes Threshold in addition to the % you already set. Then an interval later you will get an alert something like this…
System Drive Free Space Thresholds (Defaults)
System Drive Error Mbytes Threshold
System Drive Error Percent Threshold
System Drive Warning Mbytes Threshold
System Drive Warning Percent Threshold
Non-System Drive Free Space Thresholds (Defaults)
Non-System Drive Error Mbytes Threshold
Non-System Drive Error Percent Threshold
Non-System Drive Warning Mbytes Threshold
Non-System Drive Warning Percent Threshold
The Solution :
Set overrides for both Mbytes and Percent thresholds as they both have to be breached to throw an alarm.
If you hate math perhaps you could just set the MB alarm to some unreasonably large value so that it is always breached, thus making the % monitor the only one that changes.
Update – Nov 30, 2009
Billy made some comments that started me thinking about a larger solution, and I fear it’s all in to overrides.
First create a series of groups that match your needs, like Alarm System at 100MB, Alarm non-System at 1GB, Alarm System at 100GB, Alarm System at 5%, Alarm non-System at 15%, Alarm System at 50% really whatever makes you happy. Isn’t that what we all really want? Then create a series of overrides based on the groups. Something like for the override targeted at “Alarm System at 100MB” set the system MB to 100MB and set the system % to .01%, when creating a percentage based override the work it the other way setting the % to what you want the the MB to 1,000,000,000,000,000 or something similar. Then as you figure each new machine you just decide how you want it to work for that machine and add it to the static groups you defined earlier. Someone please correct me if I am wrong but you may want to decide if % or MB is more important and set the enforced check box on that override just in case you ever assigned a machine to both groups. I figure this will help SCOM determine what override should apply, but I have not tested that and could be wrong there.
Hey Microsoft :
Is it not the point of a percent based alarm that you don’t need to go to every dive of a different size and figure it out for your self? For me I would expect that a person could say send me an alarm whenever a drive is 50% full, but also at the same time may want to know when some very old small drives have less than 10GB free even if this does not constitute 50% of the drive. I simply can’t wrap my head around the concept that because “logical disk volumes may vary in size from a few gigabytes to many terabytes or more” would cause any situation where you would want to set 2 different thresholds that both have to be triggered to cause an alarm. Does the alarm in your house only go off if a burglar had both your front and back doors open at the same time?
Last modified time: 19/11/2009 3:13:35 PM Alert description: The disk J: on computer X is running out of disk space. The values that exceeded the threshold are 52% free space and 36452 free Mbytes.