The Problem:
We want to alert on specific drives that tend to fill up quickly when something goes wrong at different levels than the defaults.
The Process:
Sounds simple enough right? An override and it’s time for Coffee. Well being the careful person I am I decided perhaps a test is in order. So I create a folder on the drive and put a 10 MB file in it called 1.log plus the following batch file
:10
set Now=%time:~0,2%%time:~3,2%%time:~6,2%
copy 1.log %now%.log
PING 1.1.1.1 -n 1 -w 2000 >NUL
goto 10
Simply this script copies 1.log to a new file name that has the hour, minute and second as the file name. Then it tries to ping something for 2000 ms (2 Seconds) and does it again.
Don’t wander off while this is running or use it for evil please 😉
So I let this run till my drive is about 55% full, or was that 45% empty? But I am bothered by the complete absence of any kind of alert or alarm or even status change of the server in question .
If you look up the definition of redundancy here you will find 2 things of note, The good kind of redundancy (6. Electronics Duplication or repetition of elements in electronic equipment to provide alternative functional channels in case of failure.) and the kind we are going to deal with here (2. Something redundant or excessive; a superfluity.)
I didn’t notice the first time but there is a paragraph on the properties of the Logical Disk Free Space Monitor, and although I am glad it wasn’t harder to find I am bothered by it’s content.
Configuration
The Logical Disk Free Space monitoring routine is a high configurable solution that enables Operators to set varying threshold values for system and non-system logical disk volumes. In addition separate threshold values can be set for Warning and Error states.
Since logical disk volumes may vary in size from a few gigabytes to many terabytes or more the Logical Disk Free Space monitoring routine requires that an Operator indicate both the Megabyte and Percentage based threshold values that must be passed before the Warning and Error thresholds reached. This means that in order for the threshold to be reached both the Megabyte and Percentage based threshold values for the System or Non-System Drive must be breached.
So lets say like me you have several different drives of varying sizes that you want alerts on and the defaults from the table below just don’t do it for you. Like me you probably figured you could just set the Non-System Drive Error Percent Threshold and be done with it. Then like me you find that you get no alarm because although you are below the Non-System Drive Error Percent Threshold you are still over the Non-System Drive Error Mbytes Threshold that defaults to 1GB. Sadly now your option is to check the full size on each drive you are monitoring, do the math and figure out how many MB is X% of your drive and enter that value in Non-System Drive Error Mbytes Threshold in addition to the % you already set. Then an interval later you will get an alert something like this…
System Drive Free Space Thresholds (Defaults)
Parameter
Default Value
System Drive Error Mbytes Threshold
100
System Drive Error Percent Threshold
5
System Drive Warning Mbytes Threshold
200
System Drive Warning Percent Threshold
10
Non-System Drive Free Space Thresholds (Defaults)
Parameter
Default Value
Non-System Drive Error Mbytes Threshold
1000
Non-System Drive Error Percent Threshold
5
Non-System Drive Warning Mbytes Threshold
2000
Non-System Drive Warning Percent Threshold
10
The Solution :
Set overrides for both Mbytes and Percent thresholds as they both have to be breached to throw an alarm.
If you hate math perhaps you could just set the MB alarm to some unreasonably large value so that it is always breached, thus making the % monitor the only one that changes.
Update – Nov 30, 2009
Billy made some comments that started me thinking about a larger solution, and I fear it’s all in to overrides.
First create a series of groups that match your needs, like Alarm System at 100MB, Alarm non-System at 1GB, Alarm System at 100GB, Alarm System at 5%, Alarm non-System at 15%, Alarm System at 50% really whatever makes you happy. Isn’t that what we all really want? Then create a series of overrides based on the groups. Something like for the override targeted at “Alarm System at 100MB” set the system MB to 100MB and set the system % to .01%, when creating a percentage based override the work it the other way setting the % to what you want the the MB to 1,000,000,000,000,000 or something similar. Then as you figure each new machine you just decide how you want it to work for that machine and add it to the static groups you defined earlier. Someone please correct me if I am wrong but you may want to decide if % or MB is more important and set the enforced check box on that override just in case you ever assigned a machine to both groups. I figure this will help SCOM determine what override should apply, but I have not tested that and could be wrong there.
Hey Microsoft :
Is it not the point of a percent based alarm that you don’t need to go to every dive of a different size and figure it out for your self? For me I would expect that a person could say send me an alarm whenever a drive is 50% full, but also at the same time may want to know when some very old small drives have less than 10GB free even if this does not constitute 50% of the drive. I simply can’t wrap my head around the concept that because “logical disk volumes may vary in size from a few gigabytes to many terabytes or more” would cause any situation where you would want to set 2 different thresholds that both have to be triggered to cause an alarm. Does the alarm in your house only go off if a burglar had both your front and back doors open at the same time?
Last modified time: 19/11/2009 3:13:35 PM Alert description: The disk J: on computer X is running out of disk space. The values that exceeded the threshold are 52% free space and 36452 free Mbytes.
Hi,
I have also had an issue were the alert was showing that the volume on the server was fine when in fact it had 1% available disk space. The monitor had been setup to alert on warning at 15% howevere the Mbytes had been set as default to a really small amount. Setting the Mbytes to a very large number worked for me as well.
Is there another way of solving this issue? Can an override be created to disable the Mbytes and only work off percentage. When dealing with a large environment with more then 200 servers practical to set different Mbytes for every server so it would be ideal if it only worked off percentage.
How about disabling the monitor and adding an override for all logical disks set enabled with a custom monitor that only does percentage?
You are quite right I think this monitor design is flawed. Sometime I would like someone to explain the logic to me, ‘cause I don’t get it. From my perspective you have 2 options. The first is to decide in your environment if you are going to run all % or all MB, then set the type you are not using to a really massive number or 100%. This will cause that type to always be failed, resulting in an alert whenever the other condition is breached. The second option you mention here is a good one. Override and disable this monitor and create you own from scratch that does what you want. I think when I get a minute I will add to this entry how to do that.
Thanks.
Hey Scott,
I forgot to ask you another question but i guess your title answers this for me. We are looking at upgrading to SCOM 2007 R2 which is currently not in place on our prod environment but has been setup in our test enviroenmnt. I have not had the chance to play around with it and I thought i pose the question in case by some miracle there is a solution. Does R2 have anything in place that solves our problem?
Sadly all the content on this blog is R2, we went right to it in produciton as we hadn’t launched yet.
I think the only hope is an updated management pack.
Thanks for your help Scott.
made a quick update that might be a slightly nicer solution for you