SCOM, SNMP and TRAPS or The Good, the Bad and the Ugly : Part 2

If you have made your way through Part 1 then you have written your management pack complete with your own custom discovery and imported it into SCOM.  Once you have ensured that it is discovering only the devices you wish to manage in this pack it is time to begin writing the monitors and rules that will apply to the detected devices.   As was mentioned in Part 1a program such as MIB Browser can be very handy in assisting with sorting through all of the OID’s and the healthy values which correspond with each individual OID. 

Creating an SNMP Get Monitor in SCOM 2007 R2

I find the easiest way to create a new monitor or rule is to start with the System Center Operations Manager Console. I will admit it’s not the best and does not give you many of the options you probably want but I find it’s the easiest way to get the XML started, and then edit it to get exactly what we want after the fact.

We will start in the management console, Authoring tab, expand Management Pack Objects and right click on Monitors, Select Create a Monitor \ Unit Monitor.

From within SCOM click on the Authoring tab and then right click on Monitors which is listed beneath Management Pack Objects.  At this point we would like to choose Create a Monitor – Unit Monitor, once this has been picked you will see the following screen:

 First we will create a simple expression Get Monitor and later we will deal with TRAPS, so we pick SNMP – Probe Based Detection – Simple Event Detection – Event Monitor……

Be sure to create this in the management pack you created for the discovery of the object.

Now we have to name our monitor, Select a target (You are looking for the device type you defined in part one and you may have to click the “View all targets” radio button for it to appear) and add a parent monitor (this defines where in the health view tree your new monitor will appear)

Personally I always use the discovery community string but you could use something custom if you want. The frequency is how often you want the monitor to poll the device and the object identifier or OID. This is the bit this will be used in the SNMP get call I find it works most reliably if you don’t have a leading period.

We need to create an expression what causes an alarm. I will keep the expressions simple so you can get a feel for one that works. Click the +Insert at the top and you are presented with 3 fields.  The first field that appears parameter name is the magic field.

/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value

This is the value you are going to compare it is based on the First SnmpProbe from the step before. I have read that if you have more than one SnmpProbe that the number in this case [1] is in reverse order so [1] is at the bottom and [2] would be just above it in the list. Personally I have only one OID providers right now so I don’t know. Let me know if you figure it out for sure.  The operator gives you a drop down of choices. I will get into it more below but thing about this one carefully. If you can use a simple equals or does not equal you can make things much easier. Think of it like this if a UPS battery charge of anything less than 100% is bad then use an expression like “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – does not equal – 100” instead of  “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – less than – 100” it will just save you a bunch of extra steps even if it is not quite as flexible.

 

Second SnmpProbe lets you pick an OID just like for the first SnmpProbe personally all the monitor I have so far use the same OID as in the first provider as I am watching for a single value to be either good or bad. The second expression is exactly the same as the first. If you want a monitor that will not recover (you have to manually reset the health state  I use something like “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value – does not match wildcard – *” since any GET will have some result it will never not match and this will never recover.

Configure health lets you decide how the device health will change when the monitor gets tripped. I use second event raised as healthy and first event raised as warning or critical depending on whats going on.

The last option is if you want to create an alert or not, up to you.

Not so simple expressions

So lets say you don’t want a simple equals or does not equal kind of expression. It’s there in the drop down so whats the big deal you say? Well the SCOM Console make what I consider a bad assumption when creating rules and monitors. All the datatypes are strings. so although “100” does not equal “10” produces a true result “100” is greater than “10” when the values are strings has no meaning. Fixing this is actually not so hard and you have 2 choices. If the next bit is clear to use go for manual xml editing, if that makes you nervous then hold on for the second option.

Option 1 : Advanced

Export your MP to XML and open it in your favorite xml editor.

Way at the bottom you will find an ElementID linked to the text label you assigned to the monitor. Use this ElementID to find your monitor or rule and alter as follows.  I have highlighted the 4 places you must change “String” to “Integer”. Save the file and re-import it into SCOM and your monitor should be working.

<UnitMonitor ID=”UIGeneratedMonitor38d2a38d163b4c1f971885f7ea686f16″ Accessibility=”Public” Enabled=”true” Target=”GEUPS.Single.Phase.Management.Pack.SNMPDevice” ParentMonitorID=”Health!System.Health.AvailabilityState” Remotable=”true” Priority=”Normal” TypeID=”Snmp!System.SnmpProbe.2SingleEvent2StateMonitorType” ConfirmDelivery=”false”>
        <Category>Custom</Category>
        <AlertSettings AlertMessage=”UIGeneratedMonitor38d2a38d163b4c1f971885f7ea686f16_AlertMessageResourceID”>
          <AlertOnState>Error</AlertOnState>
          <AutoResolve>true</AutoResolve>
          <AlertPriority>Normal</AlertPriority>
          <AlertSeverity>Error</AlertSeverity>
          <AlertParameters>
            <AlertParameter1>$Data/Context/SnmpVarBinds/SnmpVarBind[1]/Value$</AlertParameter1>
          </AlertParameters>
        </AlertSettings>
        <OperationalStates>
          <OperationalState ID=”UIGeneratedOpStateId8a4572649aec48b58c336b84182c464b” MonitorTypeStateID=”SecondEventRaised” HealthState=”Success” />
          <OperationalState ID=”UIGeneratedOpStateId2bcf307948d14f4a892a99088603714a” MonitorTypeStateID=”FirstEventRaised” HealthState=”Error” />
        </OperationalStates>
        <Configuration>
          <FirstInterval>60</FirstInterval>
          <FirstIsWriteAction>false</FirstIsWriteAction>
          <FirstIP>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/IPAddress$</FirstIP>
          <FirstCommunityString>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/CommunityString$</FirstCommunityString>
          <FirstVersion>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/Version$</FirstVersion>
          <FirstSnmpVarBinds>
            <SnmpVarBind>
              <OID>.1.3.6.1.2.1.33.1.2.4.0</OID>
              <Syntax>0</Syntax>
              <Value VariantType=”8″ />
            </SnmpVarBind>
          </FirstSnmpVarBinds>
          <FirstExpression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery Type=”Integer“>/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value</XPathQuery>
              </ValueExpression>
              <Operator>Less</Operator>
              <ValueExpression>
                <Value Type=”Integer“>96</Value>
              </ValueExpression>
            </SimpleExpression>
          </FirstExpression>
          <SecondInterval>60</SecondInterval>
          <SecondIsWriteAction>false</SecondIsWriteAction>
          <SecondIP>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/IPAddress$</SecondIP>
          <SecondCommunityString>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/CommunityString$</SecondCommunityString>
          <SecondVersion>$Target/Property[Type=”NetLib!Microsoft.SystemCenter.NetworkDevice”]/Version$</SecondVersion>
          <SecondSnmpVarBinds>
            <SnmpVarBind>
              <OID>.1.3.6.1.2.1.33.1.2.4.0</OID>
              <Syntax>0</Syntax>
              <Value VariantType=”8″ />
            </SnmpVarBind>
          </SecondSnmpVarBinds>
          <SecondExpression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery Type=”Integer“>/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value</XPathQuery>
              </ValueExpression>
              <Operator>GreaterEqual</Operator>
              <ValueExpression>
                <Value Type=”Integer“>96</Value>
              </ValueExpression>
            </SimpleExpression>
          </SecondExpression>
        </Configuration>
      </UnitMonitor>

Option 2 : Easy

Export your management pack to XML then Using System Center Operations Manager 2007 R2 Authoring Console open it.  You may be asked for dependencies that are usually found in “C:\Program Files\System Center Operations Manager 2007” but you can easily enough find *.mp

Once you find your monitor or rule of choice right click, properties. Configuration Tab.  Under each >>>XPathQuery and >>>Value you will see >>>@Type you need to change 4 Types to Integer.  You can see examples of the last two changed in the image below.

Then once you are finished just save the management pack and re-import it into SCOM and it should work.

 

Note: I am writing this in a somewhat sleep deprived state. I have not talked about rules at all but they are simpler than monitors so I hope it’s clear where the magic is. I will also thank David Allen for some blog posts that helped be although I can’t find them right now.  If things here are not clear or more detail is needed please comment or contact me and I will see what I can do.  

Part 1

Part 2

Part 3

15 thoughts on “SCOM, SNMP and TRAPS or The Good, the Bad and the Ugly : Part 2

  1. Pingback: SCOM, SNMP and TRAPS or The Good, the Bad and the Ugly : Part 1 | BlackOps

  2. Pingback: SCOM, SNMP and TRAPS or The Good, the Bad and the Ugly : Part 3 | BlackOps

  3. Shawn

    How did you determine to use “/DataItem/SnmpVarBinds/SnmpVarBind[1]/Value” as the parameter name? I’m having trouble catching specific traps, and I don;t know that I’m populating this field properly …

  4. Scott Garrett

    I am sorry, clearly I didn’t see fit to record where I found that. It’s been a long time and I don’t recall. I suppose it may not be the same in 2012 either way.
    I do recall that I started with a wildcard trap alarm and looked at the details that arrived…. perhaps it was from that log or alert?
    wish I could help more but I don’t have the system to test on any more.

  5. Shawn

    Yeah, I did the same … created a rule to trap everything, and thought that info would help, but so far no dice. Thanks for getting back to me about it anyhow.

    Cheers

  6. mafspeedy

    what if I want do some more complicated process with ValueExpression , some like convert Bit status to Byte Status or Mbyte , can you tell how can I do that or guide me to useful resources for this point

    Many Thanks

  7. Jesty Sam

    Hi,
    We have a requirement for monitoring Total RAM free for SNMP devices. i changed the data type to Integer now but i am not getting any alert now.

    OID : 1.3.6.1.4.1.2021.10.1.3.2

  8. Scott Garrett

    It’s been quite a while but if you would export the MP and post the relevant XML I will see if anything stands out

  9. Scott Garrett

    One other thought….
    If you query the value manually what do you get back?
    what unit is the value… K, M, G?

    A quick Google suggests 1.3.6.1.4.1.2021.10.1.3.2 is a CPU counter for linux as a 5 minute average and a percent so an integer would always get a 0

  10. Jesty Sam

    Hi Scott,
    Thank you for your quick response.

    I have total 16GB RAM space and i want to alert if lesser than 3 GB of Total RAM. I had configured /DataItem/SnmpVarBinds/SnmpVarBind[1]/Value lesser than 3467826 to be critical and alert. The other condition is /DataItem/SnmpVarBinds/SnmpVarBind[1]/Value greater than 3467826 to be healthy.
    Once configured i am getting alerts for higher values like 19458676.

    As you had suggested i had change the data type to Integer and i do not get any alerts now,

    When i query them manually i get the values in KB. So had configured them accordingly.

  11. Jesty Sam

    Hi Scott,

    IF we need to configure for High CPU raw idle CPU time would that be configured in terms of % like 90% or in terms of integer values?

  12. Scott Garrett

    Would you send me a couple of screen shots and a chunk of the xml?
    I am wondering if you are exceeding the limits of an integer datatype.
    Perhaps you could use a decimal or long datatype?

    It’s been a long time I don’t recall the exact options but if you send screenshots and xml I will try
    Any chance there is another OID that reports in MB?

  13. Scott Garrett

    I think those counters are usually a decimal 0.90 = 90%
    you may want to try the “Double” datatype

Leave a Reply

Your email address will not be published. Required fields are marked *