Quantcast
Channel: Oracle DBA – A lifelong learning experience
Viewing all articles
Browse latest Browse all 189

How to add a metric extension and use a ‘control’ table to test results

$
0
0

I have previously blogged about creating a User Defined Metric (UDM) before  (7 years ago!)  but the method I am using today demonstrates how you can test out the alert using specific data.

The aim was to see if integration data is flowing through a table.  If there is a backlog we want to know about it.

Simon Ryan, a colleague did all the preparatory work and created all the 18 !! screen shots which I had to copy and paste into this entry

select * from sm7.EVENTOUTM1; -- Transient table so data not usually present.

The data is processed quickly but occasionally you can see a handful of rows for a few seconds.

select DESCRIPTOR ,
EVTYPE ,
TO_CHAR(EVTIME, 'DD-MON-YYYY HH24:MI:SS') ,
EVSYSSEQ ,
EVUSRSEQ from sm7.EVENTOUTM1;
DESCRIPTOR EVTYPE EVTIME EVSYSSEQ EVUSRSEQ
 (BLOB) email 08-NOV-2017 11:40:01 0000000005254328

A typical query for the ME would be very simple:

select count(*) from sm7.EVENTOUTM1;

However, we are going to plugin a ‘control’ table in order for us to test the alert by artificially adjusting the results.

Test ME Table:

create table system.em_me_control (metric_extension char(40),reason char(20), value number);
grant select on system.em_me_control to dbsnmp,sys;
grant select on sm7.EVENTOUTM1 to dbsnmp.sys;

Test ME SQL:

insert into system.em_me_control values (‘sm9_cc_integration_check’,'Test ME Warning',0);
commit;

Updated ME SQL:

select
 ( select count(*) from sm7.EVENTOUTM1 )
+ 
 ( select value from system.em_me_control where metric_extension=’sm9_cc_integration_check’) 
as total_rows
from dual;

Including the Metric Extension name in the query allows us to use this table for multiple Metric Extensions on the same database.

As the source table is usually empty it would be difficult to test the ME is working but now we can simply update the value in em_me_control as we please to create a warning, critical and clear.

Create the Metric Extension

Name: me$sm9_cc_integration_check

Display Name: CORE_SM9_CC_INTEGRATION_CHECK

Select the adapter as “SQL”

Select “Alerting and Historical Trending”

Set frequency to x mins

Add the SQL

Add a Column

Ensure “Manually Clearable Alert” is false as this is misleading and it means the alert doesn’t auto clear.

Note: in the alert/clear message you can use the variables %keyValue%, %value% and %columnName% to add specific values to the message e.g. Integration Table has 318 rows in the table.  In this scenario we aren’t particularly concerned with the exact number of rows, just that there are a number of rows present so we’ve used a generic message.

If monitoring credentials are already setup then use those.

Add a test target and run a test to ensure the ME is working correctly

Review the ME before finishing

 

Deploy the ME to the required databases (including any standby databases).  First make it a Deployable Draft.

Check the ME has been deployed and values are being picked up by Oracle Database > Monitoring > All Metrics

 

Over time the data will show this table having little or no data in it but for now we will test the control table is working.

Artificially inflate the result:

update system.em_me_control set value=6 where metric_extension='sm9_cc_integration_check' and reason='Test ME Warning';

commit;

The collection is every five mins so the next collection will show the new value of at least 6 (6 artificial plus any genuine rows in the table at the time of execution).  It should also generate a warning.

If you want to check when the next collection is you can issue the following command:

$emctl status agent scheduler | grep -i sm9

2017-11-23 15:58:53.816 : oracle_database:SMRPRD1A:ME$sm9_cc_integration_check

Now check the metric and we can see the last collected value is 6:

 

Set the result back to 0:

update system.em_me_control set value=0 where metric_extension='sm9_cc_integration_check' and reason='Test ME Warning';

commit;

Overnight we now have some historical data:

 

From the above graph we can see that the data collection has been working overnight.  The last Collection Timestamp is within the last 5 mins.  As expected, due to the nature of the table the High Value is only 2.  This is not enough to trigger any form of alert.  In the past we would have gone into Metric and Collection Settings and manually adjusted the thresholds temporarily in order to force an alert to be produced e.g. Warning 0 and Critical 1.  For this metric this is possible with the only risk being the thresholds being left incorrectly set.  However, if our Comparison Operator was an “=” rather than a “>” then that could be problematic.  The “control” table makes this really easy.  To test a warning and critical alert we can now adjust the table.

Artificially inflate the result:

update system.em_me_control set value=6 where metric_extension='sm9_cc_integration_check' and reason='Test ME Warning';

commit;

After the next scheduled collection we can now see the the warning alert has been triggered.

We have standard notification rules setup so this alert was sent to our HP OMI Tool automatically.  You can see this under “Last Comment” as that scripts sends the alert to OMI for processing.

The incident shows the details of the breach:

The alert in HP OMI – passed out of OEM and into our monitor/alerting system  – HP Openview

Now we can make this a critical by increasing the value to 11.

update system.em_me_control set value=11 where metric_extension='sm9_cc_integration_check' and reason='Test ME Warning';

Commit;

Again, we can see the reflection of the change in the metric results:

Our alert in HP OMI has now changed to a critical

Now we need to both clear the alert and return the metric to it’s correct state for future monitoring.

update system.em_me_control set value=0 where metric_extension='sm9_cc_integration_check' and reason='Test ME Warning';

Commit;

At this point you can either wait for the next collection or drill down into the incident and reevaluate the alert:

This shows the alert has now cleared.  For completion, we can see this in the metric history:

If we specifically look at the Metric Alert History we can see a summary of what we’ve tested based on just adjusting the metric control table.

 


Viewing all articles
Browse latest Browse all 189

Trending Articles