Friday, March 30, 2012

How do I monitor my SQL JOB?

Hullo. I have a SQL SCHEDULER Job that keeps hanging.

1) How do I monitor this? I would like to have another Sql Job monitor the first one hourly, and send me an email if it is hanging. Isn't there a system table that tells me a job status?

1a) Once I find the job, how can I "Stop it" automatically?

2) Is there a way to have the Job stop itself it it runs for more than 20 minutes?

Thanks.

~LeNo one Knows...?

~Le

"No one knows? No one Knows if the Hot Tea is Hot or Cold...?"
"Is it Iced Tea?"
"NO!"
"Well then, I have no idea."|||Well, I have a suspision that most dba's would probably want to spend their time investigating the cause of a 'hanging' job rather than developing other jobs to manage such a situation. I think it's more correct to say that a job step stays 'in-progress'. The only time I have seen this is in relation to replication jobs that seem to have a tendency to report being continiously 'in-progress'. Apart from that 'feature', I can't remember a situation where a job would routinely misbehave without a good reason.

Have you looked at the job history to see which step the job is failing on? I know it's not much use if it's a one step job but I thought I would mention it anyway. Either way, look at the code in the step that is staying 'in-progress' and see if you can isolate it further. Perhaps add a few audit steps to write out to a table (like print statements if you were running code via QA). The objective should be to keep drilling down until you've found the cause.

Perhaps post more detail about the code that is staying in-progress and more useful feedback will come your way.

Clive|||Well, I have a suspision that most dba's would probably want to spend their time investigating the cause of a 'hanging' job rather than developing other jobs to manage such a situation. I think it's more correct to say that a job step stays 'in-progress'. The only time I have seen this is in relation to replication jobs that seem to have a tendency to report being continiously 'in-progress'. Apart from that 'feature', I can't remember a situation where a job would routinely misbehave without a good reason.

Have you looked at the job history to see which step the job is failing on? I know it's not much use if it's a one step job but I thought I would mention it anyway. Either way, look at the code in the step that is staying 'in-progress' and see if you can isolate it further. Perhaps add a few audit steps to write out to a table (like print statements if you were running code via QA). The objective should be to keep drilling down until you've found the cause.

Perhaps post more detail about the code that is staying in-progress and more useful feedback will come your way.

Clive

LOL. You are right, the better thing to do would be to stop it from hanging in the first place! The SQL Agent fires off a VB App that I wrote. The VB App is hanging on the the custom PGP command. But it does not happen often. The PGP 8 Service itself seems to be hanging about every 13-14 days, which in turn causes my VB App to hang, which causes the SQL Agent to hang.

Since there is a code freeze, I cannot edit the VB app to trap the error at the moment. So I just want to monitor the SQL Agent jobs, and if possible stop it if it is hanging.

`Le|||You can find what jobs are running by looking in the sysprocesses table for jobs running:

select *
from master..sysprocesses
where program_name like 'SQLAgent%Job%'

The problem is that the job_id you get: 0xC778DE2759DD354B9C219B301886EA43
equates to the binary version of the job_id stored in the sysjobs table.

Still if you know you shouldn't have any jobs running at all except the one you are checking on to see if it hung, you should be able to use this select to get the SPID of the hanging job and kill it if need be until you can figure out what's causing it to fail.

No comments:

Post a Comment