• Management Pack:  HPC Server
  • MP Version:  3.1.3266.0 for HPC Server 2008 R2
  • Released:  2/14/2011
  • Publisher:  Microsoft

Daily Job Queue Time Monitor

  • ID:  Microsoft.HPC.2008R2.Monitor.JobScheduler.Performance.WaitTime
  • Description:  Daily job queue time performance monitor for HPC 2008 R2
  • Target:  HPC 2008 R2 Job Scheduler
  • Enabled:  No

Operational States

Name State Description
UnderThreshold1 Success  
OverThreshold1UnderThreshold2 Warning  
OverThreshold2 Error  

Overridable Parameters

Parameter Name Default Value Description Override
Frequency 300  
Lower Threshold -2 Lower threshold value
Upper Threshold -1 Upper threshold value

Alert Details

Monitor State Message Priority Severity Auto Resolution
OverThreshold2 (Error) HPC Daily Job Queue Time has exceeded the upper threshold Medium Critical Yes

Run As Profiles

Name
HPC Server Admin Action Acount

Monitor Knowledgebase

Summary

This monitor tracks the average job queue (wait) time. The wait time can be used as one of the indicators to show whether the cluster is congested. This monitor is disabled by default because job queue times can be very different across different organizations.

Causes

This error can be caused by any of the following:

  • There are some large jobs that require a lot of nodes to run and there are not enough nodes available to run them. This can cause average wait times to increase.

  • The cluster is busy. In HPC Cluster Manager, in Charts and Reports, review charts such as “Cluster CPU Usage” to determine if the cluster is exhibiting high CPU usage. Alternatively, in HPC Cluster Manager, in Node Management, you can add the “Running Jobs” metric to the heat map and determine if most nodes are occupied with jobs.

  • Job configurations are not optimized. Some job configurations such as those that give a job exclusive access to a node can slow down other jobs. Configurations that are better suited to the requirements of the application can help jobs process faster.

Resolutions

To troubleshoot and fix this problem:

If the cluster load is consistently high from “Charts and Reports”, we suggest adding more resources to the cluster (for example, more compute nodes, or more CPU and memory on the nodes).

Make better job configurations to improve cluster efficiency, (for example, checking whether the exclusive access to nodes is necessary for jobs or not).

External References
This monitor does not contain any external references.

See Also for HPC Server Management Pack


Downloads for HPC Server Management Pack

AZURE OPTIMIZATION ASSESSMENT GET STARTED
MIGRATION TO AZURE GET STARTED
SYSTEM CENTER MIGRATION TO AZURE GET STARTED
MIGRATION TO AZURE FOR SQL AND WINDOWS 2008 GET STARTED