Congestion

Introduction

This module shows the cluster state (Optimal, Underutilized, Contention, Congestion) through time, and jobs life cycle.

Optimal: The users are not waiting too much and the cluster is well loaded

Underutilized: The users are not waiting too much and the cluster is under-used

Contention: The users are waiting for too long and the cluster is well loaded

Congestion: The users are waiting for too long and the cluster is under-used

image0

  • The X-axis shows normalized running core-hours, ranging from 0 to 100 percent, where 100 percent represents full utilization of available cores per resolution tick.

  • The Y-axis shows normalized waiting core-hours, starting from 0 and increasing upward, where values represent how many cores were queueing relative to the available cores per resolution tick.

  • Both axis shows core-hours per day relative to available cores on that day.

Warning

If your cluster configuration is set to 0 when it comes to the number or cores or GPUs available, you might end up seeing points presenting you with a R:0 , W:101. This will be the case when you have jobs in queues at a period where no resources where set as available in the cluster. For those particular cases, we will consider that 101% of the cluster resources are requested by the jobs.