2024-09-09 - Mjolnir System Updates: New Monitoring Tools & Usage Statistics Available

Created by Bent Petersen, Modified on Mon, 9 Sep, 2024 at 3:25 PM by Bent Petersen

Dear everyone,


I’m pleased to announce some updates on Mjolnir in the past month:


CPU/Memory Monitoring on mjolnirgate01fl:

Two scripts now continuously monitor CPU load and memory usage. If your process exceeds the set thresholds (e.g., unintentional overload), you’ll receive an email. 

If usage continues to rise beyond another threshold, the process will be terminated, and you’ll be informed.

The new CPU and memory monitoring system will significantly enhance the node’s stability by ensuring processes that exceed set thresholds are controlled.


New Website Section - Server Usage:

Explore our live hourly cluster utilization to track how busy Mjolnir is. https://mjolnir-ucph.dk/current_cluster_utilization.html



There’s also a section showing cluster performance over the past week, though please note this feature was only implemented on Friday evening, so there isn’t much historical data yet

https://mjolnir-ucph.dk/cluster_utilization_over_time.html


Additionally, the SLURM Usage Statistics page provides a detailed breakdown since June 2024.

https://mjolnir-ucph.dk/job_statistics.html


On the statistics page, you can track:


  • Total Jobs: Number of jobs processed.
  • Average Elapsed Time: Average completion time.
  • Average Wait Time: Time before jobs started.
  • Unique Users: Distinct users submitting jobs on the cluster.
  • Job State Breakdown: Counts for completed, failed, timed out, canceled, or out-of-memory jobs.


Feel free to explore the updates, and let me know if you have any comments or ideas for improvement or further expansion.


Best regards,


Bent


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article