Dear everyone,
I’m pleased to announce some updates on Mjolnir in the past month:
CPU/Memory Monitoring on mjolnirgate01fl:
Two scripts now continuously monitor CPU load and memory usage. If your process exceeds the set thresholds (e.g., unintentional overload), you’ll receive an email.
If usage continues to rise beyond another threshold, the process will be terminated, and you’ll be informed.
The new CPU and memory monitoring system will significantly enhance the node’s stability by ensuring processes that exceed set thresholds are controlled.
New Website Section - Server Usage:
Explore our live hourly cluster utilization to track how busy Mjolnir is. https://mjolnir-ucph.dk/current_cluster_utilization.html
There’s also a section showing cluster performance over the past week, though please note this feature was only implemented on Friday evening, so there isn’t much historical data yet
https://mjolnir-ucph.dk/cluster_utilization_over_time.html
Additionally, the SLURM Usage Statistics page provides a detailed breakdown since June 2024.
https://mjolnir-ucph.dk/job_statistics.html
On the statistics page, you can track:
- Total Jobs: Number of jobs processed.
- Average Elapsed Time: Average completion time.
- Average Wait Time: Time before jobs started.
- Unique Users: Distinct users submitting jobs on the cluster.
- Job State Breakdown: Counts for completed, failed, timed out, canceled, or out-of-memory jobs.
Feel free to explore the updates, and let me know if you have any comments or ideas for improvement or further expansion.
Best regards,
Bent
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article