2025-03-03 - New QoS & Partition Limits on Mjolnir: Optimized Resource Allocation for Faster Job Scheduling!

Created by Bent Petersen, Modified on Mon, 3 Mar at 3:26 PM by Bent Petersen

Dear Mjolnir Users,

KU-IT has completed maintenance, and Mjolnir is now open for all users. To improve job scheduling, data transfers, and system fairness, I have introduced important changes that will enhance job efficiency and resource management.

What’s New?

I have restructured the QoS (Quality of Service) settings and partitions to optimize resource usage and prioritize jobs effectively.

Key Changes:
  1. New QoS enforcement
    • All jobs now require a QoS to be specified => Meaning that you, unfortunately, will have to modify all your current scripts.
    • New normal QoS (applies to most compute jobs).
    • New filetransfer QoS: Dedicated for data transfer jobs.
    • teaching QoS: Restricted to teaching sessions only.
  2. Updated Partitions
    • cpuqueue (default): General compute jobs.
    • gpuqueue: For GPU-intensive jobs.
    • filetransfer: Optimized for fast data movement across storage locations.
    • teaching: Reserved for educational purposes.
How This Benefits You LongTerm

Faster job starts: The new QoS structure will prevent long-running jobs from blocking short jobs, improving job throughput.
Efficient resource allocation: Dedicated partitions and limits ensure fair resource sharing among users.
Seamless file transfers: The filetransfer partition (currently all cpu nodes, but will change after migration) prevents large rsync/moving jobs from slowing down compute jobs.
Fair Scheduling: QoS prioritization allows better access for users who haven’t been running many jobs recently.
Plus, I will be able to add and modify more QoS for specific purposes without any compute downtime in the future

Below are the specific limitations for each QoS.

Normal QoS
Purpose: General computing for all users.Max Runtime: Up to 14 days (336 hours).Max CPU Allocation per User: Up to 96 CPUs.Memory Limit per User: No hard limit, but subject to available resources.Priority: Standard priority (fairshare-based).
Use this QoS for: Regular computational jobs that don’t fall under specialized categories.

Filetransfer QoS (For Data Transfers)
Purpose: Dedicated for file transfers (e.g., rsync, scp).Max Runtime: Up to 4 days (96 hours).CPU Limit per Job: 1 CPU per job (to prevent excessive load).Memory Limit per Job: 20 GB max.Priority: Higher than normal QoS to ensure quick data transfers.
Use this QoS for:        Transferring large files without slowing down compute jobs.        Running rsync, scp, or similar commands.

Important: Do NOT use filetransfer for compute jobs! This QoS is optimized for I/O operations, not CPU-intensive workloads.

Teaching QoS (For Educational Use)
Purpose: Reserved for teaching and workshops.Max Runtime: Up to 1 day (24 hours).Max CPU Allocation per User: Limited to the assigned teaching node.Memory Allocation: Limited to available resources on the teaching node.Priority: Moderate (restricted to course participants).
Use this QoS for:        Running student assignments during teaching        Hands-on practical sessions during workshops.        Jobs scheduled as part of a teaching session.
Important: This QoS is only accessible to students and teachers enrolled in a course.

How to Submit Jobs with a QoS
Due to the changes, you are required to add new flags to all scripts you currently have, or else they will fail. 

Regular compute jobs
sbatch --partition=cpuqueue --qos=normal --cpus-per-task=4 --mem=32G --time=12:00:00 --wrap=“your_command_here"

File transfers (using rsync, scp, etc.) - Please note that I renamed the QoS name from rsync to filetransfer
sbatch --partition=filetransfer --qos=filetransfer --cpus-per-task=1 --mem=20G --time=1-00:00:00 --wrap="rsync -av /source/ /destination/“

GPU jobs
sbatch --partition=gpuqueue --qos=normal --gres=gpu:1 --cpus-per-task=8 --mem=64G --time=12:00:00 --wrap=“your_gpu_command_here"

Teaching Jobs (for course participants only)
sbatch --partition=teaching --qos=teaching --cpus-per-task=2 --mem=8G --time=2:00:00 --wrap=“your_command_here”

Users will only gain access to the normal QoS once all data has been moved from /projects/mjolnir1 as previously announced.

Thank you for your patience during the past few days.

Best regards,
Bent 

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article