Each project folder follows this structure:
- Shared software & environments → Store shared Conda environments, custom scripts, and compiled software in /projects/{project_name}/apps/.
- Shared project data → Use /projects/{project_name}/data/ for large datasets, reference files, or results that multiple users need access to, so you don’t have multiple copies
- Private user data → Keep your personal working files/data inside /projects/{project_name}/people/ku-ID/
- Temporary data (for short-term use) → Use /projects/{project_name}/scratch/ for intermediate files that do not require long-term storage.
- Scratch data is not backed up, and older files may be deleted automatically to free space.
2. Optimizing Storage: Compressing Large Data Efficiently
Before transferring large datasets, it’s best to compress old, uncompressed data to save space, money and reduce transfer time.
Instead of using standard gzip, which runs on a single CPU core, use pigz (parallel gzip) for faster compression by utilizing multiple threads.
Remember to only run multithreaded processes by submitting a job to the queue.
2.1. Compressing a Single File with pigz
pigz -p 8 -9 large_file.txt
• -p 8 → Uses 8 CPU cores (adjust based on need).
• -9 → Uses the highest compression level.
This creates large_file.txt.gz.
2.2. Compressing Entire Directories with tar + pigz
For directories, create a tarball and compress it using pigz:
tar cf - large_directory/ | pigz -p 8 -9 > large_directory.tar.gz
- This method compresses multiple files at once while using multiple threads.
- The output is a compressed tarball (.tar.gz).
- Creating a tarball takes time, especially if you have many directories and many files
To extract later, use:
tar xvf large_directory.tar.gz
Submitting Compression as a SLURM Job
Since compression can be CPU-intensive, you should submit it as a job instead of running it on the login node.
3. Best Practices for Transferring Data
To ensure a smooth transition, DO NOT use mv, scp, or cp, as they lack error checking and cannot resume interrupted transfers.
Instead, use rsync, which offers better control, efficiency, and reliability.
Moving Data and Deleting Old Files Automatically
To move your data and remove the original files after transfer, use:
rsync -avh —progress --remove-source-files --progress /projects/mjolnir1/people/KU-ID/yourdata /projects/{projectname}/people/KU-ID/yourdata
-a → Preserves file permissions, timestamps, symbolic links, etc.
-v → Enables verbose mode to show progress.
-h → Displays human-readable file sizes.
--progress → Shows real-time transfer progress.
If a transfer gets interrupted, rerun the same rsync command—it will only copy missing or incomplete files instead of restarting from scratch.
Important Notes:
• This rsync command removes files from the original location only after a successful transfer.
• Directories are NOT deleted, so you may need to clean them up manually:
find /projects/mjolnir1/people/ku-ID/your_data/ -type d -empty -delete
This above command deletes all empty directories after file transfer.
Important Note: Do NOT Store Project Data in Your Home Directory (/home/ku-ID/)
Your home directory (/home/ku-ID/) has a strict 100GB quota and is NOT meant for storing project data.
Your home directory should only be used for:
- Personal scripts or configurations (e.g., .bashrc, .vimrc).
- Small temporary files (but not large datasets).
- Software environments that don’t belong in a shared project folder.
- As software environments can grow in size it is recommended to store it in your /projects/{project_name}/people/ku-ID/ folder
Summary of Best Practices
- Use rsync instead of mv, scp, or cp to ensure error checking and resuming capabilities.
- Delete unnecessary files before moving data to save storage and backup costs.
- Compress large files before transferring using pigz for multi-threaded compression.
- Use /data/ for shared project data, /people/ for personal files, and /scratch/ for temporary files.
- Submit CPU-intensive compression jobs to SLURM instead of running them on the login node.
- After moving data, clean up empty directories with find -type d -empty -delete.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article