Difference between revisions of "HPC:FAQ"
From HPC wiki
| Line 96: | Line 96: | ||
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
27023685 asrini EXIT normal consign.hpc node037.hpc *eep 10100 Feb 23 12:33 | 27023685 asrini EXIT normal consign.hpc node037.hpc *eep 10100 Feb 23 12:33 | ||
| + | </pre> | ||
| + | |||
| + | If the above fails, Try "bkill -s 7 <JOBID>"; The "-s 7" option will send a SIGTERM signal/force kill signal to the JOB but the scheduler waits for confirmation that this took effect. | ||
| + | <pre> | ||
| + | $ bkill -s 7 27023688 | ||
| + | Job <27023688> is being signaled | ||
| + | |||
| + | $ bjobs 27023688 | ||
| + | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
| + | 27023688 asrini EXIT normal consign.hpc node012.hpc *eep 10100 Feb 23 12:39 | ||
</pre> | </pre> | ||
| − | If the above fails, | + | If the above fails, you can then try the "bkill -r <JOBID>" approach. This does the same as above, but the scheduler does not wait and proceeds to remove the job from the queue. |
| + | <pre> | ||
| + | $ bkill -r 27023755 | ||
| + | Job <27023755> is being terminated | ||
| + | |||
| + | $ bjobs 27023755 | ||
| + | JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME | ||
| + | 27023755 asrini EXIT normal consign.hpc node037.hpc *eep 10100 Feb 23 12:41 | ||
| + | </pre> | ||
==== PMACS ERA Team Contact Info ==== | ==== PMACS ERA Team Contact Info ==== | ||
Revision as of 17:43, 23 February 2018
This page has all the answers you are looking for ..... OK, maybe not! But you will find answers to some of the most common questions we get about the PMACS HPC System.
Contents
Other Pages
Administrivia
- How much does it cost to use the PMACS HPC cluster?
Usage costs are published here and here
Requesting Accounts
- How do I request an account on the PMACS HPC Cluster?
- Step 0 : Get a | UPENN PennKey - Step 1 : As part of our account creation process, we routinely collect several pieces of information listed here and here. Send us all this information in an email. - Step 2 : If you are not the PI, cc your PI/BA in your account request email so we can followup with them directly for email authorization. If you are the PI/BA, you don't have to do anything else.
- I requested an account, per the instructions outlined above, how long does it take to create the account?
Typically, less than 2 business days. Sometimes, emails do get missed, so feel free to nudge us again!
- OK, I got an email confirming my account. Now what?
Use the cluster to do your research!
General Questions
- Before I begin using the PMACS HPC cluster, I'd like to know how much it would cost me to do my work?
Unfortunately, there is no easy answer to this question. Cost of usage varies greatly on the kind of work you do, whether or not you have a working pipeline or if you are only now building some kind of processing pipeline, the tools you use etc.
- I have a limited amount funds available, can my usage be capped once I hit a certain limit?
No, we currently have no cap usage after compute/storage costs have reached a dollar amount.
- How to check the status of my jobs?
You can use the "bjobs" command Condensed output of bjobs
$ bjobs 27002288 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 27002288 asrini RUN interactiv consign.hpc node107.hpc bash Feb 21 21:50
- How to check if my job has stalled?
First, check the error and output files associated with the job.
If error/output files for the job don't provide the necessary information, check the long listing of bjobs a few times over the course of a few minutes, to verify if additional CPU time has accrued.
To get the long listing for the the bjobs command, run "bjobs -l <jobid>" (example below):
$ bjobs -l 27007909
Job <27007909>, User <asrini>, Project <default>, Status <RUN>, Queue <normal>,
Command <sleep 1000>, Share group charged </asrini>
Thu Feb 22 11:31:32: Submitted from host <consign.hpc.local>, CWD <$HOME>;
Thu Feb 22 11:31:32: Started 1 Task(s) on Host(s) <node041.hpc.local>, Allocate
d 1 Slot(s) on Host(s) <node041.hpc.local>, Execution Home
</home/asrini>, Execution CWD </home/asrini>;
Thu Feb 22 11:31:34: Resource usage collected.
MEM: 0 Mbytes; SWAP: 0 Mbytes; NTHREAD: 1
PGID: 19597; PIDs: 19597
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
nfsops uptime
loadSched - -
loadStop - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32]
same[model] affinity[thread(1)*1]
Effective: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32]
same[model] affinity[thread(1)*1]
- How to terminate a job
Use the "bkill <jobid>" command without any additional flags, first:
$ bkill 27023685 Job <27023685> is being terminated $ bjobs 27023685 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 27023685 asrini EXIT normal consign.hpc node037.hpc *eep 10100 Feb 23 12:33
If the above fails, Try "bkill -s 7 <JOBID>"; The "-s 7" option will send a SIGTERM signal/force kill signal to the JOB but the scheduler waits for confirmation that this took effect.
$ bkill -s 7 27023688 Job <27023688> is being signaled $ bjobs 27023688 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 27023688 asrini EXIT normal consign.hpc node012.hpc *eep 10100 Feb 23 12:39
If the above fails, you can then try the "bkill -r <JOBID>" approach. This does the same as above, but the scheduler does not wait and proceeds to remove the job from the queue.
$ bkill -r 27023755 Job <27023755> is being terminated $ bjobs 27023755 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 27023755 asrini EXIT normal consign.hpc node037.hpc *eep 10100 Feb 23 12:41
PMACS ERA Team Contact Info
- What is the best way to ask questions about the PMACS HPC System?
Send all PMACS HPC related questions/requests to our group's email: psom-pmacshpc@pennmedicine.upenn.edu
- I'm submitting a grant application and would like to included some information about computation resources available.
We have information here that you can copy-paste directly into your application.
- Do I need to acknowledge the PMACS HPC system in my publication?
Not necessarily, but a significant portion of our HPC and Archive systems was funded through a NIH grant - 1S10OD012312 NIH. So it would be great if you do acknowledge this grant (and us!).
