Difference between revisions of "HPC:FAQ"

From HPC wiki
Line 432: Line 432:
 
   You can use BaseMount
 
   You can use BaseMount
 
</div>
 
</div>
==== Sharing Data ====
+
==== '''Sharing Data''' ====
  
 
Sharing data among/with HPC users or with collaborators both at UPENN and elsewhere can be accomplished in different ways. Below are a few options.
 
Sharing data among/with HPC users or with collaborators both at UPENN and elsewhere can be accomplished in different ways. Below are a few options.
Line 442: Line 442:
 
HPC users can share data with each other in one of the following ways:
 
HPC users can share data with each other in one of the following ways:
  
====== '''Project directories for each lab''' ======
+
====== Project directories for each lab ======
  
 
All labs with sponsored accounts on the HPC will have a dedicated space for sharing data among members of their own lab. These directories are setup under the /project path and use the PI's PennKey with the "lab" suffix as the name for the directory. For example PI whose PennKey is "asrini":   
 
All labs with sponsored accounts on the HPC will have a dedicated space for sharing data among members of their own lab. These directories are setup under the /project path and use the PI's PennKey with the "lab" suffix as the name for the directory. For example PI whose PennKey is "asrini":   
Line 455: Line 455:
 
New data created within these project directories or copied, using cp/rsync, to these directories will be billed to the budget code on file for the PI's lab. See note about [[HPC:FAQ#Billing|HPC Billing]] above, for more details.
 
New data created within these project directories or copied, using cp/rsync, to these directories will be billed to the budget code on file for the PI's lab. See note about [[HPC:FAQ#Billing|HPC Billing]] above, for more details.
  
====== '''Project directories for inter-lab collaboration''' ======
+
====== Project directories for inter-lab collaboration ======
  
 
All labs with sponsored accounts on the HPC can share data with members of another lab with sponsored HPC accounts using the inter-lab shared directories These directories are also created under the /project path and use the primary PI's PennKey, followed by secondary PI's PennKey with the "lab" suffix as the name for the directory. For example a collaboration space between a primary PI whose PennKey is "rgodshal" and secondary PI whose PennKey is "asrini" would be     
 
All labs with sponsored accounts on the HPC can share data with members of another lab with sponsored HPC accounts using the inter-lab shared directories These directories are also created under the /project path and use the primary PI's PennKey, followed by secondary PI's PennKey with the "lab" suffix as the name for the directory. For example a collaboration space between a primary PI whose PennKey is "rgodshal" and secondary PI whose PennKey is "asrini" would be     

Revision as of 13:42, 30 May 2020

On this page you will find answers to some of the most common questions we get about the PMACS HPC System.

Costs


  • Is the PMACS HPC service free?
 No. The PMACS HPC service is a pay-for-use service. See details below.
  • How much does it cost to have a PMACS HPC account account
 There are no costs associated with requesting or maintaining a PMACS HPC account. There are however costs associated with using system resources, be that disk or archive (for storage) or computational resources (like CPU and RAM). See below 
  • How much does it cost to use the PMACS HPC cluster?
 Usage costs are published here and here
  • I have not used the PMACS HPC for some time, yet, I keep getting usage charges. Why?
 If there is data left in an individual user's home directory or in the project and/or archive directory that is owned by the individual user, disk usage charges will apply.
  • Why are disk/archive usage charges incurred when the individual user's account has been disabled?
 Usage charges, be that disk or archive, are billed against UNIX groups. When accounts are disabled, the associated UNIX group is not deleted. Any data owned by that UNIX group is not deleted either. 
 Therefore, usage charges will continue to appear in the monthly invoices. See Billing section below for details. 
  • I'm the PI/BA and wish to have disk/archive usage charges for a former member of my lab removed from my monthly bill. How can this be accomplished?
 Having storage (disk/archive) charges for former lab members appear in monthly invoices, while a bit confusing, is usually an indication that the former lab member did not follow the correct procedure to transfer ownership/delete the data in their home directory, prior to their departure from the lab.
 This can be easily resolved by changing the ownership of the data in question. See Billing section below for details on how storage charges are billed.

Billing


  • How is PMACS HPC usage billed?
 PMACS HPC usage is billed based system resources that are used by the individual user. System resources can either be storage (disk/archive) or compute (using the PMACS HPC's available CPU cores/RAM to do analysis) or both compute and storage. 
 See Costs section above for details.
  • How are storage charges computed?
 Disk and Archive storage usage for every individual PMACS HPC user is monitored daily and costs are computed for usage accrued in a given month.
  • How are charges for compute calculated?
 The PMACS HPC system uses a job scheduling system - an IBM product called Spectrum LSF (formerly called Platform LSF). The job scheduler keeps track of resources used when a job is run on the PMACS HPC system. 
 Cost of computational time on the PMACS HPC system is calculated when a job has completed and resource utilization information has been logged by the job scheduler. 

  • Are storage charges associated with individual users or with the lab the individual user works in?
 Both. Storage charges for any data placed either on the PMACS HPC's disk or archive system is associated with the UNIX group that owns the data in question. In a POSIX compliant system, each file or directory can have 
 two "owners" - the individual user who created the file there and a UNIX group whose members are also designated to have access to the file/directory. 
 Example 1:
 $ ls -lh $HOME/my_large_file1 
 -rw-rw-r-- 1 asrini asrini 1.0G Nov  6 12:15 /home/asrini/my_large_file1 
 
 The above file, created within in a user's home directory, is set to be owned by the individual user and the UNIX group of that user by default. Monthly usage charges for data owned by the individual user will be listed as a line item with the individual's name.   
 
 Example 2:
 $ ls -lh /project/asrinilab/my_large_file2
 -rw-rw-r-- 1 asrini asrinilab 1.0G Nov  6 12:15 /project/asrinilab/my_large_file2 
 
 The above file, created within in a lab's project directory, is set to be owned by the UNIX group of that lab by default. Monthly usage charges for data owned by the UNIX group of the lab, will be listed as a separate line item with the lab/project directory's name.
  • Why are disk usage charges for former lab members being billed when their home directory is empty?
 Disk and archive usage charges are billed based on UNIX group ownership (see above). When an individual uses the UNIX move operation (the "mv" command) to move contents of their home directory, the permissions set on the lab's project directory are not respected.
 Therefore, the data continues to be owned by the individual's UNIX group rather than the lab's UNIX group. Thus, a line item for disk usage charges in the monthly invoice with the former lab member's name continues to appear. 
 This can be fixed by changing ownership of the data.
  • Why are archive usage charges for former lab members being billed when their home directory is empty?
 Archive usage charges are billed based on UNIX group ownership. Not having data on the disk, does not impact archive usage charges. If data was placed in the archive using the UNIX move operation (the "mv" command) or if the the prescribed procedure for placing, 
 files in the archive was not followed, the data continues to be owned by the individual's UNIX group rather than the lab's UNIX group. Thus, a line item for archive usage charges in the monthly invoice with the former lab member's name continues to appear. 
 This can be fixed by changing ownership of the data. See Archive usage page for details on how to access the PMACS archive.

Accounts


  • What is a PMACS account?
 A PMACS Account is a user name & password combination that grants individual users access to resources managed by the Penn Medicine Academic Computing Services (PMACS) organization.
  • Is the PMACS account the same as PennKey?
 No. A PennKey is a user name/password combination that grants individuals access to systems managed by UPENN's central computing division and having a PennKey alone does not grant individuals access to resources managed by PMACS. 
 However, having a valid PennKey is a requirement for requesting a PMACS account. See Requesting Accounts section below.    
  • Is the PMACS HPC account different from a "regular" PMACS Account?
 No. A PMACS HPC account is a PMACS account which has been granted access to the PMACS HPC system resources.
  • Can I share my PMACS HPC account with other members of my lab?
 NO. Sharing of PMACS accounts be that for accessing the PMACS HPC system or any other resource managed by PMACS, is not permitted. 
 Each individual is expected to have their own account. See Requesting Accounts section below to request an account.    
  • What is the PMACS Password policy?
 All PMACS account passwords are set to expire every 180 days. Email notifications are sent to individual users 14 days prior to password expiration. 
 PMACS passwords can be reset via the PMACS self-service password reset portal.
  • Why are PMACS accounts disabled?
 PMACS Accounts can be disabled for a few different reasons:
 1. Removal of valid PennKey affiliation: this is standard practice when individuals change departments or if the affiliation was set with an expiration date.
 2. Account was set to be disabled on specific date: this is a special case when a request to have the account disabled was made at the time of account creation.
 3. PennKey expired: when individuals leave UPENN, their PennKey is set to expire.
  • Why was my PMACS HPC access disrupted?
 PMACS HPC access can be disrupted for the following reasons:
 1. Expired passwords (see FAQ re: PMACS Password policy, above)
 2. Disabled accounts (see FAQ re: why PMACS accounts are disabled, above)
 3. A PI/BA no longer wishes to sponsor the individual user and requested HPC access to be revoked. 
Requesting Accounts

  • How do I request an account on the PMACS HPC Cluster?
 - Step 0 : Get a |UPENN PennKey
 - Step 1 : As part of our account creation process, we routinely collect several pieces of information listed here and here. Send us all this information in an email.
 - Step 2 : If you are not the PI, cc your PI/BA in your account request email so we can followup with them directly for email authorization. If you are the PI/BA, you don't have to do anything else.

  • I requested an account, per the instructions outlined above, how long does it take to create the account?
 Typically, less than 2 business days. Sometimes, emails do get missed, so feel free to nudge us again! 
  • OK, I got an email confirming my account. Now what?
 - Use the cluster to do your research
Closing Accounts

  • How do I request to close my PMACS HPC account?
 There is no prescribed procedure for closing an account. This is because PMACS accounts rely on having valid PennKeys. So when an individual's PennKey expires, their PMACS HPC access is terminated.
 That being said, if the intent is to ensure no additional usage charges are billed for PMACS HPC access, below are the steps that need to be taken: 
 If the request to close an account is from the individual user (student/post-doct/staff):
 - Step 1 : Delete or transfer ownership of all data stored on the PMACS HPC's disk and archive systems that is not backed up elsewhere (data stored on the PMACS HPC disk or archive systems is not backed-up and cannot be restored later).
 - Step 2 : Send us an email requesting to disable your PMACS HPC access and cc your PI. 
 If the request to close an account is from a PI/BA:
 - Step 1 : Delete all data stored on the PMACS HPC's disk and archive systems that is not backed up elsewhere (data stored on the PMACS HPC disk or archive systems is not backed-up and cannot be restored later).
 - Step 2 : Send us an email requesting to disable your PMACS HPC access.
 - Step 3 : Ensure that there are funds to pay for the last invoice. 
 
  • How do individual users transfer ownership of data before leaving the lab/closing accounts?
 Data stored within an individual's home directory is set to be owned by them and thus inaccessible to others. This data must be copied (using rsync/cp and not mv) to the lab's project directory and subsequently deleted from their own home directory.

General Questions


  • Before I begin using the PMACS HPC cluster, I'd like to know how much it would cost me to do my work?
 Unfortunately, there is no easy answer to this question. Cost of usage varies greatly on the kind of work you do, whether or not you have a working pipeline or if you are only now building some kind of processing pipeline, the tools you use etc. 
  • I have a limited amount funds available, can my usage be capped once I hit a certain limit?
 No, we currently have no cap usage after compute/storage costs have reached a dollar amount.  

PMACS ERA Team Contact Info


  • What is the best way to ask questions about the PMACS HPC System?
  Send all PMACS HPC related questions/requests to our group's email: psom-pmacshpc@pennmedicine.upenn.edu

Grant related questions


  • I'm submitting a grant application and would like to included some information about computation resources available.
 We have information here that you can copy-paste directly into your application.
  • Do I need to acknowledge the PMACS HPC system in my publication?
 Not necessarily, but a significant portion of our HPC and Archive systems was funded through a NIH grant - 1S10OD012312 NIH. So it would be great if you do acknowledge this grant (and us!).

Accessing the PMACS HPC


  • How do I access the PMACS HPC cluster?

See our Wiki page about Connecting to the PMACS Cluster

Job related Questions


  • How to check the status of my jobs?
You can use the "bjobs" command
 
Condensed output of bjobs 
 $ bjobs 27002288
JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
27002288   asrini  RUN   interactiv consign.hpc node107.hpc bash       Feb 21 21:50
 


  • How to check if my job has stalled?
First, check the error and output files associated with the job.
If error/output files for the job don't provide the necessary information, check the long listing of bjobs a few times over the course of a few minutes, to verify if additional CPU time has accrued. 
To get the long listing for the the bjobs command, run "bjobs -l <jobid>" (example below):
$ bjobs -l 27007909

Job <27007909>, User <asrini>, Project <default>, Status <RUN>, Queue <normal>,
                     Command <sleep 1000>, Share group charged </asrini>
Thu Feb 22 11:31:32: Submitted from host <consign.hpc.local>, CWD <$HOME>;
Thu Feb 22 11:31:32: Started 1 Task(s) on Host(s) <node041.hpc.local>, Allocate
                     d 1 Slot(s) on Host(s) <node041.hpc.local>, Execution Home
                      </home/asrini>, Execution CWD </home/asrini>;
Thu Feb 22 11:31:34: Resource usage collected.
                     MEM: 0 Mbytes;  SWAP: 0 Mbytes;  NTHREAD: 1
                     PGID: 19597;  PIDs: 19597 


 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  

           nfsops  uptime 
 loadSched     -       -  
 loadStop      -       -  

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32] 
                     same[model] affinity[thread(1)*1]
 Effective: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32]
                      same[model] affinity[thread(1)*1] 

 


  • How to terminate a job
 Use the "bkill <jobid>" command without any additional flags, first:
  $ bkill 27023685
  Job <27023685> is being terminated
 
  $ bjobs 27023685
  JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
  27023685   asrini  EXIT  normal     consign.hpc node037.hpc *eep 10100 Feb 23 12:33
  
 If the above fails, Try "bkill -s 9 <JOBID>"; The "-s 9" option will send a SIGTERM signal/force kill signal to the JOB but the scheduler waits for confirmation that this took effect.
  $ bkill -s 9 27023688
  Job <27023688> is being signaled
  
  $ bjobs 27023688
  JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
  27023688   asrini  EXIT  normal     consign.hpc node012.hpc *eep 10100 Feb 23 12:39
  
 If the above fails, you can then try the "bkill -r <JOBID>" approach. This does the same as above, but the scheduler does not wait and proceeds to remove the job from the queue.
  $ bkill -r 27023755
  Job <27023755> is being terminated
  
  $ bjobs 27023755
  JOBID      USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
  27023755   asrini  EXIT  normal     consign.hpc node037.hpc *eep 10100 Feb 23 12:41
  

Note: Read the man page for bkill for a more detailed explanation


  • How to terminate all my jobs
 $ bkill 0
 Job <3797180923> is being terminated
 Job <3797180924> is being terminated
 Job <3797180925> is being terminated
 Job <3797180926> is being terminated


Note: Read the man page for bkill for a more detailed explanation


  • How do I make sure my job does not run too long?
 Set a runtime limit, in minutes, for the job (a.k.a Wall-clock). This then terminates the job using bkill, when the preconfigured runtime limit has reached and the job is still running. 
  $ bsub -W 1 sleep 300
  Job <27576257> is submitted to default queue <normal>.


  $ bjobs -l

Job <27576257>, User <asrini>, Project <default>, Status <RUN>, Queue <normal>,
                     Command <sleep 300>, Share group charged </asrini>
Fri Mar  2 12:12:04: Submitted from host <consign.hpc.local>, CWD <$HOME>;

 RUNLIMIT                
 1.0 min of node120.hpc.local
Fri Mar  2 12:12:08: Started 1 Task(s) on Host(s) <node120.hpc.local>, Allocate
                     d 1 Slot(s) on Host(s) <node120.hpc.local>, Execution Home
                      </home/asrini>, Execution CWD </home/asrini>;

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -  
 loadStop    -     -     -     -       -     -    -     -     -      -      -  

           nfsops  uptime 
 loadSched     -       -  
 loadStop      -       -  

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32] 
                     same[model] affinity[thread(1)*1]
 Effective: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:32]
                      same[model] affinity[thread(1)*1] 
  


  • How do I request more than the default 6GB RAM limit for my jobs?
 Use the -M <mem_in_MB> bsub option. For example, to request 10G RAM 
  $ bsub -M 10240 sh test_r.sh
  


  • Why is my job in pending (PEND) state?
 There can be many reasons for this. Always check the output of "bjobs -l"
   $ bjobs -l 35184277

   Job <35184277>, User <asrini>, Project <default>, Status <PEND>, Queue <normal>
                     , Command <sh myjob.sh>
   Tue May  8 11:48:11: Submitted from host <consign.hpc.local>, CWD <$HOME>, 16 T
                     ask(s);
   PENDING REASONS:
   Not specified in job submission: 68 hosts;
   Affinity resource requirement cannot be met because there are not enough processor units to satisfy the job affinity request: 8 hosts;
   Job slot limit reached: 11 hosts;
   Load information unavailable: 5 hosts;
   Not enough job slot(s): 26 hosts;
   Closed by LSF administrator: 9 hosts;
   Just started a job recently: 10 hosts;
   Not enough hosts to meet the job's spanning requirement: 2 hosts;

   SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
   loadSched   -     -     -     -       -     -    -     -     -      -      -  
   loadStop    -     -     -     -       -     -    -     -     -      -      -  

           nfsops  uptime 
   loadSched     -       -  
   loadStop      -       -  

   RESOURCE REQUIREMENT DETAILS:
   Combined: select[type == local] order[r15s:pg] span[ptile='!',Intel_EM64T:24] 
                     same[model] affinity[thread(1)*1]
   Effective: -

  
 Notice above, under "PENDING REASONS", the reason for the job to remain in pending state is listed as "not enough processor units to satisfy the job affinity request: 8 hosts;"
  • How do I setup my jobs to depend/wait on other jobs?
 See the section on Job Dependency

Software Installation


  • Am I allowed to install software in my home directory?
  Yes. See the Manual Software Installation section of our wiki for some pointers.
  • What are the rules for software installation?
  Rule 0 : All software MUST be installed on a compute node. Use an interactive session (bsub -Is bash) for all software installation. Do NOT use the head node.
  Rule 1 : If you get a missing library or header file error during the software installation, make sure you've read and followed RULE 0  
  Rule 2 : Do NOT use "sudo" in your installation steps. This will result in a permissions error and will generate an alert to us, the Admins.
  Rule 3 : Read the rest of this section
  Rule 4 : Send us a note with details on the steps you followed and exactly which interactive session you tried to install the program on so we can investigate.


  • I get a "Permission denied" error when I try to install a program in my home directory by running "make install". How do I fix it?
  The reason for this error is that by default most software packages are written to be compiled and installed in system level directories like /usr/bin /usr/local/bin etc.
  You have two options:
  
  Option 1: Compile the program with a "prefix" flag during the "configure" step:
   ./configure --prefix=$HOME
   
  Option 2: install the program in a different destination after the compilation is done (using home as the default; change it if you want to):
   make install DESTDIR=$HOME

   OR

   make install DESTDIR=/home/<usr_name>

   Replace "<usr_name>" with your user name 
   
  • OK I tried one the above options to install the program in my home directory and it worked! How do I use it?
 If you followed Option 1 above, the program likely was installed in a "bin" directory in your $HOME directory. First verify if the program exists there. Using (vcftools) as an example: 
   $ ls $HOME/bin/vcftools
   /home/asrini/bin/vcftools
  
  If you followed Option 2 above, then the program was likely installed under $HOME/usr/local/bin or $HOME/usr/bin/, in addition checking $HOME/bin, check these locations as well. Again, using (vcftools) as an example:
     $ ls $HOME/bin/vcftools

     $ ls $HOME/usr/bin/vcftools

     $ ls $HOME/usr/local/bin/vcftools
  
  Once you've determined where the location of the installed binary is, you can add this to your $PATH variable for easy use.
  Assuming the file is $HOME/bin: 
   export PATH=$HOME/bin:$PATH
   
  The above line can be added to your .bashrc or .bash_profile files
  • OK I tried both the above options to install the program in my home directory and still get an error. Now what?
 If  you are trying to install a Python package, make sure you're using a Python virtual environment. See this page for details.
 If not, send us a note with details on the steps you followed and exactly which interactive session you tried to install the program on so we can investigate.

Troubleshooting Job Failures


More to come soon ...

Downloading Sequencing data from BaseSpace


  • Is Illumina's BaseMount software available on the PMACS HPC system so I can download sequencing data directly from BaseSpace?
 Yes. BaseMount is installed on the PMACS HPC's dedicated file transfer system: mercury.pmacs.upenn.edu. Contact us for information on how to use it.
  • I have sequencing data stored in Illumina's BaseSpace. How do I download the data?
 You can setup the BaseSpace Sequence Hub CLI tool to download sequencing data. Information on how to do this is available here
 OR
 You can use BaseMount

Sharing Data

Sharing data among/with HPC users or with collaborators both at UPENN and elsewhere can be accomplished in different ways. Below are a few options.

Sharing data with other HPC users (and other UPENN users)

HPC users can share data with each other in one of the following ways:

Project directories for each lab

All labs with sponsored accounts on the HPC will have a dedicated space for sharing data among members of their own lab. These directories are setup under the /project path and use the PI's PennKey with the "lab" suffix as the name for the directory. For example PI whose PennKey is "asrini":


[asrini@consign ~]$ ls -ld /project/asrinilab 
drwxrws-r-- 1 asrini asrinilab 1344 Nov  6 2014 /project/asrinilab 

New data created within these project directories or copied, using cp/rsync, to these directories will be billed to the budget code on file for the PI's lab. See note about HPC Billing above, for more details.

Project directories for inter-lab collaboration

All labs with sponsored accounts on the HPC can share data with members of another lab with sponsored HPC accounts using the inter-lab shared directories These directories are also created under the /project path and use the primary PI's PennKey, followed by secondary PI's PennKey with the "lab" suffix as the name for the directory. For example a collaboration space between a primary PI whose PennKey is "rgodshal" and secondary PI whose PennKey is "asrini" would be


[asrini@consign ~]$ ls -ld /project/asrini_rgodshal_lab 
drwxrws-r-- 1 asrini asrinilab 1344 Nov  6 2014 /project/asrini_rgodshal_lab
 

There is no limit to the number of such shared directories each lab can have and there is no limit to the number of sponsored HPC users who can have access to these directories.

To request such directories, send us an email psom-pmacshpc@pennmedicine.upenn.edu with the following information:

1. Primary PI Name:

2. Primary PI Email:

3. Secondary PIs (name & UPENN email of all other PIs):

4. List of HPC Users who need access:

5. Name of Business Administrator (BA):

6. Email contact for BA:

7. Budget code to bill for data stored in the new shared project directory:

8. Will you store PHI data or data that requires HIPAA compliance in this directory?

NOTE 1: The Primary PI is the one who assumes financial responsibility for the data storage charges accrued from data stored in the shared space.

NOTE 2: There can be multiple secondary PIs

NOTE 3: If a lab with sponsored HPC accounts wishes to share data with another lab that does not have sponsored accounts on the HPC, a shared space will be created using the same naming convention above and members of the secondary PI's lab will need to go through the standard account request process to gain HPC access first and subsequently have access to the shared space. Information for setting up a new HPC account is available here.

New data created within these project directories or copied, using cp/rsync, to these directories will be billed to the budget code on file for the PI's lab. See note about HPC Billing above, for more details.

Sharing data with non-HPC users (both at UPENN and elsewhere)

If sponsored HPC users wish to share data with collaborators either at UPENN or elsewhere, then this can be done in a few different ways too:


Sharing data with non-HPC users at UPENN using the HPC

If a lab with sponsored HPC accounts wishes to share data with another lab that does not have sponsored accounts on the HPC, a shared space will be created using the same naming convention above and members of the secondary PI's lab will need to go through the standard account request process to gain HPC access first and subsequently have access to the shared space, as described above. Information for setting up a new HPC account is available here.

Sharing data with collaborators using cloud-based services

There are a few more options available when sharing data with collaborators both at UPENN or at other institutions:

  • UPENN Box: This is a excellent and cost-free option for storing/sharing data. It is the ONLY cloud-based storage option that is officially supported by the University of Pennsylvania at this time.

NOTE: Each individual file that can be stored on UPENN Box can be no larger than 15GB. However, there is no limit to the size of the entire dataset/directory that is uploaded to UPENN Box.

Box is an excellent option for sharing data easily with yourself or other individuals. UPenn offers unlimited Box storage for all individuals with a Pennkey. Go to https://upenn.app.box.com to access your Box.

Please refer to either the HPC lftp page or the HPC Rclone page for instructions on how push and pull files to Box from the HPC.


  • Other cloud-based services: While uploading/download data to/from the HPC to other cloud based services like AWS S3 buckets, Azure Blobs, Dropbox etc. is possible, it is the end-users responsibility to ensure that the data stored using these services does NOT require HIPPA or some other regulatory compliance before storing data using such services.

NOTE 1: The HPC team, Penn Medicine/PSOM IT and UPENN IT teams strongly discourage storing data that requires HIPPA or some other regulatory compliance using any of cloud-based solution without prior consultation with IT staff

NOTE 2: It is the end-user/lab's responsibility to pay for any of these cloud-based storage services as these are NOT free of charge


Please refer to the HPC Rclone page for instructions on how push and pull files to/from a cloud-based storage service from/to the HPC using rclone.

Technical Questions


  • Do you have Jupyter Notebook support?

Yes we do! Please refer to the Jupyter page for instructions on how to get started.

  • Why don't some of my commands work? For example:
[asrini@consign ~]$ bsub -Is bash
Job <64070984> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on node082.hpc.local>

(base) bash-4.2$ module avail
module avail
bash: module: command not found

The issue here is Conda. While using tools like Conda/Anaconda makes it easy for local package management and reusing Python virtual environments, invariably, it also allows most novice (and in some cases experienced) HPC users to inadvertently overwrite their .bashrc and/or .bash_profile files. Thus modifying crucial environment variables such as $PATH etc.

We encourage users to tread with caution when using Conda.

The fix for the above to to make a copy of the Conda based .bashrc and .bash_profile files and then use the system default versions of these files and then re-apply only the non-Conda related changes in the now-older .bashrc/.bash_profile files:


[asrini@consign ~]$ mv $HOME/.bashrc $HOME/.bashrc_bak

[asrini@consign ~]$ mv $HOME/.bash_profile $HOME/.bash_profile_bak

[asrini@consign ~]$ cp -i /home/apps/user_defaults/skel/.{bashrc,bash_profile} $HOME/

[asrini@consign ~]$ logout
Connection to consign.pmacs.upenn.edu closed

% ssh consign.pmacs.upenn.edu
[asrini@consign ~]$ 

NOTE: The last step in the above command set is required - a clean profile is required to ensure the environment variables are all reset correctly.

Once the system defaults have been restored (after logout and subsequent login, into the HPC), the non-Conda related changes can be restored from the $HOME/.bashrc_bak and $HOME/.bash_profile_bak files.

We recommend only enabling/activating Conda environments as needed i.e manually, to avoid issues.

Also see Conda page for more information.

Other Pages