Data MGMT and Sharing

From HPC wiki

Sharing Data

Data in project directories is accounted nightly and will be billed monthly to the budget code on file for the PI's lab. See HPC Billing above for more details.

Sharing data within UPENN

HPC access is restricted to HPC users. Accounts are free. Usage is billed monthly. Information for setting up a new HPC account is available here.


Project directories for intra-lab

Labs in HPC have dedicated spaces for sharing data among lab group members.
These directories are setup under the /project path and named using the PI's PennKey suffixed with "lab".

For example PI whose PennKey is wpenn:

[wpenn@hpclogin ~]$ id 
uid=101644(wpenn) gid=101644(wpenn) groups=101644(wpenn),19104(wpennlab)
[wpenn@hpclogin ~]$ ls -ld /project/wpennlab
drwxrwxr-x 13 root wpennlab 4096 Oct 24  1644 /project/wpennlab
Project directories for inter-lab collaboration

All labs with sponsored accounts on the HPC can share data with members of another lab with sponsored HPC accounts using the inter-lab shared directories These directories are also created under the /project path and use the primary PI's PennKey, followed by secondary PI's PennKey with the "lab" suffix as the name for the directory. For example a collaboration space between a primary PI whose PennKey is "rgodshal" and secondary PI whose PennKey is "asrini" would be

[asrini@consign ~]$ ls -ld /project/asrini_rgodshal_lab 
drwxrws-r-- 1 asrini asrinilab 1344 Nov  6 2014 /project/asrini_rgodshal_lab

To request such directories, please include the following information and use any of the contact methods below:

1. Primary PI Name:
2. Primary PI Email:
3. Secondary PIs (name & UPENN email of all other PIs):
4. List of HPC Users who need access:
5. Name of Business Administrator (BA):
6. Email contact for BA:
7. Budget code to bill for data stored in the new shared project directory:
8. Will you store PHI data or data that requires HIPAA compliance in this directory?
Questions? Comments? Concerns?
Open a Systems ticket in the PMACS Helpdesk
Email PMACS HPC here: psom-pmacshpc@pennmedicine.upenn.edu

NOTE:

  1. The Primary PI assumes financial responsibility for the data storage charges accrued in the shared space.
  2. There is no limit to the number of such shared directories a lab can have.
  3. There can be multiple secondary PIs.
  4. There is no limit to the number of sponsored HPC users who can have access to these directories.

Sharing data with collaborators using cloud-based services


PennBox

This is a excellent and cost-free option for storing/sharing data. It is the ONLY cloud-based storage option that is officially supported by the University of Pennsylvania at this time.

File sizes are limited to 50GB
Penn's "free" tier includes 1TB per account. <5TB can be purchased for $525.30

Please refer to either the HPC lftp page or the HPC Rclone page for instructions on how push and pull files to Box from the HPC.

Azure/AWS/GCP

The HPC team, Penn Medicine/PSOM IT and UPENN IT teams
STRONGLY DISCOURAGE storing data that requires HIPPA or some other regulatory compliance using any of cloud-based solution without prior consultation with IT staff

End users are responsible for regulatory compliance (HIPPA) and payment for services (HPC cannot bill for these).

Globus

While No Globus end point exists in HPC, HPC has a queue dedicated to globus, with cli commands pre-installed.

https://docs.globus.org/cli/#documentation_topics or globus --help or see below

Personal Endpoint vs Server and Penn Sponsored Subscriptions

Personal Endpoint are intended for a user to transfer between their laptop and HPC for example.

The below provides more detail about sharing between institutions and is quoted from https://docs.globus.org/faq/globus-connect-endpoints/#are_transfers_between_globus_connect_personal_endpoints_possible

Are transfers between Globus Connect Personal endpoints possible?
Yes. To transfer between two Globus Connect Personal (GCP) endpoints, one of the users must create a guest collection hosted on their GCP endpoint, then grant the other user(s) access to that collection. Any user that has access to that guest collection (via an individual permission or group permission ACL) can transfer between it and their GCP endpoint. To create a guest collection, the endpoint must be associated with a Globus subscription.
You do not need to be a subscriber to transfer files between a Globus Connect Personal endpoint (e.g. on your laptop) and a Globus Connect Server endpoint (e.g. on your lab server or campus cluster). Globus Connect Personal can execute a transfer as long as either the source or destination endpoint has a routable IP address (which is the case for almost all Globus Connect Server endpoints).
Adding a Penn subscription
  1. Log in - using UPenn - at https://app.globus.org
  2. Go to SETTINGS on the left, and then select the "Subscriptions" tab at the top
  3. Find a Subscription on the right
  4. Search UPENN and add relevant subscription
Usage Examples
create session
bsub -q globus -Is bash
Login
$ globus login --no-local-server
Or create personal Endpoint
$ globus endpoint create --personal --no-managed --no-force-encryption --default-directory /home/your_directory/any_dir YOUR_DISPLAY_NAME

And using ‘transfer’ option, you may transfer the data from your personal end-point to any other Globus end-point.