Difference between revisions of "HPC:Archive System"

From HPC wiki
Line 4: Line 4:
 
*[[HPC:Login|Connecting to the PMACS cluster]]
 
*[[HPC:Login|Connecting to the PMACS cluster]]
 
*[[HPC:User_Guide|User Guide]]
 
*[[HPC:User_Guide|User Guide]]
 +
*[[HPC:Software|Available Software]]
  
 
=== Using the Archive ===
 
=== Using the Archive ===

Revision as of 19:05, 15 April 2015

This page has details about the PMACS Archive System that attached to the PMACS HPC Cluster

Other Pages

Using the Archive

Adding Data to the Archive:

Access to the archive is via our server "mercury". Once there, you can use the rsync command with the specific options I provide you (see below) to copy files and directory structures into it. Because this is a user-accessible archive system, what you will see in that directory structure is not the actual files (which will have been moved off to a staging area and eventually copied to 2 separate tapes) but a representation of them. In this way, you can always see what's in the archive (including file sizes, and date last modified) and delete anything you wish, at any time. (The deletion process in the archive immediately makes those files inaccessible, and we have no other "backup" system in place.)

Here are the steps to place your files and folders into the archive:

Step 1: Login into the PMACS Cluster's File transfer server

ssh to our server "mercury.pmacs.upenn.edu" <-- this step is often overlooked

Step 2: rsync files

Use this specific rsync command to copy files into the archive:

rsync -rplot --inplace --no-partial --whole-file --bwlimit=50000 --no-checksum --max-size=250GB --stats /{source} /{destination}/

For example: rsync -rplot --inplace --no-partial --whole-file --bwlimit=50000 --no-checksum --max-size=250GB --stats /project/mylab/me /archivetape/mylab/me

**CRITICAL INSTRUCTION** you must pay attention to the first two lines of output from the "stats" option which tells you the number of files in your source directory and the number of files copied. If those two numbers are not the same, please be sure you know why. If you notice, in the rsync command we limit the maximum size of any single file to 250GB, and if you have a file larger than that, it WILL NOT be transferred. If that happens, contact pmacshpc@upenn.edu to make arrangements to move your larger files. To check your source directory, beforehand, for files over the size limit use this command:

 find {source} -size +250G

*** NOTE *** Arrange folder structure before you archive. You need to remember that your archive directory will always be in the order you start with, because you cannot execute a "move" command afterwards, since tape is a sequential access medium, not a random access medium. The way it's laid down on tape is the way it stays until you delete it from the archive.


==> TIP: the use of a trailing "/" makes a difference! using the slash at the end of the source path does not include that last sub directory in the copy, just it's contents (including all sub directories), whereas omitting the slash includes that directory, then it's contents.

 For example:
 $ rsync -rplot {options omitted for brevity} /home/rgodshal/pub/ /archivetape/rrg  <-- trailing "/" on {source}
 [rgodshal@mercury ~]$ ls -l /archivetape/rrg
 drwxrwxr-x 2 rgodshal rgodshal      4096 Oct 10  2013 consign-opt  <-- these files are the contents of /pub, in the rrg folder
 drwxr-xr-x 3 rgodshal rgodshal      4096 Aug 25 11:30 mathworks_downloads
 -rw-r--r-- 1 rgodshal rgodshal   1017044 Jan  9  2014 RFS-v5 2 1-4145-release-notes.pdf
 compared to:
 $ rsync -rplot /home/rgodshal/pub /archivetape/rrg  <-- no trailing "/" on {source}
 [rgodshal@mercury ~]$ ls /archivetape/rrg
 drwxrwx--- 4 rgodshal rgodshal 32768 Mar  4 10:38 pub  <-- there's /pub (with all it's contents)

Retrieving Data From the Archive:

When you wish to retrieve data from the archive, you can choose to copy single files, sets of files or directories back your /home or /project directory on mercury, or use rsync with the source and destination directories reversed from the command you used to place data into the archive.

 Single file example:  cp /archivetape/mylab/me/veryold.doc /project/mylab/me/
 Folder example:       cp /archivetape/mylab/me/completed /project/mylab
 Rsync example:        rsync -rplot --inplace --no-partial --whole-file --bwlimit=50000 --no-checksum --max-size=250GB --stats /archivetape/mylab/me /project/mylab/

Deleting Data from Archive:

PLEASE be sure that you have retrieved files you want to keep before deleting them from the archive. This is your only "backup" copy in the HPC environment! Use the "rm" command as you would for ordinary files and directories:

 rm -rf /archivetape/mylab/me/completed