User Tools

Site Tools


fileaccess

Back To Main Page

File Transfers & Accessing Data Drives


Transfers To/From Non-CfN-Cluster Collaborators

If you have large amounts of data to transfer to/from collaborators outside of the CfN cluster, you can:

  1. use hard drives to transfer the data
  2. contact the sysadmins to discuss using our secure FTP server

Transferring from PACS/Sectra or Scanners

Data from PACS and scanners can be transferred directly to the cluster. See the PACS page.

Transferring To/From Your Cluster Account

Generally you'll need to move files between your local/desktop computer and data directories on the cluster. There are a number of ways to do this:

What server to transfer to/from ?

Instead of using the tools below to connect to chead for your transfers, you can connect directly to the file server that's hosting your data. This will be slightly more efficient and keep things less complicated in the event of unforeseen complications(!). Here are the server names/IP's:

cfile           - /data/jag, /data/jet, /data/joy
170.212.169.225 - /data/picsl, /data/grossman
170.212.169.49  - /data/tesla-data, /data/tesla-home
crich           - /data/jux

The Linux/Unix/Mac OSX scp command is not recommended for larger data transfers, because it does not verify the data integrity upon receipt at the destination. Note that TCP/IP packet-level checksums will be used for any internet traffic, but scp does not compute and compare a checksum on transferred data like rsync does. Generally, use rsync instead.


rsync - secure remote copy

Linux & Mac OSX ( & Windows )

To use rsync on Windows, see here.

This is a powerful command line program for copying files between computers on the network. The recommended command for transferring to the cluster is this:

rsync -prltD --chmod=Dug+rwx,Dg+s,Fug+rw,o-rwx <path-to-files-on-your-computer> <yourusername>@chead:</data/your-data-directory/sub-directory>

The options specified above tell rsync to recursively copy all files and directories that you specify in the command, and to modify file and directory ownership in a way that's appropriate for most cluster directories, and to preserve creation/modification dates.

Also, symlinks are copied as symlinks, meaning the directory or file to which the symlink points is not copied.

<path-to-files-on-your-computer> is the path to the files on your computer that you want to copy to chead. NOTE that you should NOT have a / slash at the end of the path, if you want the directory itself to be copied to your destination. If you do have a / slash at the end, only the contents of the directory will be copied, and not the directory itself. Here's the example of this from the rsync man page:

Each of the following commands copies the files in the same way,
including their setting of the attributes of /dest/foo:

  rsync -av /src/foo /dest
  rsync -av /src/foo/ /dest/foo

<yourusername>@chead tells rsync to login on chead using your username

:</data/your-data-directory/sub-directory> is the path to your data directory on chead, e.g. /data/jet/mgstauff/destination

Permissions and Ownership considerations

Typically, users simply use the -a option to rsync rather than the detailed options above. The -a option is an aggregate option, and among others it includes the -p option to preserve file permissions, and the -o and -g options to preserve file user and group ownership. This means the permissions, user and group from the sources files will be copied to the destination directory. Sometimes this is what you want. Other times, you don't want it.

For example if you have a data dir on you local machine that you're sync'ing to your cluster data dir whenever you acquire new data, you may want different permissions/ownership on the cluster. You may have files that haven't changed locally, but on the cluster you've changed their group ownership to allow other users to access them. When you next rsync from your local dir and use just the -a option, the group ownership will revert on the cluster to that what's on your local machine. And you may want to have different group permissions on the cluster, that are needed to facilitate sharing with other users.

To overcome this, you may use the options listed above, or some combination:

 rsync -prltD --chmod=Dug+rwx,Dg+s,Fug+rw,o-rwx 

The -a option is really an aggregate of these options: -rlptgoD. So above, I've passed all those manually except -o, -g. This means that ownership on the cluster will be preserved for any files that already exist there. Also if you're copying into a directory that uses the group 'sticky bit' to make all new files be owned by the directory's group (as we do for group data directories), then new files will get the appropriate group on the cluster.

The –chmod option tells rsync to make certain permissions changes to file that are copied to the destination. In this example, directories will get rwx (full) permissions for user and group and get their 'sticky' bit set for easier group sharing; and that files will get rw permissions for user and group, and no permissions for everyone else.

Robust in the event of interruptions and file changes

If your rsync command gets interrupted, simply run it again and it will intelligently only copy files that haven't been copied yet, or that have been changed since the first copy. This means that if you have a large directory to copy, you can start the copy and continue to work in the directory. Then after the copy is finished, stop your work, run the rsync command again, and only the changes you made will get copied. If you have deleted files locally during this process, you'll want to add the –delete option to rsync if you want those files to also be deleted at the destination.

Verifying transferred files

You can do a checksum verification on the transferred data on disk at the destination. After your full rsync process has finished, run it again but add the -c option. This time, when rsync sees that the same filename exists at the destination and has the same size as the original, it will compute a checksum of the file on disk at the destination and compare it to the original file. If there's a difference it will re-transfer the original. Note that otherwise, rsync does a checksum comparison only of the data transferred across the network, but not of the data on disk. Adding this option to your first rsync transfer will not do a checksum comparison with the data after it's written to disk. It only works if the data is already on disk at the destination.


sftp - SSH file transfer protocol

All platforms

This is a secure version of the venerable FTP protocol.

Use any ftp client that supports sftp, like the popular free program FileZilla

Be sure to choose the SFTP protocol during setup. The FTP protocol won't work.

Linux & Mac OSX ( & Windows with Cygwin or Windows Subsystem For Linux )

You can also run sftp from the terminal command line for a text-based version.

To use sftp from the terminal on Windows, try here.


SSHFS - Directly connect/mount your data directory

All Platforms

Another option is to directly connect to your CfN data directory by mounting it on your desktop. You may be familiar with this using the NFS protocol or Windows Share. For better security and simplicity we now use the SSHFS protocol. This connects using ssh and your cluster username and password, and mimics a regular filesystem on top of that.

  • TIP - You can change the encryption type for most SSHFS setups. Choose a lighter-weight encryption for faster transfers. I don't have any specifics on this now, but googling should get some quick answers. I will try to get some more details in the future.

A helpful page for all platforms: https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

Mac OSX

FUSE/SSHFS

You'll need FUSE for OSX and then the SSHFS package: http://osxfuse.github.io https://github.com/osxfuse/osxfuse/wiki/SSHFS

This guide is very helpful: https://blogs.law.harvard.edu/acts/2013/11/08/the-newbie-how-to-set-up-sshfs-on-mac-os-x/

The mounted drive will show up in the Finder.

  • TIP - If you have trouble mounting to your desired local directory location (e.g. if you try to mount onto a local dir named /data/jet/mydirname), it may be because you don't own or have write permission on the local directory. The best is to give yourself ownership of the local directory (mount point). Otherwise you use the allow_others option to sshfs to make it work, but this may cause other issues, see next.

ExpanDrive

There is a commercial product available that may be easier to use, I haven't tried it: ExpanDrive

There's an educational discount code for $18 off: EDUCATION913

Windows

win-shfs

It looks like the win-sshfs project on google code is the way to go for free software.

WARNING from the win-sshfs project page: Due the nature of symlink mapping(Windows is unaware of it) the deletion of symlink that points to directory will result in deleting of source directories content.

These sites have some more help: http://igikorn.com/sshfs-windows-8/ https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

ExpanDrive

There is a commercial product available that may be easier to use, I haven't tried it: ExpanDrive

There's an educational discount code for $18 off: EDUCATION913

Linux

You'll need to install the sshfs package.

This site has some more info: https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh

fileaccess.txt · Last modified: 2018/03/26 14:57 by mgstauff