TCSC Narvi cluster’s new user documentation!

Narvi is the Tampere University’s HPC linux cluster. It serves all researchers of University. It is almost identical to other FGCI clusters. Also we are very similar than large CSC clusters, but of course they are larger and Narvi might easier to start with because it is more integrated into the local environment.

If you are here for FGCI kickstart link to local differences is here

Cluster overview

Shared resource

Narvi is a joint installation of TCSC member faculties. It is now available to all Tuni researchers.

As of 2016, Narvi is part of FGCI - Finnish Grid and Cloud Infrastructure (predecessor of Finnish Grid Infrastructure). Through the national grid and cloud infrastructure, Narvi also becomes part of the European Grid Infrastructure.

Hardware

Frontends: narvi.cc.tut.fi and narvi-shell.cc.tut.fi

  • 24 cores E5-2620
  • 128GB memory

Compute nodes:

  • 25 HP SL230 nodes with 16 cores E5-2670 and 128GB memory (me55-me102)
  • 80 Dell M630 nodes with 24 cores E5-2680 v3 and 64GB memory (me151-280)
  • 40 Dell C6320 nodes with 24 cores E5-2680 v3 and 256GB memory (na01-na40)
  • 20 Dell C6420 nodes with 40 cores Xeon Gold 6148 with at least 400GB memory (na41-na60)
  • 8 Dell C6420 nodes with 40 cores Xeon Gold 6248 with at least 400GB memory (na61-na68)
  • 3 Dell C6525 nodes with 64 cores AMD EPYC 7502 with 512GB memory (na69-na72)
  • 18 Dell C4130 ja C4140 GPU-nodes with 4 NVIDIA Tesla P100 or V100 GPUs with 12-32GB per GPU (nag02-nag19)
  • 1 Dell R740 with 3 Quadro RTX 8000 with 45GB per GPU (nag20)

Storage:

  • ~90TB /home -partition
  • 500TB /lustre partition for computation data.

All computing nodes are identical in respect to software and access to common file system, although cuda drivers are only on gpu-nodes. Each node has its own unique host name and ip-address.

Networking

The cluster has two internal networks: Infiniband for MPI and file-IO and Gigabit Ethernet for everything else like batch-system communication and ssh.

The internal networks are unaccessible from outside. Only the login nodes have an extra Ethernet connection to outside.

Software

The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.

User Accounts

Application

  1. Go to page https://id.tuni.fi
  2. Choose “Identity management” → “Manage service entitlements” → “Apply for a new entitlement”
  3. Choose correct contract (if many)

Note

If you fill application as student, please also tell Course name and the responsible teacher

  1. Scroll down and choose “Linux Servers (LINUX-SERVERS) TCSC HPC Cluster”
  2. Fill in necessary information and press “Submit”

Please note that since your application needs confirmation from your manager (does not apply students), it may take some time until we get it for account creation.

You will get two emails after this

  • first one when your application is approved and
  • second when your user account is created. In this email is also a request for you to create SSH-keypair

Creating SSH-keypair

New Narvi users should create an password protected SSH public key (ed-25519 or rsa, minimum key length for RSA 2048 bits), and email the public key (NEVER send your private key) to TCSC, preferably as attachment. Instructions how to create the key in Linux and in Windows can be found in the sections below.

So you don’t have password at cluster at all. The passphrase of private key is kind of replacement for ssh-password, as it’s needed to be able to use private key, to access cluster. You can create several keys and add several public keys to your authorized_keys file or you can copy your private key to several machines, if you need to access cluster from several machines. If you work on multiple, random, machines, it might be easier to carry your private key with e.g. on USB-stick.

Key generation on Linux (openssh)

The command .. code-block:

ssh-keygen -t ed25519 -f ~/.ssh/${USER}_narvi_key

creates a key pair (keyname & keyname.pub) in .ssh-directory. It will ask new passphrase for your private key.

Note

TUNI Linux On maintained TUNI Redhat Linux, you need to create ssh-keypair to ~/.ssh directory, because only then SELinux context will be correct.

If you have working MTA-configuration (rare nowadays), you could send public-key us with command: .. code-block:

mailx -a .ssh/${USERNAME}_narvi_key.pub -s "Narvi SSH key (${USER})" tcsc.tau@tuni.fi
Using ssh-key in linux

When using key, which isn’t named as default, you’ll need to specify the used key when using it. e.g. .. code-block:

ssh -i ${USER}_narvi_key your_tuni_username@narvi.tut.fi (username is NOT email-address!)

or you could add it to ~/.ssh/config -file like this: .. code-block:

Host narvi*.tut.fi
    IdentityFile    ~/.ssh/${USER}_narvi_key

Key generation on Windows

Putty is the preferred client and should available on TUT-intra -machines by default.

Putty(gen)
  1. Start puttygen.
  2. Check from below, that key type is ED25519. (older versions of putty don’t have ed25519-keys, so should use e.g. 4096 bit RSA-key)
  3. Click Generate and draw some picture to puttygen-window to give it some randomness it needs. (wink)
  4. Give passphrase for key twice, and note the location you’ll save the key.
  5. Select the public key from OpenSSH from box at top and save it to some file e.g. narvi.pub.
  6. Mail that *.pub as email attachment to tcsc.tau@tuni.fi with subject like “Narvi SSH Key (yourusername)”
Windows 10 Openssh

If you want to use Windows 10’s “native” openssh, then you should follow Linux-instructions above

Warning

Never send or give your private key to anyone!!!

Connecting to Narvi

All access to Narvi is via Secure Shell (ssh).

You can connect to narvi.tut.fi from everywhere with your private ssh-key.

Note

Are you here for a SciComp KickStart course? You just need to make sure you have an account and then be able to connect via ssh (first section here), and you don’t need to worry about the graphical application parts. Everything else, will be discussed during course.

Note

Narvi uses Tuni accounts, but since we don’t use passwords, account needs to activated first and you need to provide your public ssh-key as instructed at accounts

See also

The shell crash course is a kind of prerequisite to this material.

Connecting via ssh

Linux

All Linux distributions come with an ssh client, so you don’t need to do anything. To use graphical applications, use the standard -X option, nothing extra is needed.:

ssh narvi.tut.fi
# OR, if your username is different from local machine:
ssh username@narvi.tut.fi

Mac

ssh is installed by default, same as Linux. Run it from a terminal, same command as Linux. To run graphical applications, you need to install an X server (XQuartz).

Windows

You need to install a ssh client yourself: PuTTY is the standard one. If you want to run graphical programs, you need an X server on Windows: see this link for some hints. (Side note: putty dot org is an advertisement site trying to get you to install something else.)

You should configure this with the hostname, username, and save the settings so that you can connect quickly.

Nowadays with new Windows 10 builds you can also use its native openssh, so most of linux example’s commands work out of box e.g. ssh-keygen and ssh. If you want still better linux compatibility, you could install WSL (=Windows Subsystem for Linux), which has about anything normal linux shell apart from kernel-services.

Advanced options

See the advanced ssh information to learn how to log in without a password, automatically save your username and more. It really will save you time.

ssh is one of the most fundamental Linux programs: by using it well, you can really do almost anything from anywhere. The .ssh/config file is valuable to set up. If ssh is annoying to use, ask for some help in getting it working well.

Exercise

  1. Connect to Narvi. List your home directory and work directory $WRKDIR.
  2. Check the uptime and load of the login node: uptime and htop (q to quit). What else can you learn about the node?
  3. Check what your default shell is: echo $SHELL. Go ahead and change your shell to bash if it’s not yet (see below).

What’s next?

The next tutorial is about software and modules.

Transfering files to and from cluster

There are few ways to transfer data in and out:

  • Directly with sftp/scp
  • mounting directory from cluster to your workstation with sshfs
  • accessing TUNI intra directory directly from cluster

From Linux-machine

SCP/SFTP

You can copy your files to narvi-cluster with command scp that does not need any mounts.

For example copy one file from localhost to narvi .. code-block:

scp yourfile narvi:/home/${USER}

Copy a directory from localhost to narvi .. code-block:

% scp -r yourdirectory narvi:/home/${USER}

You can use this also to other direction or between any other two hosts.

For more information .. code-block:

man scp

SSHFS

You can mount your homedirectory from narvi-cluster to your localhost with command sshfs.

First create a directory where you mount your home from narvi .. code-block:

mkdir ${HOME}/narvi-home

Then use sshfs to mount your narvi-home to directory you just created

This example asumes that your home in narvi is /home/$USER, if it is not that please change to correct one .. code-block:

sshfs ${USER}@narvi.tut.fi:/home/${USER} ${HOME}/narvi-home

When file transfers are ready and you don’t need to have this mount, unmount narvi-home from your localmachine with command .. code-block:

fusermount -u ${HOME}/narvi-home

For more information

man sshfs man fusermount

From Windows-machine

Moving files between your windows-machine and cluster might be easiest with WinSCP after you add you private-key to it.

Or then you should have your files on your INTRA -homedirectory so you access them directly on cluster frontend:

TUNI home&group -directories

You can access your TUNI-home and group directories directly on frontend nodes (narvi and narvi-shell):

/tuni/groups directory contains all group directories which you have access, but you’ll have to now the specific directory’s name. So unless you have visited some directory /tuni/groups will be empty.

Slurm

So FGCI-cluster is using slurm as batch-queue system.

While there is a lot of magnificient documentation e.g. Slurm’s own

For more detailed info about our configuration please refer to configuration.

Simple serial job Parallel job Interactive session Notes about resource reservations

Below are some basic examples of usage:

First you’ll have to create some job-script which describes resources that job needs and does the actual execution.

Simple serial job

Then you’ll submit that code with command:
sbatch serial-job.sh

Parallel job

For parallel job we’ll just need to change resource allocations of the job:

Interactive session

Interactive shell session in compute-node can be requested with: .. code-block:

srun --pty -J " Bash session" --partition=test --mem=10000 --time=4:0:0 /bin/bash -i

Notes about resource reservations

As compute-nodes operating system also needs some memory to run, some of the memory isn’t allocatable by slurm. This means that if you request e.g. 128G memory, in practice you’ll need to wait for node with 256G to be available. So if it might be better to only request 126G if don’t absolutely need 128G To show memory available for slurm jobs on nodes. .. code-block:

scontrol show node |grep RealMemory

Data storage

In this tutorial, we go over places to store data on Narvi and how to access it remotely.

Optimizing data storage isn’t very glamorous, but is an important part of high-performance computing.

Basics

Narvi has various ways to store data. Each has a purpose, and when you are dealing with large data sets or intensive I/O, efficiency becomes important.

Roughly, we have home directories (only for configuration files), large Lustre (scratch and work, large, primary calculation data), and special places for scratch during computations (local disks). At TUNI, there is Tuni home and Tuni project directories which, unlike Narvi, are backed up but don’t scale to the size of Narvi.

A file consists of its contents and metadata. The metadata is information like user, group, timestamps, permissions. To view metadata, use ls -l or stat.

Filesystem performance can be measured by both IOPS (input-output operations per second) and stream I/O speed. /usr/bin/time -v can give you some hints here. You can see the profiling page for more information.

Think about I/O before you start! - General notes

When people think of computer speed, they usually think of CPU speed. But this is missing an important factor: How fast can data get to the CPU? In many cases, input/output (IO) is the true bottleneck and must be considered just as much as processor speed. In fact, modern computers and especially GPUs are so fast that it becomes very easy for a few GPUs with bad data access patterns to bring the cluster down for everyone.

The solution is similar to how you have to consider memory: There are different types of filesystems with different tradeoffs between speed, size, and performance, and you have to use the right one for the right job. Often times. So you have to use several in tandem: For example, store original data on archive, put your working copy on scratch, and maybe even make a per-calculation copy on local disks.

The following factors are useful to consider:

  • How much I/O are you doing in the first place? Do you continually re-read the same data?
  • What’s the pattern of your I/O and which filesystem is best for it? If you read all at once, scratch is fine. But if there are many small files or random access, local disks may help.
  • Do you write log files/checkpoints more often than is needed?
  • Some programs use local disk as swap-space. Only turn on if you know it is reasonable.

There’s a checklist in the storage details page.

Avoid many small files! Use a few big ones instead. (we have a dedicated page on the matter)

Available data storage options

Home directories

The place you start when you log in. Home directory should be used for init files, small config files, etc. It is however not suitable for storing calculation data. Home directories are backed up daily. You usually want to use scratch instead.

scratch and work: Lustre

Scratch is the big, high-performance, 2PB Triton storage. It is the primary place for calculations, data analyzes etc. It is not backed up but is reliable against hardware failures (RAID6, redundant servers), but not safe against human error.. It is shared on all nodes, and has very fast access. It is divided into two parts, scratch (by groups) and work (per-user). In general, always change to $WRKDIR or a group scratch directory when you first log in and start doing work.

Lustre separates metadata and contents onto separate object and metadata servers. This allows fast access to large files, but induces a larger overhead than normal filesystems. See our small files page for more information.

See ../usage/lustre

Local disks

Local disks are on each node separately. It is used for the fastest I/Os with single-node jobs and is cleaned up after job is finished. Since 2019, things have gotten a bit more complicated given that our newest (skl) nodes don’t have local disks. If you want to ensure you have local storage, submit your job with --gres=spindle.

See the Compute node local drives page for further details and script examples.

ramfs - fast and highly temporary storage

On login nodes only, $XDG_RUNTIME_DIR is a ramfs, which means that it looks like files but is stored only in memory. Because of this, it is extremely fast, but has no persistence whatsoever. Use it if you have to make small temporary files that don’t need to last long. Note that this is no different than just holding the data in memory, if you can hold in memory that’s better.

Quotas

All directories under /scratch (as well as /home) have quotas. Two quotas are set per-filesystem: disk space and file number.

Disk quota and current usage are printed with the command quota. ‘space’ is for the disk space and ‘files’ for the total number of files limit. There is a separate quota for groups on which the user is a member.

$ quota
User quotas for darstr1
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
/home              484M    977M   1075M           10264       0       0
/scratch          3237G    200G    210G       -    158M      1M      1M       -

Group quotas
Filesystem   group                  space   quota   limit   grace   files   quota   limit   grace
/scratch     domain users            132G     10M     10M       -    310M    5000    5000       -
/scratch     some-group              534G    524G    524G       -    7534   1000M   1000M       -
/scratch     other-group              16T     20T     20T       -   1088M      5M      5M       -

If you get a quota error, see the quotas page for a solution.

Accessing and transferring files remotely

Transferring files to/from Triton is exactly the same as any other remote Linux server.

Remote mounting using SMB

By far, remote mounting of files is the easiest method to transfer files. If you are not on the Aalto networks (wired, eduroam, or aalto with Aalto-managed laptop), connect to the Aalto VPN first. Note that this is automatically done on some department workstations (see below) - if not, request it!

The scratch filesystem can be remote mounted using SMB inside secure Aalto networks at the URLs

  • scratch: smb://data.triton.aalto.fi/scratch/.
  • work: smb://data.triton.aalto.fi/work/$username/.

On different operating systems:

  • Linux (Ubuntu for example): File manager (Nautilus) → File → Connect to server. Use the smb:// URLs above.
  • Windows: In the file manager, go to Computer (in menu bar on top, at least in Windows 10) → Map Network Drive) and “Map Network Drive”. In Windows 10 → “This PC” → right click → “Add Network Location”. (Note that this is different from right-click “Add network location” which just makes a folder link and has had some problems in the past.) Use the URLs above but replace smb:// with \\ and / with \. For example, \\data.triton.aalto.fi\scratch\.
  • Mac: Finder → Go → Connect to Server. Use the smb:// URLs above.

Depending on your OS, you may need to use either your username directly or AALTO\username.

Remote mounting using sshfs

sshfs is a neat program that lets you mount remote filesystems via ssh only. It is well-supported in Linux, and somewhat on other operating systems. It’s true advantage is that you can mount any remote ssh server - it doesn’t have to be set up specially for SMB or any other type of mounting. On Ubuntu, you can mount by “File → Connect to server” and using sftp://triton.aalto.fi/scratch/work/USERNAME.

The below uses command line programs to do the same, and makes the triton_work on your local computer access all files in /scratch/work/USERNAME. Can be done with other folders.:

mkdir triton_work
sshfs USERNAME@triton.aalto.fi:/scratch/work/USERNAME triton_work

Note that ssh binds together many ways of accessing Triton, with a similar syntax and options. ssh is a very important program and binds together all types of remote access, and learning to use it well will help you for a long time.

Using sftp

The SFTP protocol uses ssh to transfer files. On Linux and Mac, the sftp command line program are the must fundamental way to do this, and are available everywhere.

A more user-friendly way of doing this (with a nice GUI) is the Filezilla program. Make sure you are using Aalto VPN, then you can put triton.aalto.fi as SFTP server with port 22.

Below is an example of the “raw” SFTP usage:

# Copying from HOME to local PC
user@pc123 $ sftp user12@triton.aalto.fi:filename
Connected to triton.aalto.fi.
Fetching /home/user12/filename to filename
# copying to HOME
user@pc123 $ sftp -b - user12@triton <<< 'put testCluster.m'
sftp> put foo
# copying to WRKDIR
user@pc123 $ sftp -b - user12@triton:/scratch/work/USERNAME/ <<< 'put testCluster.m'
...

With all modern OS it is also possible to just open your OS file manager (e.g. Nautilus on Linux) and just put as address in the bar:

sftp://triton.aalto.fi

If you are connecting from remote and cannot use the VPN, you can connect instead to department machines like kosh.aalto.fi, taltta.aalto.fi, amor.org.aalto.fi (for NBE). The port is 22. Note: If you do not see your shared folder, you need to manually specify the full path (i.e. the folder is there, just not yet visible).

Using rsync

Rsync is similar to sftp, but is smarter at restarting files. Use rsync for large file transfers. rsync actually uses the ssh protocol so you can rsync from anywhere you can ssh from. rsync is installed by default on Linux and Mac terminals. On Windows machines we recommend using GIT-bash.

While there are better places on the internet to read about rsync, it is good to try it out to sychronise a local folder on your triton’s scratch. Sometimes the issue with copying files is related to group permissions. This command takes care of permissions and makes sure that all your local files are identical (= same MD5 fingerprint) to your remote files:

rsync -avzc -e "ssh" --chmod=g+s,g+rw --group=GROUPNAME PATHTOLOCALFOLDER USERNAME@triton.aalto.fi:/scratch/DEPT/PROJECTNAME/REMOTEFOLDER/

Replace the bits in CAPS with your own case. Briefly, -a tries to preserve all attributes of the file, -v increases verbosity to see what rsync is doing, -z uses compression, -c skips files that have identical MD5 checksum, -e specifies to use ssh (not necessary but needed for the commands coming after), --chmod sets the group permissions to shared (as common practice on scratch project folders), and --group sets the groupname to the group you belong to (note that GROUPNAME == PROJECTNAME on our scratch filesystem).

If you want to just check that your local files are different from the remote ones, you can run rsync in “dry run” so that you only see what the command would do, without actually doing anything.:

rsync --dry-run -avzc ...

Sometimes you want to copy only certain files. E.g. go through all folders, consider only files ending with py:

rsync -avzc --include '*/' --include '*.py' --exclude '*' ...

Sometimes you want to copy only files under a certain size (e.g. 100MB):

rsync -avzc --max-size=100m ...

Rsync does NOT delete files by default, i.e. if you delete a file from the local folder, the remote file will not be deleted automatically, unless you specify the --delete option.

Please note that when working with files containing code or simple text, git is a better option to synchronise your local folder with your remote one, because not only it will keep the two folders in sycn, but you will also gain version controlling so that you can revert to previous version of your code, or txt/csv files.

Accessing files from Department workstations

This varies per department, with some strategies that work from everywhere.

These mounts that are already on workstations require a valid Kerberos ticket (usually generated when you log in). On long sessions these might expire, and you have to renew them with kinit to keep going.

Generic

The staff shell server taltta.aalto.fi has scratch and work mounted at /m/triton, and department directories are also in the standard paths /m/{cs,nbe}/{scratch,work}/.

NBE

Work directories are available at /m/nbe/work and group scratch directories at /m/nbe/scratch/$project/.

PHYS

Directories available on demand through SSHFS. See the Data transferring page at PHYS Intranet (accessible by PHYS users only).

CS

Work directories are available at /m/cs/work/, and group scratch directories at /m/cs/scratch/$project/.

Exercises

strace is a command which tracks system calls, basically the number of times the operating system has to do something. It can be used as a rudimentary way to see how much I/O load there is.

  1. Use strace -c to compare the number of system calls in ls, ls -l, ls --no-color, and ls --color. You can use the directory /scratch/scip/lustre_2017/many-files/ as a place with many files in it. How many system calls per file were there for each option?
  2. Using strace -c, compare the times of find and lfs find on the directory mentioned above. Why is it different?
  3. (Advanced, requires slurm knowledge from future tutorials) You will find some sample files in /scratch/scip/hpc-examples/io. Create a temporary directory and…
    1. Run create_iodata.sh to make some data files in data/
    2. Compare the IO operations of find and lfs find on this directory.
    3. use the iotest.sh script to do some basic analysis. How long does it take? Submit it as a slurm batch job.
    4. Modify the iotest.sh script to copy the data/ directory to local storage, do the operations, then remove the data. Compare to previous strategy.
    5. Use tar to compress the data while it is on lustre. Unpack this tar archive to local storage, do the operations, then remove. Compare to previous strategies.
  4. Mount your work directory by SMB - and alternatively sftp or sshfs - and transfer a file to Triton. Note that you must be on eduroam, the aalto with Aalto laptop, or connected to the Aalto VPN.
  5. (Advanced) If you have a Linux on Mac computer, study the rsync manual page and try to transfer a file.
  6. What do all of the following have in common?
    1. A job is submitted but fails with no output or messages.
    2. I can’t start a Jupyter server on jupyter.triton.
    3. Some files are randomly empty. Or the file had content, I tried to save it again, and now it’s empty!
    4. I can’t log in.
    5. I can log in with ssh, but ssh -X doesn’t work for graphical programs.
    6. I get an error message about corruption, such as InvalidArchiveError("Error with archive ... You probably need to delete and re-download or re-create this file.
    7. I can’t install my own Python/R/etc libraries.

What’s next?

See also

  • ../usage/lustre
  • ../usage/localstorage
  • ../usage/quotas
  • ../usage/smallfiles
  • If you are doing heavy I/O: ../usage/storage

The next tutorial is about interactive jobs.

Locally installed software

Software Version(s) Module Description
Matlab 2016b,2017b,2019a,2019b,2020a,2021b matlab/r[version] Multiple versions with 2020a as default
R 3.4.0 Module Compileed with Intel icc and mkl
Lumerical 2021 lumerical/2021 FDTD