SSH Tunneling

From SciNet Users Documentation
Jump to navigation Jump to search

What is SSH tunneling?

SSH tunnelling is a method to use a gateway computer to connect two computers that cannot connect directly.

SSH tunneling is necessary in certain cases, because compute nodes on Niagara do not have direct access to the internet, nor can the compute nodes be contacted directly from the internet.

The following use cases require SSH tunnels:

  1. Running commercial software on a compute node that needs to contact a license server over the internet.
  2. Running visualization software on a compute node that needs to be contacted by client software on a user's local computer.
  3. Running a Jupyter notebook on a compute node that needs to be contacted by the web browser on a user's local computer.
  4. Connecting to cedar database server from somewhere other than cedar head node, e.g., your desktop

In the first case, the license server is situated outside of the compute cluster and is rarely under a user's control, whereas in the other cases, the server is on the compute node but the challenge is to connect to it from the outside. We will therefore consider these two kind of cases separately.

Contacting a license server from a compute node

Certain commercially-licensed programs must connect to a license server machine somewhere on the internet via a predetermined port. If the compute node where the program is running has no access to the internet, then a gateway server which does have access must be used to forward communications, on that port, from the compute node to the license server. To enable this one must set up an SSH tunnel. Such an arrangement is also called port forwarding.

In most cases, creating an SSH tunnel in a batch job requires just two or three commands in your job script. You will need the following information:

  1. The IP address, or the name, of the license server. Let's call this LICSERVER.
  2. The port number of the license service. Let's call this LICPORT.

You should obtain this information from whoever maintains the license server. That server also must allow connections from the login nodes; for Niagara, the outgoing IP address will either be 142.1.174.227 or 142.1.174.228.

With this information, one can now setup the SSH tunnel.

The gateway server on Niagara is called nia-gw.

You need to choose the port number on the compute node to use. Let's call the latter COMPUTEPORT.

The ssh command to issue in the job script is then:

ssh nia-gw -L COMPUTEPORT:LICSERVER:LICPORT -n -N -f

In this command, the string following the -L parameter specifies the port forwarding information, the parameter -n prevents ssh to read input (it couldn't in a compute job anyway), the parameter -N tells ssh not to open a shell on the GATEWAY, and the parameter -f tells ssh to run in the background, allowing the job script to proceed past this ssh command.

A further command to add to the job script should tell the software that the license server is on port COMPUTEPORT on the server 'localhost'. Here, 'localhost' is not a placeholder, rather, it is the literal name to use - 'localhost' is a standard host name pseudonym by which a computer can refer to itself. Exactly how to inform your software to use this port on 'localhost' will depend on the specific application and the type of license server, but often it is simply a matter of setting an environment variable in the job script like

export MLM_LICENSE_FILE=COMPUTEPORT@localhost

Example job script

The following job script sets up an ssh tunnel to contact a license server licenseserver.institution.ca at port 9999:

#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 40
#SBATCH --time 3:00:00

ssh nia-gw -L 9999:licenseserver.institution.ca:9999 -N -f
export MLM_LICENSE_FILE=9999@localhost

module load thesoftware/2.0
mpirun thesoftware ..... 

Contacting a visualization, Jupyterhub, database or other server running on compute node

SSH tunnelling can also be used in the context of the Alliance (formerly Compute Canada) to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a Jupyter notebook or visualization software to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster. In case of connecting to a database server where the connection is possible though the head node only the SSH tunneling can be used to move an arbitrary port number of a compute network to head node of a cluster and bind it to the database server.