S4H
Introduction
S4H (formerly SciNet4Health) is our secure computing environment pilot, providing users with the ability to run Trillium jobs on confidential data. This subsystem is comprised of a dedicated login node and a storage appliance, but it is highly integrated with Trillium. Security concerns are addressed by
- Hardened access
- Encryption at rest
- Group isolation
- Data egress control (optional)
Usage of S4H is by request only. Access must be requested by a principal investigator (PI) on behalf of their group members (i.e. sponsored users on CCDB).
Policies
Each user is assigned one of three policies:
- Permissive: the user may connect to the login node using SSH from pre-approved source IP addresses, and has unrestricted internet access from the login node
- Restrictive: the user may connect to the login node using SSH from pre-approved source IP addresses, but internet access from the login node is restricted
- Prohibitive: the user may only connect to the login node using a remote desktop client program from pre-approved source IP addresses, and internet access from the login node is restricted
If you don't know what policy you belong to, you should ask your PI.
Login
Direct
Users with permission to connect directly to the login node (permissive and restrictive policies) should first make sure that they are able to login to Trillium (i.e. they have uploaded an SSH public key to, and set up second factor authentication on CCDB). If access to Trillium is successful, use the same username and SSH key to login to the following address:
s4h.scinet.utoronto.ca
You should be prompted for the second factor, like in Trillium.
The connection must be made from one of the IP addresses pre-approved by the PI for that user (e.g. a workstation or a jump host in your lab).
Through the graphical gateway
Users with permission to connect through the graphical gateway should use an RDP-enabled remote desktop client and login to the following address using their CCDB username and password:
s4h-ggw.scinet.utoronto.ca
It is recommended to set the display resolution to 1600×900 if the resolution is not picked up automatically by the program.
Additionally, a "pre-login" step has to be performed. In this step, an SSH agent must be forwarded to the above address. The graphical gateway asks the user's SSH agent program to perform the authentication (i.e. the workstation where the user is connecting from has the private key and it has been added to the agent). In a sense this is a 3-factor authentication: one needs the password, the SSH private key, and have either a YubiKey or the Duo mobile app registered with CCDB. Here is an example of this process:
eval $(ssh-agent) ssh-add /home/alice/.ssh/ccdb_ed25519 ssh -T -A alice@s4h-ggw.scinet.utoronto.ca
Note that no shell access is expected after the ssh command; but the window on the remote desktop program should now prompt for the YubiKey passcode or Duo mobile app push. Once that is done as well, the SSH client will print _Login successful_ and quit.
The connection must be made from one of the IP addresses pre-approved by the PI for that user (e.g. a workstation or a jump host in your lab).
Storage
Directories
Trillium file systems are accessible via their usual paths but are read-only on S4H (to prevent accidentally saving sensitive data there). Instead, home, scratch, and project spaces are provided on alternative paths under /s4h (indicating the encrypted storage appliance). If the user "alice" belongs to the group "def-bob" on S4H, their home directory (which can be expanded from ~ or $HOME) will be located in /s4h/def-bob/home/alice and similarly their scratch directory (can be expanded from $SCRATCH) will be in /s4h/def-bob/scratch/alice. The project directory is /s4h/def-bob/project, users may create their own directories there as needed.
The environment variables $TRIHOME, $TRISCRATCH, and $TRIPROJECT expand to the corresponding file system paths for Trillium (as noted above, they are read-only on S4H)
Data transfer
For users in the permissive and restrictive policies, please use an SSH-based program (such as scp or rsync) to transfer data directly in and out of the S4H login node. There is no dedicated datamover for S4H.
Under the prohibitive policy, users may not transfer sensitive data in and out of S4H. They may only upload non-sensitive data to Trillium (storage not encrypted at rest) where they can be accessed from S4H. For egress purposes, the PI should designate at least one user (could be themselves) that is not under the prohibitive policy. Other users in the group could share files for egress with the designated user (e.g. by putting them in the group's project directory).
Data policies
It is important to understand that:
- Within a group, file access is managed by traditional POSIX permissions and access-control lists, like on Trillium. In case users in a group are working on separate sub-projects where there should not be mutual access, it is their responsibility to make sure that permissions are set up correctly.
- There is no way to facilitate cross-group file sharing of sensitive files on S4H; each group has a different encryption key and the system is set up so that a compute node can only use one key at a time.
- No backup is provided for encrypted storage; deletion is irreversible. This ensure that data are securely disposed of in compliance with a provision found in many data sharing agreements.
Software
Same as Trillium.
Note that you may use software (including, for example, Python virtual environments) that you installed in your Trillium file systems on S4H (but not vice versa). This could be useful for users under the restrictive or prohibitive policy, that may otherwise have difficulty installing software in their encrypted storage spaces.
Submitting jobs
This is largely the same as Trillium. Note however that job metadata are not kept confidential! In particular, the submitting user, work directory, job name, comment, and command should be considered public information and the users must not include any sensitive information in these.