ISH

From SciNet Users Documentation
Jump to navigation Jump to search

ish - inventory shell

With ish, you can browse the content of directories, tarball and purging lists from a local inventory file (or index). The typical usage case is that of remotely stored (tar) files, which you do not have direct access to but which you would like to know the content of. HPSS is a prime example.

Note that ish is not intended to be able to change your existing (tar) files. The index files are simply a convenient, but read-only view of the files that are stored.

Ish is open source: [1]

Typical usage cases

Typical local usage

The typical usage case outside of hpss is that you have a (gzipped) tar file data.tgz, which you'd like to inspect without having to unpack it. If this is a local file, you can of course just use 'tar -tvf', but this gives you only a flat listing. Instead, in ish you can do the following (starting from the bash prompt):

rzon@scinet02:~$ ls
data.tgz
rzon@scinet02:~$ /scinet/niagara/bin/ish
ish 0.99
Ramses van Zon - SciNet/Toronto/Canada/Sep 8, 2011
[ish]hpss.igz> index data.tgz
[ish]data.tgz.igz> ls -l
drwxr-xr-x rzon/scinet            0 2011-02-10 13:57:01 data/ 
-rw-r--r-- rzon/scinet        16714 2010-10-05 12:41:45 input.ini
-rwxr-xr-x rzon/scinet          293 2011-06-30 12:42:57 submit.pbs
[ish]data.tgz.igz> cd data
[ish]data.tgz.igz> ls
run1/   run2/
[ish]data.tgz.igz> find important*.dat
run1/important01.dat  run1/important02.dat  run1/important03.dat
run1/important04.dat  run1/important05.dat  run1/important06.dat
run2/important01.dat  run2/important02.dat  run2/important03.dat
run2/important04.dat  run2/important05.dat  run2/important06.dat
[ish]exit
rzon@scinet02:~$

The first ish line contains the 'index' command and created an index file from the content of data.tgz. Index files are named after the entity that they index, followed by the extension .igz. This index file is stored (by default, anyway) in $HOME/.ish_register. After its creation, the index file is loaded, as the ish prompt indicates. One can then list the content in a similar way as in a shell, and one can look for specific files.

Typical hpss usage: browse the hpss tree

In an hpss context, ish can be used to list all of the files in your /archive directory. To accomplish this, one needs to run the hindex command, but this only works on a machine that has HSI installed, so the following script has to be submitted to the archive queue.

#!/bin/bash
# This script is named: data-list.sh 
#PBS -q archive
#PBS -N hpss_index
#PBS -j oe
#PBS -m e
/scinet/gpc/bin/ish hindex

Submit this jobs with qsub data-list.sh from a gpc development node, and wait for it to finish. The 'hindex' command will then have created an index for your whole hpss file tree. The location of the index is once again ~/.ish_register, and it is called hpss.igz.

You can browse through the index from any SciNet machine. For instance, you can now do

[gpc01]$ /scinet/gpc/bin/ish
ish 0.98
Ramses van Zon - SciNet/Toronto/Canada/July 8, 2011
[ish]hpss.igz> ls 
aliases.gz              fg.tar                  summaries/ 
bin.tgz                 gmd.tar.idx             summary0512.txt 
cbin.tar.idx            gmd.tar                 Tests_ljmpi.tar.idx 
cbin.tar                .hsikeysets/            Tests_ljmpi.tar 
fakegas.tar.idx         large.tar               TOOLS/ 
fakegas.tar             mmm/                    TOOLS.tar.idx 
fg.tar.idx              mpitests/               TOOLS.tar 
[ish]exit
$

Typical hpss usage: browse a remote tar file

You can also create an index file of a tar file on hpss, using e.g. 'ish hindex bin.tgz' in a similar job script as above. Depending on whether the file bin.tgz was moved already onto tape within the hpss system, this can take some time, but once it is done, you have the local index file in .ish_register.

Other features of ish

  • Colourized listings
  • List all available index files
  • Switch between different index files
  • Save older versions of index files with a date stamp
  • Show how an index file was created
  • Create indices for large remote, gzipped tar files.
  • User-defined location of index files
  • Index local directories
  • Index the monthly scratch purging notices at SciNet
  • Tar and create an index with one command
  • Htar and create a (local) index with one command
  • Check exit codes of ish, tar, htar and hsi commands
  • Extensive help system

ish is located in /scinet/gpc/bin/ish. The current version is 0.98 (a beta version). It should be mentioned that it currently does not have support for filenames with spaces and does not store checksums or symbolic links (that is, the meta data will show a file as being a symlink, but the file it points to is not stored).

Ramses van Zon - SciNet/Toronto/Canada/July 8, 2011

Usage

From the command line

   ish -h|--help                      show this help
   ish --version                      show version number
   ish [INDEX]                        interactive shell for file INDEX
   ish [INDEX] COMMAND                perform single COMMAND on file INDEX

When INDEX is omitted, hpss.igz is loaded if available.

Shell Commands

Most common

   ls [-lr]    [DIR/[FILES]] ...       list FILES from DIR in index
   du [-r]     [DIR]                   sum file numbers and sizes
   cd          DIR                     set current directory in index
   find        PATTERN ...             find files following PATTERN in index
   index       DIR|TARFILE             make index for folder/tarfile
   exit                                exit the ish shell

More

   avail       [-a]                    list (all) available index files
   colour      1|0                     set colour usage
   help        [COMMAND]               show help on (all) commands
   register    [DIR]                   set new index file location 
   use         [INDEX]                 use INDEX or list available ones
   unuse                               use the previous index file again
   info                                show properties of the index file
   pwd                                 show current directory
   settings                            show settings (colour, etc.)
   tar -[z]cf  TARFILE DIR[/FILES] ... tar and make index
   check [-n]  [COMMENT]               exit ish if error in last command
   !COMMAND    [ARGS]                  local commands (ls, cd, pwd only)
   pindex      [FILE]                  make index from a purge listing

Only in archive queue (hpss)

   hindex      DIR|TARFILE             make index for hpss folder/tarfile
   htar -[p]cf TARFILE DIR[/FILES] ... htar and make index

Command line examples

 Local tar and gzip directory 'code':  ish tar -czf code.tgz code
 Long list of its content:             ish code.tgz.igz ls -l 'code/*'
 Make index file for existing tar:     ish index another.tar
 List its top level content:           ish another.tar.igz ls
 List all of its content:              ish another.tar.igz ls -r
 Find file 'hello.txt' in it:          ish another.tar.igz find hello.txt
 Make index hpss.igz for hpss tree:    ish hindex
 Create index file for an htar file:   ish hindex data.tar
 Create htar and index file:           ish htar -cpf data.tar data/

Ish commands

help - show help on ish commands

Usage

   help [COMMAND]

If no COMMAND is given, a list of all ish commands, with a brief description, is given.

ls - list directory contents

Usage

   ls [OPTION] [ PATTERN [PATTERN ... ] ]

Lists files in the index according to one or more patterns. In interactive mode, the files are displayed in colour. The list is sorted alphabetically by name.

Patterns are of the form [PATH[/]][FILES] and may contain wildcards * and ?. Without FILES, the form PATH only lists the directory name, while PATH/ lists the files in the directory.

When no patterns are given, ls lists files in the current directory in the current index file (as given on the command line or set with use).

Paths are relative to the current directory (as set with 'cd'). To specify a path from the root of the index, an initial colon (:) should be put in front of the path. If the index contains absolute paths, the root can be indicated by an initial slash (/) as well.

The optional argument can be

 -l  list in long format, displaying information on file sizes,
     modification times, and other metadata present in the index.
 -r  lists recursively into subdirectories.

To give both options, they have to be combined into one, e.g. -lr.


du - sum file sizes in directories

Usage

   du [-r] [PATH]

Lists the number of the files and their sizes in kilobytes contained in the directory PATH.

When no PATH is given, du sums files in the current directory in the current index file (as given on the command line or set with use). Note that the count is cumulative, i.e. files in subdirectories are also counted.

PATHs are relative to the current directory (as set with 'cd'). Wildcards are not supported.

The optional argument -r can be given to recursively list the sizes and the number of files in subdirectories.

cd - set current directory in index

Usage

   cd [PATH]

Change the current directory within the index to the directory specified by PATH.

The new path is relative to the previous current directory, unless preceded by a colon (:), in which case it specifies a path from the root of the index. If the index contains absolute paths, the root can be indicated by an initial slash (/) as well.

If no PATH is given, the current directory is set to the root directory of the index file.

The current directory is always displayed in the ish prompt.

find - recursively find files in index

Usage

   find PATTERN ...

Recursively searches the directory tree in the index from the current directory (set by 'cd') for files following one or more patterns. PATTERN may be of the form PATH/FILE or just FILE, and can contain the wildcards * and ?.

exit - exit shell

Usage

   exit 

Ish will exit with the exit status of the last command that was run.

When input is redirected to be read from a file or here document, the end-of-file also exits ish, so in that case exit is optional.

register - set index file location

Usage

   register [DIR]

Sets the location where index files are stored. This location is called the register. DIR must be an existing directory on the file system. If DIR is omitted, the register is set to the default location.

The default location of the register is the (hidden) directory ~/.ish_register, or is set by the environment variable ISHREGISTER.

Index files in the current directory will be found by 'use' as well, regardless of the register setting. But to store an index file in the current directory (using 'index', 'hindex', or 'pindex'), the register has to be set to '.' first.

The current setting of the register can be found with 'settings'.

index - make an index for a local directory or tarfile

Usage

   index PATH|TARFILE

Makes an index file of a directory (PATH) or tar file (TARFILE). The PATH, or the paths stored in the tar file, can be either absolute or relative (not both). The tar file can be in compressed format, provided the local tar installation supports it.

The index files are stored in the register directory (~/.ish_register by default). See 'register' for details.

For consistency, if the parent directory of any directory in PATH or the TARFILE is not in that PATH or in that TARFILE, it gets added to the index as a stub (i.e., not with its full content). For example, 'index /home/rzon' will include /home as a stub, which will seem to only contain the directory rzon. Similarly, if test.tar was created (not from within ish) with 'tar cf test.tar /home/rzon', then 'index test.tar' will add a stub for /home.

Ish automatically assigns a name to the index file, following this naming convention

  • A directory index stored in the current directory will be called PATH.igz, with any slashes replaced by underscores.
  • A directory index stored in the registry will be called ABSPATH.igz, where ABSPATH is the absolute path to the directory PATH, again, with slashes replaced by underscores.
  • A tar index will be called TARFILE.igz, with slashes replaced with underscores.
  • When running an index command again, or running an index command that would result in the name of an existing index that would be overwritten, the existing index file NAME.igz gets renamed to NAME_DATE_TIME.igz, where date is the date of the existing index file. The exception to this rule is that if the new index file has identical content to the old one, the old one is removed.

Once the indexing is done, the index file becomes the current one in the ish shell.

The index file for a tar file can be created when the tar is made, using the ish command tar (see 'tar').

hindex - make index for remote hpss folder/tarfile (hpss only)

Usage

   hindex [PATH|TARFILE]

Makes an index file of a directory (PATH) or tar file (TARFILE) on hpss. The distinction is made on whether the argument has the extension .tar, .tar.gz, .tgz, .tar.bz2, .tbz2, or tb2.

The PATH, or the paths stored in the tar file, can be either absolute or relative. If the tar file is uncompressed, then it should ideally have been created with htar.

If neither a PATH nor a TARFILE is given, hindex will generate an index for your whole hpss directory, and store the index in hpss.igz.

For consistency, if the parent directory of any directory in PATH or the TARFILE is not in that PATH or in that TARFILE, it gets added to the index as a stub (i.e., not with its full content). For example, 'hindex /archive/scinet/rzon' will include /archive and /archive/scinet as a stubs, which will seem to only contain one directory. Similarly, if test.tar was created (not from within ish) with 'htar cf test.tar /home/rzon', then 'hindex test.tar' will add a stub for /home.

Ish automatically assigns a name to the index file, following this naming convention

  • The directory index will be called PATH.igz, with any slashes replaced by underscores.
  • A tar index will be called TARFILE.igz, with slashes replaced with underscores.
  • The index will be called hpss.igz when hindex is called without argument.
  • When running an hindex command again, or running an hindex command that would result in an existing index getting overwritten, the existing index file NAME.igz gets renamed to NAME_DATE_TIME.igz, where DATE and TIME are of the existing index file. The exception to this rule is that if the new index file has identical content to the old one, the old one is removed.

Ish calls the hsi application under the hood to get the directory listings. This means that this command has to be run on a system that has hpss with hsi. This may mean you have to submit a job for building the index. With the index file, you can then locally traverse the listings in the index and see the modification times and file sizes.

If the tar file was created with htar (outside of ish), then the index is created from the remote index file generated by a call to htar. So this command may have to be run on a system that has hpss with htar.

If the hpss-resident tar file was created with tar (outside of ish), then no remote index file exists and ish will first request htar to generate one. Because this may require the tar to be reclaimed from tape, this is an expensive operation. So it is recommended to 1) use htar, or, 2) at the point of tarring, use ish's tar command, or, 3) if the tar is already created and still available locally, to use the ish command 'index'.

Once the indexing is done, the index file becomes the current one in the ish shell.

The index files are stored in the register directory (~/.ish_register by default). See 'register' for details.

Note that the htarring and the creation of a local index file can be done with one command, using the 'htar' ish command.

use - set the index file

Usage

   use INDEX

Sets the current index file to INDEX. This file should reside in the register directory (see 'register') or the current directory. INDEX should include the index file extension '.igz'.

Note that the index file can also be set when starting ish with the name of the index file as the first argument (e.g. ish data.tgz.igz). Omitting the index file when starting ish will cause if to look for hpss.igz.

See 'index', 'hindex' and 'pindex' on how to create index files, and on their naming scheme.

The current index file is always displayed in the ish prompt.

info - show information on the current index file

Usage

   info 

Lists information such as creation date, ish version, how it was created.

colour - set colour usage

Usage

   colour ARG

Switches the usage of colour in listings on if ARG=1 and off if ARG=0. By default, colour is on in interactive sessions, and off in single command mode.

The current colour setting can be found with the 'settings' command.

pwd - show current directory

Usage

   pwd 

Shows the current directory within the index (as set by 'cd').

This command should be rarely needed, as the current directory is always displayed as part of the ish prompt.

settings - list ish settings

Usage

   settings 

Lists settings such as colour and register location.

tar - local tar and make index

Usage

   tar [OPTION] TARFILE [DIR][FILES] ...

Tars a directory or files list as usual, AND creates the corresponding index file. It should be equivalent to running the same tar command from the shell (i.e. not within ish), followed by "ish index TARFILE".

Common tar options

 -c  create the file.
 -z  gzip the tarball. 
 -j  bzip2 the tarball.
 -f  Indicates that output should go to the tarfile. 
     Has to be the last option.

Other options can be found from the tar man page. Multiple options have to be combined into one, e.g. -czf.

avail - list available index files

Usage

   avail [-a]

Displays a list of index files the register directory (see 'register') or the current directory. These index files can be accessed with the 'use' command.

The list contains the creation date and time of the igz file. Note that this need not be the same as the creation date of the corresponding tar.

By default, older versions of the igz files (which are automatically saved by index and hindex) are not shown.

The optional argument can be

 -a  When given, older saved versions of the igz files are also listed.

See 'index', 'hindex' and 'pindex' on how to create index files, and on their naming scheme.

!COMMAND - run local commands (ls, cd, pwd only)

Usage

   !COMMAND [ARGS]

Runs a local command. This is intended to allow the local directory to be changed and listed. Any arguments will be passed to the command.

Only the commands ls, cd and pwd will be accepted to run locally.

htar - run htar and make index (hpss only)

Usage

   htar OPTIONS TARFILE [DIR][FILES] ...

Htars a directory or set of files list as usual, AND creates the corresponding local index file. It should be equivalent to running the same htar command from the shell (i.e. not from within ish), followed by "ish hindex TARFILE".

Common htar options

 -c  create the file. 
 -p  Preserve time stamps.
 -f  always the last option; indicates that output should go to a file. 

To give multiple options, they have to be combined into one, e.g. -cpf. Other options can be found from the htar man page.

check - exit if the previous command had an error

Usage

   check [-n] [COMMENT]

Ish will print the COMMENT, then the error message of the last command, and will exit with its exit code. Very useful and recommended when doing multiple nontrivial actions in one ish session.

If the optional argument -n is given, ish does not exit.

pindex - make a local index for a purge file

Usage

   pindex [FILE]

Makes an index file from the monthly scratch purge file generated for you at Scinet. (Obviously, this is a very SciNet-specific command). This list only contains files, so for consistency, stubs are added for directories.

If no FILE is given, ish searches for your most current purge list.

The name of the index file is composed of the file name of the purge file, without underscores, and prepended by a path with slashes replaced by underscores. The index file is stored in the register directory (~/.ish_register by default). See 'register' for details.

Once the indexing is done, the index file becomes the current one in the ish shell.


BACK TO HPSS