ISH
ish - inventory shell
With ish, you can browse the content of directories, tarball and purging lists from a local inventory file (or index). The typical usage case is that of remotely stored (tar) files, which you do not have direct access to but which you would like to know the content of. HPSS is a prime example.
Note that ish is not intended to be able to change your existing (tar) files. The index files are simply a convenient, but read-only view of the files that are stored.
Ish is open source: [1]
Typical usage cases
Typical local usage
The typical usage case outside of hpss is that you have a (gzipped) tar file data.tgz, which you'd like to inspect without having to unpack it. If this is a local file, you can of course just use 'tar -tvf', but this gives you only a flat listing. Instead, in ish you can do the following (starting from the bash prompt):
rzon@scinet02:~$ ls data.tgz rzon@scinet02:~$ /scinet/niagara/bin/ish ish 0.99 Ramses van Zon - SciNet/Toronto/Canada/Sep 8, 2011 [ish]hpss.igz> index data.tgz [ish]data.tgz.igz> ls -l drwxr-xr-x rzon/scinet 0 2011-02-10 13:57:01 data/ -rw-r--r-- rzon/scinet 16714 2010-10-05 12:41:45 input.ini -rwxr-xr-x rzon/scinet 293 2011-06-30 12:42:57 submit.pbs [ish]data.tgz.igz> cd data [ish]data.tgz.igz> ls run1/ run2/ [ish]data.tgz.igz> find important*.dat run1/important01.dat run1/important02.dat run1/important03.dat run1/important04.dat run1/important05.dat run1/important06.dat run2/important01.dat run2/important02.dat run2/important03.dat run2/important04.dat run2/important05.dat run2/important06.dat [ish]exit rzon@scinet02:~$
The first ish line contains the 'index' command and created an index file from the content of data.tgz. Index files are named after the entity that they index, followed by the extension .igz. This index file is stored (by default, anyway) in $HOME/.ish_register. After its creation, the index file is loaded, as the ish prompt indicates. One can then list the content in a similar way as in a shell, and one can look for specific files.
Typical hpss usage: browse the hpss tree
In an hpss context, ish can be used to list all of the files in your /archive directory. To accomplish this, one needs to run the hindex command, but this only works on a machine that has HSI installed, so the following script has to be submitted to the archive queue.
#!/bin/bash # This script is named: data-list.sh #PBS -q archive #PBS -N hpss_index #PBS -j oe #PBS -m e /scinet/gpc/bin/ish hindex
Submit this jobs with qsub data-list.sh from a gpc development node, and wait for it to finish. The 'hindex' command will then have created an index for your whole hpss file tree. The location of the index is once again ~/.ish_register, and it is called hpss.igz.
You can browse through the index from any SciNet machine. For instance, you can now do
[gpc01]$ /scinet/gpc/bin/ish ish 0.98 Ramses van Zon - SciNet/Toronto/Canada/July 8, 2011 [ish]hpss.igz> ls aliases.gz fg.tar summaries/ bin.tgz gmd.tar.idx summary0512.txt cbin.tar.idx gmd.tar Tests_ljmpi.tar.idx cbin.tar .hsikeysets/ Tests_ljmpi.tar fakegas.tar.idx large.tar TOOLS/ fakegas.tar mmm/ TOOLS.tar.idx fg.tar.idx mpitests/ TOOLS.tar [ish]exit $
Typical hpss usage: browse a remote tar file
You can also create an index file of a tar file on hpss, using e.g. 'ish hindex bin.tgz' in a similar job script as above. Depending on whether the file bin.tgz was moved already onto tape within the hpss system, this can take some time, but once it is done, you have the local index file in .ish_register.
Other features of ish
- Colourized listings
- List all available index files
- Switch between different index files
- Save older versions of index files with a date stamp
- Show how an index file was created
- Create indices for large remote, gzipped tar files.
- User-defined location of index files
- Index local directories
- Index the monthly scratch purging notices at SciNet
- Tar and create an index with one command
- Htar and create a (local) index with one command
- Check exit codes of ish, tar, htar and hsi commands
- Extensive help system
ish is located in /scinet/gpc/bin/ish. The current version is 0.98 (a beta version). It should be mentioned that it currently does not have support for filenames with spaces and does not store checksums or symbolic links (that is, the meta data will show a file as being a symlink, but the file it points to is not stored).
Ramses van Zon - SciNet/Toronto/Canada/July 8, 2011
Usage
From the command line
ish -h|--help show this help ish --version show version number ish [INDEX] interactive shell for file INDEX ish [INDEX] COMMAND perform single COMMAND on file INDEX
When INDEX is omitted, hpss.igz is loaded if available.
Shell Commands
Most common
ls [-lr] [DIR/[FILES]] ... list FILES from DIR in index du [-r] [DIR] sum file numbers and sizes cd DIR set current directory in index find PATTERN ... find files following PATTERN in index index DIR|TARFILE make index for folder/tarfile exit exit the ish shell
More
avail [-a] list (all) available index files colour 1|0 set colour usage help [COMMAND] show help on (all) commands register [DIR] set new index file location use [INDEX] use INDEX or list available ones unuse use the previous index file again info show properties of the index file pwd show current directory settings show settings (colour, etc.) tar -[z]cf TARFILE DIR[/FILES] ... tar and make index check [-n] [COMMENT] exit ish if error in last command !COMMAND [ARGS] local commands (ls, cd, pwd only) pindex [FILE] make index from a purge listing
Only in archive queue (hpss)
hindex DIR|TARFILE make index for hpss folder/tarfile htar -[p]cf TARFILE DIR[/FILES] ... htar and make index
Command line examples
Local tar and gzip directory 'code': ish tar -czf code.tgz code Long list of its content: ish code.tgz.igz ls -l 'code/*' Make index file for existing tar: ish index another.tar List its top level content: ish another.tar.igz ls List all of its content: ish another.tar.igz ls -r Find file 'hello.txt' in it: ish another.tar.igz find hello.txt Make index hpss.igz for hpss tree: ish hindex Create index file for an htar file: ish hindex data.tar Create htar and index file: ish htar -cpf data.tar data/
Ish commands
help - show help on ish commands
Usage
help [COMMAND]
If no COMMAND is given, a list of all ish commands, with a brief description, is given.
ls - list directory contents
Usage
ls [OPTION] [ PATTERN [PATTERN ... ] ]
Lists files in the index according to one or more patterns. In interactive mode, the files are displayed in colour. The list is sorted alphabetically by name.
Patterns are of the form [PATH[/]][FILES] and may contain wildcards * and ?. Without FILES, the form PATH only lists the directory name, while PATH/ lists the files in the directory.
When no patterns are given, ls lists files in the current directory in the current index file (as given on the command line or set with use).
Paths are relative to the current directory (as set with 'cd'). To specify a path from the root of the index, an initial colon (:) should be put in front of the path. If the index contains absolute paths, the root can be indicated by an initial slash (/) as well.
The optional argument can be
-l list in long format, displaying information on file sizes, modification times, and other metadata present in the index. -r lists recursively into subdirectories.
To give both options, they have to be combined into one, e.g. -lr.
du - sum file sizes in directories
Usage
du [-r] [PATH]
Lists the number of the files and their sizes in kilobytes contained in the directory PATH.
When no PATH is given, du sums files in the current directory in the current index file (as given on the command line or set with use). Note that the count is cumulative, i.e. files in subdirectories are also counted.
PATHs are relative to the current directory (as set with 'cd'). Wildcards are not supported.
The optional argument -r can be given to recursively list the sizes and the number of files in subdirectories.
cd - set current directory in index
Usage
cd [PATH]
Change the current directory within the index to the directory specified by PATH.
The new path is relative to the previous current directory, unless preceded by a colon (:), in which case it specifies a path from the root of the index. If the index contains absolute paths, the root can be indicated by an initial slash (/) as well.
If no PATH is given, the current directory is set to the root directory of the index file.
The current directory is always displayed in the ish prompt.
find - recursively find files in index
Usage
find PATTERN ...
Recursively searches the directory tree in the index from the current directory (set by 'cd') for files following one or more patterns. PATTERN may be of the form PATH/FILE or just FILE, and can contain the wildcards * and ?.
exit - exit shell
Usage
exit
Ish will exit with the exit status of the last command that was run.
When input is redirected to be read from a file or here document, the end-of-file also exits ish, so in that case exit is optional.
register - set index file location
Usage
register [DIR]
Sets the location where index files are stored. This location is called the register. DIR must be an existing directory on the file system. If DIR is omitted, the register is set to the default location.
The default location of the register is the (hidden) directory ~/.ish_register, or is set by the environment variable ISHREGISTER.
Index files in the current directory will be found by 'use' as well, regardless of the register setting. But to store an index file in the current directory (using 'index', 'hindex', or 'pindex'), the register has to be set to '.' first.
The current setting of the register can be found with 'settings'.
index - make an index for a local directory or tarfile
Usage
index PATH|TARFILE
Makes an index file of a directory (PATH) or tar file (TARFILE). The PATH, or the paths stored in the tar file, can be either absolute or relative (not both). The tar file can be in compressed format, provided the local tar installation supports it.
The index files are stored in the register directory (~/.ish_register by default). See 'register' for details.
For consistency, if the parent directory of any directory in PATH or the TARFILE is not in that PATH or in that TARFILE, it gets added to the index as a stub (i.e., not with its full content). For example, 'index /home/rzon' will include /home as a stub, which will seem to only contain the directory rzon. Similarly, if test.tar was created (not from within ish) with 'tar cf test.tar /home/rzon', then 'index test.tar' will add a stub for /home.
Ish automatically assigns a name to the index file, following this naming convention
- A directory index stored in the current directory will be called PATH.igz, with any slashes replaced by underscores.
- A directory index stored in the registry will be called ABSPATH.igz, where ABSPATH is the absolute path to the directory PATH, again, with slashes replaced by underscores.
- A tar index will be called TARFILE.igz, with slashes replaced with underscores.
- When running an index command again, or running an index command that would result in the name of an existing index that would be overwritten, the existing index file NAME.igz gets renamed to NAME_DATE_TIME.igz, where date is the date of the existing index file. The exception to this rule is that if the new index file has identical content to the old one, the old one is removed.
Once the indexing is done, the index file becomes the current one in the ish shell.
The index file for a tar file can be created when the tar is made, using the ish command tar (see 'tar').
hindex - make index for remote hpss folder/tarfile (hpss only)
Usage
hindex [PATH|TARFILE]
Makes an index file of a directory (PATH) or tar file (TARFILE) on hpss. The distinction is made on whether the argument has the extension .tar, .tar.gz, .tgz, .tar.bz2, .tbz2, or tb2.
The PATH, or the paths stored in the tar file, can be either absolute or relative. If the tar file is uncompressed, then it should ideally have been created with htar.
If neither a PATH nor a TARFILE is given, hindex will generate an index for your whole hpss directory, and store the index in hpss.igz.
For consistency, if the parent directory of any directory in PATH or the TARFILE is not in that PATH or in that TARFILE, it gets added to the index as a stub (i.e., not with its full content). For example, 'hindex /archive/scinet/rzon' will include /archive and /archive/scinet as a stubs, which will seem to only contain one directory. Similarly, if test.tar was created (not from within ish) with 'htar cf test.tar /home/rzon', then 'hindex test.tar' will add a stub for /home.
Ish automatically assigns a name to the index file, following this naming convention
- The directory index will be called PATH.igz, with any slashes replaced by underscores.
- A tar index will be called TARFILE.igz, with slashes replaced with underscores.
- The index will be called hpss.igz when hindex is called without argument.
- When running an hindex command again, or running an hindex command that would result in an existing index getting overwritten, the existing index file NAME.igz gets renamed to NAME_DATE_TIME.igz, where DATE and TIME are of the existing index file. The exception to this rule is that if the new index file has identical content to the old one, the old one is removed.
Ish calls the hsi application under the hood to get the directory listings. This means that this command has to be run on a system that has hpss with hsi. This may mean you have to submit a job for building the index. With the index file, you can then locally traverse the listings in the index and see the modification times and file sizes.
If the tar file was created with htar (outside of ish), then the index is created from the remote index file generated by a call to htar. So this command may have to be run on a system that has hpss with htar.
If the hpss-resident tar file was created with tar (outside of ish), then no remote index file exists and ish will first request htar to generate one. Because this may require the tar to be reclaimed from tape, this is an expensive operation. So it is recommended to 1) use htar, or, 2) at the point of tarring, use ish's tar command, or, 3) if the tar is already created and still available locally, to use the ish command 'index'.
Once the indexing is done, the index file becomes the current one in the ish shell.
The index files are stored in the register directory (~/.ish_register by default). See 'register' for details.
Note that the htarring and the creation of a local index file can be done with one command, using the 'htar' ish command.
use - set the index file
Usage
use INDEX
Sets the current index file to INDEX. This file should reside in the register directory (see 'register') or the current directory. INDEX should include the index file extension '.igz'.
Note that the index file can also be set when starting ish with the name of the index file as the first argument (e.g. ish data.tgz.igz). Omitting the index file when starting ish will cause if to look for hpss.igz.
See 'index', 'hindex' and 'pindex' on how to create index files, and on their naming scheme.
The current index file is always displayed in the ish prompt.
info - show information on the current index file
Usage
info
Lists information such as creation date, ish version, how it was created.
colour - set colour usage
Usage
colour ARG
Switches the usage of colour in listings on if ARG=1 and off if ARG=0. By default, colour is on in interactive sessions, and off in single command mode.
The current colour setting can be found with the 'settings' command.
pwd - show current directory
Usage
pwd
Shows the current directory within the index (as set by 'cd').
This command should be rarely needed, as the current directory is always displayed as part of the ish prompt.
settings - list ish settings
Usage
settings
Lists settings such as colour and register location.
tar - local tar and make index
Usage
tar [OPTION] TARFILE [DIR][FILES] ...
Tars a directory or files list as usual, AND creates the corresponding index file. It should be equivalent to running the same tar command from the shell (i.e. not within ish), followed by "ish index TARFILE".
Common tar options
-c create the file. -z gzip the tarball. -j bzip2 the tarball. -f Indicates that output should go to the tarfile. Has to be the last option.
Other options can be found from the tar man page. Multiple options have to be combined into one, e.g. -czf.
avail - list available index files
Usage
avail [-a]
Displays a list of index files the register directory (see 'register') or the current directory. These index files can be accessed with the 'use' command.
The list contains the creation date and time of the igz file. Note that this need not be the same as the creation date of the corresponding tar.
By default, older versions of the igz files (which are automatically saved by index and hindex) are not shown.
The optional argument can be
-a When given, older saved versions of the igz files are also listed.
See 'index', 'hindex' and 'pindex' on how to create index files, and on their naming scheme.
!COMMAND - run local commands (ls, cd, pwd only)
Usage
!COMMAND [ARGS]
Runs a local command. This is intended to allow the local directory to be changed and listed. Any arguments will be passed to the command.
Only the commands ls, cd and pwd will be accepted to run locally.
htar - run htar and make index (hpss only)
Usage
htar OPTIONS TARFILE [DIR][FILES] ...
Htars a directory or set of files list as usual, AND creates the corresponding local index file. It should be equivalent to running the same htar command from the shell (i.e. not from within ish), followed by "ish hindex TARFILE".
Common htar options
-c create the file. -p Preserve time stamps. -f always the last option; indicates that output should go to a file.
To give multiple options, they have to be combined into one, e.g. -cpf. Other options can be found from the htar man page.
check - exit if the previous command had an error
Usage
check [-n] [COMMENT]
Ish will print the COMMENT, then the error message of the last command, and will exit with its exit code. Very useful and recommended when doing multiple nontrivial actions in one ish session.
If the optional argument -n is given, ish does not exit.
pindex - make a local index for a purge file
Usage
pindex [FILE]
Makes an index file from the monthly scratch purge file generated for you at Scinet. (Obviously, this is a very SciNet-specific command). This list only contains files, so for consistency, stubs are added for directories.
If no FILE is given, ish searches for your most current purge list.
The name of the index file is composed of the file name of the purge file, without underscores, and prepended by a path with slashes replaced by underscores. The index file is stored in the register directory (~/.ish_register by default). See 'register' for details.
Once the indexing is done, the index file becomes the current one in the ish shell.