An ``interpreter script'' is a file which has been set executable (see chmod(2)) and which has a first line of the form:
#!
pathname[ argument]
The ``#!'' must appear as the first two characters of the file. A space between the ``#!'' and pathname is optional. At most one argument may follow pathname, and the length of the entire line is limited (see below).
If such a file is executed (such as via the
execve(2)
system call), the interpreter specified by the
pathname
is executed by the system.
(The
pathname
is executed without regard to the
PATH
variable, so in general
pathname
should be an absolute path.)
The arguments passed to the interpreter will be as follows. argv[0] will be the path to the interpreter itself, as specified on the first line of the script. If there is an argument following pathname on the first line of the script, it will be passed as argv[1]. The subsequent elements of argv will be the path to the interpreter script file itself (i.e. the original argv[0]) followed by any further arguments passed when execve(2) was invoked to execute the script file.
By convention, it is expected that an interpreter will open the script file passed as an argument and process the commands within it. Typical interpreters treat `#' as a comment character, and thus will ignore the initial line of the script because it begins ``#!'', but there is no requirement for this per se.
On
NetBSD,
the length of the
``#!''
line, excluding the
``#!''
itself, is limited to
PATH_MAX
(as defined in
<limits.h
>).
Other operating systems impose much smaller limits on the length of
the
``#!''
line (see below).
Note that the interpreter may not itself be an interpreter script. If pathname does not point to an executable binary, execution of the interpreter script will fail.
PATH
.
This makes it somewhat challenging to set the
``#!''
line of a script so that it will run identically on different systems.
Since the
env(1)
utility executes a command passed to it on its command line, it is
often used as a
``trampoline''
to render scripts portable.
If the leading line of a script reads
#!
/usr/bin/env
interp
then the
env(1)
command will execute the
``interp''
command it finds in its
PATH
,
passing on to it all subsequent arguments with which it itself was called.
Since
/usr/bin/env
is found on almost all
POSIX
style systems, this trick is frequently exploited by authors who need
a script to execute without change on multiple systems.
AT&T UNIX
,
there was only one interpreter used on the system,
/bin/sh
,
and the shell treated any file that failed to execute with an
ENOEXEC
error
(see
intro(2))
as a shell script.
Most shells (such as
sh(1))
and certain other facilities (including
execlp(3)
and
execvp(3)
but not other types of
exec(3)
calls) still pass
interpreter scripts that do not include the
``#!''
(and thus fail to execute with
ENOEXEC
)
to
/bin/sh
.
As this behavior is implemented outside the kernel, there is no
mechanism that forces it to be respected by all programs that execute
other programs.
It is thus not completely reliable.
It is therefore important to always include
#!/bin/sh
in front of Bourne shell scripts, and to treat the traditional
behavior as obsolete.
/bin/interp
and that the file
/tmp/script
contains:
#!/bin/interp -arg
[...]
and that
/tmp/script
is set mode 755.
Executing
$
/tmp/script
one
two
three
at the shell will result in
/bin/interp
being executed, receiving the following arguments in
argv
(numbered from 0):
"/bin/interp, "-arg, "/tmp/script, "one, "two, "three
Consider the following variation on the previous example.
Suppose that an executable binary exists in
/bin/interp
and that the file
/tmp/script
contains:
#!/bin/interp -x -y
[...]
and that
/tmp/script
is set mode 755.
Executing
$
/tmp/script
one
two
three
at the shell will result in
/bin/interp
being executed, receiving the following arguments in
argv
(numbered from 0):
"/bin/interp, "-x -y, "/tmp/script, "one, "two, "three
Note that "-x -y will be passed on NetBSD as a single argument.
Although most POSIX style operating systems will pass only one argument, the behavior when multiple arguments are included is not consistent between platforms. Some, such as current releases of NetBSD, will concatenate multiple arguments into a single argument (as above), some will truncate them, and at least one will pass them as multiple arguments.
The NetBSD behavior is common but not universal. Sun's Solaris would present the above argument as "-x, dropping the "-y entirely. Perhaps uniquely, recent versions of Apple's OS X will actually pass multiple arguments properly, i.e.:
"/bin/interp, "-x, "-y, "/tmp/script, "one, "two, "three
The behavior of the system in the face of multiple arguments is thus not currently standardized, should not be relied on, and may be changed in future releases. In general, pass at most one argument, and do not rely on multiple arguments being concatenated.
The behavior is partially (but not completely) described in the System V Interface Definition, Fourth Edition (``SVID4'') .
Although it has never been formally standardized, the behavior described is largely portable across POSIX style systems, with two significant exceptions: the maximum length of the ``#!'' line, and the behavior if multiple arguments are passed. Please be aware that some operating systems limit the line to 32 or 64 characters, and that (as described above) the behavior in the face of multiple arguments is not consistent across systems.
AT&T UNIX
.
A Usenet posting to net.unix by Guy Harris on October 16, 1984 claims
that the idea for the
``#!''
behavior was first proposed by Dennis Ritchie but that the first
implementation was on
BSD.
Historical manuals (specifically the exec man page) indicate that the behavior was present in 4BSD at least as early as April, 1981. Information on precisely when it was first implemented, and in which version of UNIX, is solicited.
In addition to the fact that many interpreters (and scripts) are simply not designed to be robust in a setuid context, a race condition exists between the moment that the kernel examines the interpreter script file and the moment that the newly invoked interpreter opens the file itself.
Because of these security issues,
NetBSD
does not allow setuid interpreter scripts by default.
In order to turn on setuid interpreter scripts,
options SETUIDSCRIPTS
must be set in the configuration of the running kernel.
Setting this option implies the
FDSCRIPTS
option, which causes the kernel to open the script file on behalf of
the interpreter and pass it in
argv
as
/dev/fd/[fdnum]
.
(See
fd(4)
for an explanation of the
/dev/fd/[fdnum]
devices.)
This design avoids the race condition, at the cost of denying the
interpreter the actual name of the script file.
See
options(4)
for more information.
However, the
FDSCRIPTS
mechanism is not a cure-all for security issues in setuid interpreters
and scripts.
Subtle techniques can be used to subvert even seemingly well written scripts.
Scripts executed by Bourne type shells can be subverted in numerous
ways, such as by setting the
IFS
variable before executing the script.
Other interpreters possess their own vulnerabilities.
Turning on
SETUIDSCRIPTS
is therefore very dangerous, and should not be done lightly if at all.