NAME
PolyglotMan, rman - reverse compile man pages from formatted
form to a number of source formats
SYNOPSIS
rman [ _o_p_t_i_o_n_s ] [ _f_i_l_e ]
DESCRIPTION
_P_o_l_y_g_l_o_t_M_a_n takes man pages from most of the popular flavors
of UNIX and transforms them into any of a number of text source
formats. PolyglotMan was formerly known as RosettaMan. The name
of the binary is still called _r_m_a_n , for scripts that depend
on that name; mnemonically, just think "reverse man". Previously
_P_o_l_y_g_l_o_t_M_a_n required pages to be formatted by nroff prior
to its processing. With version 3.0, it _p_r_e_f_e_r_s _[_t_n_]_r_o_f_f _s_o_u_r_c_e
and usually produces results that are better yet. And source
processing is the only way to translate tables. Source format
translation is not as mature as formatted, however, so try formatted
translation as a backup.
In parsing [tn]roff source, one could implement an arbitrarily
large subset of [tn]roff, which I did not and will not do, so
the results can be off. I did implement a significant subset
of those use in man pages, however, including tbl (but not eqn),
if tests, and general macro definitions, so usually the results
look great. If they don't, format the page with nroff before
sending it to PolyglotMan. If PolyglotMan doesn't recognize a
key macro used by a large class of pages, however, e-mail me
the source and a uuencoded nroff-formatted page and I'll see
what I can do. When running PolyglotMan with man page source
that includes or redirects to other [tn]roff source using the .so (source
or inclusion) macro, you should be in the parent directory of
the page, since pages are written with this assumption. For example,
if you are translating /usr/man/man1/ls.1, first cd into /usr/man.
_P_o_l_y_g_l_o_t_M_a_n accepts man pages from: SunOS, Sun Solaris,
Hewlett-Packard HP-UX, AT&T System V, OSF/1 aka Digital UNIX,
DEC Ultrix, SGI IRIX, Linux, FreeBSD, SCO. Source processing
works for: SunOS, Sun Solaris, Hewlett-Packard HP-UX, AT&T System
V, OSF/1 aka Digital UNIX, DEC Ultrix. It can produce printable
ASCII-only (control characters stripped), section headers-only,
Tk, TkMan, [tn]roff (traditional man page source), SGML, HTML,
MIME, LaTeX, LaTeX2e, RTF, Perl 5 POD. A modular architecture
permits easy addition of additional output formats.
The latest version of PolyglotMan is always available from
_f_t_p_:_/_/_f_t_p_._c_s_._b_e_r_k_e_l_e_y_._e_d_u_/_u_c_b_/_p_e_o_p_l_e_/_p_h_e_l_p_s_/_t_c_l_t_k_/_r_m_a_n_._t_a_r_._Z .
OPTIONS
The following options should not be used with any others and
exit PolyglotMan without processing any input.
-
-h|--help
-
Show list of command line options and exit.
-
-v|--version
-
Show version number and exit.
_Y_o_u _s_h_o_u_l_d _s_p_e_c_i_f_y _t_h_e _f_i_l_t_e_r _f_i_r_s_t_, _a_s _t_h_i_s _s_e_t_s _a _n_u_m_b_e_r
_o_f _p_a_r_a_m_e_t_e_r_s_, _a_n_d _t_h_e_n _s_p_e_c_i_f_y _o_t_h_e_r _o_p_t_i_o_n_s_.
_<_d_l _c_o_m_p_a_c_t_>
_<_d_t_>
_-_f_|_-_-_f_i_l_t_e_r _<_A_S_C_I_I_|_r_o_f_f_|_T_k_M_a_n_|_T_k_|_S_e_c_t_i_o_n_s_|_H_T_M_L_|_S_G_M_L_|_M_I_M_E_|_L_a_T_e_X_|_L_a_T_e_X_2_e_|_R_T_F_|_P_O_D_>
_<_d_d_>
_S_e_t _t_h_e _o_u_t_p_u_t _f_i_l_t_e_r_. _D_e_f_a_u_l_t_s _t_o _A_S_C_I_I_.
_<_d_t_>
_-_S_|_-_-_s_o_u_r_c_e
_<_d_d_>
_P_o_l_y_g_l_o_t_M_a_n _t_r_i_e_s _t_o _a_u_t_o_m_a_t_i_c_a_l_l_y _d_e_t_e_r_m_i_n_e _w_h_e_t_h_e_r _i_t_s _i_n_p_u_t
_i_s _s_o_u_r_c_e _o_r _f_o_r_m_a_t_t_e_d_; _u_s_e _t_h_i_s _o_p_t_i_o_n _t_o _d_e_c_l_a_r_e _s_o_u_r_c_e _i_n_p_u_t_.
_<_d_t_>
_-_F_|_-_-_f_o_r_m_a_t_|_-_-_f_o_r_m_a_t_t_e_d
_<_d_d_>
_P_o_l_y_g_l_o_t_M_a_n _t_r_i_e_s _t_o _a_u_t_o_m_a_t_i_c_a_l_l_y _d_e_t_e_r_m_i_n_e _w_h_e_t_h_e_r _i_t_s _i_n_p_u_t
_i_s _s_o_u_r_c_e _o_r _f_o_r_m_a_t_t_e_d_; _u_s_e _t_h_i_s _o_p_t_i_o_n _t_o _d_e_c_l_a_r_e _f_o_r_m_a_t_t_e_d
_i_n_p_u_t_.
_<_d_t_>
_-_l_|_-_-_t_i_t_l_e _p_r_i_n_t_f_-_s_t_r_i_n_g
In HTML mode this sets the of the man pages, given the
same parameters as _-_r .
-r|--reference|--manref _p_r_i_n_t_f_-_s_t_r_i_n_g
In HTML and SGML modes this sets the URL form by which to retrieve
other man pages. The string can use two supplied parameters:
the man page name and its section. (See the Examples section.)
If the string is null (as if set from a shell by "-r ''"), `-'
or `off', then man page references will not be HREFs, just set
in italics. If your printf supports XPG3 positions specifier,
this can be quite flexible.
-V|--volumes _<_c_o_l_o_n_-_s_e_p_a_r_a_t_e_d _l_i_s_t_>
Set the list of valid volumes to check against when looking for
cross-references to other man pages. Defaults to _1_:_2_:_3_:_4_:_5_:_6_:_7_:_8_:_9_:_o_:_l_:_n_:_p (volume
names can be multicharacter). If an non-whitespace string in
the page is immediately followed by a left parenthesis, then
one of the valid volumes, and ends with optional other characters
and then a right parenthesis--then that string is reported as
a reference to another manual page. If this -V string starts
with an equals sign, then no optional characters are allowed
between the match to the list of valids and the right parenthesis. (This
option is needed for SCO UNIX.)
The following options apply only when formatted pages are given
as input. They do not apply or are always handled correctly with
the source.
-
-b|--subsections
-
Try to recognize subsection titles in addition to section titles.
This can cause problems on some UNIX flavors.
-
-K|--nobreak
-
Indicate manual pages don't have page breaks, so don't look for
footers and headers around them. (Older nroff -man macros always
put in page breaks, but lately some vendors have realized that
printout are made through troff, whereas nroff -man is used to
format pages for reading on screen, and so have eliminated page
breaks.) _P_o_l_y_g_l_o_t_M_a_n usually gets this right even without
this flag.
-
-k|--keep
-
Keep headers and footers, as a canonical report at the end of
the page. changeleft
Move changebars, such as those found in the Tcl/Tk manual pages,
to the left. --> notaggressive
_D_i_s_a_b_l_e aggressive man page parsing. Aggressive manual,
which is on by default, page parsing elides headers and footers,
identifies sections and more. -->
-
-n|--name _n_a_m_e
-
Set name of man page (used in roff format). If the filename is
given in the form " _n_a_m_e . _s_e_c_t_i_o_n ", the name and
section are automatically determined. If the page is being parsed
from [tn]roff source and it has a .TH line, this information
is extracted from that line.
-
-p|--paragraph
-
paragraph mode toggle. The filter determines whether lines should
be linebroken as they were by nroff, or whether lines should
be flowed together into paragraphs. Mainly for internal use.
-
-s|section _#
-
Set volume (aka section) number of man page (used in roff format).
tables
Turn on aggressive table parsing. -->
-
-t|--tabstops _#
-
For those macros sets that use tabs in place of spaces where
possible in order to reduce the number of characters used, set
tabstops every _# columns. Defaults to 8.
NOTES ON FILTER TYPES
ROFF
Some flavors of UNIX ship man page without [tn]roff source, making
one's laser printer little more than a laser-powered daisy wheel.
This filer tries to intuit the original [tn]roff directives,
which can then be recompiled by [tn]roff.
TkMan
TkMan, a hypertext man page browser, uses _P_o_l_y_g_l_o_t_M_a_n
to show man pages without the (usually) useless headers and footers
on each pages. It also collects section and (optionally) subsection
heads for direct access from a pulldown menu. TkMan and Tcl/Tk,
the toolkit in which it's written, are available via anonymous
ftp from _f_t_p_:_/_/_f_t_p_._s_m_l_i_._c_o_m_/_p_u_b_/_t_c_l_/
Tk
This option outputs the text in a series of Tcl lists consisting
of text-tags pairs, where tag names roughly correspond to HTML.
This output can be inserted into a Tk text widget by doing an
_e_v_a_l _<_t_e_x_t_w_i_d_g_e_t_> _i_n_s_e_r_t _e_n_d _<_t_e_x_t_> . This format should be
relatively easily parsable by other programs that want both the
text and the tags. Also see ASCII.
ASCII
When printed on a line printer, man pages try to produce special
text effects by overstriking characters with themselves (to produce
bold) and underscores (underlining). Other text processing software,
such as text editors, searchers, and indexers, must counteract
this. The ASCII filter strips away this formatting. Piping nroff
output through _c_o_l _-_b also strips away this formatting,
but it leaves behind unsightly page headers and footers. Also
see Tk.
Sections
Dumps section and (optionally) subsection titles. This might
be useful for another program that processes man pages.
HTML
With a simple extension to an HTTP server for Mosaic or other
World Wide Web browser, _P_o_l_y_g_l_o_t_M_a_n can produce high quality
HTML on the fly. Several such extensions and pointers to several
others are included in _P_o_l_y_g_l_o_t_M_a_n 's _c_o_n_t_r_i_b directory.
SGML
This is approaching the Docbook DTD, but I'm hoping that someone
that someone with a real interest in this will polish the tags
generated. Try it to see how close the tags are now.
MIME
MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563,
good for consumption by MIME-aware e-mailers or as Emacs (>=19.29)
enriched documents.
LaTeX and LaTeX2e
Why not?
RTF
Use output on Mac or NeXT or whatever. Maybe take random man
pages and integrate with NeXT's documentation system better.
Maybe NeXT has own man page macros that do this.
PostScript and FrameMaker
To produce PostScript, use _g_r_o_f_f or _p_s_r_o_f_f . To
produce FrameMaker MIF, use FrameMaker's builtin filter. In both
cases you need _[_t_n_]_r_o_f_f source, so if you only have a
formatted version of the manual page, use _P_o_l_y_g_l_o_t_M_a_n 's
roff filter first.
EXAMPLES
To convert the _f_o_r_m_a_t_t_e_d man page named _l_s_._1 back
into [tn]roff source form:
_r_m_a_n _-_f _r_o_f_f _/_u_s_r_/_l_o_c_a_l_/_m_a_n_/_c_a_t_1_/_l_s_._1 _> _/_u_s_r_/_l_o_c_a_l_/_m_a_n_/_m_a_n_1_/_l_s_._1
Long man pages are often compressed to conserve space (compression
is especially effective on formatted man pages as many of the
characters are spaces). As it is a long man page, it probably
has subsections, which we try to separate out (some macro sets
don't distinguish subsections well enough for _P_o_l_y_g_l_o_t_M_a_n
to detect them). Let's convert this to LaTeX format:
_p_c_a_t _/_u_s_r_/_c_a_t_m_a_n_/_a___m_a_n_/_c_a_t_1_/_a_u_t_o_m_o_u_n_t_._z _| _r_m_a_n _-_b _-_n _a_u_t_o_m_o_u_n_t _-_s _1 _-_f
_l_a_t_e_x _> _a_u_t_o_m_o_u_n_t_._m_a_n
Alternatively, _m_a_n _1 _a_u_t_o_m_o_u_n_t _| _r_m_a_n _-_b _-_n _a_u_t_o_m_o_u_n_t _-_s _1 _-_f
_l_a_t_e_x _> _a_u_t_o_m_o_u_n_t_._m_a_n
For HTML/Mosaic users, _P_o_l_y_g_l_o_t_M_a_n can, without modification
of the source code, produce HTML links that point to other HTML
man pages either pregenerated or generated on the fly. First
let's assume pregenerated HTML versions of man pages stored in _/_u_s_r_/_m_a_n_/_h_t_m_l .
Generate these one-by-one with the following form:
_r_m_a_n _-_f _h_t_m_l _-_r _'_h_t_t_p_:_/_u_s_r_/_m_a_n_/_h_t_m_l_/_%_s_._%_s_._h_t_m_l_' _/_u_s_r_/_m_a_n_/_c_a_t_1_/_l_s_._1 _> _/_u_s_r_/_m_a_n_/_h_t_m_l_/_l_s_._1_._h_t_m_l
If you've extended your HTML client to generate HTML on the fly
you should use something like:
_r_m_a_n _-_f _h_t_m_l _-_r _'_h_t_t_p_:_~_/_b_i_n_/_m_a_n_2_h_t_m_l_?_%_s_:_%_s_' _/_u_s_r_/_m_a_n_/_c_a_t_1_/_l_s_._1
when generating HTML.
BUGS/INCOMPATIBILITIES
_P_o_l_y_g_l_o_t_M_a_n is not perfect in all cases, but it usually
does a good job, and in any case reduces the problem of converting
man pages to light editing.
Tables in formatted pages, especially H-P's, aren't handled very
well. Be sure to pass in source for the page to recognize tables.
The man pager _w_o_m_a_n applies its own idea of formatting
for man pages, which can confuse _P_o_l_y_g_l_o_t_M_a_n . Bypass
_w_o_m_a_n by passing the formatted manual page text directly
into _P_o_l_y_g_l_o_t_M_a_n .
The [tn]roff output format uses fB to turn on boldface. If your
macro set requires .B, you'll have to a postprocess the _P_o_l_y_g_l_o_t_M_a_n
output.
SEE ALSO
_t_k_m_a_n_(_1_) , _x_m_a_n_(_1_) , _m_a_n_(_1_) , _m_a_n_(_7_)
or _m_a_n_(_5_) depending on your flavor of UNIX
AUTHOR
PolyglotMan
by Thomas A. Phelps ( _p_h_e_l_p_s_@_A_C_M_._o_r_g )
developed at the
University of California, Berkeley
Computer Science Division
Manual page last updated on $Date: 2004/03/05 14:26:40 $