-* Portability:: Making @code{tar} Archives More Portable
-* Compression:: Using Less Space through Compression
-* Attributes:: Handling File Attributes
-* Standard:: The Standard Format
-* Extensions:: GNU Extensions to the Archive Format
-* cpio:: Comparison of @code{tar} and @code{cpio}
-@end menu
-
-@node Portability, Compression, Formats, Formats
-@section Making @code{tar} Archives More Portable
-
-Creating a @code{tar} archive on a particular system that is meant to be
-useful later on many other machines and with other versions of @code{tar}
-is more challenging than you might think. @code{tar} archive formats
-have been evolving since the first versions of Unix. Many such formats
-are around, and are not always comptible with each other. This section
-discusses a few problems, and gives some advice about making @code{tar}
-archives more portable.
-
-One golden rule is simplicity. For example, limit your @code{tar}
-archives to contain only regular files and directories, avoiding
-other kind of special files. Do not attempt to save sparse files or
-contiguous files as such. Let's discuss a few more problems, in turn.
-
-@menu
-* Portable Names:: Portable Names
-* dereference:: Symbolic Links
-* old:: Old V7 Archives
-* posix:: POSIX archives
-* Checksumming:: Checksumming Problems
-@end menu
-
-@node Portable Names, dereference, Portability, Portability
-@subsection Portable Names
-
-Use @emph{straight} file and directory names, made up of printable
-ASCII characters, avoiding colons, slashes, backslashes, spaces, and
-other @emph{dangerous} characters. Avoid deep directory nesting.
-Accounting for oldish System V machines, limit your file and directory
-names to 14 characters or less.
-
-If you intend to have your @code{tar} archives to be read under MSDOS,
-you should not rely on case distinction for file names, and you might
-use the GNU @code{doschk} program for helping you further diagnosing
-illegal MSDOS names, which are even more limited than System V's.
-
-@node dereference, old, Portable Names, Portability
-@subsection Symbolic Links
-@cindex File names, using symbolic links
-@cindex Symbolic link as file name
-
-Normally, when @code{tar} archives a symbolic link, it writes a
-block to the archive naming the target of the link. In that way, the
-@code{tar} archive is a faithful record of the filesystem contents.
-@value{op-dereference} is used with @value{op-create}, and causes @code{tar}
-to archive the files symbolic links point to, instead of the links
-themselves. When this option is used, when @code{tar} encounters a
-symbolic link, it will archive the linked-to file, instead of simply
-recording the presence of a symbolic link.
-
-The name under which the file is stored in the file system is not
-recorded in the archive. To record both the symbolic link name and
-the file name in the system, archive the file under both names. If
-all links were recorded automatically by @code{tar}, an extracted file
-might be linked to a file name that no longer exists in the file
-system.
-
-If a linked-to file is encountered again by @code{tar} while creating
-the same archive, an entire second copy of it will be stored. (This
-@emph{might} be considered a bug.)
-
-So, for portable archives, do not archive symbolic links as such,
-and use @value{op-dereference}: many systems do not support
-symbolic links, and moreover, your distribution might be unusable if
-it contains unresolved symbolic links.
-
-@node old, posix, dereference, Portability
-@subsection Old V7 Archives
-@cindex Format, old style
-@cindex Old style format
-@cindex Old style archives
-
-Certain old versions of @code{tar} cannot handle additional
-information recorded by newer @code{tar} programs. To create an
-archive in V7 format (not ANSI), which can be read by these old
-versions, specify the @value{op-old-archive} option in
-conjunction with the @value{op-create}. @code{tar} also
-accepts @samp{--portability} for this option. When you specify it,
-@code{tar} leaves out information about directories, pipes, fifos,
-contiguous files, and device files, and specifies file ownership by
-group and user IDs instead of group and user names.
-
-When updating an archive, do not use @value{op-old-archive}
-unless the archive was created with using this option.
-
-In most cases, a @emph{new} format archive can be read by an @emph{old}
-@code{tar} program without serious trouble, so this option should
-seldom be needed. On the other hand, most modern @code{tar}s are
-able to read old format archives, so it might be safer for you to
-always use @value{op-old-archive} for your distributions.
-
-@node posix, Checksumming, old, Portability
-@subsection GNU @code{tar} and POSIX @code{tar}
-
-GNU @code{tar} was based on an early draft of the POSIX 1003.1
-@code{ustar} standard. GNU extensions to @code{tar}, such as the
-support for file names longer than 100 characters, use portions of the
-@code{tar} header record which were specified in that POSIX draft as
-unused. Subsequent changes in POSIX have allocated the same parts of
-the header record for other purposes. As a result, GNU @code{tar} is
-incompatible with the current POSIX spec, and with @code{tar} programs
-that follow it.
-
-We plan to reimplement these GNU extensions in a new way which is
-upward compatible with the latest POSIX @code{tar} format, but we
-don't know when this will be done.
-
-In the mean time, there is simply no telling what might happen if you
-read a GNU @code{tar} archive, which uses the GNU extensions, using
-some other @code{tar} program. So if you want to read the archive
-with another @code{tar} program, be sure to write it using the
-@samp{--old-archive} option (@samp{-o}).
-
-@FIXME{is there a way to tell which flavor of tar was used to write a
-particular archive before you try to read it?}
-
-Traditionally, old @code{tar}s have a limit of 100 characters. GNU
-@code{tar} attempted two different approaches to overcome this limit,
-using and extending a format specified by a draft of some P1003.1.
-The first way was not that successful, and involved @file{@@MaNgLeD@@}
-file names, or such; while a second approach used @file{././@@LongLink}
-and other tricks, yielding better success. In theory, GNU @code{tar}
-should be able to handle file names of practically unlimited length.
-So, if GNU @code{tar} fails to dump and retrieve files having more
-than 100 characters, then there is a bug in GNU @code{tar}, indeed.
-
-But, being strictly POSIX, the limit was still 100 characters.
-For various other purposes, GNU @code{tar} used areas left unassigned
-in the POSIX draft. POSIX later revised P1003.1 @code{ustar} format by
-assigning previously unused header fields, in such a way that the upper
-limit for file name length was raised to 256 characters. However, the
-actual POSIX limit oscillates between 100 and 256, depending on the
-precise location of slashes in full file name (this is rather ugly).
-Since GNU @code{tar} use the same fields for quite other purposes,
-it became incompatible with the latest POSIX standards.
-
-For longer or non-fitting file names, we plan to use yet another set
-of GNU extensions, but this time, complying with the provisions POSIX
-offers for extending the format, rather than conflicting with it.
-Whenever an archive uses old GNU @code{tar} extension format or POSIX
-extensions, would it be for very long file names or other specialities,
-this archive becomes non-portable to other @code{tar} implementations.
-In fact, anything can happen. The most forgiving @code{tar}s will
-merely unpack the file using a wrong name, and maybe create another
-file named something like @file{@@LongName}, with the true file name
-in it. @code{tar}s not protecting themselves may segment violate!
-
-Compatibility concerns make all this thing more difficult, as we
-will have to support @emph{all} these things together, for a while.
-GNU @code{tar} should be able to produce and read true POSIX format
-files, while being able to detect old GNU @code{tar} formats, besides
-old V7 format, and process them conveniently. It would take years
-before this whole area stabilizes@dots{}
-
-There are plans to raise this 100 limit to 256, and yet produce POSIX
-conformant archives. Past 256, I do not know yet if GNU @code{tar}
-will go non-POSIX again, or merely refuse to archive the file.
-
-There are plans so GNU @code{tar} support more fully the latest POSIX
-format, while being able to read old V7 format, GNU (semi-POSIX plus
-extension), as well as full POSIX. One may ask if there is part of
-the POSIX format that we still cannot support. This simple question
-has a complex answer. Maybe that, on intimate look, some strong
-limitations will pop up, but until now, nothing sounds too difficult
-(but see below). I only have these few pages of POSIX telling about
-`Extended tar Format' (P1003.1-1990 -- section 10.1.1), and there are
-references to other parts of the standard I do not have, which should
-normally enforce limitations on stored file names (I suspect things
-like fixing what @kbd{/} and @kbd{@key{NUL}} means). There are also
-some points which the standard does not make clear, Existing practice
-will then drive what I should do.
-
-POSIX mandates that, when a file name cannot fit within 100 to
-256 characters (the variance comes from the fact a @kbd{/} is
-ideally needed as the 156'th character), or a link name cannot
-fit within 100 characters, a warning should be issued and the file
-@emph{not} be stored. Unless some @value{op-posix} option is given
-(or @code{POSIXLY_CORRECT} is set), I suspect that GNU @code{tar}
-should disobey this specification, and automatically switch to using
-GNU extensions to overcome file name or link name length limitations.
-
-There is a problem, however, which I did not intimately studied yet.
-Given a truly POSIX archive with names having more than 100 characters,
-I guess that GNU @code{tar} up to 1.11.8 will process it as if it were an
-old V7 archive, and be fooled by some fields which are coded differently.
-So, the question is to decide if the next generation of GNU @code{tar}
-should produce POSIX format by default, whenever possible, producing
-archives older versions of GNU @code{tar} might not be able to read
-correctly. I fear that we will have to suffer such a choice one of these
-days, if we want GNU @code{tar} to go closer to POSIX. We can rush it.
-Another possibility is to produce the current GNU @code{tar} format
-by default for a few years, but have GNU @code{tar} versions from some
-1.@var{POSIX} and up able to recognize all three formats, and let older
-GNU @code{tar} fade out slowly. Then, we could switch to producing POSIX
-format by default, with not much harm to those still having (very old at
-that time) GNU @code{tar} versions prior to 1.@var{POSIX}.
-
-POSIX format cannot represent very long names, volume headers,
-splitting of files in multi-volumes, sparse files, and incremental
-dumps; these would be all disallowed if @value{op-posix} or
-@code{POSIXLY_CORRECT}. Otherwise, if @code{tar} is given long
-names, or @samp{-[VMSgG]}, then it should automatically go non-POSIX.
-I think this is easily granted without much discussion.
-
-Another point is that only @code{mtime} is stored in POSIX
-archives, while GNU @code{tar} currently also store @code{atime}
-and @code{ctime}. If we want GNU @code{tar} to go closer to POSIX,
-my choice would be to drop @code{atime} and @code{ctime} support on
-average. On the other hand, I perceive that full dumps or incremental
-dumps need @code{atime} and @code{ctime} support, so for those special
-applications, POSIX has to be avoided altogether.
-
-A few users requested that @value{op-sparse} be always active by
-default, I think that before replying to them, we have to decide
-if we want GNU @code{tar} to go closer to POSIX on average, while
-producing files. My choice would be to go closer to POSIX in the
-long run. Besides possible double reading, I do not see any point
-of not trying to save files as sparse when creating archives which
-are neither POSIX nor old-V7, so the actual @value{op-sparse} would
-become selected by default when producing such archives, whatever
-the reason is. So, @value{op-sparse} alone might be redefined to force
-GNU-format archives, and recover its previous meaning from this fact.
-
-GNU-format as it exists now can easily fool other POSIX @code{tar},
-as it uses fields which POSIX considers to be part of the file name
-prefix. I wonder if it would not be a good idea, in the long run,
-to try changing GNU-format so any added field (like @code{ctime},
-@code{atime}, file offset in subsequent volumes, or sparse file
-descriptions) be wholly and always pushed into an extension block,
-instead of using space in the POSIX header block. I could manage
-to do that portably between future GNU @code{tar}s. So other POSIX
-@code{tar}s might be at least able to provide kind of correct listings
-for the archives produced by GNU @code{tar}, if not able to process
-them otherwise.
-
-Using these projected extensions might induce older @code{tar}s to fail.
-We would use the same approach as for POSIX. I'll put out a @code{tar}
-capable of reading POSIXier, yet extended archives, but will not produce
-this format by default, in GNU mode. In a few years, when newer GNU
-@code{tar}s will have flooded out @code{tar} 1.11.X and previous, we
-could switch to producing POSIXier extended archives, with no real harm
-to users, as almost all existing GNU @code{tar}s will be ready to read
-POSIXier format. In fact, I'll do both changes at the same time, in a
-few years, and just prepare @code{tar} for both changes, without effecting
-them, from 1.@var{POSIX}. (Both changes: 1---using POSIX convention for
-getting over 100 characters; 2---avoiding mangling POSIX headers for GNU
-extensions, using only POSIX mandated extension techniques).
-
-So, a future @code{tar} will have a @value{op-posix}
-flag forcing the usage of truly POSIX headers, and so, producing
-archives previous GNU @code{tar} will not be able to read.
-So, @emph{once} pretest will announce that feature, it would be
-particularly useful that users test how exchangeable will be archives
-between GNU @code{tar} with @value{op-posix} and other POSIX @code{tar}.
-
-In a few years, when GNU @code{tar} will produce POSIX headers by
-default, @value{op-posix} will have a strong meaning and will disallow
-GNU extensions. But in the meantime, for a long while, @value{op-posix}
-in GNU tar will not disallow GNU extensions like @value{op-label},
-@value{op-multi-volume}, @value{op-sparse}, or very long file or link names.
-However, @value{op-posix} with GNU extensions will use POSIX
-headers with reserved-for-users extensions to headers, and I will be
-curious to know how well or bad POSIX @code{tar}s will react to these.
-
-GNU @code{tar} prior to 1.@var{POSIX}, and after 1.@var{POSIX} without
-@value{op-posix}, generates and checks @samp{ustar@w{ }@w{ }}, with two
-suffixed spaces. This is sufficient for older GNU @code{tar} not to
-recognize POSIX archives, and consequently, wrongly decide those archives
-are in old V7 format. It is a useful bug for me, because GNU @code{tar}
-has other POSIX incompatibilities, and I need to segregate GNU @code{tar}
-semi-POSIX archives from truly POSIX archives, for GNU @code{tar} should
-be somewhat compatible with itself, while migrating closer to latest
-POSIX standards. So, I'll be very careful about how and when I will do
-the correction.
-
-@node Checksumming, , posix, Portability
-@subsection Checksumming Problems
-
-SunOS and HP-UX @code{tar} fail to accept archives created using GNU
-@code{tar} and containing non-ASCII file names, that is, file names
-having characters with the eight bit set, because they use signed
-checksums, while GNU @code{tar} uses unsigned checksums while creating
-archives, as per POSIX standards. On reading, GNU @code{tar} computes
-both checksums and accept any. It is somewhat worrying that a lot of
-people may go around doing backup of their files using faulty (or at
-least non-standard) software, not learning about it until it's time
-to restore their missing files with an incompatible file extractor,
-or vice versa.
-
-GNU @code{tar} compute checksums both ways, and accept any on read,
-so GNU tar can read Sun tapes even with their wrong checksums.
-GNU @code{tar} produces the standard checksum, however, raising
-incompatibilities with Sun. That is to say, GNU @code{tar} has not
-been modified to @emph{produce} incorrect archives to be read by buggy
-@code{tar}'s. I've been told that more recent Sun @code{tar} now
-read standard archives, so maybe Sun did a similar patch, after all?
-
-The story seems to be that when Sun first imported @code{tar}
-sources on their system, they recompiled it without realizing that
-the checksums were computed differently, because of a change in
-the default signing of @code{char}'s in their compiler. So they
-started computing checksums wrongly. When they later realized their
-mistake, they merely decided to stay compatible with it, and with
-themselves afterwards. Presumably, but I do not really know, HP-UX
-has chosen that their @code{tar} archives to be compatible with Sun's.
-The current standards do not favor Sun @code{tar} format. In any
-case, it now falls on the shoulders of SunOS and HP-UX users to get
-a @code{tar} able to read the good archives they receive.