From 6488588c5fc97c8d4d6a86be76016ca980fc871f Mon Sep 17 00:00:00 2001 From: Sergey Poznyakoff Date: Mon, 26 Jun 2006 08:08:47 +0000 Subject: [PATCH] Update --- doc/tar.texi | 1908 +++++++++++++++++++++++++------------------------- 1 file changed, 949 insertions(+), 959 deletions(-) diff --git a/doc/tar.texi b/doc/tar.texi index 1d1131d..ccfe881 100644 --- a/doc/tar.texi +++ b/doc/tar.texi @@ -3055,6 +3055,13 @@ names. @xref{listing member and file names}. Invokes a @acronym{GNU} extension when adding files to an archive that handles sparse files efficiently. @xref{sparse}. +@opsummary{sparse-version} +@item --sparse-version=@var{version} + +Specified the @dfn{format version} to use when archiving sparse +files. Implies @option{--sparse}. @xref{sparse}. For the description +of the supported sparse formats, @xref{Sparse Formats}. + @opsummary{starting-file} @item --starting-file=@var{name} @itemx -K @var{name} @@ -7726,1180 +7733,1163 @@ to create archives in @samp{gnu} format, however, future version will switch to @samp{posix}. @menu -* Portability:: Making @command{tar} Archives More Portable * Compression:: Using Less Space through Compression * Attributes:: Handling File Attributes +* Portability:: Making @command{tar} Archives More Portable * cpio:: Comparison of @command{tar} and @command{cpio} @end menu -@node Portability -@section Making @command{tar} Archives More Portable - -Creating a @command{tar} archive on a particular system that is meant to be -useful later on many other machines and with other versions of @command{tar} -is more challenging than you might think. @command{tar} archive formats -have been evolving since the first versions of Unix. Many such formats -are around, and are not always compatible with each other. This section -discusses a few problems, and gives some advice about making @command{tar} -archives more portable. - -One golden rule is simplicity. For example, limit your @command{tar} -archives to contain only regular files and directories, avoiding -other kind of special files. Do not attempt to save sparse files or -contiguous files as such. Let's discuss a few more problems, in turn. - -@FIXME{Discuss GNU extensions (incremental backups, multi-volume -archives and archive labels) in GNU and PAX formats.} +@node Compression +@section Using Less Space through Compression @menu -* Portable Names:: Portable Names -* dereference:: Symbolic Links -* old:: Old V7 Archives -* ustar:: Ustar Archives -* gnu:: GNU and old GNU format archives. -* posix:: @acronym{POSIX} archives -* Checksumming:: Checksumming Problems -* Large or Negative Values:: Large files, negative time stamps, etc. -* Other Tars:: How to Extract GNU-Specific Data Using - Other @command{tar} Implementations +* gzip:: Creating and Reading Compressed Archives +* sparse:: Archiving Sparse Files @end menu -@node Portable Names -@subsection Portable Names +@node gzip +@subsection Creating and Reading Compressed Archives +@cindex Compressed archives +@cindex Storing archives in compressed format -Use portable file and member names. A name is portable if it contains -only ASCII letters and digits, @samp{/}, @samp{.}, @samp{_}, and -@samp{-}; it cannot be empty, start with @samp{-} or @samp{//}, or -contain @samp{/-}. Avoid deep directory nesting. For portability to -old Unix hosts, limit your file name components to 14 characters or -less. +@GNUTAR{} is able to create and read compressed archives. It supports +@command{gzip} and @command{bzip2} compression programs. For backward +compatibilty, it also supports @command{compress} command, although +we strongly recommend against using it, since there is a patent +covering the algorithm it uses and you could be sued for patent +infringement merely by running @command{compress}! Besides, it is less +effective than @command{gzip} and @command{bzip2}. -If you intend to have your @command{tar} archives to be read under -MSDOS, you should not rely on case distinction for file names, and you -might use the @acronym{GNU} @command{doschk} program for helping you -further diagnosing illegal MSDOS names, which are even more limited -than System V's. +Creating a compressed archive is simple: you just specify a +@dfn{compression option} along with the usual archive creation +commands. The compression option is @option{-z} (@option{--gzip}) to +create a @command{gzip} compressed archive, @option{-j} +(@option{--bzip2}) to create a @command{bzip2} compressed archive, and +@option{-Z} (@option{--compress}) to use @command{compress} program. +For example: -@node dereference -@subsection Symbolic Links -@cindex File names, using symbolic links -@cindex Symbolic link as file name +@smallexample +$ @kbd{tar cfz archive.tar.gz .} +@end smallexample -@opindex dereference -Normally, when @command{tar} archives a symbolic link, it writes a -block to the archive naming the target of the link. In that way, the -@command{tar} archive is a faithful record of the file system contents. -@option{--dereference} (@option{-h}) is used with @option{--create} (@option{-c}), and causes -@command{tar} to archive the files symbolic links point to, instead of -the links themselves. When this option is used, when @command{tar} -encounters a symbolic link, it will archive the linked-to file, -instead of simply recording the presence of a symbolic link. +Reading compressed archive is even simpler: you don't need to specify +any additional options as @GNUTAR{} recognizes its format +automatically. Thus, the following commands will list and extract the +archive created in previous example: -The name under which the file is stored in the file system is not -recorded in the archive. To record both the symbolic link name and -the file name in the system, archive the file under both names. If -all links were recorded automatically by @command{tar}, an extracted file -might be linked to a file name that no longer exists in the file -system. +@smallexample +# List the compressed archive +$ @kbd{tar tf archive.tar.gz} +# Extract the compressed archive +$ @kbd{tar xf archive.tar.gz} +@end smallexample -If a linked-to file is encountered again by @command{tar} while creating -the same archive, an entire second copy of it will be stored. (This -@emph{might} be considered a bug.) +The only case when you have to specify a decompression option while +reading the archive is when reading from a pipe or from a tape drive +that does not support random access. However, in this case @GNUTAR{} +will indicate which option you should use. For example: -So, for portable archives, do not archive symbolic links as such, -and use @option{--dereference} (@option{-h}): many systems do not support -symbolic links, and moreover, your distribution might be unusable if -it contains unresolved symbolic links. +@smallexample +$ @kbd{cat archive.tar.gz | tar tf -} +tar: Archive is compressed. Use -z option +tar: Error is not recoverable: exiting now +@end smallexample -@node old -@subsection Old V7 Archives -@cindex Format, old style -@cindex Old style format -@cindex Old style archives -@cindex v7 archive format +If you see such diagnostics, just add the suggested option to the +invocation of @GNUTAR{}: -Certain old versions of @command{tar} cannot handle additional -information recorded by newer @command{tar} programs. To create an -archive in V7 format (not ANSI), which can be read by these old -versions, specify the @option{--format=v7} option in -conjunction with the @option{--create} (@option{-c}) (@command{tar} also -accepts @option{--portability} or @samp{op-old-archive} for this -option). When you specify it, -@command{tar} leaves out information about directories, pipes, fifos, -contiguous files, and device files, and specifies file ownership by -group and user IDs instead of group and user names. +@smallexample +$ @kbd{cat archive.tar.gz | tar tfz -} +@end smallexample -When updating an archive, do not use @option{--format=v7} -unless the archive was created using this option. +Notice also, that there are several restrictions on operations on +compressed archives. First of all, compressed archives cannot be +modified, i.e., you cannot update (@option{--update} (@option{-u})) them or delete +(@option{--delete}) members from them. Likewise, you cannot append +another @command{tar} archive to a compressed archive using +@option{--append} (@option{-r})). Secondly, multi-volume archives cannot be +compressed. -In most cases, a @emph{new} format archive can be read by an @emph{old} -@command{tar} program without serious trouble, so this option should -seldom be needed. On the other hand, most modern @command{tar}s are -able to read old format archives, so it might be safer for you to -always use @option{--format=v7} for your distributions. +The following table summarizes compression options used by @GNUTAR{}. -@node ustar -@subsection Ustar Archive Format +@table @option +@opindex gzip +@opindex ungzip +@item -z +@itemx --gzip +@itemx --ungzip +Filter the archive through @command{gzip}. -@cindex ustar archive format -Archive format defined by @acronym{POSIX}.1-1988 specification is called -@code{ustar}. Although it is more flexible than the V7 format, it -still has many restrictions (@xref{Formats,ustar}, for the detailed -description of @code{ustar} format). Along with V7 format, -@code{ustar} format is a good choice for archives intended to be read -with other implementations of @command{tar}. +You can use @option{--gzip} and @option{--gunzip} on physical devices +(tape drives, etc.) and remote files as well as on normal files; data +to or from such devices or remote files is reblocked by another copy +of the @command{tar} program to enforce the specified (or default) record +size. The default compression parameters are used; if you need to +override them, set @env{GZIP} environment variable, e.g.: -To create archive in @code{ustar} format, use @option{--format=ustar} -option in conjunction with the @option{--create} (@option{-c}). +@smallexample +$ @kbd{GZIP=--best tar cfz archive.tar.gz subdir} +@end smallexample -@node gnu -@subsection @acronym{GNU} and old @GNUTAR{} format +@noindent +Another way would be to avoid the @option{--gzip} (@option{--gunzip}, @option{--ungzip}, @option{-z}) option and run +@command{gzip} explicitly: -@cindex GNU archive format -@cindex Old GNU archive format -@GNUTAR{} was based on an early draft of the -@acronym{POSIX} 1003.1 @code{ustar} standard. @acronym{GNU} extensions to -@command{tar}, such as the support for file names longer than 100 -characters, use portions of the @command{tar} header record which were -specified in that @acronym{POSIX} draft as unused. Subsequent changes in -@acronym{POSIX} have allocated the same parts of the header record for -other purposes. As a result, @GNUTAR{} format is -incompatible with the current @acronym{POSIX} specification, and with -@command{tar} programs that follow it. +@smallexample +$ @kbd{tar cf - subdir | gzip --best -c - > archive.tar.gz} +@end smallexample -In the majority of cases, @command{tar} will be configured to create -this format by default. This will change in the future releases, since -we plan to make @samp{posix} format the default. +@cindex corrupted archives +About corrupted compressed archives: @command{gzip}'ed files have no +redundancy, for maximum compression. The adaptive nature of the +compression scheme means that the compression tables are implicitly +spread all over the archive. If you lose a few blocks, the dynamic +construction of the compression tables becomes unsynchronized, and there +is little chance that you could recover later in the archive. -To force creation a @GNUTAR{} archive, use option -@option{--format=gnu}. +There are pending suggestions for having a per-volume or per-file +compression in @GNUTAR{}. This would allow for viewing the +contents without decompression, and for resynchronizing decompression at +every volume or file, in case of corrupted archives. Doing so, we might +lose some compressibility. But this would have make recovering easier. +So, there are pros and cons. We'll see! -@node posix -@subsection @GNUTAR{} and @acronym{POSIX} @command{tar} +@opindex bzip2 +@item -j +@itemx --bzip2 +Filter the archive through @code{bzip2}. Otherwise like @option{--gzip}. -@cindex POSIX archive format -@cindex PAX archive format -Starting from version 1.14 @GNUTAR{} features full support for -@acronym{POSIX.1-2001} archives. +@opindex compress +@opindex uncompress +@item -Z +@itemx --compress +@itemx --uncompress +Filter the archive through @command{compress}. Otherwise like @option{--gzip}. -A @acronym{POSIX} conformant archive will be created if @command{tar} -was given @option{--format=posix} (@option{--format=pax}) option. No -special option is required to read and extract from a @acronym{POSIX} -archive. +The @acronym{GNU} Project recommends you not use +@command{compress}, because there is a patent covering the algorithm it +uses. You could be sued for patent infringement merely by running +@command{compress}. -@menu -* PAX keywords:: Controlling Extended Header Keywords. -@end menu +@opindex use-compress-program +@item --use-compress-program=@var{prog} +Use external compression program @var{prog}. Use this option if you +have a compression program that @GNUTAR{} does not support. There +are two requirements to which @var{prog} should comply: -@node PAX keywords -@subsubsection Controlling Extended Header Keywords +First, when called without options, it should read data from standard +input, compress it and output it on standard output. -@table @option -@opindex pax-option -@item --pax-option=@var{keyword-list} -Handle keywords in @acronym{PAX} extended headers. This option is -equivalent to @option{-o} option of the @command{pax} utility. +Secondly, if called with @option{-d} argument, it should do exactly +the opposite, i.e., read the compressed data from the standard input +and produce uncompressed data on the standard output. @end table -@var{Keyword-list} is a comma-separated -list of keyword options, each keyword option taking one of -the following forms: +@cindex gpg, using with tar +@cindex gnupg, using with tar +@cindex Using encrypted archives +The @option{--use-compress-program} option, in particular, lets you +implement your own filters, not necessarily dealing with +compression/decomression. For example, suppose you wish to implement +PGP encryption on top of compression, using @command{gpg} (@pxref{Top, +gpg, gpg ---- encryption and signing tool, gpg, GNU Privacy Guard +Manual}). The following script does that: -@table @code -@item delete=@var{pattern} -When used with one of archive-creation commands, -this option instructs @command{tar} to omit from extended header records -that it produces any keywords matching the string @var{pattern}. +@smallexample +@group +#! /bin/sh +case $1 in +-d) gpg --decrypt - | gzip -d -c;; +'') gzip -c | gpg -s ;; +*) echo "Unknown option $1">&2; exit 1;; +esac +@end group +@end smallexample -When used in extract or list mode, this option instructs tar -to ignore any keywords matching the given @var{pattern} in the extended -header records. In both cases, matching is performed using the pattern -matching notation described in @acronym{POSIX 1003.2}, 3.13 -(@pxref{wildcards}). For example: +Suppose you name it @file{gpgz} and save it somewhere in your +@env{PATH}. Then the following command will create a commpressed +archive signed with your private key: @smallexample ---pax-option delete=security.* +$ @kbd{tar -cf foo.tar.gpgz --use-compress=gpgz .} @end smallexample -would suppress security-related information. +@noindent +Likewise, the following command will list its contents: -@item exthdr.name=@var{string} +@smallexample +$ @kbd{tar -tf foo.tar.gpgz --use-compress=gpgz .} +@end smallexample -This keyword allows user control over the name that is written into the -ustar header blocks for the extended headers. The name is obtained -from @var{string} after making the following substitutions: +@ignore +The above is based on the following discussion: -@multitable @columnfractions .25 .55 -@headitem Meta-character @tab Replaced By -@item %d @tab The directory name of the file, equivalent to the -result of the @command{dirname} utility on the translated pathname. -@item %f @tab The filename of the file, equivalent to the result -of the @command{basename} utility on the translated pathname. -@item %p @tab The process ID of the @command{tar} process. -@item %% @tab A @samp{%} character. -@end multitable + I have one question, or maybe it's a suggestion if there isn't a way + to do it now. I would like to use @option{--gzip}, but I'd also like + the output to be fed through a program like @acronym{GNU} + @command{ecc} (actually, right now that's @samp{exactly} what I'd like + to use :-)), basically adding ECC protection on top of compression. + It seems as if this should be quite easy to do, but I can't work out + exactly how to go about it. Of course, I can pipe the standard output + of @command{tar} through @command{ecc}, but then I lose (though I + haven't started using it yet, I confess) the ability to have + @command{tar} use @command{rmt} for it's I/O (I think). -Any other @samp{%} characters in @var{string} produce undefined -results. + I think the most straightforward thing would be to let me specify a + general set of filters outboard of compression (preferably ordered, + so the order can be automatically reversed on input operations, and + with the options they require specifiable), but beggars shouldn't be + choosers and anything you decide on would be fine with me. -If no option @samp{exthdr.name=string} is specified, @command{tar} -will use the following default value: + By the way, I like @command{ecc} but if (as the comments say) it can't + deal with loss of block sync, I'm tempted to throw some time at adding + that capability. Supposing I were to actually do such a thing and + get it (apparently) working, do you accept contributed changes to + utilities like that? (Leigh Clayton @file{loc@@soliton.com}, May 1995). + + Isn't that exactly the role of the + @option{--use-compress-prog=@var{program}} option? + I never tried it myself, but I suspect you may want to write a + @var{prog} script or program able to filter stdin to stdout to + way you want. It should recognize the @option{-d} option, for when + extraction is needed rather than creation. -@smallexample -%d/PaxHeaders.%p/%f -@end smallexample + It has been reported that if one writes compressed data (through the + @option{--gzip} or @option{--compress} options) to a DLT and tries to use + the DLT compression mode, the data will actually get bigger and one will + end up with less space on the tape. +@end ignore -@item globexthdr.name=@var{string} -This keyword allows user control over the name that is written into -the ustar header blocks for global extended header records. The name -is obtained from the contents of @var{string}, after making -the following substitutions: +@node sparse +@subsection Archiving Sparse Files +@cindex Sparse Files -@multitable @columnfractions .25 .55 -@headitem Meta-character @tab Replaced By -@item %n @tab An integer that represents the -sequence number of the global extended header record in the archive, -starting at 1. -@item %p @tab The process ID of the @command{tar} process. -@item %% @tab A @samp{%} character. -@end multitable +Files in the file system occasionally have @dfn{holes}. A @dfn{hole} +in a file is a section of the file's contents which was never written. +The contents of a hole reads as all zeros. On many operating systems, +actual disk storage is not allocated for holes, but they are counted +in the length of the file. If you archive such a file, @command{tar} +could create an archive longer than the original. To have @command{tar} +attempt to recognize the holes in a file, use @option{--sparse} +(@option{-S}). When you use this option, then, for any file using +less disk space than would be expected from its length, @command{tar} +searches the file for consecutive stretches of zeros. It then records +in the archive for the file where the consecutive stretches of zeros +are, and only archives the ``real contents'' of the file. On +extraction (using @option{--sparse} is not needed on extraction) any +such files have holes created wherever the continuous stretches of zeros +were found. Thus, if you use @option{--sparse}, @command{tar} archives +won't take more space than the original. -Any other @samp{%} characters in @var{string} produce undefined results. +@table @option +@opindex sparse +@item -S +@itemx --sparse +This option istructs @command{tar} to test each file for sparseness +before attempting to archive it. If the file is found to be sparse it +is treated specially, thus allowing to decrease the amount of space +used by its image in the archive. -If no option @samp{globexthdr.name=string} is specified, @command{tar} -will use the following default value: +This option is meaningful only when creating or updating archives. It +has no effect on extraction. +@end table -@smallexample -$TMPDIR/GlobalHead.%p.%n -@end smallexample +Consider using @option{--sparse} when performing file system backups, +to avoid archiving the expanded forms of files stored sparsely in the +system. -@noindent -where @samp{$TMPDIR} represents the value of the @var{TMPDIR} -environment variable. If @var{TMPDIR} is not set, @command{tar} -uses @samp{/tmp}. +Even if your system has no sparse files currently, some may be +created in the future. If you use @option{--sparse} while making file +system backups as a matter of course, you can be assured the archive +will never take more space on the media than the files take on disk +(otherwise, archiving a disk filled with sparse files might take +hundreds of tapes). @xref{Incremental Dumps}. -@item @var{keyword}=@var{value} -When used with one of archive-creation commands, these keyword/value pairs -will be included at the beginning of the archive in a global extended -header record. When used with one of archive-reading commands, -@command{tar} will behave as if it has encountered these keyword/value -pairs at the beginning of the archive in a global extended header -record. +However, be aware that @option{--sparse} option presents a serious +drawback. Namely, in order to determine if the file is sparse +@command{tar} has to read it before trying to archive it, so in total +the file is read @strong{twice}. So, always bear in mind that the +time needed to process all files with this option is roughly twice +the time needed to archive them without it. +@FIXME{A technical note: -@item @var{keyword}:=@var{value} -When used with one of archive-creation commands, these keyword/value pairs -will be included as records at the beginning of an extended header for -each file. This is effectively equivalent to @var{keyword}=@var{value} -form except that it creates no global extended header records. +Programs like @command{dump} do not have to read the entire file; by +examining the file system directly, they can determine in advance +exactly where the holes are and thus avoid reading through them. The +only data it need read are the actual allocated data blocks. +@GNUTAR{} uses a more portable and straightforward +archiving approach, it would be fairly difficult that it does +otherwise. Elizabeth Zwicky writes to @file{comp.unix.internals}, on +1990-12-10: -When used with one of archive-reading commands, @command{tar} will -behave as if these keyword/value pairs were included as records at the -end of each extended header; thus, they will override any global or -file-specific extended header record keywords of the same names. -For example, in the command: +@quotation +What I did say is that you cannot tell the difference between a hole and an +equivalent number of nulls without reading raw blocks. @code{st_blocks} at +best tells you how many holes there are; it doesn't tell you @emph{where}. +Just as programs may, conceivably, care what @code{st_blocks} is (care +to name one that does?), they may also care where the holes are (I have +no examples of this one either, but it's equally imaginable). -@smallexample -tar --format=posix --create \ - --file archive --pax-option gname:=user . -@end smallexample +I conclude from this that good archivers are not portable. One can +arguably conclude that if you want a portable program, you can in good +conscience restore files with as many holes as possible, since you can't +get it right. +@end quotation +} -the group name will be forced to a new value for all files -stored in the archive. +@cindex sparse formats, defined +When using @samp{POSIX} archive format, @GNUTAR{} is able to store +sparse files using in three distinct ways, called @dfn{sparse +formats}. A sparse format is identified by its @dfn{number}, +consisting, as usual of two decimal numbers, delimited by a dot. By +default, format @samp{1.0} is used. If, for some reason, you wish to +use an earlier format, you can select it using +@option{--sparse-version} option. + +@table @option +@opindex sparse-version +@item --sparse-version=@var{version} + +Select the format to store sparse files in. Valid @var{version} values +are: @samp{0.0}, @samp{0.1} and @samp{1.0}. @xref{Sparse Formats}, +for a detailed description of each format. @end table -@node Checksumming -@subsection Checksumming Problems +Using @option{--sparse-format} option implies @option{--sparse}. -SunOS and HP-UX @command{tar} fail to accept archives created using -@GNUTAR{} and containing non-ASCII file names, that -is, file names having characters with the eight bit set, because they -use signed checksums, while @GNUTAR{} uses unsigned -checksums while creating archives, as per @acronym{POSIX} standards. On -reading, @GNUTAR{} computes both checksums and -accept any. It is somewhat worrying that a lot of people may go -around doing backup of their files using faulty (or at least -non-standard) software, not learning about it until it's time to -restore their missing files with an incompatible file extractor, or -vice versa. +@node Attributes +@section Handling File Attributes +@UNREVISED -@GNUTAR{} compute checksums both ways, and accept -any on read, so @acronym{GNU} tar can read Sun tapes even with their -wrong checksums. @GNUTAR{} produces the standard -checksum, however, raising incompatibilities with Sun. That is to -say, @GNUTAR{} has not been modified to -@emph{produce} incorrect archives to be read by buggy @command{tar}'s. -I've been told that more recent Sun @command{tar} now read standard -archives, so maybe Sun did a similar patch, after all? +When @command{tar} reads files, it updates their access times. To +avoid this, use the @option{--atime-preserve[=METHOD]} option, which can either +reset the access time retroactively or avoid changing it in the first +place. -The story seems to be that when Sun first imported @command{tar} -sources on their system, they recompiled it without realizing that -the checksums were computed differently, because of a change in -the default signing of @code{char}'s in their compiler. So they -started computing checksums wrongly. When they later realized their -mistake, they merely decided to stay compatible with it, and with -themselves afterwards. Presumably, but I do not really know, HP-UX -has chosen that their @command{tar} archives to be compatible with Sun's. -The current standards do not favor Sun @command{tar} format. In any -case, it now falls on the shoulders of SunOS and HP-UX users to get -a @command{tar} able to read the good archives they receive. +Handling of file attributes -@node Large or Negative Values -@subsection Large or Negative Values -@cindex large values -@cindex future time stamps -@cindex negative time stamps -@UNREVISED{} +@table @option +@opindex atime-preserve +@item --atime-preserve +@itemx --atime-preserve=replace +@itemx --atime-preserve=system +Preserve the access times of files that are read. This works only for +files that you own, unless you have superuser privileges. -The above sections suggest to use @samp{oldest possible} archive -format if in doubt. However, sometimes it is not possible. If you -attempt to archive a file whose metadata cannot be represented using -required format, @GNUTAR{} will print error message and ignore such a -file. You will than have to switch to a format that is able to -handle such values. The format summary table (@pxref{Formats}) will -help you to do so. +@option{--atime-preserve=replace} works on most systems, but it also +restores the data modification time and updates the status change +time. Hence it doesn't interact with incremental dumps nicely +(@pxref{Incremental Dumps}), and it can set access or data modification times +incorrectly if other programs access the file while @command{tar} is +running. -In particular, when trying to archive files larger than 8GB or with -timestamps not in the range 1970-01-01 00:00:00 through 2242-03-16 -12:56:31 @sc{utc}, you will have to chose between @acronym{GNU} and -@acronym{POSIX} archive formats. When considering which format to -choose, bear in mind that the @acronym{GNU} format uses -two's-complement base-256 notation to store values that do not fit -into standard @acronym{ustar} range. Such archives can generally be -read only by a @GNUTAR{} implementation. Moreover, they sometimes -cannot be correctly restored on another hosts even by @GNUTAR{}. For -example, using two's complement representation for negative time -stamps that assumes a signed 32-bit @code{time_t} generates archives -that are not portable to hosts with differing @code{time_t} -representations. +@option{--atime-preserve=system} avoids changing the access time in +the first place, if the operating system supports this. +Unfortunately, this may or may not work on any given operating system +or file system. If @command{tar} knows for sure it won't work, it +complains right away. -On the other hand, @acronym{POSIX} archives, generally speaking, can -be extracted by any tar implementation that understands older -@acronym{ustar} format. The only exception are files larger than 8GB. +Currently @option{--atime-preserve} with no operand defaults to +@option{--atime-preserve=replace}, but this is intended to change to +@option{--atime-preserve=system} when the latter is better-supported. -@FIXME{Describe how @acronym{POSIX} archives are extracted by non -POSIX-aware tars.} +@opindex touch +@item -m +@itemx --touch +Do not extract data modification time. -@node Other Tars -@subsection How to Extract GNU-Specific Data Using Other @command{tar} Implementations +When this option is used, @command{tar} leaves the data modification times +of the files it extracts as the times when the files were extracted, +instead of setting it to the times recorded in the archive. -In previous sections you became acquainted with various quircks -necessary to make your archives portable. Sometimes you may need to -extract archives containing GNU-specific members using some -third-party @command{tar} implementation or an older version of -@GNUTAR{}. Of course your best bet is to have @GNUTAR{} installed, -but if it is for some reason impossible, this section will explain -how to cope without it. +This option is meaningless with @option{--list} (@option{-t}). -When we speak about @dfn{GNU-specific} members we mean two classes of -them: members split between the volumes of a multi-volume archive and -sparse members. You will be able to always recover such members if -the archive is in PAX format. In addition split members can be -recovered from archives in old GNU format. The following subsections -describe the required procedures in detail. +@opindex same-owner +@item --same-owner +Create extracted files with the same ownership they have in the +archive. -@menu -* Split Recovery:: Members Split Between Volumes -* Sparse Recovery:: Sparse Members -@end menu +This is the default behavior for the superuser, +so this option is meaningful only for non-root users, when @command{tar} +is executed on those systems able to give files away. This is +considered as a security flaw by many people, at least because it +makes quite difficult to correctly account users for the disk space +they occupy. Also, the @code{suid} or @code{sgid} attributes of +files are easily and silently lost when files are given away. -@node Split Recovery -@subsubsection Extracting Members Split Between Volumes +When writing an archive, @command{tar} writes the user id and user name +separately. If it can't find a user name (because the user id is not +in @file{/etc/passwd}), then it does not write one. When restoring, +it tries to look the name (if one was written) up in +@file{/etc/passwd}. If it fails, then it uses the user id stored in +the archive instead. -If a member is split between several volumes of an old GNU format archive -most third party @command{tar} implementation will fail to extract -it. To extract it, use @command{tarcat} program (@pxref{Tarcat}). -This program is available from -@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/tarcat, @GNUTAR{} -home page}. It concatenates several archive volumes into a single -valid archive. For example, if you have three volumes named from -@file{vol-1.tar} to @file{vol-2.tar}, you can do the following to -extract them using a third-party @command{tar}: +@opindex no-same-owner +@item --no-same-owner +@itemx -o +Do not attempt to restore ownership when extracting. This is the +default behavior for ordinary users, so this option has an effect +only for the superuser. -@smallexample -$ @kbd{tarcat vol-1.tar vol-2.tar vol-3.tar | tar xf -} -@end smallexample +@opindex numeric-owner +@item --numeric-owner +The @option{--numeric-owner} option allows (ANSI) archives to be written +without user/group name information or such information to be ignored +when extracting. It effectively disables the generation and/or use +of user/group name information. This option forces extraction using +the numeric ids from the archive, ignoring the names. -You could use this approach for many (although not all) PAX -format archives as well. However, extracting split members from a PAX -archive is a much easier task, because PAX volumes are constructed in -such a way that each part of a split member is extracted as a -different file by @command{tar} implementations that are not aware of -GNU extensions. More specifically, the very first part retains its -original name, and all subsequent parts are named using the pattern: +This is useful in certain circumstances, when restoring a backup from +an emergency floppy with different passwd/group files for example. +It is otherwise impossible to extract files with the right ownerships +if the password file in use during the extraction does not match the +one belonging to the file system(s) being extracted. This occurs, +for example, if you are restoring your files after a major crash and +had booted from an emergency floppy with no password file or put your +disk into another machine to do the restore. -@smallexample -%d/GNUFileParts.%p/%f.%n -@end smallexample +The numeric ids are @emph{always} saved into @command{tar} archives. +The identifying names are added at create time when provided by the +system, unless @option{--old-archive} (@option{-o}) is used. Numeric ids could be +used when moving archives between a collection of machines using +a centralized management for attribution of numeric ids to users +and groups. This is often made through using the NIS capabilities. -@noindent -where symbols preceeded by @samp{%} are @dfn{macro characters} that -have the following meaning: +When making a @command{tar} file for distribution to other sites, it +is sometimes cleaner to use a single owner for all files in the +distribution, and nicer to specify the write permission bits of the +files as stored in the archive independently of their actual value on +the file system. The way to prepare a clean distribution is usually +to have some Makefile rule creating a directory, copying all needed +files in that directory, then setting ownership and permissions as +wanted (there are a lot of possible schemes), and only then making a +@command{tar} archive out of this directory, before cleaning +everything out. Of course, we could add a lot of options to +@GNUTAR{} for fine tuning permissions and ownership. +This is not the good way, I think. @GNUTAR{} is +already crowded with options and moreover, the approach just explained +gives you a great deal of control already. -@multitable @columnfractions .25 .55 -@headitem Meta-character @tab Replaced By -@item %d @tab The directory name of the file, equivalent to the -result of the @command{dirname} utility on its full name. -@item %f @tab The file name of the file, equivalent to the result -of the @command{basename} utility on its full name. -@item %p @tab The process ID of the @command{tar} process that -created the archive. -@item %n @tab Ordinal number of this particular part. -@end multitable +@xopindex{same-permissions, short description} +@xopindex{preserve-permissions, short description} +@item -p +@itemx --same-permissions +@itemx --preserve-permissions +Extract all protection information. -For example, if, a file @file{var/longfile} was split during archive -creation between three volumes, and the creator @command{tar} process -had process ID @samp{27962}, then the member names will be: +This option causes @command{tar} to set the modes (access permissions) of +extracted files exactly as recorded in the archive. If this option +is not used, the current @code{umask} setting limits the permissions +on extracted files. This option is by default enabled when +@command{tar} is executed by a superuser. -@smallexample -var/longfile -var/GNUFileParts.27962/longfile.1 -var/GNUFileParts.27962/longfile.2 -@end smallexample -When you extract your archive using a third-party @command{tar}, these -files will be created on your disk, and the only thing you will need -to do to restore your file in its original form is concatenate them in -the proper order, for example: +This option is meaningless with @option{--list} (@option{-t}). -@smallexample -@group -$ @kbd{cd var} -$ @kbd{cat GNUFileParts.27962/longfile.1 \ - GNUFileParts.27962/longfile.2 >> longfile} -$ rm -f GNUFileParts.27962 -@end group -@end smallexample +@opindex preserve +@item --preserve +Same as both @option{--same-permissions} and @option{--same-order}. -Notice, that if the @command{tar} implementation you use supports PAX -format archives, it will probably emit warnings about unknown keywords -during extraction. They will lool like this: +The @option{--preserve} option has no equivalent short option name. +It is equivalent to @option{--same-permissions} plus @option{--same-order}. -@smallexample -@group -Tar file too small -Unknown extended header keyword 'GNU.volume.filename' ignored. -Unknown extended header keyword 'GNU.volume.size' ignored. -Unknown extended header keyword 'GNU.volume.offset' ignored. -@end group -@end smallexample +@FIXME{I do not see the purpose of such an option. (Neither I. FP.) +Neither do I. --Sergey} -@noindent -You can safely ignore these warnings. +@end table -If your @command{tar} implementation is not PAX-aware, you will get -more warnigns and more files generated on your disk, e.g.: +@node Portability +@section Making @command{tar} Archives More Portable -@smallexample -@group -$ @kbd{tar xf vol-1.tar} -var/PaxHeaders.27962/longfile: Unknown file type 'x', extracted as -normal file -Unexpected EOF in archive -$ @kbd{tar xf vol-2.tar} -tmp/GlobalHead.27962.1: Unknown file type 'g', extracted as normal file -GNUFileParts.27962/PaxHeaders.27962/sparsefile.1: Unknown file type -'x', extracted as normal file -@end group -@end smallexample +Creating a @command{tar} archive on a particular system that is meant to be +useful later on many other machines and with other versions of @command{tar} +is more challenging than you might think. @command{tar} archive formats +have been evolving since the first versions of Unix. Many such formats +are around, and are not always compatible with each other. This section +discusses a few problems, and gives some advice about making @command{tar} +archives more portable. -Ignore these warnings. The @file{PaxHeaders.*} directories created -will contain files with @dfn{extended header keywords} describing the -extracted files. You can delete them, unless they describe sparse -members. Read further to learn more about them. +One golden rule is simplicity. For example, limit your @command{tar} +archives to contain only regular files and directories, avoiding +other kind of special files. Do not attempt to save sparse files or +contiguous files as such. Let's discuss a few more problems, in turn. -@node Sparse Recovery -@subsubsection Extracting Sparse Members +@FIXME{Discuss GNU extensions (incremental backups, multi-volume +archives and archive labels) in GNU and PAX formats.} -Any @command{tar} implementation will be able to extract sparse members from a -PAX archive. However, the extracted files will be @dfn{condensed}, -i.e. any zero blocks will be removed from them. When we restore such -a condensed file to its original form, by adding zero bloks (or -@dfn{holes}) back to their original locations, we call this process -@dfn{expanding} a compressed sparse file. +@menu +* Portable Names:: Portable Names +* dereference:: Symbolic Links +* old:: Old V7 Archives +* ustar:: Ustar Archives +* gnu:: GNU and old GNU format archives. +* posix:: @acronym{POSIX} archives +* Checksumming:: Checksumming Problems +* Large or Negative Values:: Large files, negative time stamps, etc. +* Other Tars:: How to Extract GNU-Specific Data Using + Other @command{tar} Implementations +@end menu -To expand a file, you will need a simple auxiliary program called -@command{xsparse}. It is available in source form from -@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/xsparse, @GNUTAR{} -home page}. +@node Portable Names +@subsection Portable Names -Let's begin with archive members in @dfn{sparse format -version 1.0}@footnote{@xref{PAX 1}.}, which are the easiest to expand. -The condensed file will contain both file map and file data, so no -additional data will be needed to restore it. If the original file -name was @file{@var{dir}/@var{name}}, then the condensed file will be -named @file{@var{dir}/@/GNUSparseFile.@var{n}/@/@var{name}}, where -@var{n} is a decimal number@footnote{technically speaking, @var{n} is a -@dfn{process ID} of the @command{tar} process which created the -archive (@pxref{PAX keywords}).}. +Use portable file and member names. A name is portable if it contains +only ASCII letters and digits, @samp{/}, @samp{.}, @samp{_}, and +@samp{-}; it cannot be empty, start with @samp{-} or @samp{//}, or +contain @samp{/-}. Avoid deep directory nesting. For portability to +old Unix hosts, limit your file name components to 14 characters or +less. -To expand a version 1.0 file, run @command{xsparse} as follows: +If you intend to have your @command{tar} archives to be read under +MSDOS, you should not rely on case distinction for file names, and you +might use the @acronym{GNU} @command{doschk} program for helping you +further diagnosing illegal MSDOS names, which are even more limited +than System V's. -@smallexample -$ @kbd{xsparse @file{cond-file}} -@end smallexample +@node dereference +@subsection Symbolic Links +@cindex File names, using symbolic links +@cindex Symbolic link as file name -@noindent -where @file{cond-file} is the name of the condensed file. The utility -will deduce the name for the resulting expanded file using the -following algorithm: +@opindex dereference +Normally, when @command{tar} archives a symbolic link, it writes a +block to the archive naming the target of the link. In that way, the +@command{tar} archive is a faithful record of the file system contents. +@option{--dereference} (@option{-h}) is used with @option{--create} (@option{-c}), and causes +@command{tar} to archive the files symbolic links point to, instead of +the links themselves. When this option is used, when @command{tar} +encounters a symbolic link, it will archive the linked-to file, +instead of simply recording the presence of a symbolic link. -@enumerate 1 -@item If @file{cond-file} does not contain any directories, -@file{../cond-file} will be used; +The name under which the file is stored in the file system is not +recorded in the archive. To record both the symbolic link name and +the file name in the system, archive the file under both names. If +all links were recorded automatically by @command{tar}, an extracted file +might be linked to a file name that no longer exists in the file +system. -@item If @file{cond-file} has the form -@file{@var{dir}/@var{t}/@var{name}}, where both @var{t} and @var{name} -are simple names, with no @samp{/} characters in them, the output file -name will be @file{@var{dir}/@var{name}}. +If a linked-to file is encountered again by @command{tar} while creating +the same archive, an entire second copy of it will be stored. (This +@emph{might} be considered a bug.) -@item Otherwise, if @file{cond-file} has the form -@file{@var{dir}/@var{name}}, the output file name will be -@file{@var{name}}. -@end enumerate +So, for portable archives, do not archive symbolic links as such, +and use @option{--dereference} (@option{-h}): many systems do not support +symbolic links, and moreover, your distribution might be unusable if +it contains unresolved symbolic links. -In the unlikely case when this algorithm does not suite your needs, -you can explicitely specify output file name as a second argument to -the command: +@node old +@subsection Old V7 Archives +@cindex Format, old style +@cindex Old style format +@cindex Old style archives +@cindex v7 archive format -@smallexample -$ @kbd{xsparse @file{cond-file}} -@end smallexample +Certain old versions of @command{tar} cannot handle additional +information recorded by newer @command{tar} programs. To create an +archive in V7 format (not ANSI), which can be read by these old +versions, specify the @option{--format=v7} option in +conjunction with the @option{--create} (@option{-c}) (@command{tar} also +accepts @option{--portability} or @option{--old-archive} for this +option). When you specify it, +@command{tar} leaves out information about directories, pipes, fifos, +contiguous files, and device files, and specifies file ownership by +group and user IDs instead of group and user names. -It is often a good idea to run @command{xsparse} in @dfn{dry run} mode -first. In this mode, the command does not actually expand the file, -but verbosely lists all actions it would be taking to do so. The dry -run mode is enabled by @option{-n} command line argument: +When updating an archive, do not use @option{--format=v7} +unless the archive was created using this option. -@smallexample -@group -$ @kbd{xsparse -n /home/gray/GNUSparseFile.6058/sparsefile} -Reading v.1.0 sparse map -Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to -`/home/gray/sparsefile' -Finished dry run -@end group -@end smallexample +In most cases, a @emph{new} format archive can be read by an @emph{old} +@command{tar} program without serious trouble, so this option should +seldom be needed. On the other hand, most modern @command{tar}s are +able to read old format archives, so it might be safer for you to +always use @option{--format=v7} for your distributions. Notice, +however, that @samp{ustar} format is a better alternative, as it is +free from many of @samp{v7}'s drawbacks. -To actually expand the file, you would run: +@node ustar +@subsection Ustar Archive Format -@smallexample -$ @kbd{xsparse /home/gray/GNUSparseFile.6058/sparsefile} -@end smallexample +@cindex ustar archive format +Archive format defined by @acronym{POSIX}.1-1988 specification is called +@code{ustar}. Although it is more flexible than the V7 format, it +still has many restrictions (@xref{Formats,ustar}, for the detailed +description of @code{ustar} format). Along with V7 format, +@code{ustar} format is a good choice for archives intended to be read +with other implementations of @command{tar}. -@noindent -The program behaves the same way all UNIX utilities do: it will keep -quiet unless it has simething important to tell you (e.g. an error -condition or something). If you wish it to produce verbose output, -similar to that from the dry run mode, give it @option{-v} option: +To create archive in @code{ustar} format, use @option{--format=ustar} +option in conjunction with the @option{--create} (@option{-c}). -@smallexample -@group -$ @kbd{xsparse -v /home/gray/GNUSparseFile.6058/sparsefile} -Reading v.1.0 sparse map -Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to -`/home/gray/sparsefile' -Done -@end group -@end smallexample +@node gnu +@subsection @acronym{GNU} and old @GNUTAR{} format -Additionally, if your @command{tar} implementation has extracted the -@dfn{extended headers} for this file, you can instruct @command{xstar} -to use them in order to verify the integrity of the expanded file. -The option @option{-x} sets the name of the extended header file to -use. Continuing our example: +@cindex GNU archive format +@cindex Old GNU archive format +@GNUTAR{} was based on an early draft of the +@acronym{POSIX} 1003.1 @code{ustar} standard. @acronym{GNU} extensions to +@command{tar}, such as the support for file names longer than 100 +characters, use portions of the @command{tar} header record which were +specified in that @acronym{POSIX} draft as unused. Subsequent changes in +@acronym{POSIX} have allocated the same parts of the header record for +other purposes. As a result, @GNUTAR{} format is +incompatible with the current @acronym{POSIX} specification, and with +@command{tar} programs that follow it. -@smallexample -@group -$ @kbd{xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \ - /home/gray/GNUSparseFile.6058/sparsefile} -Reading extended header file -Found variable GNU.sparse.major = 1 -Found variable GNU.sparse.minor = 0 -Found variable GNU.sparse.name = sparsefile -Found variable GNU.sparse.realsize = 217481216 -Reading v.1.0 sparse map -Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to -`/home/gray/sparsefile' -Done -@end group -@end smallexample +In the majority of cases, @command{tar} will be configured to create +this format by default. This will change in the future releases, since +we plan to make @samp{POSIX} format the default. -An @dfn{extended header} is a special @command{tar} archive header -that precedes an archive member and contains a set of -@dfn{variables}, describing the member properties that cannot be -stored in the standard @code{ustar} header. While optional for -expanding sparse version 1.0 members, use of extended headers is -mandatory when expanding sparse members in older sparse formats: v.0.0 -and v.0.1 (The sparse formats are described in detail in @pxref{Sparse -Formats}). So, for this format, the question is: how to obtain -extended headers from the archive? +To force creation a @GNUTAR{} archive, use option +@option{--format=gnu}. -If you use a @command{tar} implementation that does not support PAX -format, extended headers for each member will be extracted as a -separate file. If we represent the member name as -@file{@var{dir}/@var{name}}, then the extended header file will be -named @file{@var{dir}/@/PaxHeaders.@var{n}/@/@var{name}}, where -@var{n} is an integer number. +@node posix +@subsection @GNUTAR{} and @acronym{POSIX} @command{tar} -Things become more difficult if your @command{tar} implementation -does support PAX headers, because in this case you will have to -manually extract the headers. We recommend the following algorithm: +@cindex POSIX archive format +@cindex PAX archive format +Starting from version 1.14 @GNUTAR{} features full support for +@acronym{POSIX.1-2001} archives. -@enumerate 1 -@item -Consult the documentation for your @command{tar} implementation for an -option that will print @dfn{block numbers} along with the archive -listing (analogous to @GNUTAR{}'s @option{-R} option). For example, -@command{star} has @option{-block-number}. +A @acronym{POSIX} conformant archive will be created if @command{tar} +was given @option{--format=posix} (@option{--format=pax}) option. No +special option is required to read and extract from a @acronym{POSIX} +archive. -@item -Obtain the verbose listing using the @samp{block number} option, and -find the position of the sparse member in question and the member -immediately following it. For example, running @command{star} on our -archive we obtain: +@menu +* PAX keywords:: Controlling Extended Header Keywords. +@end menu + +@node PAX keywords +@subsubsection Controlling Extended Header Keywords + +@table @option +@opindex pax-option +@item --pax-option=@var{keyword-list} +Handle keywords in @acronym{PAX} extended headers. This option is +equivalent to @option{-o} option of the @command{pax} utility. +@end table + +@var{Keyword-list} is a comma-separated +list of keyword options, each keyword option taking one of +the following forms: + +@table @code +@item delete=@var{pattern} +When used with one of archive-creation commands, +this option instructs @command{tar} to omit from extended header records +that it produces any keywords matching the string @var{pattern}. + +When used in extract or list mode, this option instructs tar +to ignore any keywords matching the given @var{pattern} in the extended +header records. In both cases, matching is performed using the pattern +matching notation described in @acronym{POSIX 1003.2}, 3.13 +(@pxref{wildcards}). For example: @smallexample -@group -$ @kbd{star -t -v -block-number -f arc.tar} -@dots{} -star: Unknown extended header keyword 'GNU.sparse.size' ignored. -star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored. -star: Unknown extended header keyword 'GNU.sparse.name' ignored. -star: Unknown extended header keyword 'GNU.sparse.map' ignored. -block 56: 425984 -rw-r--r-- gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile -block 897: 65391 -rw-r--r-- gray/users Jun 24 20:06 2006 README -@dots{} -@end group +--pax-option delete=security.* @end smallexample -@noindent -(as usual, ignore the warnings about unknown keywords.) +would suppress security-related information. + +@item exthdr.name=@var{string} + +This keyword allows user control over the name that is written into the +ustar header blocks for the extended headers. The name is obtained +from @var{string} after making the following substitutions: + +@multitable @columnfractions .25 .55 +@headitem Meta-character @tab Replaced By +@item %d @tab The directory name of the file, equivalent to the +result of the @command{dirname} utility on the translated pathname. +@item %f @tab The filename of the file, equivalent to the result +of the @command{basename} utility on the translated pathname. +@item %p @tab The process ID of the @command{tar} process. +@item %% @tab A @samp{%} character. +@end multitable -@item -Let the size of the sparse member be @var{size}, its block number be -@var{Bs} and the block number of the next member be @var{Bn}. -Compute: +Any other @samp{%} characters in @var{string} produce undefined +results. + +If no option @samp{exthdr.name=string} is specified, @command{tar} +will use the following default value: @smallexample -@var{N} = @var{Bs} - @var{Bn} - @var{size}/512 - 2 +%d/PaxHeaders.%p/%f @end smallexample -@noindent -This number gives the size of the extended header part in tar @dfn{blocks}. -In our example, this formula gives: @code{897 - 56 - 425984 / 512 - 2 -= 7}. +@item globexthdr.name=@var{string} +This keyword allows user control over the name that is written into +the ustar header blocks for global extended header records. The name +is obtained from the contents of @var{string}, after making +the following substitutions: -@item -Use @command{dd} to extract the headers: +@multitable @columnfractions .25 .55 +@headitem Meta-character @tab Replaced By +@item %n @tab An integer that represents the +sequence number of the global extended header record in the archive, +starting at 1. +@item %p @tab The process ID of the @command{tar} process. +@item %% @tab A @samp{%} character. +@end multitable + +Any other @samp{%} characters in @var{string} produce undefined results. + +If no option @samp{globexthdr.name=string} is specified, @command{tar} +will use the following default value: @smallexample -@kbd{dd if=@var{archive} of=@var{hname} bs=512 skip=@var{Bs} count=@var{N}} +$TMPDIR/GlobalHead.%p.%n @end smallexample @noindent -where @var{archive} is the archive name, @var{hname} is a name of the -file to store the extended header in, @var{Bs} and @var{N} are -computed in previous steps. +where @samp{$TMPDIR} represents the value of the @var{TMPDIR} +environment variable. If @var{TMPDIR} is not set, @command{tar} +uses @samp{/tmp}. -In our example, this command will be +@item @var{keyword}=@var{value} +When used with one of archive-creation commands, these keyword/value pairs +will be included at the beginning of the archive in a global extended +header record. When used with one of archive-reading commands, +@command{tar} will behave as if it has encountered these keyword/value +pairs at the beginning of the archive in a global extended header +record. -@smallexample -$ @kbd{dd if=arc.tar of=xhdr bs=512 skip=56 count=7} -@end smallexample -@end enumerate +@item @var{keyword}:=@var{value} +When used with one of archive-creation commands, these keyword/value pairs +will be included as records at the beginning of an extended header for +each file. This is effectively equivalent to @var{keyword}=@var{value} +form except that it creates no global extended header records. -Finally, you can expand the condensed file, using the obtained header: +When used with one of archive-reading commands, @command{tar} will +behave as if these keyword/value pairs were included as records at the +end of each extended header; thus, they will override any global or +file-specific extended header record keywords of the same names. +For example, in the command: @smallexample -@group -$ @kbd{xsparse -v -x xhdr GNUSparseFile.6058/sparsefile} -Reading extended header file -Found variable GNU.sparse.size = 217481216 -Found variable GNU.sparse.numblocks = 208 -Found variable GNU.sparse.name = sparsefile -Found variable GNU.sparse.map = 0,2048,1050624,2048,@dots{} -Expanding file `GNUSparseFile.28124/sparsefile' to `sparsefile' -Done -@end group +tar --format=posix --create \ + --file archive --pax-option gname:=user . @end smallexample -@node Compression -@section Using Less Space through Compression +the group name will be forced to a new value for all files +stored in the archive. +@end table -@menu -* gzip:: Creating and Reading Compressed Archives -* sparse:: Archiving Sparse Files -@end menu +@node Checksumming +@subsection Checksumming Problems -@node gzip -@subsection Creating and Reading Compressed Archives -@cindex Compressed archives -@cindex Storing archives in compressed format +SunOS and HP-UX @command{tar} fail to accept archives created using +@GNUTAR{} and containing non-ASCII file names, that +is, file names having characters with the eight bit set, because they +use signed checksums, while @GNUTAR{} uses unsigned +checksums while creating archives, as per @acronym{POSIX} standards. On +reading, @GNUTAR{} computes both checksums and +accept any. It is somewhat worrying that a lot of people may go +around doing backup of their files using faulty (or at least +non-standard) software, not learning about it until it's time to +restore their missing files with an incompatible file extractor, or +vice versa. -@GNUTAR{} is able to create and read compressed archives. It supports -@command{gzip} and @command{bzip2} compression programs. For backward -compatibilty, it also supports @command{compress} command, although -we strongly recommend against using it, since there is a patent -covering the algorithm it uses and you could be sued for patent -infringement merely by running @command{compress}! Besides, it is less -effective than @command{gzip} and @command{bzip2}. +@GNUTAR{} compute checksums both ways, and accept +any on read, so @acronym{GNU} tar can read Sun tapes even with their +wrong checksums. @GNUTAR{} produces the standard +checksum, however, raising incompatibilities with Sun. That is to +say, @GNUTAR{} has not been modified to +@emph{produce} incorrect archives to be read by buggy @command{tar}'s. +I've been told that more recent Sun @command{tar} now read standard +archives, so maybe Sun did a similar patch, after all? -Creating a compressed archive is simple: you just specify a -@dfn{compression option} along with the usual archive creation -commands. The compression option is @option{-z} (@option{--gzip}) to -create a @command{gzip} compressed archive, @option{-j} -(@option{--bzip2}) to create a @command{bzip2} compressed archive, and -@option{-Z} (@option{--compress}) to use @command{compress} program. -For example: +The story seems to be that when Sun first imported @command{tar} +sources on their system, they recompiled it without realizing that +the checksums were computed differently, because of a change in +the default signing of @code{char}'s in their compiler. So they +started computing checksums wrongly. When they later realized their +mistake, they merely decided to stay compatible with it, and with +themselves afterwards. Presumably, but I do not really know, HP-UX +has chosen that their @command{tar} archives to be compatible with Sun's. +The current standards do not favor Sun @command{tar} format. In any +case, it now falls on the shoulders of SunOS and HP-UX users to get +a @command{tar} able to read the good archives they receive. -@smallexample -$ @kbd{tar cfz archive.tar.gz .} -@end smallexample +@node Large or Negative Values +@subsection Large or Negative Values +@cindex large values +@cindex future time stamps +@cindex negative time stamps +@UNREVISED{} -Reading compressed archive is even simpler: you don't need to specify -any additional options as @GNUTAR{} recognizes its format -automatically. Thus, the following commands will list and extract the -archive created in previous example: +The above sections suggest to use @samp{oldest possible} archive +format if in doubt. However, sometimes it is not possible. If you +attempt to archive a file whose metadata cannot be represented using +required format, @GNUTAR{} will print error message and ignore such a +file. You will than have to switch to a format that is able to +handle such values. The format summary table (@pxref{Formats}) will +help you to do so. -@smallexample -# List the compressed archive -$ @kbd{tar tf archive.tar.gz} -# Extract the compressed archive -$ @kbd{tar xf archive.tar.gz} -@end smallexample +In particular, when trying to archive files larger than 8GB or with +timestamps not in the range 1970-01-01 00:00:00 through 2242-03-16 +12:56:31 @sc{utc}, you will have to chose between @acronym{GNU} and +@acronym{POSIX} archive formats. When considering which format to +choose, bear in mind that the @acronym{GNU} format uses +two's-complement base-256 notation to store values that do not fit +into standard @acronym{ustar} range. Such archives can generally be +read only by a @GNUTAR{} implementation. Moreover, they sometimes +cannot be correctly restored on another hosts even by @GNUTAR{}. For +example, using two's complement representation for negative time +stamps that assumes a signed 32-bit @code{time_t} generates archives +that are not portable to hosts with differing @code{time_t} +representations. -The only case when you have to specify a decompression option while -reading the archive is when reading from a pipe or from a tape drive -that does not support random access. However, in this case @GNUTAR{} -will indicate which option you should use. For example: +On the other hand, @acronym{POSIX} archives, generally speaking, can +be extracted by any tar implementation that understands older +@acronym{ustar} format. The only exception are files larger than 8GB. -@smallexample -$ @kbd{cat archive.tar.gz | tar tf -} -tar: Archive is compressed. Use -z option -tar: Error is not recoverable: exiting now -@end smallexample +@FIXME{Describe how @acronym{POSIX} archives are extracted by non +POSIX-aware tars.} -If you see such diagnostics, just add the suggested option to the -invocation of @GNUTAR{}: +@node Other Tars +@subsection How to Extract GNU-Specific Data Using Other @command{tar} Implementations -@smallexample -$ @kbd{cat archive.tar.gz | tar tfz -} -@end smallexample +In previous sections you became acquainted with various quircks +necessary to make your archives portable. Sometimes you may need to +extract archives containing GNU-specific members using some +third-party @command{tar} implementation or an older version of +@GNUTAR{}. Of course your best bet is to have @GNUTAR{} installed, +but if it is for some reason impossible, this section will explain +how to cope without it. -Notice also, that there are several restrictions on operations on -compressed archives. First of all, compressed archives cannot be -modified, i.e., you cannot update (@option{--update} (@option{-u})) them or delete -(@option{--delete}) members from them. Likewise, you cannot append -another @command{tar} archive to a compressed archive using -@option{--append} (@option{-r})). Secondly, multi-volume archives cannot be -compressed. +When we speak about @dfn{GNU-specific} members we mean two classes of +them: members split between the volumes of a multi-volume archive and +sparse members. You will be able to always recover such members if +the archive is in PAX format. In addition split members can be +recovered from archives in old GNU format. The following subsections +describe the required procedures in detail. -The following table summarizes compression options used by @GNUTAR{}. +@menu +* Split Recovery:: Members Split Between Volumes +* Sparse Recovery:: Sparse Members +@end menu -@table @option -@opindex gzip -@opindex ungzip -@item -z -@itemx --gzip -@itemx --ungzip -Filter the archive through @command{gzip}. +@node Split Recovery +@subsubsection Extracting Members Split Between Volumes -You can use @option{--gzip} and @option{--gunzip} on physical devices -(tape drives, etc.) and remote files as well as on normal files; data -to or from such devices or remote files is reblocked by another copy -of the @command{tar} program to enforce the specified (or default) record -size. The default compression parameters are used; if you need to -override them, set @env{GZIP} environment variable, e.g.: +If a member is split between several volumes of an old GNU format archive +most third party @command{tar} implementation will fail to extract +it. To extract it, use @command{tarcat} program (@pxref{Tarcat}). +This program is available from +@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/tarcat, @GNUTAR{} +home page}. It concatenates several archive volumes into a single +valid archive. For example, if you have three volumes named from +@file{vol-1.tar} to @file{vol-2.tar}, you can do the following to +extract them using a third-party @command{tar}: @smallexample -$ @kbd{GZIP=--best tar cfz archive.tar.gz subdir} +$ @kbd{tarcat vol-1.tar vol-2.tar vol-3.tar | tar xf -} @end smallexample -@noindent -Another way would be to avoid the @option{--gzip} (@option{--gunzip}, @option{--ungzip}, @option{-z}) option and run -@command{gzip} explicitly: +You could use this approach for many (although not all) PAX +format archives as well. However, extracting split members from a PAX +archive is a much easier task, because PAX volumes are constructed in +such a way that each part of a split member is extracted as a +different file by @command{tar} implementations that are not aware of +GNU extensions. More specifically, the very first part retains its +original name, and all subsequent parts are named using the pattern: @smallexample -$ @kbd{tar cf - subdir | gzip --best -c - > archive.tar.gz} +%d/GNUFileParts.%p/%f.%n @end smallexample -@cindex corrupted archives -About corrupted compressed archives: @command{gzip}'ed files have no -redundancy, for maximum compression. The adaptive nature of the -compression scheme means that the compression tables are implicitly -spread all over the archive. If you lose a few blocks, the dynamic -construction of the compression tables becomes unsynchronized, and there -is little chance that you could recover later in the archive. - -There are pending suggestions for having a per-volume or per-file -compression in @GNUTAR{}. This would allow for viewing the -contents without decompression, and for resynchronizing decompression at -every volume or file, in case of corrupted archives. Doing so, we might -lose some compressibility. But this would have make recovering easier. -So, there are pros and cons. We'll see! - -@opindex bzip2 -@item -j -@itemx --bzip2 -Filter the archive through @code{bzip2}. Otherwise like @option{--gzip}. - -@opindex compress -@opindex uncompress -@item -Z -@itemx --compress -@itemx --uncompress -Filter the archive through @command{compress}. Otherwise like @option{--gzip}. - -The @acronym{GNU} Project recommends you not use -@command{compress}, because there is a patent covering the algorithm it -uses. You could be sued for patent infringement merely by running -@command{compress}. +@noindent +where symbols preceeded by @samp{%} are @dfn{macro characters} that +have the following meaning: -@opindex use-compress-program -@item --use-compress-program=@var{prog} -Use external compression program @var{prog}. Use this option if you -have a compression program that @GNUTAR{} does not support. There -are two requirements to which @var{prog} should comply: +@multitable @columnfractions .25 .55 +@headitem Meta-character @tab Replaced By +@item %d @tab The directory name of the file, equivalent to the +result of the @command{dirname} utility on its full name. +@item %f @tab The file name of the file, equivalent to the result +of the @command{basename} utility on its full name. +@item %p @tab The process ID of the @command{tar} process that +created the archive. +@item %n @tab Ordinal number of this particular part. +@end multitable -First, when called without options, it should read data from standard -input, compress it and output it on standard output. +For example, if, a file @file{var/longfile} was split during archive +creation between three volumes, and the creator @command{tar} process +had process ID @samp{27962}, then the member names will be: -Secondly, if called with @option{-d} argument, it should do exactly -the opposite, i.e., read the compressed data from the standard input -and produce uncompressed data on the standard output. -@end table +@smallexample +var/longfile +var/GNUFileParts.27962/longfile.1 +var/GNUFileParts.27962/longfile.2 +@end smallexample -@cindex gpg, using with tar -@cindex gnupg, using with tar -@cindex Using encrypted archives -The @option{--use-compress-program} option, in particular, lets you -implement your own filters, not necessarily dealing with -compression/decomression. For example, suppose you wish to implement -PGP encryption on top of compression, using @command{gpg} (@pxref{Top, -gpg, gpg ---- encryption and signing tool, gpg, GNU Privacy Guard -Manual}). The following script does that: +When you extract your archive using a third-party @command{tar}, these +files will be created on your disk, and the only thing you will need +to do to restore your file in its original form is concatenate them in +the proper order, for example: @smallexample @group -#! /bin/sh -case $1 in --d) gpg --decrypt - | gzip -d -c;; -'') gzip -c | gpg -s ;; -*) echo "Unknown option $1">&2; exit 1;; -esac +$ @kbd{cd var} +$ @kbd{cat GNUFileParts.27962/longfile.1 \ + GNUFileParts.27962/longfile.2 >> longfile} +$ rm -f GNUFileParts.27962 @end group @end smallexample -Suppose you name it @file{gpgz} and save it somewhere in your -@env{PATH}. Then the following command will create a commpressed -archive signed with your private key: +Notice, that if the @command{tar} implementation you use supports PAX +format archives, it will probably emit warnings about unknown keywords +during extraction. They will lool like this: @smallexample -$ @kbd{tar -cf foo.tar.gpgz --use-compress=gpgz .} +@group +Tar file too small +Unknown extended header keyword 'GNU.volume.filename' ignored. +Unknown extended header keyword 'GNU.volume.size' ignored. +Unknown extended header keyword 'GNU.volume.offset' ignored. +@end group @end smallexample @noindent -Likewise, the following command will list its contents: +You can safely ignore these warnings. + +If your @command{tar} implementation is not PAX-aware, you will get +more warnigns and more files generated on your disk, e.g.: @smallexample -$ @kbd{tar -tf foo.tar.gpgz --use-compress=gpgz .} +@group +$ @kbd{tar xf vol-1.tar} +var/PaxHeaders.27962/longfile: Unknown file type 'x', extracted as +normal file +Unexpected EOF in archive +$ @kbd{tar xf vol-2.tar} +tmp/GlobalHead.27962.1: Unknown file type 'g', extracted as normal file +GNUFileParts.27962/PaxHeaders.27962/sparsefile.1: Unknown file type +'x', extracted as normal file +@end group @end smallexample -@ignore -The above is based on the following discussion: - - I have one question, or maybe it's a suggestion if there isn't a way - to do it now. I would like to use @option{--gzip}, but I'd also like - the output to be fed through a program like @acronym{GNU} - @command{ecc} (actually, right now that's @samp{exactly} what I'd like - to use :-)), basically adding ECC protection on top of compression. - It seems as if this should be quite easy to do, but I can't work out - exactly how to go about it. Of course, I can pipe the standard output - of @command{tar} through @command{ecc}, but then I lose (though I - haven't started using it yet, I confess) the ability to have - @command{tar} use @command{rmt} for it's I/O (I think). - - I think the most straightforward thing would be to let me specify a - general set of filters outboard of compression (preferably ordered, - so the order can be automatically reversed on input operations, and - with the options they require specifiable), but beggars shouldn't be - choosers and anything you decide on would be fine with me. - - By the way, I like @command{ecc} but if (as the comments say) it can't - deal with loss of block sync, I'm tempted to throw some time at adding - that capability. Supposing I were to actually do such a thing and - get it (apparently) working, do you accept contributed changes to - utilities like that? (Leigh Clayton @file{loc@@soliton.com}, May 1995). - - Isn't that exactly the role of the - @option{--use-compress-prog=@var{program}} option? - I never tried it myself, but I suspect you may want to write a - @var{prog} script or program able to filter stdin to stdout to - way you want. It should recognize the @option{-d} option, for when - extraction is needed rather than creation. - - It has been reported that if one writes compressed data (through the - @option{--gzip} or @option{--compress} options) to a DLT and tries to use - the DLT compression mode, the data will actually get bigger and one will - end up with less space on the tape. -@end ignore - -@node sparse -@subsection Archiving Sparse Files -@cindex Sparse Files -@UNREVISED - -@table @option -@opindex sparse -@item -S -@itemx --sparse -Handle sparse files efficiently. -@end table - -This option causes all files to be put in the archive to be tested for -sparseness, and handled specially if they are. The @option{--sparse} -(@option{-S}) option is useful when many @code{dbm} files, for example, are being -backed up. Using this option dramatically decreases the amount of -space needed to store such a file. - -In later versions, this option may be removed, and the testing and -treatment of sparse files may be done automatically with any special -@acronym{GNU} options. For now, it is an option needing to be specified on -the command line with the creation or updating of an archive. +Ignore these warnings. The @file{PaxHeaders.*} directories created +will contain files with @dfn{extended header keywords} describing the +extracted files. You can delete them, unless they describe sparse +members. Read further to learn more about them. -Files in the file system occasionally have @dfn{holes}. A @dfn{hole} in a file -is a section of the file's contents which was never written. The -contents of a hole read as all zeros. On many operating systems, -actual disk storage is not allocated for holes, but they are counted -in the length of the file. If you archive such a file, @command{tar} -could create an archive longer than the original. To have @command{tar} -attempt to recognize the holes in a file, use @option{--sparse} (@option{-S}). When -you use this option, then, for any file using less disk space than -would be expected from its length, @command{tar} searches the file for -consecutive stretches of zeros. It then records in the archive for -the file where the consecutive stretches of zeros are, and only -archives the ``real contents'' of the file. On extraction (using -@option{--sparse} is not needed on extraction) any such -files have holes created wherever the continuous stretches of zeros -were found. Thus, if you use @option{--sparse}, @command{tar} archives -won't take more space than the original. +@node Sparse Recovery +@subsubsection Extracting Sparse Members -A file is sparse if it contains blocks of zeros whose existence is -recorded, but that have no space allocated on disk. When you specify -the @option{--sparse} option in conjunction with the @option{--create} -(@option{-c}) operation, @command{tar} tests all files for sparseness -while archiving. If @command{tar} finds a file to be sparse, it uses a -sparse representation of the file in the archive. @xref{create}, for -more information about creating archives. +Any @command{tar} implementation will be able to extract sparse members from a +PAX archive. However, the extracted files will be @dfn{condensed}, +i.e. any zero blocks will be removed from them. When we restore such +a condensed file to its original form, by adding zero bloks (or +@dfn{holes}) back to their original locations, we call this process +@dfn{expanding} a compressed sparse file. -@option{--sparse} is useful when archiving files, such as dbm files, -likely to contain many nulls. This option dramatically -decreases the amount of space needed to store such an archive. +To expand a file, you will need a simple auxiliary program called +@command{xsparse}. It is available in source form from +@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/xsparse, @GNUTAR{} +home page}. -@quotation -@strong{Please Note:} Always use @option{--sparse} when performing file -system backups, to avoid archiving the expanded forms of files stored -sparsely in the system. +Let's begin with archive members in @dfn{sparse format +version 1.0}@footnote{@xref{PAX 1}.}, which are the easiest to expand. +The condensed file will contain both file map and file data, so no +additional data will be needed to restore it. If the original file +name was @file{@var{dir}/@var{name}}, then the condensed file will be +named @file{@var{dir}/@/GNUSparseFile.@var{n}/@/@var{name}}, where +@var{n} is a decimal number@footnote{technically speaking, @var{n} is a +@dfn{process ID} of the @command{tar} process which created the +archive (@pxref{PAX keywords}).}. -Even if your system has no sparse files currently, some may be -created in the future. If you use @option{--sparse} while making file -system backups as a matter of course, you can be assured the archive -will never take more space on the media than the files take on disk -(otherwise, archiving a disk filled with sparse files might take -hundreds of tapes). @xref{Incremental Dumps}. -@end quotation +To expand a version 1.0 file, run @command{xsparse} as follows: -@command{tar} ignores the @option{--sparse} option when reading an archive. +@smallexample +$ @kbd{xsparse @file{cond-file}} +@end smallexample -@table @option -@item --sparse -@itemx -S -Files stored sparsely in the file system are represented sparsely in -the archive. Use in conjunction with write operations. -@end table +@noindent +where @file{cond-file} is the name of the condensed file. The utility +will deduce the name for the resulting expanded file using the +following algorithm: -However, users should be well aware that at archive creation time, -@GNUTAR{} still has to read whole disk file to -locate the @dfn{holes}, and so, even if sparse files use little space -on disk and in the archive, they may sometimes require inordinate -amount of time for reading and examining all-zero blocks of a file. -Although it works, it's painfully slow for a large (sparse) file, even -though the resulting tar archive may be small. (One user reports that -dumping a @file{core} file of over 400 megabytes, but with only about -3 megabytes of actual data, took about 9 minutes on a Sun Sparcstation -ELC, with full CPU utilization.) - -This reading is required in all cases and is not related to the fact -the @option{--sparse} option is used or not, so by merely @emph{not} -using the option, you are not saving time@footnote{Well! We should say -the whole truth, here. When @option{--sparse} is selected while creating -an archive, the current @command{tar} algorithm requires sparse files to be -read twice, not once. We hope to develop a new archive format for saving -sparse files in which one pass will be sufficient.}. +@enumerate 1 +@item If @file{cond-file} does not contain any directories, +@file{../cond-file} will be used; -Programs like @command{dump} do not have to read the entire file; by -examining the file system directly, they can determine in advance -exactly where the holes are and thus avoid reading through them. The -only data it need read are the actual allocated data blocks. -@GNUTAR{} uses a more portable and straightforward -archiving approach, it would be fairly difficult that it does -otherwise. Elizabeth Zwicky writes to @file{comp.unix.internals}, on -1990-12-10: +@item If @file{cond-file} has the form +@file{@var{dir}/@var{t}/@var{name}}, where both @var{t} and @var{name} +are simple names, with no @samp{/} characters in them, the output file +name will be @file{@var{dir}/@var{name}}. -@quotation -What I did say is that you cannot tell the difference between a hole and an -equivalent number of nulls without reading raw blocks. @code{st_blocks} at -best tells you how many holes there are; it doesn't tell you @emph{where}. -Just as programs may, conceivably, care what @code{st_blocks} is (care -to name one that does?), they may also care where the holes are (I have -no examples of this one either, but it's equally imaginable). +@item Otherwise, if @file{cond-file} has the form +@file{@var{dir}/@var{name}}, the output file name will be +@file{@var{name}}. +@end enumerate -I conclude from this that good archivers are not portable. One can -arguably conclude that if you want a portable program, you can in good -conscience restore files with as many holes as possible, since you can't -get it right. -@end quotation +In the unlikely case when this algorithm does not suite your needs, +you can explicitely specify output file name as a second argument to +the command: -@node Attributes -@section Handling File Attributes -@UNREVISED +@smallexample +$ @kbd{xsparse @file{cond-file}} +@end smallexample -When @command{tar} reads files, it updates their access times. To -avoid this, use the @option{--atime-preserve[=METHOD]} option, which can either -reset the access time retroactively or avoid changing it in the first -place. +It is often a good idea to run @command{xsparse} in @dfn{dry run} mode +first. In this mode, the command does not actually expand the file, +but verbosely lists all actions it would be taking to do so. The dry +run mode is enabled by @option{-n} command line argument: -Handling of file attributes +@smallexample +@group +$ @kbd{xsparse -n /home/gray/GNUSparseFile.6058/sparsefile} +Reading v.1.0 sparse map +Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to +`/home/gray/sparsefile' +Finished dry run +@end group +@end smallexample -@table @option -@opindex atime-preserve -@item --atime-preserve -@itemx --atime-preserve=replace -@itemx --atime-preserve=system -Preserve the access times of files that are read. This works only for -files that you own, unless you have superuser privileges. +To actually expand the file, you would run: -@option{--atime-preserve=replace} works on most systems, but it also -restores the data modification time and updates the status change -time. Hence it doesn't interact with incremental dumps nicely -(@pxref{Backups}), and it can set access or data modification times -incorrectly if other programs access the file while @command{tar} is -running. +@smallexample +$ @kbd{xsparse /home/gray/GNUSparseFile.6058/sparsefile} +@end smallexample -@option{--atime-preserve=system} avoids changing the access time in -the first place, if the operating system supports this. -Unfortunately, this may or may not work on any given operating system -or file system. If @command{tar} knows for sure it won't work, it -complains right away. +@noindent +The program behaves the same way all UNIX utilities do: it will keep +quiet unless it has simething important to tell you (e.g. an error +condition or something). If you wish it to produce verbose output, +similar to that from the dry run mode, give it @option{-v} option: -Currently @option{--atime-preserve} with no operand defaults to -@option{--atime-preserve=replace}, but this is intended to change to -@option{--atime-preserve=system} when the latter is better-supported. +@smallexample +@group +$ @kbd{xsparse -v /home/gray/GNUSparseFile.6058/sparsefile} +Reading v.1.0 sparse map +Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to +`/home/gray/sparsefile' +Done +@end group +@end smallexample -@opindex touch -@item -m -@itemx --touch -Do not extract data modification time. +Additionally, if your @command{tar} implementation has extracted the +@dfn{extended headers} for this file, you can instruct @command{xstar} +to use them in order to verify the integrity of the expanded file. +The option @option{-x} sets the name of the extended header file to +use. Continuing our example: -When this option is used, @command{tar} leaves the data modification times -of the files it extracts as the times when the files were extracted, -instead of setting it to the times recorded in the archive. +@smallexample +@group +$ @kbd{xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \ + /home/gray/GNUSparseFile.6058/sparsefile} +Reading extended header file +Found variable GNU.sparse.major = 1 +Found variable GNU.sparse.minor = 0 +Found variable GNU.sparse.name = sparsefile +Found variable GNU.sparse.realsize = 217481216 +Reading v.1.0 sparse map +Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to +`/home/gray/sparsefile' +Done +@end group +@end smallexample -This option is meaningless with @option{--list} (@option{-t}). +An @dfn{extended header} is a special @command{tar} archive header +that precedes an archive member and contains a set of +@dfn{variables}, describing the member properties that cannot be +stored in the standard @code{ustar} header. While optional for +expanding sparse version 1.0 members, use of extended headers is +mandatory when expanding sparse members in older sparse formats: v.0.0 +and v.0.1 (The sparse formats are described in detail in @pxref{Sparse +Formats}). So, for this format, the question is: how to obtain +extended headers from the archive? -@opindex same-owner -@item --same-owner -Create extracted files with the same ownership they have in the -archive. +If you use a @command{tar} implementation that does not support PAX +format, extended headers for each member will be extracted as a +separate file. If we represent the member name as +@file{@var{dir}/@var{name}}, then the extended header file will be +named @file{@var{dir}/@/PaxHeaders.@var{n}/@/@var{name}}, where +@var{n} is an integer number. -This is the default behavior for the superuser, -so this option is meaningful only for non-root users, when @command{tar} -is executed on those systems able to give files away. This is -considered as a security flaw by many people, at least because it -makes quite difficult to correctly account users for the disk space -they occupy. Also, the @code{suid} or @code{sgid} attributes of -files are easily and silently lost when files are given away. +Things become more difficult if your @command{tar} implementation +does support PAX headers, because in this case you will have to +manually extract the headers. We recommend the following algorithm: -When writing an archive, @command{tar} writes the user id and user name -separately. If it can't find a user name (because the user id is not -in @file{/etc/passwd}), then it does not write one. When restoring, -it tries to look the name (if one was written) up in -@file{/etc/passwd}. If it fails, then it uses the user id stored in -the archive instead. +@enumerate 1 +@item +Consult the documentation for your @command{tar} implementation for an +option that will print @dfn{block numbers} along with the archive +listing (analogous to @GNUTAR{}'s @option{-R} option). For example, +@command{star} has @option{-block-number}. -@opindex no-same-owner -@item --no-same-owner -@itemx -o -Do not attempt to restore ownership when extracting. This is the -default behavior for ordinary users, so this option has an effect -only for the superuser. +@item +Obtain the verbose listing using the @samp{block number} option, and +find the position of the sparse member in question and the member +immediately following it. For example, running @command{star} on our +archive we obtain: -@opindex numeric-owner -@item --numeric-owner -The @option{--numeric-owner} option allows (ANSI) archives to be written -without user/group name information or such information to be ignored -when extracting. It effectively disables the generation and/or use -of user/group name information. This option forces extraction using -the numeric ids from the archive, ignoring the names. +@smallexample +@group +$ @kbd{star -t -v -block-number -f arc.tar} +@dots{} +star: Unknown extended header keyword 'GNU.sparse.size' ignored. +star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored. +star: Unknown extended header keyword 'GNU.sparse.name' ignored. +star: Unknown extended header keyword 'GNU.sparse.map' ignored. +block 56: 425984 -rw-r--r-- gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile +block 897: 65391 -rw-r--r-- gray/users Jun 24 20:06 2006 README +@dots{} +@end group +@end smallexample -This is useful in certain circumstances, when restoring a backup from -an emergency floppy with different passwd/group files for example. -It is otherwise impossible to extract files with the right ownerships -if the password file in use during the extraction does not match the -one belonging to the file system(s) being extracted. This occurs, -for example, if you are restoring your files after a major crash and -had booted from an emergency floppy with no password file or put your -disk into another machine to do the restore. +@noindent +(as usual, ignore the warnings about unknown keywords.) -The numeric ids are @emph{always} saved into @command{tar} archives. -The identifying names are added at create time when provided by the -system, unless @option{--old-archive} (@option{-o}) is used. Numeric ids could be -used when moving archives between a collection of machines using -a centralized management for attribution of numeric ids to users -and groups. This is often made through using the NIS capabilities. +@item +Let @var{size} be the size of the sparse member, @var{Bs} be its block number +and @var{Bn} be the block number of the next member. +Compute: -When making a @command{tar} file for distribution to other sites, it -is sometimes cleaner to use a single owner for all files in the -distribution, and nicer to specify the write permission bits of the -files as stored in the archive independently of their actual value on -the file system. The way to prepare a clean distribution is usually -to have some Makefile rule creating a directory, copying all needed -files in that directory, then setting ownership and permissions as -wanted (there are a lot of possible schemes), and only then making a -@command{tar} archive out of this directory, before cleaning -everything out. Of course, we could add a lot of options to -@GNUTAR{} for fine tuning permissions and ownership. -This is not the good way, I think. @GNUTAR{} is -already crowded with options and moreover, the approach just explained -gives you a great deal of control already. +@smallexample +@var{N} = @var{Bs} - @var{Bn} - @var{size}/512 - 2 +@end smallexample -@xopindex{same-permissions, short description} -@xopindex{preserve-permissions, short description} -@item -p -@itemx --same-permissions -@itemx --preserve-permissions -Extract all protection information. +@noindent +This number gives the size of the extended header part in tar @dfn{blocks}. +In our example, this formula gives: @code{897 - 56 - 425984 / 512 - 2 += 7}. -This option causes @command{tar} to set the modes (access permissions) of -extracted files exactly as recorded in the archive. If this option -is not used, the current @code{umask} setting limits the permissions -on extracted files. This option is by default enabled when -@command{tar} is executed by a superuser. +@item +Use @command{dd} to extract the headers: +@smallexample +@kbd{dd if=@var{archive} of=@var{hname} bs=512 skip=@var{Bs} count=@var{N}} +@end smallexample -This option is meaningless with @option{--list} (@option{-t}). +@noindent +where @var{archive} is the archive name, @var{hname} is a name of the +file to store the extended header in, @var{Bs} and @var{N} are +computed in previous steps. -@opindex preserve -@item --preserve -Same as both @option{--same-permissions} and @option{--same-order}. +In our example, this command will be -The @option{--preserve} option has no equivalent short option name. -It is equivalent to @option{--same-permissions} plus @option{--same-order}. +@smallexample +$ @kbd{dd if=arc.tar of=xhdr bs=512 skip=56 count=7} +@end smallexample +@end enumerate -@FIXME{I do not see the purpose of such an option. (Neither I. FP.) -Neither do I. --Sergey} +Finally, you can expand the condensed file, using the obtained header: -@end table +@smallexample +@group +$ @kbd{xsparse -v -x xhdr GNUSparseFile.6058/sparsefile} +Reading extended header file +Found variable GNU.sparse.size = 217481216 +Found variable GNU.sparse.numblocks = 208 +Found variable GNU.sparse.name = sparsefile +Found variable GNU.sparse.map = 0,2048,1050624,2048,@dots{} +Expanding file `GNUSparseFile.28124/sparsefile' to `sparsefile' +Done +@end group +@end smallexample @node cpio @section Comparison of @command{tar} and @command{cpio} -- 2.44.0