From 39e5d9182c02b0a5204d406794640ef6e71bdcb8 Mon Sep 17 00:00:00 2001
From: Sergey Poznyakoff <gray@gnu.org.ua>
Date: Sun, 25 Jun 2006 12:45:03 +0000
Subject: [PATCH] (Other Tars): New node describing how to extract GNU-specific
 member formats using third-party tars.

---
 doc/tar.texi | 375 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 364 insertions(+), 11 deletions(-)

diff --git a/doc/tar.texi b/doc/tar.texi
index d39c2de..1d1131d 100644
--- a/doc/tar.texi
+++ b/doc/tar.texi
@@ -109,8 +109,8 @@ Appendices
 
 * Changes::
 * Configuring Help Summary::
-* Genfile::
 * Tar Internals::
+* Genfile::
 * Free Software Needs Free Documentation::
 * Copying This Manual::
 * Index of Command Line Options::
@@ -330,11 +330,18 @@ Making @command{tar} Archives More Portable
 * posix::                       @acronym{POSIX} archives
 * Checksumming::                Checksumming Problems
 * Large or Negative Values::    Large files, negative time stamps, etc.
+* Other Tars::                  How to Extract GNU-Specific Data Using
+                                Other @command{tar} Implementations
 
 @GNUTAR{} and @acronym{POSIX} @command{tar}
 
 * PAX keywords:: Controlling Extended Header Keywords.
 
+How to Extract GNU-Specific Data Using Other @command{tar} Implementations
+
+* Split Recovery::       Members Split Between Volumes
+* Sparse Recovery::      Sparse Members
+
 Using Less Space through Compression
 
 * gzip::                        Creating and Reading Compressed Archives
@@ -369,12 +376,6 @@ Using Multiple Tapes
 * Tarcat::                      Concatenate Volumes into a Single Archive
 
 
-Genfile
-
-* Generate Mode::     File Generation Mode.
-* Status Mode::       File Status Mode.
-* Exec Mode::         Synchronous Execution mode.
-
 Tar Internals
 
 * Standard::           Basic Tar Format
@@ -389,6 +390,12 @@ Storing Sparse Files
 * PAX 0::                PAX Format, Versions 0.0 and 0.1
 * PAX 1::                PAX Format, Version 1.0
 
+Genfile
+
+* Generate Mode::     File Generation Mode.
+* Status Mode::       File Status Mode.
+* Exec Mode::         Synchronous Execution mode.
+
 Copying This Manual
 
 * GNU Free Documentation License::  License for copying this manual
@@ -7753,6 +7760,8 @@ archives and archive labels) in GNU and PAX formats.}
 * posix::                       @acronym{POSIX} archives
 * Checksumming::                Checksumming Problems
 * Large or Negative Values::    Large files, negative time stamps, etc.
+* Other Tars::                  How to Extract GNU-Specific Data Using
+                                Other @command{tar} Implementations
 @end menu
 
 @node Portable Names
@@ -8070,6 +8079,350 @@ be extracted by any tar implementation that understands older
 @FIXME{Describe how @acronym{POSIX} archives are extracted by non
 POSIX-aware tars.}
 
+@node Other Tars
+@subsection How to Extract GNU-Specific Data Using Other @command{tar} Implementations
+
+In previous sections you became acquainted with various quircks
+necessary to make your archives portable.  Sometimes you may need to
+extract archives containing GNU-specific members using some
+third-party @command{tar} implementation or an older version of
+@GNUTAR{}.  Of course your best bet is to have @GNUTAR{} installed,
+but if it is for some reason impossible, this section will explain
+how to cope without it. 
+
+When we speak about @dfn{GNU-specific} members we mean two classes of
+them: members split between the volumes of a multi-volume archive and
+sparse members.  You will be able to always recover such members if
+the archive is in PAX format.  In addition split members can be
+recovered from archives in old GNU format.  The following subsections
+describe the required procedures in detail.
+
+@menu
+* Split Recovery::       Members Split Between Volumes
+* Sparse Recovery::      Sparse Members
+@end menu
+
+@node Split Recovery
+@subsubsection Extracting Members Split Between Volumes
+
+If a member is split between several volumes of an old GNU format archive
+most third party @command{tar} implementation will fail to extract
+it.  To extract it, use @command{tarcat} program (@pxref{Tarcat}).
+This program is available from
+@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/tarcat, @GNUTAR{}
+home page}.  It concatenates several archive volumes into a single
+valid archive.  For example, if you have three volumes named from
+@file{vol-1.tar} to @file{vol-2.tar}, you can do the following to
+extract them using a third-party @command{tar}:
+
+@smallexample
+$ @kbd{tarcat vol-1.tar vol-2.tar vol-3.tar | tar xf -}
+@end smallexample
+
+You could use this approach for many (although not all) PAX
+format archives as well.  However, extracting split members from a PAX
+archive is a much easier task, because PAX volumes are constructed in
+such a way that each part of a split member is extracted as a
+different file by @command{tar} implementations that are not aware of
+GNU extensions.  More specifically, the very first part retains its
+original name, and all subsequent parts are named using the pattern:
+
+@smallexample
+%d/GNUFileParts.%p/%f.%n
+@end smallexample
+
+@noindent
+where symbols preceeded by @samp{%} are @dfn{macro characters} that
+have the following meaning:
+
+@multitable @columnfractions .25 .55
+@headitem Meta-character @tab Replaced By
+@item %d @tab  The directory name of the file, equivalent to the
+result of the @command{dirname} utility on its full name.
+@item %f @tab  The file name of the file, equivalent to the result
+of the @command{basename} utility on its full name.
+@item %p @tab  The process ID of the @command{tar} process that
+created the archive.
+@item %n @tab  Ordinal number of this particular part.
+@end multitable
+
+For example, if, a file @file{var/longfile} was split during archive
+creation between three volumes, and the creator @command{tar} process
+had process ID @samp{27962}, then the member names will be:
+
+@smallexample
+var/longfile
+var/GNUFileParts.27962/longfile.1
+var/GNUFileParts.27962/longfile.2
+@end smallexample
+
+When you extract your archive using a third-party @command{tar}, these
+files will be created on your disk, and the only thing you will need
+to do to restore your file in its original form is concatenate them in
+the proper order, for example:
+
+@smallexample
+@group
+$ @kbd{cd var}
+$ @kbd{cat GNUFileParts.27962/longfile.1 \
+  GNUFileParts.27962/longfile.2 >> longfile}
+$ rm -f GNUFileParts.27962
+@end group
+@end smallexample
+
+Notice, that if the @command{tar} implementation you use supports PAX
+format archives, it will probably emit warnings about unknown keywords
+during extraction.  They will lool like this:
+
+@smallexample
+@group
+Tar file too small
+Unknown extended header keyword 'GNU.volume.filename' ignored.
+Unknown extended header keyword 'GNU.volume.size' ignored.
+Unknown extended header keyword 'GNU.volume.offset' ignored.
+@end group
+@end smallexample
+
+@noindent
+You can safely ignore these warnings.
+
+If your @command{tar} implementation is not PAX-aware, you will get
+more warnigns and more files generated on your disk, e.g.:
+
+@smallexample
+@group
+$ @kbd{tar xf vol-1.tar}
+var/PaxHeaders.27962/longfile: Unknown file type 'x', extracted as
+normal file 
+Unexpected EOF in archive
+$ @kbd{tar xf vol-2.tar}
+tmp/GlobalHead.27962.1: Unknown file type 'g', extracted as normal file
+GNUFileParts.27962/PaxHeaders.27962/sparsefile.1: Unknown file type
+'x', extracted as normal file
+@end group
+@end smallexample
+
+Ignore these warnings.  The @file{PaxHeaders.*} directories created
+will contain files with @dfn{extended header keywords} describing the
+extracted files.  You can delete them, unless they describe sparse
+members.  Read further to learn more about them.
+
+@node Sparse Recovery
+@subsubsection Extracting Sparse Members
+
+Any @command{tar} implementation will be able to extract sparse members from a
+PAX archive.  However, the extracted files will be @dfn{condensed},
+i.e. any zero blocks will be removed from them.  When we restore such
+a condensed file to its original form, by adding zero bloks (or
+@dfn{holes}) back to their original locations, we call this process
+@dfn{expanding} a compressed sparse file.
+
+To expand a file, you will need a simple auxiliary program called
+@command{xsparse}.  It is available in source form from
+@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/xsparse, @GNUTAR{}
+home page}.
+
+Let's begin with archive members in @dfn{sparse format
+version 1.0}@footnote{@xref{PAX 1}.}, which are the easiest to expand.
+The condensed file will contain both file map and file data, so no
+additional data will be needed to restore it.  If the original file
+name was @file{@var{dir}/@var{name}}, then the condensed file will be
+named @file{@var{dir}/@/GNUSparseFile.@var{n}/@/@var{name}}, where 
+@var{n} is a decimal number@footnote{technically speaking, @var{n} is a
+@dfn{process ID} of the @command{tar} process which created the
+archive (@pxref{PAX keywords}).}.
+
+To expand a version 1.0 file, run @command{xsparse} as follows:
+
+@smallexample
+$ @kbd{xsparse @file{cond-file}}
+@end smallexample
+
+@noindent
+where @file{cond-file} is the name of the condensed file.  The utility
+will deduce the name for the resulting expanded file using the
+following algorithm:
+
+@enumerate 1
+@item If @file{cond-file} does not contain any directories,
+@file{../cond-file} will be used;
+
+@item If @file{cond-file} has the form
+@file{@var{dir}/@var{t}/@var{name}}, where both @var{t} and @var{name}
+are simple names, with no @samp{/} characters in them, the output file
+name will be @file{@var{dir}/@var{name}}.
+
+@item Otherwise, if @file{cond-file} has the form
+@file{@var{dir}/@var{name}}, the output file name will be
+@file{@var{name}}.
+@end enumerate
+
+In the unlikely case when this algorithm does not suite your needs,
+you can explicitely specify output file name as a second argument to
+the command:
+
+@smallexample
+$ @kbd{xsparse @file{cond-file}}
+@end smallexample
+
+It is often a good idea to run @command{xsparse} in @dfn{dry run} mode
+first.  In this mode, the command does not actually expand the file,
+but verbosely lists all actions it would be taking to do so.  The dry
+run mode is enabled by @option{-n} command line argument:
+
+@smallexample
+@group
+$ @kbd{xsparse -n /home/gray/GNUSparseFile.6058/sparsefile}
+Reading v.1.0 sparse map
+Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
+`/home/gray/sparsefile'
+Finished dry run
+@end group
+@end smallexample
+
+To actually expand the file, you would run:
+
+@smallexample
+$ @kbd{xsparse /home/gray/GNUSparseFile.6058/sparsefile}
+@end smallexample
+
+@noindent
+The program behaves the same way all UNIX utilities do: it will keep
+quiet unless it has simething important to tell you (e.g. an error
+condition or something).  If you wish it to produce verbose output,
+similar to that from the dry run mode, give it @option{-v} option:
+
+@smallexample
+@group
+$ @kbd{xsparse -v /home/gray/GNUSparseFile.6058/sparsefile}
+Reading v.1.0 sparse map
+Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
+`/home/gray/sparsefile'
+Done
+@end group
+@end smallexample
+
+Additionally, if your @command{tar} implementation has extracted the
+@dfn{extended headers} for this file, you can instruct @command{xstar}
+to use them in order to verify the integrity of the expanded file.
+The option @option{-x} sets the name of the extended header file to
+use.  Continuing our example:
+
+@smallexample
+@group
+$ @kbd{xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \
+  /home/gray/GNUSparseFile.6058/sparsefile}
+Reading extended header file
+Found variable GNU.sparse.major = 1
+Found variable GNU.sparse.minor = 0
+Found variable GNU.sparse.name = sparsefile
+Found variable GNU.sparse.realsize = 217481216
+Reading v.1.0 sparse map
+Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
+`/home/gray/sparsefile'
+Done
+@end group
+@end smallexample
+
+An @dfn{extended header} is a special @command{tar} archive header
+that precedes an archive member and contains a set of
+@dfn{variables}, describing the member properties that cannot be
+stored in the standard @code{ustar} header.  While optional for
+expanding sparse version 1.0 members, use of extended headers is
+mandatory when expanding sparse members in older sparse formats: v.0.0
+and v.0.1 (The sparse formats are described in detail in @pxref{Sparse
+Formats}).  So, for this format, the question is: how to obtain
+extended headers from the archive? 
+
+If you use a @command{tar} implementation that does not support PAX
+format, extended headers for each member will be extracted as a 
+separate file.  If we represent the member name as
+@file{@var{dir}/@var{name}}, then the extended header file will be
+named @file{@var{dir}/@/PaxHeaders.@var{n}/@/@var{name}}, where
+@var{n} is an integer number.
+
+Things become more difficult if your @command{tar} implementation
+does support PAX headers, because in this case you will have to
+manually extract the headers.  We recommend the following algorithm:
+
+@enumerate 1
+@item 
+Consult the documentation for your @command{tar} implementation for an
+option that will print @dfn{block numbers} along with the archive
+listing (analogous to @GNUTAR{}'s @option{-R} option).  For example,
+@command{star} has @option{-block-number}.
+
+@item
+Obtain the verbose listing using the @samp{block number} option, and
+find the position of the sparse member in question and the member
+immediately following it.  For example, running @command{star} on our
+archive we obtain:
+
+@smallexample
+@group
+$ @kbd{star -t -v -block-number -f arc.tar}
+@dots{}
+star: Unknown extended header keyword 'GNU.sparse.size' ignored.
+star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
+star: Unknown extended header keyword 'GNU.sparse.name' ignored.
+star: Unknown extended header keyword 'GNU.sparse.map' ignored.
+block        56:  425984 -rw-r--r--  gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
+block       897:   65391 -rw-r--r--  gray/users Jun 24 20:06 2006 README
+@dots{}
+@end group
+@end smallexample
+
+@noindent
+(as usual, ignore the warnings about unknown keywords.)
+
+@item
+Let the size of the sparse member be @var{size}, its block number be
+@var{Bs} and the block number of the next member be @var{Bn}.
+Compute: 
+
+@smallexample
+@var{N} = @var{Bs} - @var{Bn} - @var{size}/512 - 2
+@end smallexample
+
+@noindent
+This number gives the size of the extended header part in tar @dfn{blocks}.
+In our example, this formula gives: @code{897 - 56 - 425984 / 512 - 2
+= 7}.
+
+@item
+Use @command{dd} to extract the headers:
+
+@smallexample
+@kbd{dd if=@var{archive} of=@var{hname} bs=512 skip=@var{Bs} count=@var{N}}
+@end smallexample
+
+@noindent
+where @var{archive} is the archive name, @var{hname} is a name of the
+file to store the extended header in, @var{Bs} and @var{N} are
+computed in previous steps.
+
+In our example, this command will be
+
+@smallexample
+$ @kbd{dd if=arc.tar of=xhdr bs=512 skip=56 count=7}
+@end smallexample
+@end enumerate
+
+Finally, you can expand the condensed file, using the obtained header:
+
+@smallexample
+@group
+$ @kbd{xsparse -v -x xhdr GNUSparseFile.6058/sparsefile}
+Reading extended header file
+Found variable GNU.sparse.size = 217481216
+Found variable GNU.sparse.numblocks = 208
+Found variable GNU.sparse.name = sparsefile
+Found variable GNU.sparse.map = 0,2048,1050624,2048,@dots{}
+Expanding file `GNUSparseFile.28124/sparsefile' to `sparsefile'
+Done
+@end group
+@end smallexample
+
 @node Compression
 @section Using Less Space through Compression
 
@@ -10432,14 +10785,14 @@ output. Default is 12.
 Right margin of the text output. Used for wrapping.
 @end deftypevr
 
-@node Genfile
-@appendix Genfile
-@include genfile.texi
-
 @node Tar Internals
 @appendix Tar Internals
 @include intern.texi
 
+@node Genfile
+@appendix Genfile
+@include genfile.texi
+
 @node Free Software Needs Free Documentation
 @appendix Free Software Needs Free Documentation
 @include freemanuals.texi
-- 
2.44.0