Dogcows Code - chaz/tar/blob - doc/sparse.texi

   1 @c This is part of the paxutils manual.
   2 @c Copyright (C) 2006 Free Software Foundation, Inc.
   3 @c This file is distributed under GFDL 1.1 or any later version
   4 @c published by the Free Software Foundation.
   5
   6 The notion of sparse file, and the ways of handling it from the point
   7 of view of @GNUTAR{} user have been described in detail in
   8 @ref{sparse}.  This chapter describes the internal format @GNUTAR{}
   9 uses to store such files.
  10
  11 The support for sparse files in @GNUTAR{} has a long history.  The
  12 earliest version featuring this support that I was able to find was 1.09,
  13 released in November, 1990.  The format introduced back then is called
  14 @dfn{old GNU} sparse format and in spite of the fact that its design
  15 contained many flaws, it was the only format @GNUTAR{} supported
  16 until version 1.14 (May, 2004), which introduced initial support for
  17 sparse archives in @acronym{PAX} archives (@pxref{posix}).  This
  18 format was not free from design flows, either and it was subsequently
  19 improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
  20 2006).
  21
  22 In addition to GNU sparse format, @GNUTAR{} is able to read and
  23 extract sparse files archived by @command{star}.
  24
  25 The following subsections describe each format in detail.
  26
  27 @menu
  28 * Old GNU Format::
  29 * PAX 0::                PAX Format, Versions 0.0 and 0.1
  30 * PAX 1::                PAX Format, Version 1.0
  31 @end menu
  32
  33 @node Old GNU Format
  34 @appendixsubsec Old GNU Format
  35
  36 The format introduced some time around 1990 (v. 1.09).  It was
  37 designed on top of standard @code{ustar} headers in such an
  38 unfortunate way that some of its fields overwrote fields required by
  39 POSIX.
  40
  41 An old GNU sparse header is designated by type @samp{S}
  42 (@code{GNUTYPE_SPARSE}) and has the following layout:
  43
  44 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
  45 @headitem Offset @tab Size @tab Name   @tab Data type   @tab Contents
  46 @item          0 @tab 345  @tab        @tab N/A         @tab Not used.
  47 @item        345 @tab  12  @tab atime  @tab Number      @tab @code{atime} of the file.
  48 @item        357 @tab  12  @tab ctime  @tab Number      @tab @code{ctime} of the file .
  49 @item        369 @tab  12  @tab offset @tab Number      @tab For
  50 multivolume archives: the offset of the start of this volume.
  51 @item        381 @tab   4  @tab        @tab N/A         @tab Not used.
  52 @item        385 @tab   1  @tab        @tab N/A         @tab Not used.
  53 @item        386 @tab  96  @tab sp     @tab @code{sparse_header} @tab (4 entries) File map.
  54 @item        482 @tab   1  @tab isextended @tab Bool        @tab @code{1} if an
  55 extension sparse header follows, @code{0} otherwise.
  56 @item        483 @tab  12  @tab realsize @tab Number      @tab Real size of the file.
  57 @end multitable
  58
  59 Each of @code{sparse_header} object at offset 386 describes a single
  60 data chunk. It has the following structure:
  61
  62 @multitable @columnfractions 0.10 0.10 0.20 0.60
  63 @headitem Offset @tab Size @tab Data type   @tab Contents
  64 @item          0 @tab   12 @tab Number      @tab Offset of the
  65 beginning of the chunk.
  66 @item         12 @tab   12 @tab Number      @tab Size of the chunk.
  67 @end multitable
  68
  69 If the member contains more than four chunks, the @code{isextended}
  70 field of the header has the value @code{1} and the main header is
  71 followed by one or more @dfn{extension headers}.  Each such header has
  72 the following structure:
  73
  74 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
  75 @headitem Offset @tab Size @tab Name   @tab Data type   @tab Contents
  76 @item          0 @tab   21 @tab sp     @tab @code{sparse_header} @tab
  77 (21 entires) File map.
  78 @item        504 @tab    1 @tab isextended @tab Bool    @tab @code{1} if an
  79 extension sparse header follows, or @code{0} otherwise.
  80 @end multitable
  81
  82 A header with @code{isextended=0} ends the map.
  83
  84 @node PAX 0
  85 @appendixsubsec PAX Format, Versions 0.0 and 0.1
  86 @UNREVISED{}
  87
  88 There are two formats available in this branch.  The version @code{0.0}
  89 is the initial version of sparse format used by @command{tar}
  90 versions 1.14--1.15.1.  The sparse file map is kept in extended
  91 (@code{x}) PAX header variables:
  92
  93 @table @code
  94 @item GNU.sparse.size
  95 Real size of the stored file
  96
  97 @item GNU.sparse.numblocks
  98 Number of blocks in the sparse map
  99
 100 @item GNU.sparse.offset
 101 Offset of the data block
 102
 103 @item GNU.sparse.numbytes
 104 Size of the data block
 105 @end table
 106
 107 The latter two variables repeat for each data block, so the overall
 108 structure is like this:
 109
 110 @smallexample
 111 @group
 112 GNU.sparse.size=@var{size}
 113 GNU.sparse.numblocks=@var{numblocks}
 114 repeat @var{numblocks} times
 115   GNU.sparse.offset=@var{offset}
 116   GNU.sparse.numbytes=@var{numbytes}
 117 end repeat
 118 @end group
 119 @end smallexample
 120
 121 This format presented the following two problems:
 122
 123 @enumerate 1
 124 @item
 125 Whereas the POSIX specification allows a variable to appear multiple
 126 times in a header, it requires that only the last occurrence be
 127 meaningful.  Thus, multiple ocurrences of @code{GNU.sparse.offset} and
 128 @code{GNU.sparse.numbytes} are conficting with the POSIX specs.
 129
 130 @item
 131 Attempting to extract such archives using a third-party @command{tar}s
 132 results in extraction of sparse files in @emph{compressed form}.  If
 133 the @command{tar} implementation in question does not support POSIX
 134 format, it will also extract a file containing extension header
 135 attributes.  This file can be used to expand the file to its original
 136 state.  However, posix-aware @command{tar}s will usually ignore the
 137 unknown variables, which makes restoring the file much more
 138 difficult@FIXME-xref{how to extract sparse file using third-party @command{tar}s}.
 139 @end enumerate
 140
 141 @GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
 142 attempted to solve these problems.  As its predecessor, this format
 143 stores sparse map in the extended POSIX header.  It retains
 144 @code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
 145 instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
 146 it uses a single variable:
 147
 148 @table @code
 149 @item GNU.sparse.map
 150 Map of non-null data chunks.  It is a string consisting of
 151 comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
 152 @end table
 153
 154 To address the 2nd problem, the @code{name} field in @code{ustar}
 155 is replaced with a special name, constructed using the following pattern:
 156
 157 @smallexample
 158 %d/GNUSparseFile.%p/%f
 159 @end smallexample
 160
 161 The real name of the sparse file is stored in the variable
 162 @code{GNU.sparse.name}.  Thus, those @command{tar} implementations
 163 that are not aware of GNU extensions will at least extract the files
 164 into separate directories, giving the user a possibility to expand it
 165 afterwards @FIXME-ref{how to extract sparse file using third-party
 166 @command{tar}s}.
 167
 168 The resulting @code{GNU.sparse.map} string can be @emph{very} long.
 169 Although POSIX does not impose any limit on the length of a @code{x}
 170 header variable, this possibly can confuse some tars.
 171
 172 @node PAX 1
 173 @appendixsubsec PAX Format, Version 1.0
 174 @UNREVISED{}
 175
 176 The version @code{1.0} of sparse format was introduced with @GNUTAR{}
 177 1.15.92.  Its main objective was to make the resulting file
 178 extractable with little effort even by non-posix aware @command{tar}
 179 implementations.  Starting from this version, the extended header
 180 preceding a sparse member always contains the following variables that
 181 identify the format being used:
 182
 183 @table @code
 184 @item GNU.sparse.major
 185 Major version
 186
 187 @item GNU.sparse.minor
 188 Minor version
 189 @end table
 190
 191 The @code{name} field in @code{ustar} header contains a special name,
 192 constructed using the following pattern:
 193
 194 @smallexample
 195 %d/GNUSparseFile.%p/%f
 196 @end smallexample
 197
 198 The real name of the sparse file is stored in the variable
 199 @code{GNU.sparse.name}.  The real size of the file is stored in the
 200 variable @code{GNU.sparse.realsize}.
 201
 202 The sparse map itself is stored in the file data block, preceding the actual
 203 file data.  It consists of a series of octal numbers of arbitrary length, delimited
 204 by newlines. The map is padded with nulls to the nearest block boundary.
 205
 206 The first number gives the number of entries in the map. Following are map entries,
 207 each one consisting of two numbers giving the offset and size of the
 208 data block it describes.
 209
 210 The format is designed in such a way that non-posix aware tars and tars not
 211 supporting @code{GNU.sparse.*} keywords will extract each sparse file
 212 in its condensed form with the file map prepended and will place it
 213 into a separate directory.  Then, using a simple program it would be
 214 possible to expand the file to its original form even without GNU tar.
 215 @FIXME-xref{how to extract sparse file using third-party
 216 @command{tar}s}. @FIXME{Write the program and give its URL here}.
 217