]> Dogcows Code - chaz/tar/blob - doc/sparse.texi
New files
[chaz/tar] / doc / sparse.texi
1 @c This is part of the paxutils manual.
2 @c Copyright (C) 2006 Free Software Foundation, Inc.
3 @c This file is distributed under GFDL 1.1 or any later version
4 @c published by the Free Software Foundation.
5
6 The notion of sparse file, and the ways of handling it from the point
7 of view of @GNUTAR{} user have been described in detail in
8 @ref{sparse}. This chapter describes the internal format @GNUTAR{}
9 uses to store such files.
10
11 The support for sparse files in @GNUTAR{} has a long history. The
12 earliest version featuring this support that I was able to find was 1.09,
13 released in November, 1990. The format introduced back then is called
14 @dfn{old GNU} sparse format and in spite of the fact that its design
15 contained many flaws, it was the only format @GNUTAR{} supported
16 until version 1.14 (May, 2004), which introduced initial support for
17 sparse archives in @acronym{PAX} archives (@pxref{posix}). This
18 format was not free from design flows, either and it was subsequently
19 improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
20 2006).
21
22 In addition to GNU sparse format, @GNUTAR{} is able to read and
23 extract sparse files archived by @command{star}.
24
25 The following subsections describe each format in detail.
26
27 @menu
28 * Old GNU Format::
29 * PAX 0:: PAX Format, Versions 0.0 and 0.1
30 * PAX 1:: PAX Format, Version 1.0
31 @end menu
32
33 @node Old GNU Format
34 @appendixsubsec Old GNU Format
35
36 The format introduced some time around 1990 (v. 1.09). It was
37 designed on top of standard @code{ustar} headers in such an
38 unfortunate way that some of its fields overwrote fields required by
39 POSIX.
40
41 An old GNU sparse header is designated by type @samp{S}
42 (@code{GNUTYPE_SPARSE}) and has the following layout:
43
44 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
45 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
46 @item 0 @tab 345 @tab @tab N/A @tab Not used.
47 @item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file.
48 @item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file .
49 @item 369 @tab 12 @tab offset @tab Number @tab For
50 multivolume archives: the offset of the start of this volume.
51 @item 381 @tab 4 @tab @tab N/A @tab Not used.
52 @item 385 @tab 1 @tab @tab N/A @tab Not used.
53 @item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map.
54 @item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
55 extension sparse header follows, @code{0} otherwise.
56 @item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file.
57 @end multitable
58
59 Each of @code{sparse_header} object at offset 386 describes a single
60 data chunk. It has the following structure:
61
62 @multitable @columnfractions 0.10 0.10 0.20 0.60
63 @headitem Offset @tab Size @tab Data type @tab Contents
64 @item 0 @tab 12 @tab Number @tab Offset of the
65 beginning of the chunk.
66 @item 12 @tab 12 @tab Number @tab Size of the chunk.
67 @end multitable
68
69 If the member contains more than four chunks, the @code{isextended}
70 field of the header has the value @code{1} and the main header is
71 followed by one or more @dfn{extension headers}. Each such header has
72 the following structure:
73
74 @multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
75 @headitem Offset @tab Size @tab Name @tab Data type @tab Contents
76 @item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab
77 (21 entires) File map.
78 @item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
79 extension sparse header follows, or @code{0} otherwise.
80 @end multitable
81
82 A header with @code{isextended=0} ends the map.
83
84 @node PAX 0
85 @appendixsubsec PAX Format, Versions 0.0 and 0.1
86 @UNREVISED{}
87
88 There are two formats available in this branch. The version @code{0.0}
89 is the initial version of sparse format used by @command{tar}
90 versions 1.14--1.15.1. The sparse file map is kept in extended
91 (@code{x}) PAX header variables:
92
93 @table @code
94 @item GNU.sparse.size
95 Real size of the stored file
96
97 @item GNU.sparse.numblocks
98 Number of blocks in the sparse map
99
100 @item GNU.sparse.offset
101 Offset of the data block
102
103 @item GNU.sparse.numbytes
104 Size of the data block
105 @end table
106
107 The latter two variables repeat for each data block, so the overall
108 structure is like this:
109
110 @smallexample
111 @group
112 GNU.sparse.size=@var{size}
113 GNU.sparse.numblocks=@var{numblocks}
114 repeat @var{numblocks} times
115 GNU.sparse.offset=@var{offset}
116 GNU.sparse.numbytes=@var{numbytes}
117 end repeat
118 @end group
119 @end smallexample
120
121 This format presented the following two problems:
122
123 @enumerate 1
124 @item
125 Whereas the POSIX specification allows a variable to appear multiple
126 times in a header, it requires that only the last occurrence be
127 meaningful. Thus, multiple ocurrences of @code{GNU.sparse.offset} and
128 @code{GNU.sparse.numbytes} are conficting with the POSIX specs.
129
130 @item
131 Attempting to extract such archives using a third-party @command{tar}s
132 results in extraction of sparse files in @emph{compressed form}. If
133 the @command{tar} implementation in question does not support POSIX
134 format, it will also extract a file containing extension header
135 attributes. This file can be used to expand the file to its original
136 state. However, posix-aware @command{tar}s will usually ignore the
137 unknown variables, which makes restoring the file much more
138 difficult@FIXME-xref{how to extract sparse file using third-party @command{tar}s}.
139 @end enumerate
140
141 @GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
142 attempted to solve these problems. As its predecessor, this format
143 stores sparse map in the extended POSIX header. It retains
144 @code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
145 instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
146 it uses a single variable:
147
148 @table @code
149 @item GNU.sparse.map
150 Map of non-null data chunks. It is a string consisting of
151 comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
152 @end table
153
154 To address the 2nd problem, the @code{name} field in @code{ustar}
155 is replaced with a special name, constructed using the following pattern:
156
157 @smallexample
158 %d/GNUSparseFile.%p/%f
159 @end smallexample
160
161 The real name of the sparse file is stored in the variable
162 @code{GNU.sparse.name}. Thus, those @command{tar} implementations
163 that are not aware of GNU extensions will at least extract the files
164 into separate directories, giving the user a possibility to expand it
165 afterwards @FIXME-ref{how to extract sparse file using third-party
166 @command{tar}s}.
167
168 The resulting @code{GNU.sparse.map} string can be @emph{very} long.
169 Although POSIX does not impose any limit on the length of a @code{x}
170 header variable, this possibly can confuse some tars.
171
172 @node PAX 1
173 @appendixsubsec PAX Format, Version 1.0
174 @UNREVISED{}
175
176 The version @code{1.0} of sparse format was introduced with @GNUTAR{}
177 1.15.92. Its main objective was to make the resulting file
178 extractable with little effort even by non-posix aware @command{tar}
179 implementations. Starting from this version, the extended header
180 preceding a sparse member always contains the following variables that
181 identify the format being used:
182
183 @table @code
184 @item GNU.sparse.major
185 Major version
186
187 @item GNU.sparse.minor
188 Minor version
189 @end table
190
191 The @code{name} field in @code{ustar} header contains a special name,
192 constructed using the following pattern:
193
194 @smallexample
195 %d/GNUSparseFile.%p/%f
196 @end smallexample
197
198 The real name of the sparse file is stored in the variable
199 @code{GNU.sparse.name}. The real size of the file is stored in the
200 variable @code{GNU.sparse.realsize}.
201
202 The sparse map itself is stored in the file data block, preceding the actual
203 file data. It consists of a series of octal numbers of arbitrary length, delimited
204 by newlines. The map is padded with nulls to the nearest block boundary.
205
206 The first number gives the number of entries in the map. Following are map entries,
207 each one consisting of two numbers giving the offset and size of the
208 data block it describes.
209
210 The format is designed in such a way that non-posix aware tars and tars not
211 supporting @code{GNU.sparse.*} keywords will extract each sparse file
212 in its condensed form with the file map prepended and will place it
213 into a separate directory. Then, using a simple program it would be
214 possible to expand the file to its original form even without GNU tar.
215 @FIXME-xref{how to extract sparse file using third-party
216 @command{tar}s}. @FIXME{Write the program and give its URL here}.
217
This page took 0.043776 seconds and 5 git commands to generate.