From: Sergey Poznyakoff Date: Fri, 23 Jun 2006 15:18:18 +0000 (+0000) Subject: New files X-Git-Url: https://git.dogcows.com/gitweb?p=chaz%2Ftar;a=commitdiff_plain;h=9588a106a7192cf276e8db0d51c7818be286bf41 New files --- diff --git a/doc/sparse.texi b/doc/sparse.texi new file mode 100644 index 0000000..7b9145d --- /dev/null +++ b/doc/sparse.texi @@ -0,0 +1,217 @@ +@c This is part of the paxutils manual. +@c Copyright (C) 2006 Free Software Foundation, Inc. +@c This file is distributed under GFDL 1.1 or any later version +@c published by the Free Software Foundation. + +The notion of sparse file, and the ways of handling it from the point +of view of @GNUTAR{} user have been described in detail in +@ref{sparse}. This chapter describes the internal format @GNUTAR{} +uses to store such files. + +The support for sparse files in @GNUTAR{} has a long history. The +earliest version featuring this support that I was able to find was 1.09, +released in November, 1990. The format introduced back then is called +@dfn{old GNU} sparse format and in spite of the fact that its design +contained many flaws, it was the only format @GNUTAR{} supported +until version 1.14 (May, 2004), which introduced initial support for +sparse archives in @acronym{PAX} archives (@pxref{posix}). This +format was not free from design flows, either and it was subsequently +improved in versions 1.15.2 (November, 2005) and 1.15.92 (June, +2006). + +In addition to GNU sparse format, @GNUTAR{} is able to read and +extract sparse files archived by @command{star}. + +The following subsections describe each format in detail. + +@menu +* Old GNU Format:: +* PAX 0:: PAX Format, Versions 0.0 and 0.1 +* PAX 1:: PAX Format, Version 1.0 +@end menu + +@node Old GNU Format +@appendixsubsec Old GNU Format + +The format introduced some time around 1990 (v. 1.09). It was +designed on top of standard @code{ustar} headers in such an +unfortunate way that some of its fields overwrote fields required by +POSIX. + +An old GNU sparse header is designated by type @samp{S} +(@code{GNUTYPE_SPARSE}) and has the following layout: + +@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40 +@headitem Offset @tab Size @tab Name @tab Data type @tab Contents +@item 0 @tab 345 @tab @tab N/A @tab Not used. +@item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file. +@item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file . +@item 369 @tab 12 @tab offset @tab Number @tab For +multivolume archives: the offset of the start of this volume. +@item 381 @tab 4 @tab @tab N/A @tab Not used. +@item 385 @tab 1 @tab @tab N/A @tab Not used. +@item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map. +@item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an +extension sparse header follows, @code{0} otherwise. +@item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file. +@end multitable + +Each of @code{sparse_header} object at offset 386 describes a single +data chunk. It has the following structure: + +@multitable @columnfractions 0.10 0.10 0.20 0.60 +@headitem Offset @tab Size @tab Data type @tab Contents +@item 0 @tab 12 @tab Number @tab Offset of the +beginning of the chunk. +@item 12 @tab 12 @tab Number @tab Size of the chunk. +@end multitable + +If the member contains more than four chunks, the @code{isextended} +field of the header has the value @code{1} and the main header is +followed by one or more @dfn{extension headers}. Each such header has +the following structure: + +@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40 +@headitem Offset @tab Size @tab Name @tab Data type @tab Contents +@item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab +(21 entires) File map. +@item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an +extension sparse header follows, or @code{0} otherwise. +@end multitable + +A header with @code{isextended=0} ends the map. + +@node PAX 0 +@appendixsubsec PAX Format, Versions 0.0 and 0.1 +@UNREVISED{} + +There are two formats available in this branch. The version @code{0.0} +is the initial version of sparse format used by @command{tar} +versions 1.14--1.15.1. The sparse file map is kept in extended +(@code{x}) PAX header variables: + +@table @code +@item GNU.sparse.size +Real size of the stored file + +@item GNU.sparse.numblocks +Number of blocks in the sparse map + +@item GNU.sparse.offset +Offset of the data block + +@item GNU.sparse.numbytes +Size of the data block +@end table + +The latter two variables repeat for each data block, so the overall +structure is like this: + +@smallexample +@group +GNU.sparse.size=@var{size} +GNU.sparse.numblocks=@var{numblocks} +repeat @var{numblocks} times + GNU.sparse.offset=@var{offset} + GNU.sparse.numbytes=@var{numbytes} +end repeat +@end group +@end smallexample + +This format presented the following two problems: + +@enumerate 1 +@item +Whereas the POSIX specification allows a variable to appear multiple +times in a header, it requires that only the last occurrence be +meaningful. Thus, multiple ocurrences of @code{GNU.sparse.offset} and +@code{GNU.sparse.numbytes} are conficting with the POSIX specs. + +@item +Attempting to extract such archives using a third-party @command{tar}s +results in extraction of sparse files in @emph{compressed form}. If +the @command{tar} implementation in question does not support POSIX +format, it will also extract a file containing extension header +attributes. This file can be used to expand the file to its original +state. However, posix-aware @command{tar}s will usually ignore the +unknown variables, which makes restoring the file much more +difficult@FIXME-xref{how to extract sparse file using third-party @command{tar}s}. +@end enumerate + +@GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which +attempted to solve these problems. As its predecessor, this format +stores sparse map in the extended POSIX header. It retains +@code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but +instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs +it uses a single variable: + +@table @code +@item GNU.sparse.map +Map of non-null data chunks. It is a string consisting of +comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]" +@end table + +To address the 2nd problem, the @code{name} field in @code{ustar} +is replaced with a special name, constructed using the following pattern: + +@smallexample +%d/GNUSparseFile.%p/%f +@end smallexample + +The real name of the sparse file is stored in the variable +@code{GNU.sparse.name}. Thus, those @command{tar} implementations +that are not aware of GNU extensions will at least extract the files +into separate directories, giving the user a possibility to expand it +afterwards @FIXME-ref{how to extract sparse file using third-party +@command{tar}s}. + +The resulting @code{GNU.sparse.map} string can be @emph{very} long. +Although POSIX does not impose any limit on the length of a @code{x} +header variable, this possibly can confuse some tars. + +@node PAX 1 +@appendixsubsec PAX Format, Version 1.0 +@UNREVISED{} + +The version @code{1.0} of sparse format was introduced with @GNUTAR{} +1.15.92. Its main objective was to make the resulting file +extractable with little effort even by non-posix aware @command{tar} +implementations. Starting from this version, the extended header +preceding a sparse member always contains the following variables that +identify the format being used: + +@table @code +@item GNU.sparse.major +Major version + +@item GNU.sparse.minor +Minor version +@end table + +The @code{name} field in @code{ustar} header contains a special name, +constructed using the following pattern: + +@smallexample +%d/GNUSparseFile.%p/%f +@end smallexample + +The real name of the sparse file is stored in the variable +@code{GNU.sparse.name}. The real size of the file is stored in the +variable @code{GNU.sparse.realsize}. + +The sparse map itself is stored in the file data block, preceding the actual +file data. It consists of a series of octal numbers of arbitrary length, delimited +by newlines. The map is padded with nulls to the nearest block boundary. + +The first number gives the number of entries in the map. Following are map entries, +each one consisting of two numbers giving the offset and size of the +data block it describes. + +The format is designed in such a way that non-posix aware tars and tars not +supporting @code{GNU.sparse.*} keywords will extract each sparse file +in its condensed form with the file map prepended and will place it +into a separate directory. Then, using a simple program it would be +possible to expand the file to its original form even without GNU tar. +@FIXME-xref{how to extract sparse file using third-party +@command{tar}s}. @FIXME{Write the program and give its URL here}. + diff --git a/tests/spmvp00.at b/tests/spmvp00.at new file mode 100644 index 0000000..526289d --- /dev/null +++ b/tests/spmvp00.at @@ -0,0 +1,26 @@ +# Process this file with autom4te to create testsuite. -*- Autotest -*- + +# Test suite for GNU tar. +# Copyright (C) 2006 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +# 02110-1301, USA. + +AT_SETUP([sparse files in PAX MV archives, v.0.0]) +AT_KEYWORDS([sparse multiv sparsemvp sparsemvp00]) + +TAR_MVP_TEST(0.0, [0 ABCDEFGHI 1M ABCDEFGHI], [0 ABCDEFGH 1M ABCDEFGHI]) + +AT_CLEANUP diff --git a/tests/spmvp01.at b/tests/spmvp01.at new file mode 100644 index 0000000..a2123cc --- /dev/null +++ b/tests/spmvp01.at @@ -0,0 +1,26 @@ +# Process this file with autom4te to create testsuite. -*- Autotest -*- + +# Test suite for GNU tar. +# Copyright (C) 2006 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +# 02110-1301, USA. + +AT_SETUP([sparse files in PAX MV archives, v.0.1]) +AT_KEYWORDS([sparse multiv sparsemvp sparsemvp01]) + +TAR_MVP_TEST(0.1, [0 ABCDEFGHIJK 1M ABCDEFGHI], [0 ABCDEFGHIJ 1M ABCDEFGHI]) + +AT_CLEANUP diff --git a/tests/spmvp10.at b/tests/spmvp10.at new file mode 100644 index 0000000..e35908d --- /dev/null +++ b/tests/spmvp10.at @@ -0,0 +1,26 @@ +# Process this file with autom4te to create testsuite. -*- Autotest -*- + +# Test suite for GNU tar. +# Copyright (C) 2006 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2, or (at your option) +# any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA +# 02110-1301, USA. + +AT_SETUP([sparse files in PAX MV archives, v.1.0]) +AT_KEYWORDS([sparse multiv sparsemvp sparsemvp10]) + +TAR_MVP_TEST(1.0, [0 ABCDEFGH 1M ABCDEFGHI], [0 ABCDEFG 1M ABCDEFGHI]) + +AT_CLEANUP