Dogcows Code - chaz/tar/blob - doc/intern.texi

   1 @c This is part of the paxutils manual.
   2 @c Copyright (C) 2006, 2014 Free Software Foundation, Inc.
   3 @c This file is distributed under GFDL 1.1 or any later version
   4 @c published by the Free Software Foundation.
   5
   6 @menu
   7 * Standard::           Basic Tar Format
   8 * Extensions::         @acronym{GNU} Extensions to the Archive Format
   9 * Sparse Formats::     Storing Sparse Files
  10 * Snapshot Files::
  11 * Dumpdir::
  12 @end menu
  13
  14 @node Standard
  15 @unnumberedsec Basic Tar Format
  16 @UNREVISED
  17
  18 While an archive may contain many files, the archive itself is a
  19 single ordinary file.  Like any other file, an archive file can be
  20 written to a storage device such as a tape or disk, sent through a
  21 pipe or over a network, saved on the active file system, or even
  22 stored in another archive.  An archive file is not easy to read or
  23 manipulate without using the @command{tar} utility or Tar mode in
  24 @acronym{GNU} Emacs.
  25
  26 Physically, an archive consists of a series of file entries terminated
  27 by an end-of-archive entry, which consists of two 512 blocks of zero
  28 bytes.  A file
  29 entry usually describes one of the files in the archive (an
  30 @dfn{archive member}), and consists of a file header and the contents
  31 of the file.  File headers contain file names and statistics, checksum
  32 information which @command{tar} uses to detect file corruption, and
  33 information about file types.
  34
  35 Archives are permitted to have more than one member with the same
  36 member name.  One way this situation can occur is if more than one
  37 version of a file has been stored in the archive.  For information
  38 about adding new versions of a file to an archive, see @ref{update}.
  39
  40 In addition to entries describing archive members, an archive may
  41 contain entries which @command{tar} itself uses to store information.
  42 @xref{label}, for an example of such an archive entry.
  43
  44 A @command{tar} archive file contains a series of blocks.  Each block
  45 contains @code{BLOCKSIZE} bytes.  Although this format may be thought
  46 of as being on magnetic tape, other media are often used.
  47
  48 Each file archived is represented by a header block which describes
  49 the file, followed by zero or more blocks which give the contents
  50 of the file.  At the end of the archive file there are two 512-byte blocks
  51 filled with binary zeros as an end-of-file marker.  A reasonable system
  52 should write such end-of-file marker at the end of an archive, but
  53 must not assume that such a block exists when reading an archive.  In
  54 particular @GNUTAR{} always issues a warning if it does not encounter it.
  55
  56 The blocks may be @dfn{blocked} for physical I/O operations.
  57 Each record of @var{n} blocks (where @var{n} is set by the
  58 @option{--blocking-factor=@var{512-size}} (@option{-b @var{512-size}}) option to @command{tar}) is written with a single
  59 @w{@samp{write ()}} operation.  On magnetic tapes, the result of
  60 such a write is a single record.  When writing an archive,
  61 the last record of blocks should be written at the full size, with
  62 blocks after the zero block containing all zeros.  When reading
  63 an archive, a reasonable system should properly handle an archive
  64 whose last record is shorter than the rest, or which contains garbage
  65 records after a zero block.
  66
  67 The header block is defined in C as follows.  In the @GNUTAR{}
  68 distribution, this is part of file @file{src/tar.h}:
  69
  70 @smallexample
  71 @include header.texi
  72 @end smallexample
  73
  74 All characters in header blocks are represented by using 8-bit
  75 characters in the local variant of ASCII.  Each field within the
  76 structure is contiguous; that is, there is no padding used within
  77 the structure.  Each character on the archive medium is stored
  78 contiguously.
  79
  80 Bytes representing the contents of files (after the header block
  81 of each file) are not translated in any way and are not constrained
  82 to represent characters in any character set.  The @command{tar} format
  83 does not distinguish text files from binary files, and no translation
  84 of file contents is performed.
  85
  86 The @code{name}, @code{linkname}, @code{magic}, @code{uname}, and
  87 @code{gname} are null-terminated character strings.  All other fields
  88 are zero-filled octal numbers in ASCII.  Each numeric field of width
  89 @var{w} contains @var{w} minus 1 digits, and a null.
  90
  91 The @code{name} field is the file name of the file, with directory names
  92 (if any) preceding the file name, separated by slashes.
  93
  94 @FIXME{how big a name before field overflows?}
  95
  96 The @code{mode} field provides nine bits specifying file permissions
  97 and three bits to specify the Set @acronym{UID}, Set @acronym{GID}, and Save Text
  98 (@dfn{sticky}) modes.  Values for these bits are defined above.
  99 When special permissions are required to create a file with a given
 100 mode, and the user restoring files from the archive does not hold such
 101 permissions, the mode bit(s) specifying those special permissions
 102 are ignored.  Modes which are not supported by the operating system
 103 restoring files from the archive will be ignored.  Unsupported modes
 104 should be faked up when creating or updating an archive; e.g., the
 105 group permission could be copied from the @emph{other} permission.
 106
 107 The @code{uid} and @code{gid} fields are the numeric user and group
 108 @acronym{ID} of the file owners, respectively.  If the operating system does
 109 not support numeric user or group @acronym{ID}s, these fields should
 110 be ignored.
 111
 112 The @code{size} field is the size of the file in bytes; linked files
 113 are archived with this field specified as zero.
 114
 115 The @code{mtime} field is the data modification time of the file at
 116 the time it was archived.  It is the ASCII representation of the octal
 117 value of the last time the file's contents were modified, represented
 118 as an integer number of
 119 seconds since January 1, 1970, 00:00 Coordinated Universal Time.
 120
 121 The @code{chksum} field is the ASCII representation of the octal value
 122 of the simple sum of all bytes in the header block.  Each 8-bit
 123 byte in the header is added to an unsigned integer, initialized to
 124 zero, the precision of which shall be no less than seventeen bits.
 125 When calculating the checksum, the @code{chksum} field is treated as
 126 if it were all blanks.
 127
 128 The @code{typeflag} field specifies the type of file archived.  If a
 129 particular implementation does not recognize or permit the specified
 130 type, the file will be extracted as if it were a regular file.  As this
 131 action occurs, @command{tar} issues a warning to the standard error.
 132
 133 The @code{atime} and @code{ctime} fields are used in making incremental
 134 backups; they store, respectively, the particular file's access and
 135 status change times.
 136
 137 The @code{offset} is used by the @option{--multi-volume} (@option{-M}) option, when
 138 making a multi-volume archive.  The offset is number of bytes into
 139 the file that we need to restart at to continue the file on the next
 140 tape, i.e., where we store the location that a continued file is
 141 continued at.
 142
 143 The following fields were added to deal with sparse files.  A file
 144 is @dfn{sparse} if it takes in unallocated blocks which end up being
 145 represented as zeros, i.e., no useful data.  A test to see if a file
 146 is sparse is to look at the number blocks allocated for it versus the
 147 number of characters in the file; if there are fewer blocks allocated
 148 for the file than would normally be allocated for a file of that
 149 size, then the file is sparse.  This is the method @command{tar} uses to
 150 detect a sparse file, and once such a file is detected, it is treated
 151 differently from non-sparse files.
 152
 153 Sparse files are often @code{dbm} files, or other database-type files
 154 which have data at some points and emptiness in the greater part of
 155 the file.  Such files can appear to be very large when an @samp{ls
 156 -l} is done on them, when in truth, there may be a very small amount
 157 of important data contained in the file.  It is thus undesirable
 158 to have @command{tar} think that it must back up this entire file, as
 159 great quantities of room are wasted on empty blocks, which can lead
 160 to running out of room on a tape far earlier than is necessary.
 161 Thus, sparse files are dealt with so that these empty blocks are
 162 not written to the tape.  Instead, what is written to the tape is a
 163 description, of sorts, of the sparse file: where the holes are, how
 164 big the holes are, and how much data is found at the end of the hole.
 165 This way, the file takes up potentially far less room on the tape,
 166 and when the file is extracted later on, it will look exactly the way
 167 it looked beforehand.  The following is a description of the fields
 168 used to handle a sparse file:
 169
 170 The @code{sp} is an array of @code{struct sparse}.  Each @code{struct
 171 sparse} contains two 12-character strings which represent an offset
 172 into the file and a number of bytes to be written at that offset.
 173 The offset is absolute, and not relative to the offset in preceding
 174 array element.
 175
 176 The header can hold four of these @code{struct sparse} at the moment;
 177 if more are needed, they are not stored in the header.
 178
 179 The @code{isextended} flag is set when an @code{extended_header}
 180 is needed to deal with a file.  Note that this means that this flag
 181 can only be set when dealing with a sparse file, and it is only set
 182 in the event that the description of the file will not fit in the
 183 allotted room for sparse structures in the header.  In other words,
 184 an extended_header is needed.
 185
 186 The @code{extended_header} structure is used for sparse files which
 187 need more sparse structures than can fit in the header.  The header can
 188 fit 4 such structures; if more are needed, the flag @code{isextended}
 189 gets set and the next block is an @code{extended_header}.
 190
 191 Each @code{extended_header} structure contains an array of 21
 192 sparse structures, along with a similar @code{isextended} flag
 193 that the header had.  There can be an indeterminate number of such
 194 @code{extended_header}s to describe a sparse file.
 195
 196 @table @asis
 197
 198 @item @code{REGTYPE}
 199 @itemx @code{AREGTYPE}
 200 These flags represent a regular file.  In order to be compatible
 201 with older versions of @command{tar}, a @code{typeflag} value of
 202 @code{AREGTYPE} should be silently recognized as a regular file.
 203 New archives should be created using @code{REGTYPE}.  Also, for
 204 backward compatibility, @command{tar} treats a regular file whose name
 205 ends with a slash as a directory.
 206
 207 @item @code{LNKTYPE}
 208 This flag represents a file linked to another file, of any type,
 209 previously archived.  Such files are identified in Unix by each
 210 file having the same device and inode number.  The linked-to name is
 211 specified in the @code{linkname} field with a trailing null.
 212
 213 @item @code{SYMTYPE}
 214 This represents a symbolic link to another file.  The linked-to name
 215 is specified in the @code{linkname} field with a trailing null.
 216
 217 @item @code{CHRTYPE}
 218 @itemx @code{BLKTYPE}
 219 These represent character special files and block special files
 220 respectively.  In this case the @code{devmajor} and @code{devminor}
 221 fields will contain the major and minor device numbers respectively.
 222 Operating systems may map the device specifications to their own
 223 local specification, or may ignore the entry.
 224
 225 @item @code{DIRTYPE}
 226 This flag specifies a directory or sub-directory.  The directory
 227 name in the @code{name} field should end with a slash.  On systems where
 228 disk allocation is performed on a directory basis, the @code{size} field
 229 will contain the maximum number of bytes (which may be rounded to
 230 the nearest disk block allocation unit) which the directory may
 231 hold.  A @code{size} field of zero indicates no such limiting.  Systems
 232 which do not support limiting in this manner should ignore the
 233 @code{size} field.
 234
 235 @item @code{FIFOTYPE}
 236 This specifies a FIFO special file.  Note that the archiving of a
 237 FIFO file archives the existence of this file and not its contents.
 238
 239 @item @code{CONTTYPE}
 240 This specifies a contiguous file, which is the same as a normal
 241 file except that, in operating systems which support it, all its
 242 space is allocated contiguously on the disk.  Operating systems
 243 which do not allow contiguous allocation should silently treat this
 244 type as a normal file.
 245
 246 @item @code{A} @dots{} @code{Z}
 247 These are reserved for custom implementations.  Some of these are
 248 used in the @acronym{GNU} modified format, as described below.
 249
 250 @end table
 251
 252 Other values are reserved for specification in future revisions of
 253 the P1003 standard, and should not be used by any @command{tar} program.
 254
 255 The @code{magic} field indicates that this archive was output in
 256 the P1003 archive format.  If this field contains @code{TMAGIC},
 257 the @code{uname} and @code{gname} fields will contain the ASCII
 258 representation of the owner and group of the file respectively.
 259 If found, the user and group @acronym{ID}s are used rather than the values in
 260 the @code{uid} and @code{gid} fields.
 261
 262 For references, see ISO/IEC 9945-1:1990 or IEEE Std 1003.1-1990, pages
 263 169-173 (section 10.1) for @cite{Archive/Interchange File Format}; and
 264 IEEE Std 1003.2-1992, pages 380-388 (section 4.48) and pages 936-940
 265 (section E.4.48) for @cite{pax - Portable archive interchange}.
 266
 267 @node Extensions
 268 @unnumberedsec @acronym{GNU} Extensions to the Archive Format
 269 @UNREVISED
 270
 271 The @acronym{GNU} format uses additional file types to describe new types of
 272 files in an archive.  These are listed below.
 273
 274 @table @code
 275 @item GNUTYPE_DUMPDIR
 276 @itemx 'D'
 277 This represents a directory and a list of files created by the
 278 @option{--incremental} (@option{-G}) option.  The @code{size} field gives the total
 279 size of the associated list of files.  Each file name is preceded by
 280 either a @samp{Y} (the file should be in this archive) or an @samp{N}.
 281 (The file is a directory, or is not stored in the archive.)  Each file
 282 name is terminated by a null.  There is an additional null after the
 283 last file name.
 284
 285 @item GNUTYPE_MULTIVOL
 286 @itemx 'M'
 287 This represents a file continued from another volume of a multi-volume
 288 archive created with the @option{--multi-volume} (@option{-M}) option.  The original
 289 type of the file is not given here.  The @code{size} field gives the
 290 maximum size of this piece of the file (assuming the volume does
 291 not end before the file is written out).  The @code{offset} field
 292 gives the offset from the beginning of the file where this part of
 293 the file begins.  Thus @code{size} plus @code{offset} should equal
 294 the original size of the file.
 295
 296 @item GNUTYPE_SPARSE
 297 @itemx 'S'
 298 This flag indicates that we are dealing with a sparse file.  Note
 299 that archiving a sparse file requires special operations to find
 300 holes in the file, which mark the positions of these holes, along
 301 with the number of bytes of data to be found after the hole.
 302
 303 @item GNUTYPE_VOLHDR
 304 @itemx 'V'
 305 This file type is used to mark the volume header that was given with
 306 the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option when the archive was created.  The @code{name}
 307 field contains the @code{name} given after the @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) option.
 308 The @code{size} field is zero.  Only the first file in each volume
 309 of an archive should have this type.
 310
 311 @end table
 312
 313 You may have trouble reading a @acronym{GNU} format archive on a
 314 non-@acronym{GNU} system if the options @option{--incremental} (@option{-G}),
 315 @option{--multi-volume} (@option{-M}), @option{--sparse} (@option{-S}), or @option{--label=@var{archive-label}} (@option{-V @var{archive-label}}) were
 316 used when writing the archive.  In general, if @command{tar} does not
 317 use the @acronym{GNU}-added fields of the header, other versions of
 318 @command{tar} should be able to read the archive.  Otherwise, the
 319 @command{tar} program will give an error, the most likely one being a
 320 checksum error.
 321
 322 @node Sparse Formats
 323 @unnumberedsec Storing Sparse Files
 324 @include sparse.texi
 325
 326 @node Snapshot Files
 327 @unnumberedsec Format of the Incremental Snapshot Files
 328 @include snapshot.texi
 329
 330 @node Dumpdir
 331 @unnumberedsec Dumpdir
 332 @include dumpdir.texi