(Other Tars): New node describing how to extract

GNU-specific member formats using third-party tars.
This commit is contained in:
Sergey Poznyakoff
2006-06-25 12:45:03 +00:00
parent c7aa519f09
commit 39e5d9182c

View File

@@ -109,8 +109,8 @@ Appendices
* Changes::
* Configuring Help Summary::
* Genfile::
* Tar Internals::
* Genfile::
* Free Software Needs Free Documentation::
* Copying This Manual::
* Index of Command Line Options::
@@ -330,11 +330,18 @@ Making @command{tar} Archives More Portable
* posix:: @acronym{POSIX} archives
* Checksumming:: Checksumming Problems
* Large or Negative Values:: Large files, negative time stamps, etc.
* Other Tars:: How to Extract GNU-Specific Data Using
Other @command{tar} Implementations
@GNUTAR{} and @acronym{POSIX} @command{tar}
* PAX keywords:: Controlling Extended Header Keywords.
How to Extract GNU-Specific Data Using Other @command{tar} Implementations
* Split Recovery:: Members Split Between Volumes
* Sparse Recovery:: Sparse Members
Using Less Space through Compression
* gzip:: Creating and Reading Compressed Archives
@@ -369,12 +376,6 @@ Using Multiple Tapes
* Tarcat:: Concatenate Volumes into a Single Archive
Genfile
* Generate Mode:: File Generation Mode.
* Status Mode:: File Status Mode.
* Exec Mode:: Synchronous Execution mode.
Tar Internals
* Standard:: Basic Tar Format
@@ -389,6 +390,12 @@ Storing Sparse Files
* PAX 0:: PAX Format, Versions 0.0 and 0.1
* PAX 1:: PAX Format, Version 1.0
Genfile
* Generate Mode:: File Generation Mode.
* Status Mode:: File Status Mode.
* Exec Mode:: Synchronous Execution mode.
Copying This Manual
* GNU Free Documentation License:: License for copying this manual
@@ -7753,6 +7760,8 @@ archives and archive labels) in GNU and PAX formats.}
* posix:: @acronym{POSIX} archives
* Checksumming:: Checksumming Problems
* Large or Negative Values:: Large files, negative time stamps, etc.
* Other Tars:: How to Extract GNU-Specific Data Using
Other @command{tar} Implementations
@end menu
@node Portable Names
@@ -8070,6 +8079,350 @@ be extracted by any tar implementation that understands older
@FIXME{Describe how @acronym{POSIX} archives are extracted by non
POSIX-aware tars.}
@node Other Tars
@subsection How to Extract GNU-Specific Data Using Other @command{tar} Implementations
In previous sections you became acquainted with various quircks
necessary to make your archives portable. Sometimes you may need to
extract archives containing GNU-specific members using some
third-party @command{tar} implementation or an older version of
@GNUTAR{}. Of course your best bet is to have @GNUTAR{} installed,
but if it is for some reason impossible, this section will explain
how to cope without it.
When we speak about @dfn{GNU-specific} members we mean two classes of
them: members split between the volumes of a multi-volume archive and
sparse members. You will be able to always recover such members if
the archive is in PAX format. In addition split members can be
recovered from archives in old GNU format. The following subsections
describe the required procedures in detail.
@menu
* Split Recovery:: Members Split Between Volumes
* Sparse Recovery:: Sparse Members
@end menu
@node Split Recovery
@subsubsection Extracting Members Split Between Volumes
If a member is split between several volumes of an old GNU format archive
most third party @command{tar} implementation will fail to extract
it. To extract it, use @command{tarcat} program (@pxref{Tarcat}).
This program is available from
@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/tarcat, @GNUTAR{}
home page}. It concatenates several archive volumes into a single
valid archive. For example, if you have three volumes named from
@file{vol-1.tar} to @file{vol-2.tar}, you can do the following to
extract them using a third-party @command{tar}:
@smallexample
$ @kbd{tarcat vol-1.tar vol-2.tar vol-3.tar | tar xf -}
@end smallexample
You could use this approach for many (although not all) PAX
format archives as well. However, extracting split members from a PAX
archive is a much easier task, because PAX volumes are constructed in
such a way that each part of a split member is extracted as a
different file by @command{tar} implementations that are not aware of
GNU extensions. More specifically, the very first part retains its
original name, and all subsequent parts are named using the pattern:
@smallexample
%d/GNUFileParts.%p/%f.%n
@end smallexample
@noindent
where symbols preceeded by @samp{%} are @dfn{macro characters} that
have the following meaning:
@multitable @columnfractions .25 .55
@headitem Meta-character @tab Replaced By
@item %d @tab The directory name of the file, equivalent to the
result of the @command{dirname} utility on its full name.
@item %f @tab The file name of the file, equivalent to the result
of the @command{basename} utility on its full name.
@item %p @tab The process ID of the @command{tar} process that
created the archive.
@item %n @tab Ordinal number of this particular part.
@end multitable
For example, if, a file @file{var/longfile} was split during archive
creation between three volumes, and the creator @command{tar} process
had process ID @samp{27962}, then the member names will be:
@smallexample
var/longfile
var/GNUFileParts.27962/longfile.1
var/GNUFileParts.27962/longfile.2
@end smallexample
When you extract your archive using a third-party @command{tar}, these
files will be created on your disk, and the only thing you will need
to do to restore your file in its original form is concatenate them in
the proper order, for example:
@smallexample
@group
$ @kbd{cd var}
$ @kbd{cat GNUFileParts.27962/longfile.1 \
GNUFileParts.27962/longfile.2 >> longfile}
$ rm -f GNUFileParts.27962
@end group
@end smallexample
Notice, that if the @command{tar} implementation you use supports PAX
format archives, it will probably emit warnings about unknown keywords
during extraction. They will lool like this:
@smallexample
@group
Tar file too small
Unknown extended header keyword 'GNU.volume.filename' ignored.
Unknown extended header keyword 'GNU.volume.size' ignored.
Unknown extended header keyword 'GNU.volume.offset' ignored.
@end group
@end smallexample
@noindent
You can safely ignore these warnings.
If your @command{tar} implementation is not PAX-aware, you will get
more warnigns and more files generated on your disk, e.g.:
@smallexample
@group
$ @kbd{tar xf vol-1.tar}
var/PaxHeaders.27962/longfile: Unknown file type 'x', extracted as
normal file
Unexpected EOF in archive
$ @kbd{tar xf vol-2.tar}
tmp/GlobalHead.27962.1: Unknown file type 'g', extracted as normal file
GNUFileParts.27962/PaxHeaders.27962/sparsefile.1: Unknown file type
'x', extracted as normal file
@end group
@end smallexample
Ignore these warnings. The @file{PaxHeaders.*} directories created
will contain files with @dfn{extended header keywords} describing the
extracted files. You can delete them, unless they describe sparse
members. Read further to learn more about them.
@node Sparse Recovery
@subsubsection Extracting Sparse Members
Any @command{tar} implementation will be able to extract sparse members from a
PAX archive. However, the extracted files will be @dfn{condensed},
i.e. any zero blocks will be removed from them. When we restore such
a condensed file to its original form, by adding zero bloks (or
@dfn{holes}) back to their original locations, we call this process
@dfn{expanding} a compressed sparse file.
To expand a file, you will need a simple auxiliary program called
@command{xsparse}. It is available in source form from
@uref{http://www.gnu.org/@/software/@/tar/@/utils/@/xsparse, @GNUTAR{}
home page}.
Let's begin with archive members in @dfn{sparse format
version 1.0}@footnote{@xref{PAX 1}.}, which are the easiest to expand.
The condensed file will contain both file map and file data, so no
additional data will be needed to restore it. If the original file
name was @file{@var{dir}/@var{name}}, then the condensed file will be
named @file{@var{dir}/@/GNUSparseFile.@var{n}/@/@var{name}}, where
@var{n} is a decimal number@footnote{technically speaking, @var{n} is a
@dfn{process ID} of the @command{tar} process which created the
archive (@pxref{PAX keywords}).}.
To expand a version 1.0 file, run @command{xsparse} as follows:
@smallexample
$ @kbd{xsparse @file{cond-file}}
@end smallexample
@noindent
where @file{cond-file} is the name of the condensed file. The utility
will deduce the name for the resulting expanded file using the
following algorithm:
@enumerate 1
@item If @file{cond-file} does not contain any directories,
@file{../cond-file} will be used;
@item If @file{cond-file} has the form
@file{@var{dir}/@var{t}/@var{name}}, where both @var{t} and @var{name}
are simple names, with no @samp{/} characters in them, the output file
name will be @file{@var{dir}/@var{name}}.
@item Otherwise, if @file{cond-file} has the form
@file{@var{dir}/@var{name}}, the output file name will be
@file{@var{name}}.
@end enumerate
In the unlikely case when this algorithm does not suite your needs,
you can explicitely specify output file name as a second argument to
the command:
@smallexample
$ @kbd{xsparse @file{cond-file}}
@end smallexample
It is often a good idea to run @command{xsparse} in @dfn{dry run} mode
first. In this mode, the command does not actually expand the file,
but verbosely lists all actions it would be taking to do so. The dry
run mode is enabled by @option{-n} command line argument:
@smallexample
@group
$ @kbd{xsparse -n /home/gray/GNUSparseFile.6058/sparsefile}
Reading v.1.0 sparse map
Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
`/home/gray/sparsefile'
Finished dry run
@end group
@end smallexample
To actually expand the file, you would run:
@smallexample
$ @kbd{xsparse /home/gray/GNUSparseFile.6058/sparsefile}
@end smallexample
@noindent
The program behaves the same way all UNIX utilities do: it will keep
quiet unless it has simething important to tell you (e.g. an error
condition or something). If you wish it to produce verbose output,
similar to that from the dry run mode, give it @option{-v} option:
@smallexample
@group
$ @kbd{xsparse -v /home/gray/GNUSparseFile.6058/sparsefile}
Reading v.1.0 sparse map
Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
`/home/gray/sparsefile'
Done
@end group
@end smallexample
Additionally, if your @command{tar} implementation has extracted the
@dfn{extended headers} for this file, you can instruct @command{xstar}
to use them in order to verify the integrity of the expanded file.
The option @option{-x} sets the name of the extended header file to
use. Continuing our example:
@smallexample
@group
$ @kbd{xsparse -v -x /home/gray/PaxHeaders.6058/sparsefile \
/home/gray/GNUSparseFile.6058/sparsefile}
Reading extended header file
Found variable GNU.sparse.major = 1
Found variable GNU.sparse.minor = 0
Found variable GNU.sparse.name = sparsefile
Found variable GNU.sparse.realsize = 217481216
Reading v.1.0 sparse map
Expanding file `/home/gray/GNUSparseFile.6058/sparsefile' to
`/home/gray/sparsefile'
Done
@end group
@end smallexample
An @dfn{extended header} is a special @command{tar} archive header
that precedes an archive member and contains a set of
@dfn{variables}, describing the member properties that cannot be
stored in the standard @code{ustar} header. While optional for
expanding sparse version 1.0 members, use of extended headers is
mandatory when expanding sparse members in older sparse formats: v.0.0
and v.0.1 (The sparse formats are described in detail in @pxref{Sparse
Formats}). So, for this format, the question is: how to obtain
extended headers from the archive?
If you use a @command{tar} implementation that does not support PAX
format, extended headers for each member will be extracted as a
separate file. If we represent the member name as
@file{@var{dir}/@var{name}}, then the extended header file will be
named @file{@var{dir}/@/PaxHeaders.@var{n}/@/@var{name}}, where
@var{n} is an integer number.
Things become more difficult if your @command{tar} implementation
does support PAX headers, because in this case you will have to
manually extract the headers. We recommend the following algorithm:
@enumerate 1
@item
Consult the documentation for your @command{tar} implementation for an
option that will print @dfn{block numbers} along with the archive
listing (analogous to @GNUTAR{}'s @option{-R} option). For example,
@command{star} has @option{-block-number}.
@item
Obtain the verbose listing using the @samp{block number} option, and
find the position of the sparse member in question and the member
immediately following it. For example, running @command{star} on our
archive we obtain:
@smallexample
@group
$ @kbd{star -t -v -block-number -f arc.tar}
@dots{}
star: Unknown extended header keyword 'GNU.sparse.size' ignored.
star: Unknown extended header keyword 'GNU.sparse.numblocks' ignored.
star: Unknown extended header keyword 'GNU.sparse.name' ignored.
star: Unknown extended header keyword 'GNU.sparse.map' ignored.
block 56: 425984 -rw-r--r-- gray/users Jun 25 14:46 2006 GNUSparseFile.28124/sparsefile
block 897: 65391 -rw-r--r-- gray/users Jun 24 20:06 2006 README
@dots{}
@end group
@end smallexample
@noindent
(as usual, ignore the warnings about unknown keywords.)
@item
Let the size of the sparse member be @var{size}, its block number be
@var{Bs} and the block number of the next member be @var{Bn}.
Compute:
@smallexample
@var{N} = @var{Bs} - @var{Bn} - @var{size}/512 - 2
@end smallexample
@noindent
This number gives the size of the extended header part in tar @dfn{blocks}.
In our example, this formula gives: @code{897 - 56 - 425984 / 512 - 2
= 7}.
@item
Use @command{dd} to extract the headers:
@smallexample
@kbd{dd if=@var{archive} of=@var{hname} bs=512 skip=@var{Bs} count=@var{N}}
@end smallexample
@noindent
where @var{archive} is the archive name, @var{hname} is a name of the
file to store the extended header in, @var{Bs} and @var{N} are
computed in previous steps.
In our example, this command will be
@smallexample
$ @kbd{dd if=arc.tar of=xhdr bs=512 skip=56 count=7}
@end smallexample
@end enumerate
Finally, you can expand the condensed file, using the obtained header:
@smallexample
@group
$ @kbd{xsparse -v -x xhdr GNUSparseFile.6058/sparsefile}
Reading extended header file
Found variable GNU.sparse.size = 217481216
Found variable GNU.sparse.numblocks = 208
Found variable GNU.sparse.name = sparsefile
Found variable GNU.sparse.map = 0,2048,1050624,2048,@dots{}
Expanding file `GNUSparseFile.28124/sparsefile' to `sparsefile'
Done
@end group
@end smallexample
@node Compression
@section Using Less Space through Compression
@@ -10432,14 +10785,14 @@ output. Default is 12.
Right margin of the text output. Used for wrapping.
@end deftypevr
@node Genfile
@appendix Genfile
@include genfile.texi
@node Tar Internals
@appendix Tar Internals
@include intern.texi
@node Genfile
@appendix Genfile
@include genfile.texi
@node Free Software Needs Free Documentation
@appendix Free Software Needs Free Documentation
@include freemanuals.texi