diff options
| author | B. Watson <urchlay@slackware.uk> | 2025-11-13 05:39:38 -0500 |
|---|---|---|
| committer | B. Watson <urchlay@slackware.uk> | 2025-11-13 05:39:38 -0500 |
| commit | e2da2bffe58a76c091d3496bd3ca2d2f18ea2eb6 (patch) | |
| tree | 5195b221457842d781fadcb94331c93058046744 /doc/Arcinfo | |
| download | unalf-e2da2bffe58a76c091d3496bd3ca2d2f18ea2eb6.tar.gz | |
initial commit
Diffstat (limited to 'doc/Arcinfo')
| -rw-r--r-- | doc/Arcinfo | 124 |
1 files changed, 124 insertions, 0 deletions
diff --git a/doc/Arcinfo b/doc/Arcinfo new file mode 100644 index 0000000..6c9d500 --- /dev/null +++ b/doc/Arcinfo @@ -0,0 +1,124 @@ + +ARC-FILE.INF, created by Keith Petersen, W8SDZ, 21-Sep-86, extracted +from UNARC.INF by Robert A. Freed. + +From: Robert A. Freed +Subject: Technical Information for ARC files +Date: June 24, 1986 + +Note: In the following discussion, UNARC refers to my CP/M-80 program +for extracting files from MSDOS ARCs. The definitions of the ARC file +format are based on MSDOS ARC512.EXE. + +ARCHIVE FILE FORMAT +------------------- + +Component files are stored sequentially within an archive. Each entry +is preceded by a 29-byte header, which contains the directory +information. There is no wasted space between entries. (This is in +contrast to the centralized directory used by Novosielski libraries. +Although random access to subfiles within an archive can be noticeably +slower than with libraries, archives do have the advantage of not +requiring pre-allocation of directory space.) + +Archive entries are normally maintained in sorted name order. The +format of the 29-byte archive header is as follows: + +Byte 1: 1A Hex. + This marks the start of an archive header. If this byte is not found + when expected, UNARC will scan forward in the file (up to 64K bytes) + in an attempt to find it (followed by a valid compression version). + If a valid header is found in this manner, a warning message is + issued and archive file processing continues. Otherwise, the file is + assumed to be an invalid archive and processing is aborted. (This is + compatible with MS-DOS ARC version 5.12). Note that a special + exception is made at the beginning of an archive file, to accomodate + "self-unpacking" archives (see below). + +Byte 2: Compression version, as follows: + + 0 = end of file marker (remaining bytes not present) + 1 = unpacked (obsolete) + 2 = unpacked + 3 = packed + 4 = squeezed (after packing) + 5 = crunched (obsolete) + 6 = crunched (after packing) (obsolete) + 7 = crunched (after packing, using faster hash algorithm) (obsolete) + 8 = crunched (after packing, using dynamic LZW variations) + +Bytes 3-15: ASCII file name, nul-terminated. + +(All of the following numeric values are stored low-byte first.) + +Bytes 16-19: Compressed file size in bytes. + +Bytes 20-21: File date, in 16-bit MS-DOS format: + Bits 15:9 = year - 1980 + Bits 8:5 = month of year + Bits 4:0 = day of month + (All zero means no date.) + +Bytes 22-23: File time, in 16-bit MS-DOS format: + Bits 15:11 = hour (24-hour clock) + Bits 10:5 = minute + Bits 4:0 = second/2 (not displayed by UNARC) + +Bytes 24-25: Cyclic redundancy check (CRC) value (see below). + +Bytes 26-29: Original (uncompressed) file length in bytes. + (This field is not present for version 1 entries, byte 2 = 1. + I.e., in this case the header is only 25 bytes long. Because + version 1 files are uncompressed, the value normally found in + this field may be obtained from bytes 16-19.) + + +SELF-UNPACKING ARCHIVES +----------------------- + +A "self-unpacking" archive is one which can be renamed to a .COM file +and executed as a program. An example of such a file is the MS-DOS +program ARC512.COM, which is a standard archive file preceded by a +three-byte jump instruction. The first entry in this file is a simple +"bootstrap" program in uncompressed form, which loads the subfile +ARC.EXE (also uncompressed) into memory and passes control to it. In +anticipation of a similar scheme for future distribution of UNARC, the +program permits up to three bytes to precede the first header in an +archive file (with no error message). + + +CRC COMPUTATION +--------------- + +Archive files use a 16-bit cyclic redundancy check (CRC) for error +control. The particular CRC polynomial used is x^16 + x^15 + x^2 + 1, +which is commonly known as "CRC-16" and is used in many data +transmission protocols (e.g. DEC DDCMP and IBM BSC), as well as by +most floppy disk controllers. Note that this differs from the CCITT +polynomial (x^16 + x^12 + x^5 + 1), which is used by the XMODEM-CRC +protocol and the public domain CHEK program (although these do not +adhere strictly to the CCITT standard). The MS-DOS ARC program does +perform a mathematically sound and accurate CRC calculation. (We +mention this because it contrasts with some unfortunately popular +public domain programs we have witnessed, which from time immemorial +have based their calculation on an obscure magazine article which +contained a typographical error!) + +Additional note (while we are on the subject of CRC's): The validity +of using a 16-bit CRC for checking an entire file is somewhat +questionable. Many people quote the statistics related to these +functions (e.g. "all two-bit errors, all single burst errors of 16 or +fewer bits, 99.997% of all single 17-bit burst errors, etc."), without +realizing that these claims are valid only if the total number of bits +checked is less than 32767 (which is why they are used in small-packet +data transmission protocols). I.e., for file sizes in excess of about +4K bytes, a 16-bit CRC is not really as good as what is often claimed. +This is not to say that it is bad, but there are more reliable methods +available (e.g. the 32-bit AUTODIN-II polynomial). (End of lecture!) + + Bob Freed + 62 Miller Road + Newton Centre, MA 02159 + Telephone (617) 332-3533 + + |
