aboutsummaryrefslogtreecommitdiff
path: root/doc/Arcinfo
diff options
context:
space:
mode:
Diffstat (limited to 'doc/Arcinfo')
-rw-r--r--doc/Arcinfo124
1 files changed, 124 insertions, 0 deletions
diff --git a/doc/Arcinfo b/doc/Arcinfo
new file mode 100644
index 0000000..6c9d500
--- /dev/null
+++ b/doc/Arcinfo
@@ -0,0 +1,124 @@
+
+ARC-FILE.INF, created by Keith Petersen, W8SDZ, 21-Sep-86, extracted
+from UNARC.INF by Robert A. Freed.
+
+From: Robert A. Freed
+Subject: Technical Information for ARC files
+Date: June 24, 1986
+
+Note: In the following discussion, UNARC refers to my CP/M-80 program
+for extracting files from MSDOS ARCs. The definitions of the ARC file
+format are based on MSDOS ARC512.EXE.
+
+ARCHIVE FILE FORMAT
+-------------------
+
+Component files are stored sequentially within an archive. Each entry
+is preceded by a 29-byte header, which contains the directory
+information. There is no wasted space between entries. (This is in
+contrast to the centralized directory used by Novosielski libraries.
+Although random access to subfiles within an archive can be noticeably
+slower than with libraries, archives do have the advantage of not
+requiring pre-allocation of directory space.)
+
+Archive entries are normally maintained in sorted name order. The
+format of the 29-byte archive header is as follows:
+
+Byte 1: 1A Hex.
+ This marks the start of an archive header. If this byte is not found
+ when expected, UNARC will scan forward in the file (up to 64K bytes)
+ in an attempt to find it (followed by a valid compression version).
+ If a valid header is found in this manner, a warning message is
+ issued and archive file processing continues. Otherwise, the file is
+ assumed to be an invalid archive and processing is aborted. (This is
+ compatible with MS-DOS ARC version 5.12). Note that a special
+ exception is made at the beginning of an archive file, to accomodate
+ "self-unpacking" archives (see below).
+
+Byte 2: Compression version, as follows:
+
+ 0 = end of file marker (remaining bytes not present)
+ 1 = unpacked (obsolete)
+ 2 = unpacked
+ 3 = packed
+ 4 = squeezed (after packing)
+ 5 = crunched (obsolete)
+ 6 = crunched (after packing) (obsolete)
+ 7 = crunched (after packing, using faster hash algorithm) (obsolete)
+ 8 = crunched (after packing, using dynamic LZW variations)
+
+Bytes 3-15: ASCII file name, nul-terminated.
+
+(All of the following numeric values are stored low-byte first.)
+
+Bytes 16-19: Compressed file size in bytes.
+
+Bytes 20-21: File date, in 16-bit MS-DOS format:
+ Bits 15:9 = year - 1980
+ Bits 8:5 = month of year
+ Bits 4:0 = day of month
+ (All zero means no date.)
+
+Bytes 22-23: File time, in 16-bit MS-DOS format:
+ Bits 15:11 = hour (24-hour clock)
+ Bits 10:5 = minute
+ Bits 4:0 = second/2 (not displayed by UNARC)
+
+Bytes 24-25: Cyclic redundancy check (CRC) value (see below).
+
+Bytes 26-29: Original (uncompressed) file length in bytes.
+ (This field is not present for version 1 entries, byte 2 = 1.
+ I.e., in this case the header is only 25 bytes long. Because
+ version 1 files are uncompressed, the value normally found in
+ this field may be obtained from bytes 16-19.)
+
+
+SELF-UNPACKING ARCHIVES
+-----------------------
+
+A "self-unpacking" archive is one which can be renamed to a .COM file
+and executed as a program. An example of such a file is the MS-DOS
+program ARC512.COM, which is a standard archive file preceded by a
+three-byte jump instruction. The first entry in this file is a simple
+"bootstrap" program in uncompressed form, which loads the subfile
+ARC.EXE (also uncompressed) into memory and passes control to it. In
+anticipation of a similar scheme for future distribution of UNARC, the
+program permits up to three bytes to precede the first header in an
+archive file (with no error message).
+
+
+CRC COMPUTATION
+---------------
+
+Archive files use a 16-bit cyclic redundancy check (CRC) for error
+control. The particular CRC polynomial used is x^16 + x^15 + x^2 + 1,
+which is commonly known as "CRC-16" and is used in many data
+transmission protocols (e.g. DEC DDCMP and IBM BSC), as well as by
+most floppy disk controllers. Note that this differs from the CCITT
+polynomial (x^16 + x^12 + x^5 + 1), which is used by the XMODEM-CRC
+protocol and the public domain CHEK program (although these do not
+adhere strictly to the CCITT standard). The MS-DOS ARC program does
+perform a mathematically sound and accurate CRC calculation. (We
+mention this because it contrasts with some unfortunately popular
+public domain programs we have witnessed, which from time immemorial
+have based their calculation on an obscure magazine article which
+contained a typographical error!)
+
+Additional note (while we are on the subject of CRC's): The validity
+of using a 16-bit CRC for checking an entire file is somewhat
+questionable. Many people quote the statistics related to these
+functions (e.g. "all two-bit errors, all single burst errors of 16 or
+fewer bits, 99.997% of all single 17-bit burst errors, etc."), without
+realizing that these claims are valid only if the total number of bits
+checked is less than 32767 (which is why they are used in small-packet
+data transmission protocols). I.e., for file sizes in excess of about
+4K bytes, a 16-bit CRC is not really as good as what is often claimed.
+This is not to say that it is bad, but there are more reliable methods
+available (e.g. the 32-bit AUTODIN-II polynomial). (End of lecture!)
+
+ Bob Freed
+ 62 Miller Road
+ Newton Centre, MA 02159
+ Telephone (617) 332-3533
+
+