1 files changed, 124 insertions, 0 deletions
diff --git a/doc/Arcinfo b/doc/Arcinfo
new file mode 100644
index 0000000..6c9d500
--- /dev/null
+++ b/doc/Arcinfo
@@ -0,0 +1,124 @@
+
+ARC-FILE.INF, created by Keith Petersen, W8SDZ, 21-Sep-86, extracted
+from UNARC.INF by Robert A. Freed.
+
+From:     Robert A. Freed
+Subject:  Technical Information for ARC files
+Date:     June 24, 1986
+
+Note: In the following discussion, UNARC refers to my CP/M-80 program
+for extracting files from MSDOS ARCs.  The definitions of the ARC file
+format are based on MSDOS ARC512.EXE.
+
+ARCHIVE FILE FORMAT
+-------------------
+
+Component files are stored sequentially within an archive.  Each entry
+is preceded by a 29-byte header, which contains the directory
+information.  There is no wasted space between entries.  (This is in
+contrast to the centralized directory used by Novosielski libraries.
+Although random access to subfiles within an archive can be noticeably
+slower than with libraries, archives do have the advantage of not
+requiring pre-allocation of directory space.)
+
+Archive entries are normally maintained in sorted name order.  The
+format of the 29-byte archive header is as follows:
+
+Byte 1:  1A Hex.
+         This marks the start of an archive header.  If this byte is not found 
+         when expected, UNARC will scan forward in the file (up to 64K bytes) 
+         in an attempt to find it (followed by a valid compression version).  
+         If a valid header is found in this manner, a warning message is 
+         issued and archive file processing continues.  Otherwise, the file is 
+         assumed to be an invalid archive and processing is aborted.  (This is 
+         compatible with MS-DOS ARC version 5.12).  Note that a special 
+         exception is made at the beginning of an archive file, to accomodate 
+         "self-unpacking" archives (see below).
+
+Byte 2:  Compression version, as follows:
+
+         0 = end of file marker (remaining bytes not present)
+         1 = unpacked (obsolete)
+         2 = unpacked
+         3 = packed
+         4 = squeezed (after packing)
+         5 = crunched (obsolete)
+         6 = crunched (after packing) (obsolete)
+         7 = crunched (after packing, using faster hash algorithm) (obsolete)
+         8 = crunched (after packing, using dynamic LZW variations)
+
+Bytes 3-15:  ASCII file name, nul-terminated.
+
+(All of the following numeric values are stored low-byte first.)
+
+Bytes 16-19:  Compressed file size in bytes.
+
+Bytes 20-21:  File date, in 16-bit MS-DOS format:
+              Bits 15:9 = year - 1980
+              Bits  8:5 = month of year
+              Bits  4:0 = day of month
+              (All zero means no date.)
+
+Bytes 22-23:  File time, in 16-bit MS-DOS format:
+              Bits 15:11 = hour (24-hour clock)
+              Bits 10:5  = minute
+              Bits  4:0  = second/2 (not displayed by UNARC)
+
+Bytes 24-25:  Cyclic redundancy check (CRC) value (see below).
+
+Bytes 26-29:  Original (uncompressed) file length in bytes.
+              (This field is not present for version 1 entries, byte 2 = 1.  
+              I.e., in this case the header is only 25 bytes long.  Because 
+              version 1 files are uncompressed, the value normally found in 
+              this field may be obtained from bytes 16-19.)
+
+
+SELF-UNPACKING ARCHIVES
+-----------------------
+
+A "self-unpacking" archive is one which can be renamed to a .COM file
+and executed as a program.  An example of such a file is the MS-DOS
+program ARC512.COM, which is a standard archive file preceded by a
+three-byte jump instruction.  The first entry in this file is a simple
+"bootstrap" program in uncompressed form, which loads the subfile
+ARC.EXE (also uncompressed) into memory and passes control to it.  In
+anticipation of a similar scheme for future distribution of UNARC, the
+program permits up to three bytes to precede the first header in an
+archive file (with no error message).
+
+
+CRC COMPUTATION
+---------------
+
+Archive files use a 16-bit cyclic redundancy check (CRC) for error
+control.  The particular CRC polynomial used is x^16 + x^15 + x^2 + 1,
+which is commonly known as "CRC-16" and is used in many data
+transmission protocols (e.g. DEC DDCMP and IBM BSC), as well as by
+most floppy disk controllers.  Note that this differs from the CCITT
+polynomial (x^16 + x^12 + x^5 + 1), which is used by the XMODEM-CRC
+protocol and the public domain CHEK program (although these do not
+adhere strictly to the CCITT standard).  The MS-DOS ARC program does
+perform a mathematically sound and accurate CRC calculation.  (We
+mention this because it contrasts with some unfortunately popular
+public domain programs we have witnessed, which from time immemorial
+have based their calculation on an obscure magazine article which
+contained a typographical error!)
+
+Additional note (while we are on the subject of CRC's): The validity
+of using a 16-bit CRC for checking an entire file is somewhat
+questionable.  Many people quote the statistics related to these
+functions (e.g. "all two-bit errors, all single burst errors of 16 or
+fewer bits, 99.997% of all single 17-bit burst errors, etc."), without
+realizing that these claims are valid only if the total number of bits
+checked is less than 32767 (which is why they are used in small-packet
+data transmission protocols).  I.e., for file sizes in excess of about
+4K bytes, a 16-bit CRC is not really as good as what is often claimed.
+This is not to say that it is bad, but there are more reliable methods
+available (e.g. the 32-bit AUTODIN-II polynomial).  (End of lecture!)
+
+                           Bob Freed
+                           62 Miller Road
+                           Newton Centre, MA  02159
+                           Telephone (617) 332-3533
+
+