精华公布栏

发信人: micheal (平凡的世界), 信区: ECE
标  题: A Painless Guide to CRC [5]
发信站: 大红花的国度 (Fri Jun  2 09:54:40 2000), 转信

10. A Slightly Mangled Table-Driven Implementation
--------------------------------------------------
Despite the terse beauty of the line
   r=0; while (len--) r = ((r << 8) | *p++) ^ t[(r >> 24) & 0xFF];
those optimizing hackers couldn't leave it alone. The trouble, you
see, is that this loop operates upon the AUGMENTED message and in
order to use this code, you have to append W/8 zero bytes to the end
of the message before pointing p at it. Depending on the run-time
environment, this may or may not be a problem; if the block of data
was handed to us by some other code, it could be a BIG problem. One
alternative is simply to append the following line after the above
loop, once for each zero byte:
      for (i=0; i<W/4; i++) r = (r << 8) ^ t[(r >> 24) & 0xFF];
This looks like a sane enough solution to me. However, at the further
expense of clarity (which, you must admit, is already a pretty scare
commodity in this code) we can reorganize this small loop further so
as to avoid the need to either augment the message with zero bytes, or
to explicitly process zero bytes at the end as above. To explain the
optimization, we return to the processing diagram given earlier.
                   3    2    1    0   Bytes
                +----+----+----+----+
         +-----<|    |    |    |    | <----- Augmented message
         |      +----+----+----+----+
         |                ^
         |                |
         |               XOR
         |                |
         |     0+----+----+----+----+       Algorithm
         v      +----+----+----+----+       ---------
         |      +----+----+----+----+       1. Shift the register left by
         |      +----+----+----+----+          one byte, reading in a new
         |      +----+----+----+----+          message byte.
         |      +----+----+----+----+       2. Use the top byte just rotated
         |      +----+----+----+----+          out of the register to index
         +----->+----+----+----+----+          the table of 256 32-bit values.
                +----+----+----+----+       3. XOR the table value into the
                +----+----+----+----+          register.
                +----+----+----+----+       4. Goto 1 iff more augmented
                +----+----+----+----+          message bytes.
             255+----+----+----+----+
Now, note the following facts:
TAIL: The W/4 augmented zero bytes that appear at the end of the
      message will be pushed into the register from the right as all
      the other bytes are, but their values (0) will have no effect
      whatsoever on the register because 1) XORing with zero does not
      change the target byte, and 2) the four bytes are never
      propagated out the left side of the register where their
      zeroness might have some sort of influence. Thus, the sole
      function of the W/4 augmented zero bytes is to drive the
      calculation for another W/4 byte cycles so that the end of the
      REAL data passes all the way through the register.
HEAD: If the initial value of the register is zero, the first four
      iterations of the loop will have the sole effect of shifting in
      the first four bytes of the message from the right. This is
      because the first 32 control bits are all zero and so nothing is
      XORed into the register. Even if the initial value is not zero,
      the first 4 byte iterations of the algorithm will have the sole
      effect of shifting the first 4 bytes of the message into the
      register and then XORing them with some constant value (that is
      a function of the initial value of the register).
These facts, combined with the XOR property
   (A xor B) xor C = A xor (B xor C)
mean that message bytes need not actually travel through the W/4 bytes
of the register. Instead, they can be XORed into the top byte just
before it is used to index the lookup table. This leads to the
following modified version of the algorithm.
         +-----<Message (non augmented)
         |
         v         3    2    1    0   Bytes
         |      +----+----+----+----+
        XOR----<|    |    |    |    |
         |      +----+----+----+----+
         |                ^
         |                |
         |               XOR
         |                |
         |     0+----+----+----+----+       Algorithm
         v      +----+----+----+----+       ---------
         |      +----+----+----+----+       1. Shift the register left by
         |      +----+----+----+----+          one byte, reading in a new
         |      +----+----+----+----+          message byte.
         |      +----+----+----+----+       2. XOR the top byte just rotated
         |      +----+----+----+----+          out of the register with the
         +----->+----+----+----+----+          next message byte to yield an
                +----+----+----+----+          index into the table ([0,255]).
                +----+----+----+----+       3. XOR the table value into the
                +----+----+----+----+          register.
                +----+----+----+----+       4. Goto 1 iff more augmented
             255+----+----+----+----+          message bytes.
Note: The initial register value for this algorithm must be the
initial value of the register for the previous algorithm fed through
the table four times. Note: The table is such that if the previous
algorithm used 0, the new algorithm will too.
This is an IDENTICAL algorithm and will yield IDENTICAL results. The C
code looks something like this:
   r=0; while (len--) r = (r<<8) ^ t[(r >> 24) ^ *p++];
and THIS is the code that you are likely to find inside current
table-driven CRC implementations. Some FF masks might have to be ANDed
in here and there for portability's sake, but basically, the above
loop is IT. We will call this the DIRECT TABLE ALGORITHM.
During the process of trying to understand all this stuff, I managed
to derive the SIMPLE algorithm and the table-driven version derived
from that. However, when I compared my code with the code found in
real-implementations, I was totally bamboozled as to why the bytes
were being XORed in at the wrong end of the register! It took quite a
while before I figured out that theirs and my algorithms were actually
the same. Part of why I am writing this document is that, while the
link between division and my earlier table-driven code is vaguely
apparent, any such link is fairly well erased when you start pumping
bytes in at the "wrong end" of the register. It looks all wrong!
If you've got this far, you not only understand the theory, the
practice, the optimized practice, but you also understand the real
code you are likely to run into. Could get any more complicated? Yes
it can.
11. "Reflected" Table-Driven Implementations
--------------------------------------------
Despite the fact that the above code is probably optimized about as
much as it could be, this did not stop some enterprising individuals
from making things even more complicated. To understand how this
happened, we have to enter the world of hardware.
DEFINITION: A value/register is reflected if it's bits are swapped
around its centre. For example: 0101 is the 4-bit reflection of 1010.
0011 is the reflection of 1100.
0111-0101-1010-1111-0010-0101-1011-1100 is the reflection of
0011-1101-1010-0100-1111-0101-1010-1110.
Turns out that UARTs (those handy little chips that perform serial IO)
are in the habit of transmitting each byte with the least significant
bit (bit 0) first and the most significant bit (bit 7) last (i.e.
reflected). An effect of this convention is that hardware engineers
constructing hardware CRC calculators that operate at the bit level
took to calculating CRCs of bytes streams with each of the bytes
reflected within itself. The bytes are processed in the same order,
but the bits in each byte are swapped; bit 0 is now bit 7, bit 1 is
now bit 6, and so on. Now this wouldn't matter much if this convention
was restricted to hardware land. However it seems that at some stage
some of these CRC values were presented at the software level and
someone had to write some code that would interoperate with the
hardware CRC calculation.
In this situation, a normal sane software engineer would simply
reflect each byte before processing it. However, it would seem that
normal sane software engineers were thin on the ground when this early
ground was being broken, because instead of reflecting the bytes,
whoever was responsible held down the byte and reflected the world,
leading to the following "reflected" algorithm which is identical to
the previous one except that everything is reflected except the input
bytes.
             Message (non augmented) >-----+
                                           |
           Bytes   0    1    2    3        v
                +----+----+----+----+      |
                |    |    |    |    |>----XOR
                +----+----+----+----+      |
                          ^                |
                          |                |
                         XOR               |
                          |                |
                +----+----+----+----+0     |
                +----+----+----+----+      v
                +----+----+----+----+      |
                +----+----+----+----+      |
                +----+----+----+----+      |
                +----+----+----+----+      |
                +----+----+----+----+      |
                +----+----+----+----+<-----+
                +----+----+----+----+
                +----+----+----+----+
                +----+----+----+----+
                +----+----+----+----+
                +----+----+----+----+255
Notes:
   * The table is identical to the one in the previous algorithm
   except that each entry has been reflected.
   * The initial value of the register is the same as in the previous
   algorithm except that it has been reflected.
   * The bytes of the message are processed in the same order as
   before (i.e. the message itself is not reflected).
   * The message bytes themselves don't need to be explicitly
   reflected, because everything else has been!
At the end of execution, the register contains the reflection of the
final CRC value (remainder). Actually, I'm being rather hard on
whoever cooked this up because it seems that hardware implementations
of the CRC algorithm used the reflected checksum value and so
producing a reflected CRC was just right. In fact reflecting the world
was probably a good engineering solution - if a confusing one.
We will call this the REFLECTED algorithm.
Whether or not it made sense at the time, the effect of having
reflected algorithms kicking around the world's FTP sites is that
about half the CRC implementations one runs into are reflected and the
other half not. It's really terribly confusing. In particular, it
would seem to me that the casual reader who runs into a reflected,
table-driven implementation with the bytes "fed in the wrong end"
would have Buckley's chance of ever connecting the code to the concept
of binary mod 2 division.
It couldn't get any more confusing could it? Yes it could.

--
┌───────────────────────────────────┐
│　Ｃ = Ｗ·㏒﹝1﹢ＳＮＲ﹞　　　　　　▁▄▇█▇▄▁　　　　　　　　　│
│　﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋      　　▅███████▅　　　　　　　　│
│　噪声不是我的错　:)　　　　　　▃███████████▃　　　　　　│
│　﹌﹌﹌﹌﹌﹌﹌﹌﹌  　　　▂▇█████████████▇▂　　　　│
│▁▁▁▁▁▁▁▂▂▂▃▃▅▇█████████████████▇▄▃▂│

--
☆ 来源:．哈工大紫丁香 bbs.hit.edu.cn．[FROM: sillystone.bbs@smth.]