精华公布栏

发信人: micheal (平凡的世界), 信区: ECE
标  题: A Painless Guide to CRC [1]
发信站: 大红花的国度 (Fri Jun  2 09:52:27 2000), 转信

Abstract
--------
This document explains CRCs (Cyclic Redundancy Codes) and their
table-driven implementations in full, precise detail. Much of the
literature on CRCs, and in particular on their table-driven
implementations, is a little obscure (or at least seems so to me).
This document is an attempt to provide a clear and simple no-nonsense
explanation of CRCs and to absolutely nail down every detail of the
operation of their high-speed implementations. In addition to this,
this document presents a parameterized model CRC algorithm called the
"Rocksoft^tm Model CRC Algorithm". The model algorithm can be
parameterized to behave like most of the CRC implementations around,
and so acts as a good reference for describing particular algorithms.
A low-speed implementation of the model CRC algorithm is provided in
the C programming language. Lastly there is a section giving two forms
of high-speed table driven implementations, and providing a program
that generates CRC lookup tables.
1. Introduction: Error Detection
--------------------------------
The aim of an error detection technique is to enable the receiver of a
message transmitted through a noisy (error-introducing) channel to
determine whether the message has been corrupted. To do this, the
transmitter constructs a value (called a checksum) that is a function
of the message, and appends it to the message. The receiver can then
use the same function to calculate the checksum of the received
message and compare it with the appended checksum to see if the
message was correctly received. For example, if we chose a checksum
function which was simply the sum of the bytes in the message mod 256
(i.e. modulo 256), then it might go something as follows. All numbers
are in decimal.
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  6 27  4 33
In the above, the second byte of the message was corrupted from 23 to
27 by the communications channel. However, the receiver can detect
this by comparing the transmitted checksum (33) with the computer
checksum of 37 (6 + 27 + 4). If the checksum itself is corrupted, a
correctly transmitted message might be incorrectly identified as a
corrupted one. However, this is a safe-side failure. A dangerous-side
failure occurs where the message and/or checksum is corrupted in a
manner that results in a transmission that is internally consistent.
Unfortunately, this possibility is completely unavoidable and the best
that can be done is to minimize its probability by increasing the
amount of information in the checksum (e.g. widening the checksum from
one byte to two bytes).
Other error detection techniques exist that involve performing complex
transformations on the message to inject it with redundant
information. However, this document addresses only CRC algorithms,
which fall into the class of error detection algorithms that leave the
data intact and append a checksum on the end. i.e.:
      <original intact message> <checksum>
2. The Need For Complexity
--------------------------
In the checksum example in the previous section, we saw how a
corrupted message was detected using a checksum algorithm that simply
sums the bytes in the message mod 256:
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  6 27  4 33
A problem with this algorithm is that it is too simple. If a number of
random corruptions occur, there is a 1 in 256 chance that they will
not be detected. For example:
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  8 20  5 33
To strengthen the checksum, we could change from an 8-bit register to
a 16-bit register (i.e. sum the bytes mod 65536 instead of mod 256) so
as to apparently reduce the probability of failure from 1/256 to
1/65536. While basically a good idea, it fails in this case because
the formula used is not sufficiently "random"; with a simple summing
formula, each incoming byte affects roughly only one byte of the
summing register no matter how wide it is. For example, in the second
example above, the summing register could be a Megabyte wide, and the
error would still go undetected. This problem can only be solved by
replacing the simple summing formula with a more sophisticated formula
that causes each incoming byte to have an effect on the entire
checksum register.
Thus, we see that at least two aspects are required to form a strong
checksum function:
   WIDTH: A register width wide enough to provide a low a-priori
          probability of failure (e.g. 32-bits gives a 1/2^32 chance
          of failure).
   CHAOS: A formula that gives each input byte the potential to change
          any number of bits in the register.
Note: The term "checksum" was presumably used to describe early
summing formulas, but has now taken on a more general meaning
encompassing more sophisticated algorithms such as the CRC ones. The
CRC algorithms to be described satisfy the second condition very well,
and can be configured to operate with a variety of checksum widths.
3. The Basic Idea Behind CRC Algorithms
---------------------------------------
Where might we go in our search for a more complex function than
summing? All sorts of schemes spring to mind. We could construct
tables using the digits of pi, or hash each incoming byte with all the
bytes in the register. We could even keep a large telephone book
on-line, and use each incoming byte combined with the register bytes
to index a new phone number which would be the next register value.
The possibilities are limitless.
However, we do not need to go so far; the next arithmetic step
suffices. While addition is clearly not strong enough to form an
effective checksum, it turns out that division is, so long as the
divisor is about as wide as the checksum register.
The basic idea of CRC algorithms is simply to treat the message as an
enormous binary number, to divide it by another fixed binary number,
and to make the remainder from this division the checksum. Upon
receipt of the message, the receiver can perform the same division and
compare the remainder with the "checksum" (transmitted remainder).
Example: Suppose the the message consisted of the two bytes (6,23) as
in the previous example. These can be considered to be the hexadecimal
number 0617 which can be considered to be the binary number
0000-0110-0001-0111. Suppose that we use a checksum register one-byte
wide and use a constant divisor of 1001, then the checksum is the
remainder after 0000-0110-0001-0111 is divided by 1001. While in this
case, this calculation could obviously be performed using common
garden variety 32-bit registers, in the general case this is messy. So
instead, we'll do the division using good-'ol long division which you
learnt in school (remember?). Except this time, it's in binary:
          ...0000010101101 = 00AD =  173 = QUOTIENT
         ____-___-___-___-
9= 1001 ) 0000011000010111 = 0617 = 1559 = DIVIDEND
DIVISOR   0000.,,....,.,,,
          ----.,,....,.,,,
           0000,,....,.,,,
           0000,,....,.,,,
           ----,,....,.,,,
            0001,....,.,,,
            0000,....,.,,,
            ----,....,.,,,
             0011....,.,,,
             0000....,.,,,
             ----....,.,,,
              0110...,.,,,
              0000...,.,,,
              ----...,.,,,
               1100..,.,,,
               1001..,.,,,
               ====..,.,,,
                0110.,.,,,
                0000.,.,,,
                ----.,.,,,
                 1100,.,,,
                 1001,.,,,
                 ====,.,,,
                  0111.,,,
                  0000.,,,
                  ----.,,,
                   1110,,,
                   1001,,,
                   ====,,,
                    1011,,
                    1001,,
                    ====,,
                     0101,
                     0000,
                     ----
                      1011
                      1001
                      ====
                      0010 = 02 = 2 = REMAINDER
In decimal this is "1559 divided by 9 is 173 with a remainder of 2".
Although the effect of each bit of the input message on the quotient
is not all that significant, the 4-bit remainder gets kicked about
quite a lot during the calculation, and if more bytes were added to
the message (dividend) it's value could change radically again very
quickly. This is why division works where addition doesn't.
In case you're wondering, using this 4-bit checksum the transmitted
message would look like this (in hexadecimal): 06172 (where the 0617
is the message and the 2 is the checksum). The receiver would divide
0617 by 9 and see whether the remainder was 2.
constructs a value (called a checksum) that is a function
of the message, and appends it to the message. The receiver can then
use the same function to calculate the checksum of the received
message and compare it with the appended checksum to see if the
message was correctly received. For example, if we chose a checksum
function which was simply the sum of the bytes in the message mod 256
(i.e. modulo 256), then it might go something as follows. All numbers
are in decimal.
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  6 27  4 33
In the above, the second byte of the message was corrupted from 23 to
27 by the communications channel. However, the receiver can detect
this by comparing the transmitted checksum (33) with the computer
checksum of 37 (6 + 27 + 4). If the checksum itself is corrupted, a
correctly transmitted message might be incorrectly identified as a
corrupted one. However, this is a safe-side failure. A dangerous-side
failure occurs where the message and/or checksum is corrupted in a
manner that results in a transmission that is internally consistent.
Unfortunately, this possibility is completely unavoidable and the best
that can be done is to minimize its probability by increasing the
amount of information in the checksum (e.g. widening the checksum from
one byte to two bytes).
Other error detection techniques exist that involve performing complex
transformations on the message to inject it with redundant
information. However, this document addresses only CRC algorithms,
which fall into the class of error detection algorithms that leave the
data intact and append a checksum on the end. i.e.:
      <original intact message> <checksum>
2. The Need For Complexity
--------------------------
In the checksum example in the previous section, we saw how a
corrupted message was detected using a checksum algorithm that simply
sums the bytes in the message mod 256:
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  6 27  4 33
A problem with this algorithm is that it is too simple. If a number of
random corruptions occur, there is a 1 in 256 chance that they will
not be detected. For example:
   Message                    :  6 23  4
   Message with checksum      :  6 23  4 33
   Message after transmission :  8 20  5 33
To strengthen the checksum, we could change from an 8-bit register to
a 16-bit register (i.e. sum the bytes mod 65536 instead of mod 256) so
as to apparently reduce the probability of failure from 1/256 to
1/65536. While basically a good idea, it fails in this case because
the formula used is not sufficiently "random"; with a simple summing
formula, each incoming byte affects roughly only one byte of the
summing register no matter how wide it is. For example, in the second
example above, the summing register could be a Megabyte wide, and the
error would still go undetected. This problem can only be solved by
replacing the simple summing formula with a more sophisticated formula
that causes each incoming byte to have an effect on the entire
checksum register.
Thus, we see that at least two aspects are required to form a strong
checksum function:
   WIDTH: A register width wide enough to provide a low a-priori
          probability of failure (e.g. 32-bits gives a 1/2^32 chance
          of failure).
   CHAOS: A formula that gives each input byte the potential to change
          any number of bits in the register.
Note: The term "checksum" was presumably used to describe early
summing formulas, but has now taken on a more general meaning
encompassing more sophisticated algorithms such as the CRC ones. The
CRC algorithms to be described satisfy the second condition very well,
and can be configured to operate with a variety of checksum widths.
3. The Basic Idea Behind CRC Algorithms
---------------------------------------
Where might we go in our search for a more complex function than
summing? All sorts of schemes spring to mind. We could construct
tables using the digits of pi, or hash each incoming byte with all the
bytes in the register. We could even keep a large telephone book
on-line, and use each incoming byte combined with the register bytes
to index a new phone number which would be the next register value.
The possibilities are limitless.
However, we do not need to go so far; the next arithmetic step
suffices. While addition is clearly not strong enough to form an
effective checksum, it turns out that division is, so long as the
divisor is about as wide as the checksum register.
The basic idea of CRC algorithms is simply to treat the message as an
enormous binary number, to divide it by another fixed binary number,
and to make the remainder from this division the checksum. Upon
receipt of the message, the receiver can perform the same division and
compare the remainder with the "checksum" (transmitted remainder).
Example: Suppose the the message consisted of the two bytes (6,23) as
in the previous example. These can be considered to be the hexadecimal
number 0617 which can be considered to be the binary number
0000-0110-0001-0111. Suppose that we use a checksum register one-byte
wide and use a constant divisor of 1001, then the checksum is the
remainder after 0000-0110-0001-0111 is divided by 1001. While in this
case, this calculation could obviously be performed using common
garden variety 32-bit registers, in the general case this is messy. So
instead, we'll do the division using good-'ol long division which you
learnt in school (remember?). Except this time, it's in binary:
          ...0000010101101 = 00AD =  173 = QUOTIENT
         ____-___-___-___-
9= 1001 ) 0000011000010111 = 0617 = 1559 = DIVIDEND
DIVISOR   0000.,,....,.,,,
          ----.,,....,.,,,
           0000,,....,.,,,
           0000,,....,.,,,
           ----,,....,.,,,
            0001,....,.,,,
            0000,....,.,,,
            ----,....,.,,,
             0011....,.,,,
             0000....,.,,,
             ----....,.,,,
              0110...,.,,,
              0000...,.,,,
              ----...,.,,,
               1100..,.,,,
               1001..,.,,,
               ====..,.,,,
                0110.,.,,,
                0000.,.,,,
                ----.,.,,,
                 1100,.,,,
                 1001,.,,,
                 ====,.,,,
                  0111.,,,
                  0000.,,,
                  ----.,,,
                   1110,,,
                   1001,,,
                   ====,,,
                    1011,,
                    1001,,
                    ====,,
                     0101,
                     0000,
                     ----
                      1011
                      1001
                      ====
                      0010 = 02 = 2 = REMAINDER
In decimal this is "1559 divided by 9 is 173 with a remainder of 2".
Although the effect of each bit of the input message on the quotient
is not all that significant, the 4-bit remainder gets kicked about
quite a lot during the calculation, and if more bytes were added to
the message (dividend) it's value could change radically again very
quickly. This is why division works where addition doesn't.
In case you're wondering, using this 4-bit checksum the transmitted
message would look like this (in hexadecimal): 06172 (where the 0617
is the message and the 2 is the checksum). The receiver would divide
0617 by 9 and see whether the remainder was 2.

--
┌───────────────────────────────────┐
│　Ｃ = Ｗ·㏒﹝1﹢ＳＮＲ﹞　　　　　　▁▄▇█▇▄▁　　　　　　　　　│
│　﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋﹋      　　▅███████▅　　　　　　　　│
│　噪声不是我的错　:)　　　　　　▃███████████▃　　　　　　│
│　﹌﹌﹌﹌﹌﹌﹌﹌﹌  　　　▂▇█████████████▇▂　　　　│
│▁▁▁▁▁▁▁▂▂▂▃▃▅▇█████████████████▇▄▃▂│

※ 修改:·violence 於 May 20 00:32:16 修改本文·[FROM: 162.105.160.170]

--
☆ 来源:．哈工大紫丁香 bbs.hit.edu.cn．[FROM: sillystone.bbs@smth.]