精华公布栏

发信人: Ambi_A@bbs.ustc.edu.cn (阿翔), 信区: cnlinux
标  题: Welcome to linux-kernel (fwd, 3)
发信站: 中国科大BBS站 (Fri Apr 24 19:04:02 1998)
转信站: Lilac!ustcnews!ustcbbs

= 3. = The kernel

= 3.1. = Common problems

= 3.1.1. = Before you dive into it

    Read up on all strategies for error recovery.  Your file system
    corruption might be caused by the same problem that causes another
    user's Signal 11 trouble; a kernel is a complex piece of software,
    and errors happening in the kernel or in hardware below might
    cause every thinkable and unthinkable kind of problem.  Errors
    propagate in unforseeable ways. There are hardware problems which
    show up only on Linux while Win95, NT, OS/2, Doom and Quake run
    fine on the same computer.  You are dealing with software handling
    hardware, this is extremely complex and right next to black magic.
    Flipping one bit of memory in the kernel creates strange results,
    just like adding a drop of tabasco sauce to a witch's cauldron
    might make her conjure up a horde of croaking frogs instead of the
    (by her) desired white prince.

= 3.1.2. = File system corruption

    "On a block device (block size 1024 bytes) with ext2fs, I don't
    reliably get back from a file what I first wrote in."

    Normally any kind of file system corruption is a sign of hardware
    problems or problems with a low level I/O driver.  Ext2fs is a
    quite tested and stable file system, but due to its high
    performance it is likely to dig out problems in the lower levels
    of the system.

    When I experienced ext2fs file system corruption myself, a query
    on the linux newsgroups showed others had experienced similar
    problems.  It turned out to be a firmware problem in the Conner
    CFP1060S hard drive, when data was read _really fast_ the buffer
    cache algorithm in the drive firmware failed.

    There are a lot of things to try when you get ext2fs corruption:

    - Use tune2fs to set your file system to drop into read only mode
      when an error occurs.  This will prevent small errors causing
      catastrophes. The command is:

      tune2fs -e remount-ro /dev/???

      You should also use that command to set a time interval between
      file systems checks, because (especially on long running
      servers) it can take eons to reach the maximum mount count.

    - Check the partition tables

      Especially on the Intel x86 platform partition tables can easily
      be broken.  Such a problem can occur when the drives were
      partitioned using another disk controller than they are used
      with.  Use fdisk dump the tables and ensure partitions are set
      up correctly.

    - Tune Linux and your BIOS to slow and safe parameters.  Turn off
      bus (PCI) optimizations.

    - Use an empty partition to check if the problem lurks in the
      kernel levels below the file system.  Probably the most simple
      test is to copy /dev/zero to the partition (using dd) and
      comparing the partition and /dev/zero afterwards (using cmp) if
      there are any differences.

      A very thorough test has been suggested on the mailing list by
      Doug Ledford <URL:mailto:dledfor@dialnet.net>:

        "I'll go one step further with this.  I would recommend that
        the people having problems with ext2fs corruption run the
        following test (if possible):

        Let's say you have a hard drive partition of decent size that
        you don't mind losing the data on (or even if you do mind,
        this test can turn up a lot of errors so if you have an
        inconvenient way of getting back, then you should probably do
        this anyway)

        First, get the exact size of the partition (or the whole drive
        as the case may be in some circumstances) in 1K blocks.

        Divide this total number of blocks into 4 equal chunks (most
        drives do this easily, some may have a few odd sized chunks).

        Write a script like this:

        badblocks -w -s -b 1024 -o /tmp/list.1 /dev/??? (blocks * .25) 0 &
        badblocks -w -s -b 1024 -o /tmp/list.2 /dev/??? (blocks * .5) (blocks * .25) &
        badblocks -w -s -b 1024 -o /tmp/list.3 /dev/??? (blocks * .75) (blocks * .5) &
        badblocks -w -s -b 1024 -o /tmp/list.4 /dev/??? (blocks) (blocks * .75) &

        A simple shell script like this will run four simultaneous
        badblocks programs on the drive.  A person can then check the
        files in the /tmp directory to see if any were returned as
        bad.  With modern IDE or SCSI drives, all of these files
        should have a zero length unless one of two things is true.
        One, you have a drive developing too many bad sectors to be
        mapped out (which is cause for alarm in itself) or two, you
        have corruption in your low level driver (or other low level
        hardware such as memory or cache or bus transfer problems).
        If these test return all 0 length files, then we should start
        looking elsewhere for the problem.  Run the test several
        times, as a single pass may not show the problem.  If you are
        really courageous, you can try doubling the tests by splitting
        the drive into 8 equal chunks (or if you have two drives you
        can do both drives at four chunks each at the same time).
        This is a standard test I use with the aic7xxx driver to find
        problems with tagged queueing and high commands per lun
        values.  It seems to show problems much quicker than any
        file system activity would (in my case, I had as many as 24 of
        these running simultaneously on 6 drives in order to test this
        out, talk about a dog slow machine, it took about 5 minutes
        just to start X windows under this load).

        In any case, running tests like these to rule out hardware
        corruption would help greatly in increasing the level of
        confidence that somehow the ext2fs layer is at fault (which I
        personally don't think it is except under very rare occasions
        since I have a hard hit news server running that file system
        without problems, but I've taken the care and gone to the
        lengths to run these tests on the particular hardware in that
        machine and identified bad combinations that can cause
        problems and worked around them at the driver level)."

      Later on Doug followed up to another article on the ext2fs
      corruption thread:

        "Correct.  And it's very useful information to have at that.
        If you can produce corruption problems without going through
        the ext2fs code, then you have hardware corruption of some
        sort.  An example of some of the things in the past that I
        have personally seen cause hardware corruption which made one
        *THINK* that something was wrong with the ext2fs code when
        there wasn't:

        1. Bad CPU fans on pentium and high speed 486 machines
        2. Bad SCSI cables
        3. Memory timing settings in BIOS being just a tad too
           aggressive
        4. Bad memory
        5. Bad Pipeline Burst (or other) cache
        6. Too long of a SCSI or IDE cable
        7. Interference between SCSI and IDE cables running in
           close proximity to each other
        8. Flaky CPU (had been overclocked and partially burnt out)
        9. Esoteric BIOS options being enabled when they shouldn't be
           (this takes some experimentation to find and fix, a change
           BIOS settings, test to see if problem is gone, if not, reboot
           and change settings again type thing)

        These are a few examples.  A second thing to keep in mind is
        that the ext2fs is a rather fast filesystem by unix standards
        (it beats the hell out of the EAFS HTFS DTFS etc filesystems
        from SCO, but who's comparing SCO to linux anyway :) so if you
        have hardware corruption problems that don't show up except
        under heavy load, ext2fs is a good filesystem to bring those
        out :)

        And of course, the very reason I posted my original email as
        part of this thread.  A person needs to always keep in mind
        that if they are getting ext2fs errors about corruption, this
        does *NOT* always mean the ext2fs is at fault.  It means that
        somewhere along the way, either due to code in the ext2fs, or
        code in the block driver you are using, or code in the low
        level driver you are using, or somewhere between the CPU, RAM,
        cache, bus, controller, drive bus, drive, and magnetic media,
        something is getting corrupted.  It is important in these
        cases to try and isolate software faults from hardware faults.
        The purpose of the "script" I posted was to give a convenient
        way of trying to narrow down the line between hardware and
        software.  There is still software involved with that script,
        but not as much.  You are down to just the badblocks program,
        the various buffer mechanisms, and the block driver itself
        (with its underlying low level driver).  Generally speaking,
        the buffer cache is considered to be safe code, so you can
        rule that out.  Most of the block drivers are considered to be
        the same, so they can be ruled out.  This leaves the
        underlying low level driver and the badblocks program as
        suspect.  The badblocks program is rather simple in design,
        and an inspection of the source will result in the conclusion
        that it too can be ruled out (not to mention how many times
        it's been used to find these problems, yet I've never once
        heard of it causing sectors that are fine to be mapped as bad
        unless the underlying driver had problems).  That means that
        the script I posted is really stressing hardware and your
        underlying low level driver.  All in all, that greatly reduces
        the number of variables to look at.  So, a failure during the
        testing by the badblocks program gives a person somewhere to
        look.  They can either fiddle with compile options for their
        low level driver, or they can start the process of trying to
        enable/disable things in the computer's BIOS to try and find a
        culprit (disable cache this run, delay memory timings that
        run, etc) which then allows a person to try and pinpoint the
        exact problem, get it fixed, and be on their way :) Further,
        as long as you fail this test, there is no sense at all in
        even looking at the ext2fs code since you won't know if you've
        fixed anything by changing it unless something you did just
        happened to slow things down enough to keep the problem from
        showing up.  In this case, instead of slowing the machine down
        to be reliable and leaving fast code in place, you've slowed
        the code down so it doesn't break your faulty hardware."

    - Use debugging tools to check your system.

      Memtest-86, a thorough, stand alone memory tet for 386, 486 and
      586 systems:

      <URL:ftp://sunsite.unc.edu/pub/Linux/system/misc/memtest86-1.2.tar.gz>

      If you think a particular tool shall be listed please mail to
      <URL:mailto:kernelfaq@iconsult.com>

= 3.1.3. = Signal 11

    If your processes frequently die because of a signal 11, there
    might be a problem with your hard- or software.  There's a FAQ
    regarding signal 11 at <URL:http://www.bitwizard.nl/sig11>.

    You should read the Signal 11 FAQ even if you have a different
    problem; the procedures mentioned in the FAQ will probably help
    finding that one, too.

= 3.1.4. = Seasonal Problems

= 3.1.4.1. = Warning: possible SYN flooding. Sending cookies.

    > I got 44 of these 2 days ago, then another 35 more.
    >
    >    " Warning: possible SYN flooding. Sending cookies."

    This need not be an attack.  It _does_ mean that your backlog has
    become full.  This can be a consequence of crummy network
    connections between you and legitimate remote sites.  If you
    normally don't see 67 connection attempts per second then it's
    probably an attack.

    > My interpretation is that somebody has flooded the irc port to
    > kill the server, am I right?  What are the chances that this is
    > not an attack, but just "one of those things?"

    It very much depends on how busy your irc port is and how bad
    network conditions are between you and the users of your irc.  To
    really find out if you are being attacked you would need to start
    taking TCP dumps and look for streams of SYN packets with
    addresses that are unreachable.  Large numbers of packets from the
    same unreachable address would be a give away.

    Answered by Eric Schenk <url:mailto:Eric.Schenk@dna.lth.sh>

= 3.1.4.2. = Kernel hangs / no output after "Now booting the kernel ..."

    > The kernel is loaded, uncompressed and it hangs after the
    message "Now booting the kernel...".

    Are you sure you have VTs enabled?  They became optional in
    2.1.31, and default to being disabled.

= 3.1.4.3. = Ignoring P6 Local APIC Spurious Interrupt Bug

    > Is there a problem with the P6? Or with the board?  If it's a
    > problem with the P6, do all P6's have this problem?  Does this
    > bug affect the system in any way?

    It's a problem with the Local APIC on most steppings of the
    Pentium Pro CPU.  Specifically, a spurious interrupt is delivered
    as an exception 15 (a reserved code) rather than as interrupt 15.
    The bug is benign, so long as the kernel ignores the exception 15.

    > Is it a problem to comment out the line in the kernel that shows
    > this message? It's messing up the display.

    You can safely comment it out.

    Answered by Leonard N. Zubkoff <url:mailto:lnz@dandelion.com>

= 3.2. = How to get started on kernel development

    Cameron MacKinnon <URL:mailto:mackin@interlog.com> wrote a wonderful
    article on that topic:

      "... I'm not a pro, but I generally know what's going on for
      least part of the time. Here's what I did:

      I bought books. Here's reviews: LINUX Kernel Internals, Beck et
      al, Addison Wesley, 0-201-87741-4. I read about a third of
      it. It's dated (1.2 kernels) and doesn't have anything about
      SCSI in it, but it's the only Linux kernel book out
      there. There's a new version out for 2.0 kernels, but only in
      the original German. 'The Design and Implementation of the 4.4
      BSD Operating System', McKusick et al, Addison Wesley,
      0-201-54979-4. A much more readable book, IMHO. It talks about
      the BSD design in general, why things changed over time, why and
      how specific performance tradeoffs were made, etcetera. Also,
      'The Magic Garden Explained' or something like that, borrowed,
      pub. and ISBN unknown. This book is a very thorough coverage of
      the design of System 5 Release 4 (SVR4), but not as easy to read
      as the BSD book. Bottom line: Beg, borrow, check out or steal
      one book, any book, on the design of the UNIX operating
      system. Sit in a library or a bookstore reading it, if you
      haven't got the money. You need to understand how schedulers,
      pagers, swappers, top and bottom halves, wait queues, inodes,
      ttys, the boot process, init and some other stuff work. Most of
      this stuff will be applicable to Linux at the concept level,
      regardless of the book (ignore anything on SysV STREAMS). Unless
      you're extremely gifted, the concepts won't reveal themselves to
      you from kernel source code. LEARN THE CONCEPTS. The Linux
      community is not a good place to do this - this list assumes
      that if you're here, you already know them. If you're one of
      those truly unlucky people with no access to such a book, try to
      find this info on the net. I've never really looked. If all else
      fails, proceed to step two:

      I read Michael Johnson's Kernel Hackers' Guide. It wasn't
      perfect when I read it, but that was a while ago. 1) It's
      probably perfect by now. 2) It's free. You can get it anywhere,
      including here: <URL:http://www.redhat.com:8080/HyperNews/get/khg.html>
      It does a good job of mapping the concepts you just learned to
      actual kernel function calls and processes in Linux. Also, many
      kernel functions have man pages, though they're horribly out of
      date.

      I subscribed to mailing lists. Initially I was all over: gcc,
      kernel, a few scsi lists, security... Now I've got it down to a
      core of kernel, two SCSI driver lists, DIALD, security and
      SMP. Don't be afraid to subscribe to a lot of lists (read-only!)
      for a few weeks to see what interests you. You can always
      unsubscribe later. Some people prefer reading the lists via
      news, but I'd recommend mail: You SAVE the mail on your hard
      disk. It becomes your personal reference library (N.B. UNIX has
      some really great text search and processing tools). You read
      all the mail. This gives you a feel for what's being worked on
      and what's not, who knows what they're talking about and who
      doesn't, and what snags are troubling other users. This is
      important so you can ask senior developers PRIVATELY when you
      have questions relating to The Code - unless you genuinely
      believe that a lot of list subscribers also want the
      answer. Also, some of the news gateways appear to be brutally
      broken, randomly mixing messages from different linux lists like
      a cypherpunk remailer gone mad. I recommend going straight to
      the source: send 'help' to mailto:majordomo@vger.rutgers.edu

      I quickly got over the idea that I could learn everything about
      the kernel. Last time I looked, it was over 600,000 lines of
      source. I can muck around with SCSI and network device drivers,
      I understand the mid level SCSI code, and I've got a reasonably
      good handle on the scheduler.  That leaves high level
      networking, filesystems, the buffer cache and memory management,
      to name a few, ABOUT WHICH I HAVEN'T A CLUE. Pick an area you
      want to diddle with, and concentrate on that. If you don't
      believe me, grab a dictionary and look up 'hubris'.

      I read most (some?) of the important stuff in Documentation/
      (you should read it all) and then: I dove into the code,
      wholeheartedly, for nights (days?) at a time. Pick
      drivers. Concentrate on the simple ones - you want concepts, not
      nasty workarounds for buggy hardware. Try 'wc *.c|sort' in your
      favorite directory. Pick ones that look well formatted and well
      commented, and see how they're written and how they interact
      with the higher level stuff. Go into each subdirectory in the
      whole linux/ tree, and learn what lives there. You should be
      able to identify what's what from the stuff you read in those
      books. Note especially mm/ and kernel/, along with their
      counterparts under arch/.  Here lie most of the important
      functions for juggling memory, interrupts, processes
      etcetera. Learn to use grep, find and xargs effectively. If you
      have a strong constitution, look in the scripts/ directory and
      the Makefiles everywhere to see how the kernel actually gets
      built. If you're a bit twiddler at heart, look at the low level
      stuff for your favorite architecture under arch/.

      If you've still got the lust for knowledge at this point, you
      will probably have found 'that special something' that interests
      you in the kernel. You will know generally how things work from
      the source, and you will know the right people to ask from the
      source and the mailing lists.  If you have a question, go ahead
      and ask it. I've found developers to be very helpful when asked
      questions by someone who's obviously studied the sources. Play
      around. Recompile. Benchmark. Test.

      One thing that's probably overlooked by a lot of Linux people:
      BSD, 'the other free UNIX'. I can't even tell you the difference
      between FreeBSD and NetBSD, but for my purposes, I don't
      care. They're available free on the net or a CD, just like Linux
      <URL:http://ftp.freebsd.org> and <URL:http://www.freebsd.org>.
      If you're stumped by something in Linux, seeing how BSD does it
      is often helpful, especially for device drivers. Also (ahem) BSD
      code sometimes seems to be commented and formatted somewhat
      better. I don't run it, I just look at the source.

      At this stage your hats will no longer fit, and your dog will
      have run off with your girlfriend. No matter, because you'll be
      able to ask, and sometimes answer, intelligent questions about
      kernel design, in your particular specialty areas. You'll be
      fixing insidious bugs, improving performance, and posting things
      like 'this patch is from memory and untested, but it will solve
      your problem on 2.1.87: [proper patch syntax]'

      I'm not at this stage yet, and I've been working at it for a
      while.  That's why I usually post answers to questions like
      'where do I begin' rather than 'why did it hang'. The above is
      working for me, it might work for you. May the Source be With
      You, Always."

--
※ 来源: 中国科大BBS站 [bbs.ustc.edu.cn]