精华区文章阅读

发信人: clx (楚留香), 信区: Unix
标  题: SysV init 2.6 的开机过程 (一)
发信站: 紫丁香 (Sat Jun 27 13:56:41 1998), 转信

发信人: Hanky@FruitTea1 (骗谁啊？), 信区: Linux
标  题: SysV init 2.6 的开机过程 (一)
发信站: 果茶小站 (Wed May 29 19:38:46 1996)

首先我们先讲一下为什麽是 sysv init 2.6:
因为很多 distributions 用了有问题的 2.5x 版......
所以我们以 Linux 为平台, 讲 2.6 的。

这样开场白好像怪怪的, 先不管它。

一般来说, 系统在跑完 kernel bootstrapping 後, 就去跑 init
这个『万process之父』, 有了它, 才能开始生小孩......
另外 Linux 有两个 kernel 类的 process 也开始跑了起来,
一个是 kflushd, 另一个是 kswapd:

        * process ID 1: init
        * process ID 2: kflushd
        * process ID 3: kswapd

只有这个 init 是完全属於 user 类的 process, 後两者是 kernel
假藉 process 之名挂在排程上。

init 一开始就去读 /etc/inittab, 这个 inittab 中对於各个
runlevel 要跑哪些 rc 或 spawn 出什麽有很清楚的设定。

--------[/etc/inittab □例档]---------------------------------

# /etc/inittab: init(8) configuration.
# $Id: inittab,v 1.4 1996/03/10 11:47:55 miquels Exp $

# The default runlevel, 我们设为 3; runlevel 的概念我们等会儿提。
id:3:initdefault:

# 开机的 system configuration/initialization script.
# This is run first except when booting in emergency (-b) mode.
si::sysinit:/etc/init.d/boot

# What to do in single-user mode. sulogin 即为 Single User LOGIN.
~~:S:wait:/sbin/sulogin

# /etc/init.d executes the S and K scripts upon change
# of runlevel. 其中 /etc/init.d/rc 是一个 shell script,
# 後面的 0~6 参数表示要跑该 runlevel 所应跑的设定 script.
# Runlevel 0 is halt.
# Runlevel 1 is single-user.
# Runlevels 2-5 are multi-user.
# Runlevel 6 is reboot.

l0:0:wait:/etc/init.d/rc 0
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/rc 6

# 当 CTRL-ALT-DEL 按下去了, 该做什麽？一般都是 shutdown -r now
ca:12345:ctrlaltdel:/sbin/shutdown -t1 -r now
#                                 ^^^^^先等个一秒好了

# Action on special keypress (ALT-UpArrow). 这个我暂时查不到, 抱歉。
kb::kbrequest:/bin/echo "Keyboard Request--edit /etc/inittab to let this work."

# What to do when the power fails/returns. 这个我晚一点讲 (有关 UPS 的部份)
pf::powerwait:/etc/init.d/genpowerfail start
#pn::powerfailnow:/etc/init.d/genpowerfail now
po::powerokwait:/etc/init.d/genpowerfail stop
#pg::powerokwait:/etc/init.d/genpowerfail stop

# /sbin/getty invocations for the runlevels.
# ^^^^^^^^^^^开 console 出来了。
# The "id" field MUST be the same as the last
# characters of the device (after "tty").
# 例如在 runlevel==3 时, 会有六个 virtual console (tty1~tty6)
# Format:
#  <id>:<runlevels>:<action>:<process>
1:2345:respawn:/sbin/agetty 19200 tty1
2:23:respawn:/sbin/agetty 19200 tty2
3:23:respawn:/sbin/agetty 19200 tty3
4:23:respawn:/sbin/agetty 19200 tty4
5:23:respawn:/sbin/agetty 19200 tty5
6:23:respawn:/sbin/agetty 19200 tty6

----------[End of /etc/inittab]---------------------------------

上面的设定, 令系统在一跑完 kernel bootstrapping 後, 就去执
行 /etc/init.d/boot 这个 shell script, 如果没什麽问题, 就
进入 default runlevel, runlevel 在 sysv 通常是有如下意义:

0: halt (只有 shutdown, 不 reboot 了)
1: Single-user
2: Multi-user
3: Multi-user & 把 network resource export 出来
4: 除了有 Multi-user & export network resource, 一般来讲是留给 xdm
5: 不太用到了.
6: reboot (shutdown 後会 reboot)

其中 2-5 都是 multi-user 的 runlevel, 通常 runlevel(2~5)
越高, 所提供的服务也就越多。当系统资源有所变动 (例如电力)
时, 我们可以用 telinit 去告知 init 要变换 runlevel (例如
原来是 runlevel=3, 用 telinit 2 使 runlevel 降为 2), 这
样子可以关掉一些网路资源服务; 或例如 telinit S 与 telinit 1
都是到 Single-user mode (但前者不同的是, telinit S 根本
就直接在 /dev/console 上执行一个 /bin/sh 给你用; 後者会
去执行 /etc/init.d/rc 1 这个指令); telinit 6 就等於 reboot.

OK, 有问题请先提出来。

下面我们要讲到的是 /etc/init.d/boot 这个 script 应该做些
什麽, 才是我们要的。

再来我们讲一下 /etc/init.d/boot 这个 script.

既然 kernel bootstrapping 完, 我们就要开始做一些
很基本的检查、设定, 以及做一些『准备工作』。

在 kernel bootstrapping 後, 在 mount root as read-only
前会先做一些工作, 以 Linux 为例:

        * 先设定这个 script 的 PATH 及 umask
        * 挂入 kerneld 这个 daemon (这个和 kernel modules 有关, 後面再讲)
        * mdadd -ar, 把 md device 跑起来 (这也是後面再讲)
        * swapon -a, 把所有的 swap partition 打开来用.
        * 挂入 update (bdflush) 这个 daemon

kerneld 是 Linux kernel 1.3.xx 有了 modules 化後,
一个会自动插 modules 进 kernel 的 daemon, 也会把
经一段时间後不曾用到的 modules 拔出 kernel。有关
modules 的概念, 我们等到 kernel 的介绍再谈详细的。

md device 是 Linux kernel 1.3.69 後新加入的功能,
它可以把两个以上的 partition 合成一个大的 md device
之後, 直接做出 file system 或 swap space, 而且可以
『交错地』安排 block 位置, 这就像 RAID-0 一样, 所以
不但可以将一堆小的 partition 合成大的来用, 也可以增
进速度。

update (bdflush) 这个 daemon 是每隔一段时间 (预设值
是 5 秒) 就把 'dirty blocks' flush 回 disk 中. 这个
一定要在跑 fsck 等主要的 I/O 动作前就先挂入的了。

好, 最基本的准备完毕後, 我们就要先来 fsck 了。
首先是把『根』mount 起来, 而且是 read-only:

        * mount -n -o remount,ro /

其中 -n 的参数是不把 mount 的动作写入 /etc/mtab 中,
因为现在是把 '/' mount 成 read-only, 根本不能写入。
然後我们开始 fsck:

        * fsck -A -a

参数 -A 是对 /etc/fstab 中的东西全部 check 一次, -a
的参数是指 auto-repair. 在检查後如果有东西实在是不能
修好, 就会执行 sulogin, 然後 reboot.  如果正常, 那就
把 '/' remount 成 read-write:

        * mount -n -o remount,rw /

因为後面还会 mount -a, 所以这次我们还是用了 -n 参数。

接下来是跑一些『当 '/' 可以 write 了』之後, 立刻要做的
事:

        * 跑 modules 的设定
        * 把一些 /etc 及 / 下的档案清除
        * 更新 psdatabase

□例如下:

----------[/etc/init.d/boot 部份内容]-------------------------
# Load the appropriate modules.
if [ -x /etc/init.d/modules ]
then
  /etc/init.d/modules
fi

# Remove /etc/mtab*, /etc/rmtab, /etc/nologin and /fastboot.
rm -f /etc/mtab* /etc/nologin /fastboot /etc/rmtab

# update /etc/psdatabase
psupdate 2> /dev/null
# or
#ps -U 2> /dev/null
---------[□例结束]--------------------------------------------

上面的东西, 我想大家大概都看得懂......

好, 再来就是把所有的 local partitions 都给它 mount 起来.

        * mount -avt nonfs

那为何是 -t nonfs 呢？很简单, 因为我们还没开始设定 network.
然後, 如果我们有一些 swap file 是在 mount -a 後才出现的,
这时就要再跑一次 swapon:

        * swapon -a 2>/dev/null

才会把 swap file 开来用。
OK, 然後设定网路 (这时是去叫用一个独立的 script,
如果这个 script 不存在, 我们就无法设定 network)
及主机名称, 然後再 mount -a -t nfs 来加挂人家 export 出来的 fs.

------[□例如下]------------------------------------------------
if [ -x /etc/init.d/network ]
then
  /etc/init.d/network
fi
# 然後设定 hostname
# If there's no /etc/HOSTNAME, fall back on this default:
if [ ! -r /etc/HOSTNAME ]; then
   echo "Henry.Dorm10.NCTU.edu.tw" > /etc/HOSTNAME
fi
cat /etc/HOSTNAME | cut -f1 -d . > /etc/hostname
hostname --file /etc/hostname

# Now that TCP/IP is configured, mount the NFS file systems in /etc/fstab.
echo "Mounting remote file systems..."
mount -a -t nfs
------[□例结束]------------------------------------------------

好, 这时才把所有的 file system(含 nfs) 都 mount 起来了,
所以现在立刻要做的事, 就是更新 /etc/ld.so.cache 这个档,
设定 system clock, 然後清除 /tmp, /var/run 及 /var/lock
下的大部份垃圾:

        * /sbin/ldconfig
        * clock -s
        * 清除 /tmp, /var/run, /var/lock 下的垃圾

OK, /tmp, /var/run 及 /var/lock 这些目录下的垃圾都清空了,
这时才去执行 /etc/rc.boot/ 下的所有 script (其中 run-parts
是一个工具程式, 它会把你给的参数[目录]下所有的 scripts 都
给它跑个一次):

        * run-parts /etc/rc.boot

如果没有 run-parts 这个工具, 不妨自己学著用 shell script
写一个; 或是用笨方法: 还是乖乖地写在这个 /etc/init.d/boot
script 内吧......

然後修改 /dev/ttyXX 的属性 (关於 pty256 容後再介绍):

        * chmod 666 /dev/tty[pqrstuvwxyzabcde]*
        * chown root.tty /dev/tty[pqrstuvwxyzabcde]*

再来就看你还有什麽事打算在这儿就先处理掉的, 也一并在此写入,
或是写个 script 丢到 /etc/rc.boot/ 下也是一样的。例如把
powerd 跑起来、建立 /etc/motd、建立 /etc/issue.net、建立一
些 links......都是不错的主意。

以下是我的 /etc/init.d/boot 这个 script:

[附件一]

PATH="/sbin:/bin:/usr/sbin:/usr/bin"
umask 022

echo
echo "Running /etc/init.d/boot..."
echo

# enable kerneld
if [ -x /sbin/kerneld ]; then
  /sbin/kerneld
fi

# 把 md 跑起来
if [ -s /etc/mdtab -a -f /sbin/mdadd ]
then
  mdadd -ar
fi

echo "Activating swap..."
swapon -a 2>/dev/null

# Ensure that bdflush (update) is running before any major I/O is
# performed (the following is a good example of such activity :).
update &

# Check the integrity of all file systems (if not a fastboot).
if [ ! -f /fastboot ]
then
  # Ensure that root is quiescent and read-only before fsck'ing.
  mount -n -o remount,ro /
  if [ $? = 0 ]
  then
    echo "Checking file systems..."
    fsck -A -a
    # If there was a failure, drop into single-user mode.
    #
    # NOTE: "failure" is defined as exiting with a return code of
    # 2 or larger.  A return code of 1 indicates that file system
    # errors were corrected but that the boot may proceed.
    if [ $? -gt 1 ]
    then
      # Surprise! Re-directing from a HERE document (as in
      # "cat << EOF") won't work, because the root is read-only.
      echo
      echo "fsck failed.  Please repair manually and reboot.  Please note"
      echo "that the root file system is currently mounted read-only.  To"
      echo "remount it read-write:"
      echo
      echo "   bash# mount -n -o remount,rw /"
      echo
      echo "CONTROL-D will reboot the system."
      echo
      # Start a single user shell on the console
      /sbin/sulogin /dev/console
      sync
      reboot
    fi
    echo
  else
    echo "*** ERROR!  Cannot fsck because root is not read-only!"
    echo
  fi
else
  echo "*** Fast boot ... skipping disk checks"
  echo
fi

# Remount the root file system in read-write mode.
mount -n -o remount,rw /

# Load the appropriate modules.
if [ -x /etc/init.d/modules ]
then
  /etc/init.d/modules
fi

# Remove /etc/mtab*, /etc/rmtab, /etc/nologin and /fastboot.
rm -f /etc/mtab* /etc/nologin /fastboot /etc/rmtab

# update /etc/psdatabase
psupdate 2> /dev/null
# or
#ps -U 2> /dev/null

# Mount local file systems in /etc/fstab.
echo "Mounting local file systems..."
mount -avt nonfs

# Execute swapon command again, in case we want to swap to
# a file on a now mounted filesystem.
swapon -a 2>/dev/null

# Setup the network interfaces. Note that /var/run and /var/lock
# are cleaned up after this, so don't put anything in the "network"
# script that leave a pidfile or a lockfile.
if [ -x /etc/init.d/network ]
then
  /etc/init.d/network
fi

# Set hostname.
# If there's no /etc/HOSTNAME, fall back on this default:
if [ ! -r /etc/HOSTNAME ]; then
   echo "Henry.Dorm10.NCTU.edu.tw" > /etc/HOSTNAME
fi
cat /etc/HOSTNAME | cut -f1 -d . > /etc/hostname
hostname --file /etc/hostname

# Now that TCP/IP is configured, mount the NFS file systems in /etc/fstab.
echo "Mounting remote file systems..."
mount -a -t nfs

# Update all the shared library links automatically
echo "Update /etc/ld.so.cache and all the shared library links."
/sbin/ldconfig

# Set GMT="-u" if your system clock is set to GMT, and GMT=""
# if not.
GMT=""
# Set and adjust the CMOS clock.
clock -s $GMT
if [ ! -f /etc/adjtime ]
then
  echo "0.0 0 0.0" > /etc/adjtime
fi
clock -a $GMT

# Now that /usr/lib/zoneinfo should be available, announce the local time.
echo
echo "Local time: `date`"
echo

# Wipe /tmp (and don't erase `lost+found', `quota.user' or `quota.group')!
# Note that files _in_ lost+found _are_ deleted.
echo -n "Cleaning up /tmp, /var/run and /var/lock... "
( cd /tmp && \
  find . \
  ! -name .\
  ! $ -name lost+found -uid 0 $ \
  ! $ -name quota.user -uid 0 $ \
  ! $ -name quota.group -uid 0 $ \
    -depth -exec rm -rf -- {} \; )
# Clean up any stale locks.
( cd /var/lock && find . -type f -exec rm -f -- {} \; )
# Clean up /var/run and create /var/run/utmp so that we can login.
( cd /var/run && find . ! -type d -exec rm -f -- {} \; )
: > /var/run/utmp
echo "done."

# Run the package-specific boot scripts in /etc/rc.boot.
run-parts /etc/rc.boot
# Set pseudo-terminal access permissions.
chmod 666 /dev/tty[pqrstuvwxyzabcde]*
chown root.tty /dev/tty[pqrstuvwxyzabcde]*

# Setup the /etc/issue.net to reflect the current kernel level:
cat /etc/issue > /etc/issue.net
uname -a >> /etc/issue.net
echo >> /etc/issue.net

touch /etc/motd

# and startup powerd
echo "Start up the genpowerd" ; /sbin/genpowerd /dev/UPS henry

--------------------- --------------------- ---------------------
这下我们的 /etc/init.d/boot 跑完了, 依照 /etc/inittab
的设定, 是跑 runlevel=3, 执行 /etc/init.d/rc 3 这个指令。

/etc/init.d/rc 是一个不错 (聪明？) 的 shell script,
我们来介绍一下它的运作方式。

在 SysV init 2.6 所用的 /etc/ 下, 除了 init.d/ 及
boot/ 这两个子目录外, 尚有:

drwxr-xr-x   2 root         1024 May 28 09:08 rc0.d/
drwxr-xr-x   2 root         1024 May 28 09:08 rc1.d/
drwxr-xr-x   2 root         1024 May 29 09:54 rc2.d/
drwxr-xr-x   2 root         1024 May 28 09:08 rc3.d/
drwxr-xr-x   2 root         1024 May 28 09:08 rc4.d/
drwxr-xr-x   2 root         1024 May 28 09:08 rc5.d/
drwxr-xr-x   2 root         1024 May 28 09:08 rc6.d/

这七个子目录, 我们以 rc3.d 来作□例:

# dir rc3.d/
S20cron -> ../init.d/cron*
S20gpm -> ../init.d/gpm*
S20httpd -> ../init.d/httpd*
S20innbbsd -> ../init.d/innbbsd*
S20ip_acct -> ../init.d/ip_acct*
S20lpd -> ../init.d/lpd*
S20netbase -> ../init.d/netbase*
S20netstd_misc -> ../init.d/netstd_misc*
S20nfs -> ../init.d/nfs*
S20quota -> ../init.d/quota*
S30sendmail -> ../init.d/sendmail*
S30syslogd -> ../init.d/syslogd*

这些都是一个一个的 symbolic link 到 /etc/init.d/ 下的 script,
其中 S20 是一种控制说明, S 是表示 start (K 是表示 kill), 20
这个数字表示它们的执行顺序 (20 比 30 先执行, 同数字则照字母
顺序即可)。

/etc/init.d/rc 就是照给定的 runlevel 数字, 去对应的目录看这些
东西来决定要跑那些 script.

那麽, init.d/ 下的那些 script 又是怎麽回事呢？
我们来看一下好了:

root@Henry:/etc# dir init.d/
total 45
-rwxr-xr-x   1 root         5231 May 28 17:47 boot*
-rwxr-xr-x   1 root          290 May 28 09:08 cron*
-rwxr-xr--   1 root         2250 May 29 01:29 genpowerfail*
-rwxr-xr-x   1 root          283 May 28 06:41 gpm*
-rwxr-xr-x   1 root          707 Feb 29 20:17 halt*
-rwxr-xr-x   1 root          718 May 28 06:27 httpd*
-rwxr-xr-x   1 root          494 May 28 07:05 innbbsd*
-rwxr-x---   1 root          333 May 28 07:12 ip_acct*
-rwxr-xr-x   1 root          343 May 28 06:35 lpd*
-rwxr-xr-x   1 root          500 May 28 05:25 modules*
-rwxr-xr-x   1 root          699 May 28 06:12 netbase*
-rwxr-xr-x   1 root          391 Mar 19 10:32 netstd_init*
-rwxr-xr-x   1 root          598 May 29 09:54 netstd_misc*
-rwxr-xr-x   1 root         1372 May 29 10:05 network*
-rwxr-xr-x   1 root         1208 May 28 05:36 nfs*
-rwxr-xr-x   1 root         1258 Dec 28 08:02 powerfail*
-rwxr-x---   1 root          891 May 28 06:45 quota*
-rwxr-xr-x   1 root         2928 Jan  4 19:59 rc*
-rwxr-xr-x   1 root          653 Feb 29 20:17 reboot*
-rwxr-xr-x   1 root          696 May 28 07:34 sendmail*
-rwxr-xr-x   1 root          527 Mar 20 00:44 single*
-rwxr-xr-x   1 root         1078 Dec 28 08:21 skeleton*
-rwxr-xr-x   1 root          640 May 29 08:21 syslogd*

呵呵, 没想到有这麽多吧？一个提供比较多服务的工作站, 大概
就需要这些。  除了 boot, rc, network, modules 是我们介绍
过的之外, 其他都是给 /etc/rc[0-6].d/ 做 symbolic link 用
去了。

OK, 我叫一个 script 出来给大家看:

-------[/etc/init.d/netbase]----------------------------
#!/bin/sh
#
# Start networking daemons.

test -f /usr/sbin/rpc.portmap || exit 0

case "$1" in
  start)
        echo -n "Starting base networking daemons: "
        echo -n "rpc.portmap "
        start-stop-daemon --start --quiet --exec /usr/sbin/rpc.portmap
        echo -n "xinetd "
        start-stop-daemon --start --quiet --exec /usr/sbin/xinetd
        echo
        ;;
  stop)
        start-stop-daemon --stop --quiet --oknodo --exec /usr/sbin/xinetd
        start-stop-daemon --stop --quiet --oknodo --exec /usr/sbin/rpc.portmap
        killall -9 slattach 2>/dev/null || exit 0
        ;;
  reload)
        start-stop-daemon --stop --oknodo --signal 10 --exec /usr/sbin/xinetd
        ;;
  *)
        echo "Usage: /etc/init.d/netbase {start|stop|reload}"
        exit 1
        ;;
esac

exit 0

-------[end]-----------------------------------------------------------------

其中 start-stop-daemon 是一个 perl script, 参数 --start 是跑它去启动
那个 daemon (例如 xinetd), 参数 --stop 是叫它杀掉那个 daemon。  一般
而言, 如果给的 signal 对的话 (预设值是 9, SIGKILL), 尚可使该 daemon
reload 一次它的设定档 (以 xinetd 为例是 10, SIGUSR1), 而通常都是给
1, SIGHUP。

S20netbase 这个 symbolic link 会使得 /etc/init.d/rc 去启动它, 也就是
跑 '/etc/init.d/netbase start', 使得 rpc.portmap 及 xinetd 执行起来,
就达到我们所要求的。

其他如 lpd、gpm、httpd、nfs、也是如此。  补充一点的是, nfs 类的 daemon
一定得在 rpc.portmap 跑起来之後才能正确动作 (因为 nfs 也是一种 rpc),
但在此 S20netbase 会比 S20nfs 先跑 (考虑字母顺序), 所以看起来没有什麽
问题。  注意一下执行顺序是很重要的事。

在此再补充一点, debian 有一个工具程式 update-rc.d 很有用:

-----------[man update-rc.d]-------------------------------------
NAME
       update-rc.d  -  install  and  remove  System-V  style init
       script links

SYNOPSIS
       update-rc.d <basename> remove

       update-rc.d <basename> defaults [ <codenumber> | <startco-
       denumber> <stopcodenumber> ]

       update-rc.d  <basename>  start  |  stop <codenumber> <run-
       level> [ <runlevel> [ <runlevel> [...]]]  .

----------[end of part]------------------------------------------

它会帮助你把新造好的 script 加入 /etc/rc[0-6].d/ 中的 symbolic
link. 如果是 defaults, 则在 rc[2-5].d/ 下造 S20xxxxx, rc[016].d/
下造 K20xxxxx 的 symbolic link, 不错吧？

好, /etc/init.d/rc 3 也跑完了, 再来就是开 console
了。依照 inittab 的设定, 是在 tty1 ~ tty6 开六个
virtual console, 所以我们可以用 atl-f1 到 f6 来切
换 virtual console 的操作。  login 之後, 我们会看
到下面的 process:

root@Henry:/etc# ps ax |grep getty
  165 v02 S     0:00 /sbin/agetty 19200 tty2
  166 v03 S     0:00 /sbin/agetty 19200 tty3
  167 v04 S     0:00 /sbin/agetty 19200 tty4
  168 v05 S     0:00 /sbin/agetty 19200 tty5
  169 v06 S     0:00 /sbin/agetty 19200 tty6

因为我们已经 login 了, 用掉 tty1 了, 所以看不到 tty1
在跑 agetty.

有关 SysV init 2.6 版就先讲到这□, 大家有没有什麽问题？

接下来要谈到的是 UPS 和 powerd 的连接方式, 我们先休息
十分钟。

--
※ 来源:．紫丁香 bbs.hit.edu.cn．[FROM: 202.118.244.16]

Linux 版 (精华区)