精华区文章阅读

发信人: Zinux (Linux技工), 信区: Embedded_system
标题: Design of a Fully Preemptable Linux Kernel
发信站: 哈工大紫丁香 (Sun Sep 23 12:46:02 2001) , 转信

George Anzinger and Nigel Gamble, MontaVista Software (September 6, 2000)

Introduction

MontaVista has developed a hard real-time fully preemptable Linux kernel, base
d on Linux kernel 2.4. The preemptable kernel has the potential to dramaticall
y improve application responsiveness of the Linux kernel, while fully preservi
ng the standard Linux programming model. The current prototype of the preempta
ble Linux kernel (for IA32/X86 platforms) is available for ftp download, at ft
p://ftp.mvista.com.

Preemption model

The preemption model used is to allow the kernel to be preempted at any time w
hen it is not locked. This is very different from the model where preemption i
s actively requested by the currently executing code. Using this model, when a
n event occurs that causes a higher priority task to be executable, the system
will preempt the current task and run the higher priority task. Of course, th
ere are times when this should not be done. These include:
While handling interrupts

While doing "bottom half" processing. Bottom half processing is work that an i
nterrupt routine needs to do, but which can be done at a more relaxed pace.

While holding a spinlock, writelock, or readlock. These locks were put in the
kernel to protect it from other processors in Symmetric Multiprocessing (SMP)
Systems. While these locks are held, the kernel is not preemptable for reentra
ncy or data protection reasons (just as in the SMP case).

While the kernel is executing the scheduler itself. The scheduler is charged w
ith executing the "best" task and if it is engaged in making that decision, it
should not be confused by asking it to give up the processor.
At all other times the MontaVista algorithm allows preemption. Also, whenever
the system exits from one of the above states, a test is made to see if preemp
tion is called for. If so, the current task is preempted. (Please see the fina
l paragraph below for important comments on this attribute.)

MontaVista's effort to bring preemption to Linux focused on two changes.

First the basic interrupt entry and completion code "entry.S" and its helpers
was modified to ensure that it never returns to a user with a pending soft int
errupt, context switch, or signal delivery. It was also modified to never retu
rn to system code with a pending soft interrupt, or allowed context switch pen
ding. Here "allowed" means that the preemption lock count is zero. The preempt
ion lock count is incremented whenever a spinlock, writelock, readlock, interr
upt or trap is taken and decremented when ever these conditions clear.

The second change centered on modifying the header files for spinlock, writelo
ck, readlock, and some SMP code to produce code that modified the preemption c
ount.

The scheduler was changed only slightly to directly prevent its own preemption
and to honor an additional "state" TASK_PREEMPTING flag. This flag is set whe
never preemption is taken and tells the scheduler that the task is to be treat
ed as running, even though its actual state may be other than running. This al
lows preemption to occur during wake_up set up times when the kernel sets the
current tasks state to something other than running and then does other set up
work on the way to calling the scheduler. By using the TASK_PREEMPTING flag o
n the state (the underlying state is preserved) the scheduler can distinguish
between the preemption call and the completion of the wake_up set up. Without
this flag, the task could be put to sleep prior to completion of the wake_up s
etup, and thus would never wake_up.

Continuing development

MontaVista is continuing to work on preemption and is currently looking at the
following:
Change the spinlock, writelock, and readlock macros that also disable interrup
ts to not bother with the preemption counter. The interrupt system being off p
revents preemption, so the flag is not needed.

Set up the spinlock, writelock, and readlock macros so that both SMP and preem
ption can be turned on at the same time. There is no reason an SMP system cann
ot be preemptable.

Measure the impact on throughput of a variety of workloads. Note that under co
mbined I/O and CPU intensive workloads, throughput may actually improve.

Do timing analysis on the system locks.

Using the above analysis for short locks, consider using the interrupt system
as a lock, instead of the preemption counter. This will reduce system overhead
.

Again, using the above analysis, consider ways to reduce the longer times. Thi
s would most likely entail using mutex locks to protect the regions, thus bloc
king only code that needed the same function, rather than the whole processor.
These mutex locks would be based on priority-inheriting binary semaphores. (P
riority-inheriting semaphores are designed such that a task holding such a sem
aphore runs at, or above, the priority of the highest priority task waiting fo
r the semaphore. This prevents a low priority task from being blocked by prior
ity while it is holding a semaphore that a higher priority task wants.)
Over the long term, MontaVista is investigating whether preemption locks can b
e eliminated (or at least greatly reduced in number) by protecting all the sho
rt-duration critical regions with spinlocks that also disable interrupts on th
e local CPU, and the long-duration critical regions with mutex locks. ("Long d
uration" means much greater than the time taken by two context switches.) This
will reduce the overhead of the preemptable kernel, since there will no longe
r be any need to test for preemption ("polling for preemption") at the end of
a preemption-locked region (which could happen tens of thousands of times per
second on an average system). Instead, preemption would happen automatically a
s part of the interrupt servicing that causes a higher-priority process to bec
ome runnable ("event-driven preemption"). Typically, this only happens a few t
imes to a few tens of times per second with an average system workload, making
the "event-driven preemption" model much more efficient than the "polling for
preemption" model. This method also has an added efficiency in that the syste
m will take advantage of the cache disruption caused by the interrupt (which i
s unavoidable) to continue with the preemption.

--

※ 来源:．哈工大紫丁香 http://bbs.hit.edu.cn [FROM: 202.118.239.146]

Embedded 版 (精华区)