Embedded 版 (精华区)
发信人: Zinux (Linux技工), 信区: Embedded_system
标 题: Porting Device Drivers To Linux 2_2 part II
发信站: 哈工大紫丁香 (2001年10月26日18:27:19 星期五), 站内信件
Linux Magazine (http://www.linux-mag.com) June 1999
Copyright Linux Magazine ?1999
GEARHEADS
Porting Device Drivers To Linux 2.2: part II
by Alan Cox
If you followed last issue's "Gearheads" column, all of your block and
character devices should be running under Linux 2.2, albeit possibly
with warnings about obsolete PCI interfaces. In this article, I will
finish up with some of the smaller changes that may catch a driver
author, cover networking, and then look at the new PCI layer.
Figure 1
if(skb->protocol == htons(ETH_P_MYPROTO))
{
/* The card requires we mask the addresses for this */
u8 v;
skb = skb_cow(skb, 0);
if(skb==NULL)
return 1; /* Whoops no memory */
skb->data[4]&=0x7F;
skb->data[5]&=0x7F;
}
I'll start with the small stuff, since that is nice and easy. The
first of these is signal handling. Linux 2.2 has more signals as well
as POSIX real-time signal queues. This fact changes the driver code to
determine whether a process has received a signal.
In Linux 2.0, drivers check for signals directly. This is done with
code such as:
code such as:
if(current->signal &
~current->blocked)
return -EINTR;
which ensures that pressing Ctrl-C on the terminal will return the
EINTR error code from the device driver function.
Linux 2.2 replaces this with a function which hides the implementation
of signals, which also means that we can avoid changing drivers again
in the future. The above code now becomes:
if(signal_pending(current))
return -EINTR;
which is much cleaner.
The second, related issue is timeouts. Linux provides device drivers
The second, related issue is timeouts. Linux provides device drivers
with several ways to handle timeouts. The normal mechanism is to use
the add_timer() and del_timer() functions. It is also possible to
sleep on a wait queue or reschedule with a timeout.
In Linux 2.0, the code for rescheduling with a timeout (essentially,
to cause the process to sleep for a certain delay within the kernel)
was:
current->state =
TASK_INTERRUPTIBLE;
current->timeout =
jiffies + MY_DELAY;
schedule();
In Linux 2.2, this becomes:
current->state =
TASK_INTERRUPTIBLE;
schedule_timeout(MY_DELAY);
schedule_timeout(MY_DELAY);
The same pattern is followed for sleeping on a wait queue. This is
done quite simply with:
interruptible_sleep_on_
timeout (wait_queue, MY_DELAY);
These changes to the timeout functions are done to improve scheduler
performance. Instead of the scheduling code spending time managing
processes that are almost never running, the scheduler and the timer
handling are now split.
Porting Network Interfaces
Network driver functions have changed between Linux 2.0 and Linux 2.2.
The actual routines have changed little in terms of functionality, but
the calling conventions have changed considerably due to the addition
of extensive SMP support.
of extensive SMP support.
The most obvious difference is that the functions for freeing buffers
have changed. In order to avoid memory-accounting errors (and to make
the programmer's life easier) the network buffers now remember which
resources they are using and whether the buffer is in the sending or
receiving path. This means that the second FREE_READ or FREE_WRITE
argument to dev_kfree_skb() is a thing of the past.
Secondly, the buffers handed to a network driver belong solely to that
driver. This gives the driver (almost) total freedom to play with the
sk_ buff structure which it is handed. It can't change skb-> data
(which is shared) but it can play with the rest of the object. It is
best to avoid playing with the data anyway, as this potentially
requires a copy. However, you can use the skb_cow() function to obtain
a private copy of the buffer. If the buffer was already private it
simply hands back the buffer you gave it, taking virtually no time to
execute. Thus you might use something like Figure 1.
Another function provided to help device and protocol authors is skb_
realloc_headroom(). The kernel guarantees that when your driver is
passed a buffer, that buffer has at leastdev->hard_header_len bytes
for the hardware headers. The ether_ setup function sets this to 14
for the hardware headers. The ether_ setup function sets this to 14
(for example), which leaves space for the Ethernet header to fit.
Sometimes you get low-speed drivers that occasionally need a lot of
header space, and it's undesirable to allocate the entire header space
all of the time. In these cases you can use:
skb = skb_realloc_headroom
(skb, 128);
if(skb == NULL)
/* Whoops no memory */
return 0;
to make a copy of the buffer if need be, which has at least 128 bytes
of space at the beginning. In general, you want to avoid this function
as copies impact performance. In some cases, such as tunnel devices,
you can never be sure how much header space is needed in advance
(e.g., as a tunnel may itself be tunneled). In these cases you set the
device header length to cover normal cases and bite the overhead on
the occasional unusual frame by using skb_realloc_ headroom().
Locking and SMP
The actual receive and transmit paths of most drivers are unchanged
between 2.0 and 2.2. There are real changes in the handling of ARP and
headers but these are invisible to most drivers as they use the
standard setup functions.
The interaction between receives and transmits has changed
considerably. In Linux 2.0, the SMP lock ensured that the transmit and
receive paths never ran in parallel. A receive interrupt might well
occur during a transmit, but the opposite was never true.
In 2.2, it is quite likely on a multiprocessor machine that the
transmit and receive paths will run at the same time. While this is
good for performance, it does mean that driver authors may need to
manage locks explicitly. Modern Ethernet controllers are sometimes
designed to make this easy, but not always.
Most drivers that need to do some locking use spin locks (discussed in
Most drivers that need to do some locking use spin locks (discussed in
the previous issue). The simple changes applied to most drivers are:
Figure 2A
The 3c509 driver in 2.2 has the following structure:
struct el3_private {
struct enet_statistics stats;
struct device *next_dev;
spinlock_t lock; /* The device lock */
int head, size;
struct sk_buff *queue[SKB_QUEUE_SIZE];
char mca_slot;
}
* Adding a spin lock to the device private structure, as in Figure 2A.
* Initializing the lock when the device is probed (in device_probe)
lp->lock=SPIN_LOCK_UNLOCKED.
Figure 2B
if (test_and_set_bit(0, (void*)&dev->tbusy) != 0)
printk("%s: Transmitter access conflict.\n", dev->name);
else {
spin_lock_irqsave(&lp->lock, flags);
/* Transmit code */
spin_unlock_irqrestore(&lp->lock,flags);
}
Figure 2C
lp = (struct el3_private *)dev->priv;
spin_lock(&lp->lock);
if (dev->interrupt)
if (dev->interrupt)
printk("%s: Re-entering the interrupt handler.\n",
dev->name);
dev->interrupt = 1
* Setting the lock in the transmit function, as in Figure 2B.
* Using the lock in the interrupt handler, as in Figure 2C.
The above usage of locks enforces a single threading between the
transmit and receive paths if required by the device. If you can avoid
such locking, it's best to do so, especially on devicescapable of
full-duplex networking. Avoiding locks means you can be simultaneously
sending data on one processor and receiving data on another.
Two other functions which have not changed in themselves are, however,
tangled up in the locking. The first is the get_stats() function. This
is called whenever a user asks for statistics on the device -- for
example, through ifconfig or the /proc/ net/dev file. It is quite
common for the statistics function to query the card itself -- often
common for the statistics function to query the card itself -- often
the card maintains the counters rather than the driver. Therefore,
get_stats() may need to be locked against the transmit and receive
paths to prevent conflicts.
Figure 3
static struct enet_statistics *el3_get_stats(struct device *dev)
{
struct el3_private *lp = (struct el3_private *)dev->priv;
unsigned long flags;
spin_lock_irqsave(&lp->lock, flags);
update_stats(dev);
spin_unlock_irqrestore(&lp->lock, flags);
return &lp->stats;
}
The example in Figure 3 is from the 3c509 driver where the statistics
query cannot be done during a transmit or receive. Here, you can see
query cannot be done during a transmit or receive. Here, you can see
the statistics update function is guarded by the device spin lock
ensuring that all three of the statistics, transmit, and receive paths
are serialized -- that is, only one of the three is executed at any
given time.
Figure 4
static void set_multicast_list(struct device *dev)
{
unsigned long flags;
struct el3_private *lp = (struct el3_private *)dev->priv;
int ioaddr = dev->base_addr;
spin_lock_irqsave(&lp->lock, flags);
if (dev->flags&IFF_PROMISC) {
outw(SetRxFilter | RxStation | RxMulticast |
RxBroadcast | RxProm, ioaddr + EL3_CMD);
}
else if (dev->mc_count || (dev->flags&IFF_ALLMULTI)) {
outw(SetRxFilter|RxStation|RxMulticast|RxBroadcast,
ioaddr + EL3_CMD);
}
}
else
outw(SetRxFilter | RxStation | RxBroadcast, ioaddr
+EL3_CMD);
spin_unlock_irqrestore(&lp->lock, flags);
}
The final function that tends to get involved with SMP locking is the
multicast list update. This can be called from both a user process
updating its multicast listening list and also from the IPv6 network
layer. On some cards, updating the multicast list requires you to stop
transmit and receive, and perhaps prevent statistics querying. Again,
a spin lock can be used to ensure this. The example in Figure 4 is
from the 3c509 driver as well.
By now you are probably thinking that the kernel is out to get you. It
does, however, provide a set of sensible guarantees to eliminate most
does, however, provide a set of sensible guarantees to eliminate most
headaches:
* An interrupt handler will not be re-entered while running. This
means you will not get two processors trying to receive packets at the
same time.
* The sending function is single threaded. The kernel will not pass
you any pack-ets to send while you are executing your packet
transmission function. It will wait for you to return and then feed
you the next packet if you are ready for it.
Nevertheless, you do need to be aware of the fact that on a
four-processor machine you may be running a get_stats, a multicast
update, a receive and a transmit at the same time. The locks suggested
should get your driver working. Optimizing it beyond that really needs
an SMP machine and a lot of testing.
If your driver uses the common core drivers for things like the NS8390
(8390.o), the core driver modules handle SMP locking. In the case of
the 8390 driver this is very good news for driver authors as the chip
was not designed for SMP use. In fact, at times it appears to have
been designed to prevent SMP use, mostly due to its age!
been designed to prevent SMP use, mostly due to its age!
Header Handling
Header caches and ARP handling have changed significantly since Linux
2.0. ARP is a protocol used by many networking layers to discover
other IP hosts. Physical networks such as Ethernet use their own
addressing scheme and it is thus necessary to map an IP address to an
Ethernet address before sending any packets. ARP solves this through
the simple approach of broadcasting messages such as, "Whoever has
(1.2.3.4) own up and tell me the Ethernet address for (1.2.3.10)". The
results are then cached by the kernel. You can inspect this cache
through /proc/net/arp.
Because most drivers use an existing protocol layer for their physical
headers, the header cache and ARP changes are not issues for most
driver authors. The Ethernet, FDDI and token ring setup
functions(init_ethdev (), init_trdev(),etc.) are already covering the
changes.
If you do need to touch these layers, all you probably need to know is
that while the build_header() function behaves as it did in Linux 2.0,
the rebuild() function has changed. Previously this function passed a
the rebuild() function has changed. Previously this function passed a
whole series of mostly unnecessary parameters to the driver. Now it
passes only the buffer. This makes sense because the other fields you
need are the device and the data pointers, which can be obtained from
the buffer itself.
Figure 5A
int eth_rebuild_header(void *buff, struct device *dev, unsigned long
dst,
struct sk_buff *skb)
{
struct ethhdr *eth = (struct ethhdr *)buff;
/* ... */
}
Figure 5B
int eth_rebuild_header(struct sk_buff *skb)
{
struct ethhdr *eth = (struct ethhdr *)skb->data;
struct ethhdr *eth = (struct ethhdr *)skb->data;
struct device *dev = skb->dev;
/* ... */
}
Thus, Figure 5A becomes like Figure 5B, which takes the other
parameters from the packet itself.
If you look at the Ethernet layer as a good example (net/ethernet/
eth.c) you will see that the kernel ARP functions have also been
cleaned up in the same way. The only arguments now passed around are:
arp_find(u8 *where,
structsk_buff*skb)
where skb is the buffer we are trying to complete an ARP query for,
and where is the place within that buffer to put the answer.
Final Cleanup
The last small piece that has changed with network drivers is the
statistics structure. Previously called struct enet_statistics this
structure is now called struct net_device_ stats to reflect its more
generic nature. Using the old name is fine for now, but that may break
in 2.3.
Also, the stats structure now contains byte counts, so you will want
to go over your driver and add code to update tx_bytes and rx_bytes
when you update tx_packets
and rx_packets. These extra byte counters are needed for accurate SNMP
network monitoring of Linux boxes.
The Linux 2.2 PCI Layer
Now it's time to look at tidying up the PCI usage in 2.2 drivers.
The PCI code in 2.2 changed for a good reason. In Linux 2.0, PCI
basically meant x86, or to a limited extent, Alpha. Only the Intel x86
has the PCI BIOS interface provided by the kernel. With Linux 2.2, you
can be using PCI devices on numerous platforms, including some bus
layouts that the PCI BIOS does not support.
Therefore, the kernel provides an abstract PCI layer that is built on
top of architecture-dependent code. On the x86 this includes both
direct PCI and PCI BIOS access. On other platforms (such as the
PowerPC) this is done by talking to the boot ROMs and directly to the
PCI bus.
Linux 2.2 builds a list of PCI devices at boot time. Each entry is a
struct pci_dev, which contains PCI configuration information about the
device.
The PCI bus functions take astructpci_dev pointer as an argument,
enabling strange bus architectures to be hidden from the device
driver. To a driver, a PCI device is almost a platform-independent
object.
Under Linux 2.0 a program using PCI would use
Under Linux 2.0 a program using PCI would use
if(!pcibios_present())
return -ENODEV;
to check if the PCI services existed. On Linux 2.2, this becomes
if(!pci_present())
return -ENODEV;
After this, you scan the bus looking for your device. A PCI device has
a vendor and device identifier that are unique for each different type
of card.
Figure 6A
unsigned char bus, devfn;
int index=0;
while(!(pcibios_find_device(MY_PCI_VENDOR, MY_PCI_DEVICE, index++,
&bus, &devfn)))
{
/* Check this device */
/* ... */
}
Figure 6B
struct pci_dev *pdev = NULL;
while((pdev=pci_find_device(MY_PCI_VENDOR, MY_PCI_DEVICE,pdev)! =
NULL))
{
/* Check this device */
/* ... */
}
Under Linux 2.0, drivers would use something like Figure 6A to walk
systematically through all matching cards. In Linux 2.2 the code is
very similar to that in Figure 6B.
As you can see, Linux 2.0 uses a counter to walk through the device
list and refers to devices by their bus and device-function
identifiers, which is how PCI is addressed at the device level. Linux
2.2 uses the struct pci_dev instead, which hides all sorts of
mysteries and sins that may be in the underlying PCI architecture.
The initial assignment of
struct pci_dev *pdev = NULL;
is done because NULL means "start from the beginning" when passed as
the third argument to pci_find_ device().
Once you have a handle on your PCI device you have access to its
memory, I/O and IRQ assignment. In PCI, these are encoded in what are
memory, I/O and IRQ assignment. In PCI, these are encoded in what are
known as the Base Address Registers (BAR registers). Each of these may
be used to hold either an I/O or memory address as well as its
properties.
Figure 7
struct pci_dev *pdev;
/* ... */
membase = (pdev->base_address[0] & PCI_BASE_ADDRESS_MEM_MASK);
mydev->mem = ioremap(membase, MY_DEVICE_SIZE);
If your card manual says "base address register 0 specifies the memory
address for the card", you would use something like Figure 7.
For I/O space (rather than memory space), PCI_BASE_ADDRESS_IO_MASK is
used instead of PCI_BASE_ADDRESS_MEM_ MASK.
The interrupt line for the card is found in pdev->irq. Each card has
only one interrupt but this may be shared between devices on the card
only one interrupt but this may be shared between devices on the card
and between cards. The interrupt will have been assigned for you at
boot time, either by the BIOS or boot ROM, or by the kernel itself.
PCI devices which are capable of generating bus read/write requests
themselves (say, to access host memory or another PCI device) are
called "bus masters". If a card is bus-mastering, it is up to the
driver to set the bus master flag in the PCI configuration register of
the board. This is such a common operation that the function
pci_set_master(pdev) is provided to do this.
From the above information you can find and map both memory and I/O
spaces on a PCI card. If you've ever read a PCI card manual or looked
at Linux 2.0 PCI code you will see there is a third PCI address space
-- the "configuration space". It contains a mix of vendor-specific and
standard registers that can be read and sometimes written.
The registers holding the vendor and device ID, which are used to find
your card on the bus, are examples of configuration space register.
Another example is the BAR registers themselves (which themselves
point to memory or I/O space).
Figure 8
Figure 8
error = pci_read_config_byte(struct pci_dev *, u8 where, u8 *val);
error = pci_read_config_word(struct pci_dev *, u8 where, u16 *val);
error = pci_read_config_dword(struct pci_dev *, u8 where, u32 *val);
error = pci_write_config_byte(struct pci_dev *, u8 where, u8 val);
error = pci_write_config_word(struct pci_dev *, u8 where, u16 val);
error = pci_write_config_dword(struct pci_dev *, u8 where, u32 val);
Linux 2.2 provides functions to read and write byte, word (16-bit) and
dword (32-bit) values in the PCI configuration space. These have a
straightforward mapping to the Linux 2.0 PCI BIOS functions. They can
be seen in Figure 8.
Here, where is the address (from 0-255) in the configuration space to
access, and val is the value to read or write. In Linux 2.0, these
functions looked like:
error =
pcibios_read_config_word (
u8 bus,
u8 devfn,
u8 where,
u8 *value)
and so forth. This makes porting Linux 2.0 to 2.2 PCI configuration
handling fairly painless to do.
You may notice that in Linux 2.0, some drivers used the PCI BIOS
functions directly in ways that you now want to avoid. Directly
accessing the PCI configuration space (e.g., for reading IRQs, BAR
registers, and setting the bus master flag) was required in version
2.0.
In Linux 2.2, however, it may be the case that the configuration space
values don't match those found in the structpci_dev structure. This is
because the kernel knows about things such as interrupt re-mapping on
non-x86 hardware and has been quietly fiddling with these values
behind your back. In short, you should always use the values in
You may notice that in Linux 2.0, some drivers used the PCI BIOS
structpci_dev, and not probe the PCI configuration space directly if
you can help it.
--
puke!
技工而已
※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: 202.118.239.152]
Powered by KBS BBS 2.0 (http://dev.kcn.cn)
页面执行时间:208.382毫秒