精华区文章阅读

发信人: Zinux (Linux技工), 信区: Embedded_system
标  题: Writing a Module for netfilter
发信站: 哈工大紫丁香 (2001年10月26日18:24:27 星期五), 站内信件

Linux Magazine (http://www.linux-mag.com) June 2000

Copyright Linux Magazine ?2000

GEARHEADS ONLY
Writing a Module for netfilter
by Paul "Rusty" Russell

Figure One: The netfilter architecture allows you to hook into a
protocol stack at several points.
With Linux 2.4 right around the corner, now would be a very good time
to discuss the new packet observation and filtering mechanism that
were introduced during the 2.3 kernel development, which iscalled
netfilter. I discussed the netfilter architecture briefly back in my
Best Defense column in October 1999
Best Defense column in October 1999
(http://www.linux-mag.com/1999-10/bestdefense_01.html), and more
thoroughly in the January 2000 issue of Linux Magazine.

netfilter is a framework inside the kernel that allows a module to
observe and modify packets as they pass through the IP stack. Well,
since I wrote that article in January, netfilter hooks have been added
to the IPv6 (the next-generation of IP) and DECnet (a more obscure
protocol) layers that are similar to those described here for IPv4.

Inside the kernel you will see calls such as the following throughout
the protocol code (this is from ip_local_ deliver() in
net/ipv4/ip_input.c):

return NF_HOOK(PF_INET, NF_IP_LOCAL_IN,
               skb, skb->dev, NULL,
               ip_local_deliver_finish);

NF_HOOK is a macro that calls any registered netfilter hooks for the
given protocol (PF_INET) and hook (NF_IP_ LOCAL_IN), with the given
packet (skb). It also handles information on the incoming and outgoing
packet (skb). It also handles information on the incoming and outgoing
devices (skb-> dev and NULL, respectively). Once everyone registered
to listen on that hook has returned NF_ACCEPT, the function specified
by the last argument is called to continue packet traversal
(ip_local_deliver_finish). If a hook returns NF_DROP, the packet is
freed, and the function is never called.

If CONFIG_NETFILTER is set to n when the kernel is compiled, then the
above macro simply calls the final argument, which is declared inline
(a gcc extension taken from C++) so there is no overhead for that
case.

Where to put these NF_HOOK calls in your protocol stack is of fairly
limited interest (there are only about a dozen protocols in the Linux
kernel), but of more interest is the other side of the framework: How
do you register to listen for packets at a certain point? Many people
have specialized packet watching or mangling needs, so I'll explain
what they can expect.

First, you have to decide what protocol you wish to hook into.
netfilter divides up hooks on a per-protocol basis: there is no way to
hook into all packets at once, for example. Usually this will be IP
(protocol PF_INET inside the kernel).
(protocol PF_INET inside the kernel).

Each protocol defines a number of points you can hook into. IPv4
defines five points, and the other protocols have so far followed the
model shown in Figure One (although DECnet added some new ones).

As you can see in the figure, a hook can observe all valid incoming
packets by registering at NF_IP_PRE_ROUTING. If you only want to
observe packets destined for this IP address, you can do that by
hooking into NF_IP_LOCAL_IN, and locally generated packets at
NF_IP_LOCAL_OUT. Packets being forwarded through the machine will hit
the NF_IP_FORWARD hook, and immediately before IP packets are
transmitted they will pass through the NF_IP_POST_ ROUTING hook.

Since many hooks can be registered at the same point, some priority
must be assigned to each hook to determine what order they are
executed in. Hooks with a lower-priority number are called first.

For IPv4, linux/netfilter_ipv4.h has an enumerated type that offers
some standard values. Traditionally, 0 is for packet filtering, so
negative numbers are used for executing hooks before filtering, and
positive numbers for after filtering.

To register a hook, you fill in an nf_hook_ops structure with the
priority, hook point, and a pointer to your hook function, and call
nf_register_hook(). In keeping with kernel tradition, this function
returns 0 for success, and a negative error number for failure. A good
example to look at is Jamal Salim's ingress filtering in
net/sched/sch_ingress.c, which uses a single netfilter hook, or the
more complex examples in the net/ipv4/netfilter/ directory.

A Silly Example

For the purposes of this article we're going to work a little bit on
the demonstration-only linuxmag.o kernel module. This tiny module will
corrupt locally generated IP packets that are of length 100, and drop
packets that are of length 200. First, we define the nf_hook_ops
structure:

static struct nf_hook_ops linuxmag_ops
= { { NULL, NULL }, linuxmag_hook,
= { { NULL, NULL }, linuxmag_hook,
   PF_INET, NF_IP_LOCAL_OUT,
   NF_IP_PRI_FILTER-1 };

The first element in the structure ({ NULL, NULL},) is a
doubly-linked-list element, which is used internally. The second is
the function to call (which in this case is the linuxmag_hook
function). Following that is the protocol (PF_INET), the hook point
(NF_IP_LOCAL_OUT) for locally generated packets, and the priority
(just before packet filtering).

All we need to do now is write the function that does the actual work
(see Listing One ).

Listing One: The linuxmag_hook Function

static unsigned int
linuxmag_hook(unsigned int hook, struct sk_buff **pskb,
               const struct net_device *indev, const
               struct net_device *outdev, int
               (*okfn)(struct sk_buff *))
{
{

      /* Get a handle to the packet data */
   unsigned char *data = (void *)(*pskb)->nh.iph +
                         (*pskb)->nh.iph->ihl*4;

   (*pskb)->nfcache |= NFC_UNKNOWN;

   switch ((*pskb)->len) {
   case 100:
      printk("linuxmag: corrupting packet\n");
      data[99]++;
      (*pskb)->nfcache |= NFC_ALTERED;
      return NF_ACCEPT;

   case 200:
      printk("linuxmag: dropping packet\n");
      return NF_DROP;

   default:
      return NF_ACCEPT;
   }
}
}

We can see that the hook function takes five arguments:

1. The Hook. This will always be NF_IP_ LOCAL_OUT in this module, as
that is the only place we register this function.

2. A Pointer to a Pointer to the skbuff. This represents the packet.
We will use the double-pointer so that we can replace the entire
packet with another one if that becomes necessary.

3. A Pointer to the Input Device. This is set to NULL for the
NF_IP_LOCAL_OUT hook.

4.A Pointer to the Output Device. This is set to the interface the
packet is heading out for the NF_IP_ LOCAL_OUT hook.

5.A Pointer to the Function that Will be Called if All the Hooks are
Successful. This should never be called directly, except for special
effects (it is a hack for modules that need to fragment packets).

In this function, we only care about the packet itself, so we use only
the pskb parameter. The first thing we do is obtain a pointer to the
packet's IP header. We know this field (nh.iph) is valid, because we
registered this as a PF_INET hook, so we will only ever be passed IP
packets.

The second thing we do is a little tricky. Each skbuff has a field
that should identify which skbuff fields were examined by a hook.
Values for this are given in include/linux/netfilter/netfilter_ipv4.h.
For example, if a module examined the source IP address, we would set
the NFC_IP_SRC bit in the nfcache field. In the future this field
could be used to cache the decisions made by modules. There is no
field for packet length, so we set the NFC_UNKNOWN bit, which means "I
looked at something that the framework doesn't understand, so make
sure I get every packet."

Next, we decide what to do based on packet length. If the packet
length is 100, we increment the last byte. Because we altered the
packet, we must mark it altered, by setting the NFC_ALTERED bit. This
is particularly important for NF_IP_LOCAL_OUT hook, which needs to
look up the route on the packet again in case we were to change the
way routing should be done. We then return NF_ACCEPT, which means to
way routing should be done. We then return NF_ACCEPT, which means to
let the packet through.

If the length is 200, we simply return NF_DROP, which means the packet
should be dropped. Otherwise, the packet passes unscathed, by
returning NF_ACCEPT.

Polishing Our Example

We need very little else to turn these two code fragments into a
complete kernel module. At the top of the code, we need the headers
and a comment:

/* Example kernel module for Linux Magazine. */
#include <linux/config.h>
#include <linux/module.h>
#include <linux/netfilter_ipv4.h>
#include <linux/ip.h>

Following this comes the linuxmag_hook function, then the linuxmag_ops
structure, then finally the glue needed to turn it into a module:

static int __init init(void)
{
  return nf_register_hook(&linuxmag_ops);
}

static void __exit fini(void)
{
  nf_unregister_hook(&linuxmag_ops);
}

module_init(init);
module_exit(fini);

So now we have a complete kernel module: the init function loads and
registers our hook function (returning a negative error code if it
fails) and the fini function unregisters it. Then we only need to use
fails) and the fini function unregisters it. Then we only need to use
the module_init and module_ exit macros to tell the kernel that these
are our module initialization functions. The _init and _exit keywords
are used if this is built into the kernel: It means that the init
function will be discarded after boot, freeing memory, and that the
fini function will never be needed at all, and hence should not be
included in the kernel image.

Testing Our Example

Let's look at what happens when we install our module and test it
using the ping program:

# insmod ./linuxmag.o
# ping -c1 linuxcare.com.au
PING linuxcare.com.au (203.29.91.49): 56
   data bytes
64 bytes from 203.29.91.49: icmp_seq=0
   ttl=249 time=204.0 ms

--- linuxcare.com.au ping statistics ---
1 packets transmitted, 1 packets received,
1 packets transmitted, 1 packets received,
   0% packet loss

Now let's send a packet of length 200 (which means we must use the
ping option -s172, since there are 20 bytes for the IP header, and 8
for the ICMP header):

# ping -c1 -s172 linuxcare.com.au
PING linuxcare.com.au (203.29.91.49): 172
   data bytes
ping: sendto: Operation not permitted
ping: wrote linuxcare.com.au 180 chars, ret=-1

--- linuxcare.com.au ping statistics ---
1 packets transmitted, 0 packets received,
   100% packet loss

And from dmesg we can see:

# dmesg -c
linuxmag: dropping packet

A packet of length 100 is corrupted (the ICMP checksum will be
incorrect after we've modified it), and so we will receive no reply:

# ping -c1 -s72 linuxcare.com.au
PING linuxcare.com.au (203.29.91.49):
   72 data bytes

--- linuxcare.com.au ping statistics ---
1 packets transmitted, 0 packets received,
   100% packet loss

And once again dmesg shows our little message:

# dmesg -c
linuxmag: corrupting packet
linuxmag: corrupting packet

If you were to do a tcpdump on a remote machine, you would see the
modified packet on the wire.

Beyond Our Example

Hook functions can return things other than NF_ ACCEPT and NF_DROP.
You can return NF_STOLEN, which means "I've taken control of the
packet, so don't refer to it again." This is different from NF_DROP,
which tries to free the packet using kfree_skb(). You can also return
NF_ REPEAT, which is like NF_ACCEPT, but calls this hook function
again, rather than moving on to the next one.

Finally, you can also return NF_QUEUE, which allows the packet to be
queued for asynchronous packet handling. If a handler is registered
(for IP, this is in net/ipv4/netfilter/ip_queue.c) then it will be
handed the packet, and then processing will finish. At some later
time, the packet will be reinjected, and processing will continue.

This is a very useful technique for dealing with packets in userspace,
where the kernel cannot wait while the processing is going on. In
where the kernel cannot wait while the processing is going on. In
fact, if ultra-high speed is not a requirement, you can do everything
you would do in the kernel in a simple userspace program, using James
Morris' libipq.

Where to Find Out More

As well as building on top of the netfilter framework directly, there
are elements which already exist which provide higher-level
functionality for IP (especially for packet filtering). You can find
details on all these in the netfilter-hacking-HOWTO, which is
available in my Unreliable Guides collection at
http://netfilter.kernelnotes.org/unreliable-guides.

The mailing list for serious kernel network development under Linux is
called netdev, and is hosted by SGI: netdev@oss.sgi.com. There is also
a netfilter mailing list, which is hosted by the SAMBA team and can be
found on one of the three netfilter mirrors:

* http://antarctica.penguincomputing.com/~netfilter/

* http://www.samba.org/netfilter/

* http://netfilter.kernelnotes.org/

The netfilter core team generally does not answer netfilter help
requests that are sent to them directly, so these resources are your
best starting point.

Happy hacking!

--

Paul "Rusty" Russell is the Linux kernel IP packet filter maintainer,
and gets to develop cool networky stuff for the Linux kernel. He can
be reached at paul.russell@rustcorp.com.au.

--

  puke!
  技工而已

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: 202.118.239.152]

Embedded 版 (精华区)