精华区文章阅读

发信人: JJason (C++ Primer), 信区: Programming
标  题: Fast UDP-Based Network Storage(zz)
发信站: 哈工大紫丁香 (2002年11月22日09:47:26 星期五), 站内信件

以下是刊登在CUJ杂志2002年12期上的一篇文章
///////////////////////////////////////////////////////////////////////
                Fast UDP-Based Network Storage
                                    Tim Kientzle
If network performance is critical, you can bypass that database of yours
with this clever approach.

-----------------------------------------------------------------------------

The relational database is the hands-down winner for general-purpose data
storage. But that generality makes it less than ideal for some applications.
In many cases, especially those where very high performance is critical, a
custom server will fit your requirements much better.

Over the last few years, I’ve been examining ways to manage session data for
clusters of web servers. This particular application needs high-performance,
low-latency storage of small pieces of data. After examining a number of
different approaches, I found that a combination of UDP-based transport and
memory-based storage gave exemplary performance in less than 200 lines of
straightforward C code.

In this article, I’ll explain why UDP is often a better fit than TCP for
performance-driven network applications. I’ll then share the techniques you’l
l need to build your own custom storage service. Full example source code is
available at <www.cuj.com/code>.

TCP’s Drawbacks
TCP is a natural fit for many services. A client opens a TCP connection to
the server, and the two then exchange commands and results over that
connection. The TCP connection itself defines a persistent session that the
server uses to associate various resources with a particular client.

TCP can be complex to use. Because connections can be long lived, TCP servers
must support multiple simultaneous connections. Many servers use a separate
thread or process for each connection, but this approach degrades under high
load due to task-switching overhead. Single-threaded servers reduce
task-switching overhead, but must use complex asynchronous I/O to juggle the
connections.

Latency is also a problem with TCP. TCP’s three-way handshake adds a
noticeable delay to each new connection. To reduce overhead, the operating
system deliberately delays short packets in the hope that more data will
follow, which adds additional latency at the end of each request or response.
If most requests are short, that additional delay can be added to every
request.

UDP to the Rescue
UDP is less well known than TCP. It provides a mechanism for programs to send
and receive single-packet messages over IP. Although it has some significant
limitations, UDP does manage to avoid the latency and connection-management
problems that dog TCP.

UDP clients send explicit packets; there is no connection to worry about and
no delay before a packet is sent. Similarly, the server does not need to
maintain knowledge of each client. The server simply receives a request,
builds a response packet, and fires it back. This makes UDP ideal for simple
request-response protocols such as DNS, NTP, or CLDAP.

UDP’s Limitations
UDP’s primary limitations are reliability and security. UDP provides no
transmission guarantees, which means that a message might not get to its
intended recipient. Over switched Ethernet, this is not much of a problem, as
lost packets are rare. However, robust clients do need to be prepared to
retry requests.

Security is a more subtle issue. UDP servers tend to be subject to spoofing
attacks, in which a request packet is sent with a forged source address. As a
result, UDP is most useful inside a network protected by a good firewall. Any
authentication information must be provided with each request, which can lead
to excessive overhead if strong cryptographic authentication is required. Of
course, if you’re choosing UDP for performance reasons, strong cryptographic
authentication is probably not a concern.

The UDP API
UDP is supported by the standard Unix sockets API. Because there are no
connections to deal with, only sending and receiving of packets, this portion
of the API is really quite simple.

The first step for the server is to create the socket using the socket call
and bind it to a specific port using the bind call, as shown in Listing 1.
The server then simply waits for a packet to arrive and responds when it
does. The socket’s API provides a recvfrom call, which waits for an incoming
packet and places it into your buffer. The recvfrom call also provides you
with the address of the sender, which you can then use in a corresponding
sendto call to send a response. Listing 2 outlines this process.

The client is only a little different. Because the client doesn’t care what
port it uses, the bind call is unnecessary. The client needs to build an addr
structure with the server address and then send the request using sendto.

A naive client would then wait for a response with recvfrom. Unfortunately,
this will simply hang if the response packet doesn’t arrive. As a practical
matter, it’s necessary to enforce some sort of timeout. The standard
technique on Unix is to use a select call to wait for data on the socket and
then only call recvfrom when and if data does arrive. Listing 3 provides a
rough outline of the client logic.

Session Storage
The specific application that led me to consider UDP was session storage for
web server farms. Websites routinely track a certain amount of information
about each visitor. This involves assigning each visitor a session token,
which is generally stored in a client-side cookie. The client provides that
information with each request, and the server can then use it to access
information about the user. For example, the server might store the user’s
name, geographic location, or shopping cart information.

Many websites store this data on each front-end server, often in RAM. This
creates a number of awkward problems: it is difficult or impossible to mix
different implementation technologies; the failure of a front-end server will
lose session data; the load-balancing mechanism must provide “session
persistence,” in which requests from a single user are consistently
dispatched to the same server.

Using a shared back-end session server eliminates these problems. A standard
protocol allows different implementation technologies to be mixed on the same
site. Front-end servers can come and go without affecting current users.
Session persistence is not required, since any front-end server can access
the session data at any time.

UDP-Based Session Storage
Session storage requires only a few operations: clients need to be able to
allocate a session, read a session, or write to a session. To keep things
simple, I’ve kept the request and response formats identical: 1-byte
operation/status code, 7-byte session identifier, and up to 1,024 bytes of
session data. Valid responses always set the status byte to zero, so you can
simply treat the first eight bytes as a session key.

To create a new session, the client sends a packet whose first byte is zero.
The server responds with a newly created session, which includes a newly
allocated session key. That key will generally be encoded into a cookie for
the user’s web browser. To read or write the session data, the client
provides the correct operation code (1 to read, 2 to write) and session key.
For a write operation, the session data is appended. In either case, the
server will always respond with the full session (whose first byte is always
zero) unless there is an error, in which case the first byte of the response
indicates the nature of the problem.

Limitations
This protocol is very simple and should be easy to implement in just about
any programming language. However, there are a few awkward points that you
may need to consider. First, there is no explicit support for session
timeouts; if the web application needs sessions to expire, it will need to
store an expiration time in the session data.

If your session data is very large, the resulting UDP packets will be
transferred as multiple IP fragments. This increases the likelihood of
dropped packets, so it should be avoided over long-haul networks.

One way to limit the size of your session data is to store only the most
critical data on the session server. The remaining data can be stored in a
slower but more flexible relational database. For example, a shopping cart
might store the total number of items in the session server so that every
page can display a basic shopping cart status, while the complete list of
items is kept in a relational database.

Implementation Notes
The programs udpserver.c and udpclient.c (available for download at
<www.cuj.com/code>) provide a sample implementation of this protocol. The
client exercises the full get/put protocol to ensure that it gets consistent
results and should be easy to modify for your own requirements.

The server is more interesting. To keep things as efficient as possible, the
server is single threaded and stores all session data in memory. By limiting
each session to 1,024 bytes of data, a server with 1 GB of memory can handle
just under one million simultaneous sessions without swapping. This is more
than enough for most websites.

The combination of UDP and simple memory-based storage gives impressive
performance. This server can easily saturate a 100 Mbps Ethernet — upwards
of 15,000 requests per second. A full client request, including the network
round trip, takes a mere 300 microseconds, which is faster than most hard
disks. I’ve not yet had the opportunity to benchmark this server over
Gigabit Ethernet.

Recall that the server chooses the session IDs. This is a key feature of the
protocol, which allows the server to store the array index directly as part
of the session ID. Incoming requests can then be immediately mapped to a
particular array slot. Since there are only a limited number of array slots,
the server keeps a list of the least recently used slots and reuses them as
necessary. Filling the remainder of the session ID with a random value helps
guard against any conflicts caused by this reuse.

Future Directions
The current implementation does not provide any persistence. While this is
fine for most e-commerce sites that only need short-term session management,
it’s less appropriate for portal sites that want to use long-lived cookies
to provide easy access. Long-lived sessions really require some form of
disk-based storage. Adding disk-based storage requires care to avoid
impacting performance.

One approach would use two threads. One thread is similar to the current
fast-running server, handling most requests directly from memory. Sessions
that are not immediately available in memory are placed on a queue to be
handled by the second thread. That thread moves sessions from disk into
memory before sending a response. Of course, the simple approach of embedding
a slot number into the session ID will not work with such a design. Instead,
use a hash table or balanced tree to map session IDs to sessions.

About the Author
Tim Kientzle is an independent consultant, instructor, and software developer
based in Oakland, California. He can be contacted at kientzle@acm.org.

--

     人生，就是一团欲望：
     欲望没有满足的时候就是痛苦，
     欲望被满足的时候就是无聊；
     人生就是在痛苦与无聊之间徘徊。

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: 202.118.229.69]

Programming 版 (精华区)