mirror of
https://github.com/uw-imap/imap.git
synced 2024-11-16 10:28:23 +01:00
418 lines
19 KiB
Plaintext
418 lines
19 KiB
Plaintext
|
/* ========================================================================
|
|||
|
* Copyright 1988-2006 University of Washington
|
|||
|
*
|
|||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|||
|
* you may not use this file except in compliance with the License.
|
|||
|
* You may obtain a copy of the License at
|
|||
|
*
|
|||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|||
|
*
|
|||
|
*
|
|||
|
* ========================================================================
|
|||
|
*/
|
|||
|
|
|||
|
UNIX Advisory File Locking Implications on c-client
|
|||
|
Mark Crispin, 28 November 1995
|
|||
|
|
|||
|
|
|||
|
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
|
|||
|
LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
|
|||
|
HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
|
|||
|
-- JUNE 15, 2004
|
|||
|
|
|||
|
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
|
|||
|
IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
|
|||
|
IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
|
|||
|
IMAP TOOLKIT.
|
|||
|
|
|||
|
INTRODUCTION
|
|||
|
|
|||
|
Advisory locking is a mechanism by which cooperating processes
|
|||
|
can signal to each other their usage of a resource and whether or not
|
|||
|
that usage is critical. It is not a mechanism to protect against
|
|||
|
processes which do not cooperate in the locking.
|
|||
|
|
|||
|
The most basic form of locking involves a counter. This counter
|
|||
|
is -1 when the resource is available. If a process wants the lock, it
|
|||
|
executes an atomic increment-and-test-if-zero. If the value is zero,
|
|||
|
the process has the lock and can execute the critical code that needs
|
|||
|
exclusive usage of a resource. When it is finished, it sets the lock
|
|||
|
back to -1. In C terms:
|
|||
|
|
|||
|
while (++lock) /* try to get lock */
|
|||
|
invoke_other_threads (); /* failed, try again */
|
|||
|
.
|
|||
|
. /* critical code here */
|
|||
|
.
|
|||
|
lock = -1; /* release lock */
|
|||
|
|
|||
|
This particular form of locking appears most commonly in
|
|||
|
multi-threaded applications such as operating system kernels. It
|
|||
|
makes several presumptions:
|
|||
|
(1) it is alright to keep testing the lock (no overflow)
|
|||
|
(2) the critical resource is single-access only
|
|||
|
(3) there is shared writeable memory between the two threads
|
|||
|
(4) the threads can be trusted to release the lock when finished
|
|||
|
|
|||
|
In applications programming on multi-user systems, most commonly
|
|||
|
the other threads are in an entirely different process, which may even
|
|||
|
be logged in as a different user. Few operating systems offer shared
|
|||
|
writeable memory between such processes.
|
|||
|
|
|||
|
A means of communicating this is by use of a file with a mutually
|
|||
|
agreed upon name. A binary semaphore can be passed by means of the
|
|||
|
existance or non-existance of that file, provided that there is an
|
|||
|
atomic means to create a file if and only if that file does not exist.
|
|||
|
In C terms:
|
|||
|
|
|||
|
/* try to get lock */
|
|||
|
while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
|
|||
|
sleep (1); /* failed, try again */
|
|||
|
close (fd); /* got the lock */
|
|||
|
.
|
|||
|
. /* critical code here */
|
|||
|
.
|
|||
|
unlink ("lockfile"); /* release lock */
|
|||
|
|
|||
|
This form of locking makes fewer presumptions, but it still is
|
|||
|
guilty of presumptions (2) and (4) above. Presumption (2) limits the
|
|||
|
ability to have processes sharing a resource in a non-conflicting
|
|||
|
fashion (e.g. reading from a file). Presumption (4) leads to
|
|||
|
deadlocks should the process crash while it has a resource locked.
|
|||
|
|
|||
|
Most modern operating systems provide a resource locking system
|
|||
|
call that has none of these presumptions. In particular, a mechanism
|
|||
|
is provided for identifying shared locks as opposed to exclusive
|
|||
|
locks. A shared lock permits other processes to obtain a shared lock,
|
|||
|
but denies exclusive locks. In other words:
|
|||
|
|
|||
|
current state want shared want exclusive
|
|||
|
------------- ----------- --------------
|
|||
|
unlocked YES YES
|
|||
|
locked shared YES NO
|
|||
|
locked exclusive NO NO
|
|||
|
|
|||
|
Furthermore, the operating system automatically relinquishes all
|
|||
|
locks held by that process when it terminates.
|
|||
|
|
|||
|
A useful operation is the ability to upgrade a shared lock to
|
|||
|
exclusive (provided there are no other shared users of the lock) and
|
|||
|
to downgrade an exclusive lock to shared. It is important that at no
|
|||
|
time is the lock ever removed; a process upgrading to exclusive must
|
|||
|
not relenquish its shared lock.
|
|||
|
|
|||
|
Most commonly, the resources being locked are files. Shared
|
|||
|
locks are particularly important with files; multiple simultaneous
|
|||
|
processes can read from a file, but only one can safely write at a
|
|||
|
time. Some writes may be safer than others; an append to the end of
|
|||
|
the file is safer than changing existing file data. In turn, changing
|
|||
|
a file record in place is safer than rewriting the file with an
|
|||
|
entirely different structure.
|
|||
|
|
|||
|
|
|||
|
FILE LOCKING ON UNIX
|
|||
|
|
|||
|
In the oldest versions of UNIX, the use of a semaphore lockfile
|
|||
|
was the only available form of locking. Advisory locking system calls
|
|||
|
were not added to UNIX until after the BSD vs. System V split. Both
|
|||
|
of these system calls deal with file resources only.
|
|||
|
|
|||
|
Most systems only have one or the other form of locking. AIX
|
|||
|
and newer versions of OSF/1 emulate the BSD form of locking as a jacket
|
|||
|
into the System V form. Ultrix and Linux implement both forms.
|
|||
|
|
|||
|
BSD
|
|||
|
|
|||
|
BSD added the flock() system call. It offers capabilities to
|
|||
|
acquire shared lock, acquire exclusive lock, and unlock. Optionally,
|
|||
|
the process can request an immediate error return instead of blocking
|
|||
|
when the lock is unavailable.
|
|||
|
|
|||
|
|
|||
|
FLOCK() BUGS
|
|||
|
|
|||
|
flock() advertises that it permits upgrading of shared locks to
|
|||
|
exclusive and downgrading of exclusive locks to shared, but it does so
|
|||
|
by releasing the former lock and then trying to acquire the new lock.
|
|||
|
This creates a window of vulnerability in which another process can
|
|||
|
grab the exclusive lock. Therefore, this capability is not useful,
|
|||
|
although many programmers have been deluded by incautious reading of
|
|||
|
the flock() man page to believe otherwise. This problem can be
|
|||
|
programmed around, once the programmer is aware of it.
|
|||
|
|
|||
|
flock() always returns as if it succeeded on NFS files, when in
|
|||
|
fact it is a no-op. There is no way around this.
|
|||
|
|
|||
|
Leaving aside these two problems, flock() works remarkably well,
|
|||
|
and has shown itself to be robust and trustworthy.
|
|||
|
|
|||
|
SYSTEM V/POSIX
|
|||
|
|
|||
|
System V added new functions to the fnctl() system call, and a
|
|||
|
simple interface through the lockf() subroutine. This was
|
|||
|
subsequently included in POSIX. Both offer the facility to apply the
|
|||
|
lock to a particular region of the file instead of to the entire file.
|
|||
|
lockf() only supports exclusive locks, and calls fcntl() internally;
|
|||
|
hence it won't be discussed further.
|
|||
|
|
|||
|
Functionally, fcntl() locking is a superset of flock(); it is
|
|||
|
possible to implement a flock() emulator using fcntl(), with one minor
|
|||
|
exception: it is not possible to acquire an exclusive lock if the file
|
|||
|
is not open for write.
|
|||
|
|
|||
|
The fcntl() locking functions are: query lock station of a file
|
|||
|
region, lock/unlock a region, and lock/unlock a region and block until
|
|||
|
have the lock. The locks may be shared or exclusive. By means of the
|
|||
|
statd and lockd daemons, fcntl() locking is available on NFS files.
|
|||
|
|
|||
|
When statd is started at system boot, it reads its /etc/state
|
|||
|
file (which contains the number of times it has been invoked) and
|
|||
|
/etc/sm directory (which contains a list of all remote sites which are
|
|||
|
client or server locking with this site), and notifies the statd on
|
|||
|
each of these systems that it has been restarted. Each statd then
|
|||
|
notifies the local lockd of the restart of that system.
|
|||
|
|
|||
|
lockd receives fcntl() requests for NFS files. It communicates
|
|||
|
with the lockd at the server and requests it to apply the lock, and
|
|||
|
with the statd to request it for notification when the server goes
|
|||
|
down. It blocks until all these requests are completed.
|
|||
|
|
|||
|
There is quite a mythos about fcntl() locking.
|
|||
|
|
|||
|
One religion holds that fcntl() locking is the best thing since
|
|||
|
sliced bread, and that programs which use flock() should be converted
|
|||
|
to fcntl() so that NFS locking will work. However, as noted above,
|
|||
|
very few systems support both calls, so such an exercise is pointless
|
|||
|
except on Ultrix and Linux.
|
|||
|
|
|||
|
Another religion, which I adhere to, has the opposite viewpoint.
|
|||
|
|
|||
|
|
|||
|
FCNTL() BUGS
|
|||
|
|
|||
|
For all of the hairy code to do individual section locking of a
|
|||
|
file, it's clear that the designers of fcntl() locking never
|
|||
|
considered some very basic locking operations. It's as if all they
|
|||
|
knew about locking they got out of some CS textbook with not
|
|||
|
investigation of real-world needs.
|
|||
|
|
|||
|
It is not possible to acquire an exclusive lock unless the file
|
|||
|
is open for write. You could have append with shared read, and thus
|
|||
|
you could have a case in which a read-only access may need to go
|
|||
|
exclusive. This problem can be programmed around once the programmer
|
|||
|
is aware of it.
|
|||
|
|
|||
|
If the file is opened on another file designator in the same
|
|||
|
process, the file is unlocked even if no attempt is made to do any
|
|||
|
form of locking on the second designator. This is a very bad bug. It
|
|||
|
means that an application must keep track of all the files that it has
|
|||
|
opened and locked.
|
|||
|
|
|||
|
If there is no statd/lockd on the NFS server, fcntl() will hang
|
|||
|
forever waiting for them to appear. This is a bad bug. It means that
|
|||
|
any attempt to lock on a server that doesn't run these daemons will
|
|||
|
hang. There is no way for an application to request flock() style
|
|||
|
``try to lock, but no-op if the mechanism ain't there''.
|
|||
|
|
|||
|
There is a rumor to the effect that fcntl() will hang forever on
|
|||
|
local files too if there is no local statd/lockd. These daemons are
|
|||
|
running on mailer.u, although they appear not to have much CPU time.
|
|||
|
A useful experiment would be to kill them and see if imapd is affected
|
|||
|
in any way, but I decline to do so without an OK from UCS! ;-) If
|
|||
|
killing statd/lockd can be done without breaking fcntl() on local
|
|||
|
files, this would become one of the primary means of dealing with this
|
|||
|
problem.
|
|||
|
|
|||
|
The statd and lockd daemons have quite a reputation for extreme
|
|||
|
fragility. There have been numerous reports about the locking
|
|||
|
mechanism being wedged on a systemwide or even clusterwide basis,
|
|||
|
requiring a reboot to clear. It is rumored that this wedge, once it
|
|||
|
happens, also blocks local locking. Presumably killing and restarting
|
|||
|
statd would suffice to clear the wedge, but I haven't verified this.
|
|||
|
|
|||
|
There appears to be a limit to how many locks may be in use at a
|
|||
|
time on the system, although the documentation only mentions it in
|
|||
|
passing. On some of their systems, UCS has increased lockd's ``size
|
|||
|
of the socket buffer'', whatever that means.
|
|||
|
|
|||
|
C-CLIENT USAGE
|
|||
|
|
|||
|
c-client uses flock(). On System V systems, flock() is simulated
|
|||
|
by an emulator that calls fcntl().
|
|||
|
|
|||
|
|
|||
|
BEZERK AND MMDF
|
|||
|
|
|||
|
Locking in the traditional UNIX formats was largely dictated by
|
|||
|
the status quo in other applications; however, additional protection
|
|||
|
is added against inadvertantly running multiple instances of a
|
|||
|
c-client application on the same mail file.
|
|||
|
|
|||
|
(1) c-client attempts to create a .lock file (mail file name with
|
|||
|
``.lock'' appended) whenever it reads from, or writes to, the mail
|
|||
|
file. This is an exclusive lock, and is held only for short periods
|
|||
|
of time while c-client is actually doing the I/O. There is a 5-minute
|
|||
|
timeout for this lock, after which it is broken on the presumption
|
|||
|
that it is a stale lock. If it can not create the .lock file due to
|
|||
|
an EACCES (protection failure) error, it once silently proceeded
|
|||
|
without this lock; this was for systems which protect /usr/spool/mail
|
|||
|
from unprivileged processes creating files. Today, c-client reports
|
|||
|
an error unless it is built otherwise. The purpose of this lock is to
|
|||
|
prevent against unfavorable interactions with mail delivery.
|
|||
|
|
|||
|
(2) c-client applies a shared flock() to the mail file whenever
|
|||
|
it reads from the mail file, and an exclusive flock() whenever it
|
|||
|
writes to the mail file. This lock is freed as soon as it finishes
|
|||
|
reading. The purpose of this lock is to prevent against unfavorable
|
|||
|
interactions with mail delivery.
|
|||
|
|
|||
|
(3) c-client applies an exclusive flock() to a file on /tmp
|
|||
|
(whose name represents the device and inode number of the file) when
|
|||
|
it opens the mail file. This lock is maintained throughout the
|
|||
|
session, although c-client has a feature (called ``kiss of death'')
|
|||
|
which permits c-client to forcibly and irreversibly seize the lock
|
|||
|
from a cooperating c-client application that surrenders the lock on
|
|||
|
demand. The purpose of this lock is to prevent against unfavorable
|
|||
|
interactions with other instances of c-client (rewriting the mail
|
|||
|
file).
|
|||
|
|
|||
|
Mail delivery daemons use lock (1), (2), or both. Lock (1) works
|
|||
|
over NFS; lock (2) is the only one that works on sites that protect
|
|||
|
/usr/spool/mail against unprivileged file creation. Prudent mail
|
|||
|
delivery daemons use both forms of locking, and of course so does
|
|||
|
c-client.
|
|||
|
|
|||
|
If only lock (2) is used, then multiple processes can read from
|
|||
|
the mail file simultaneously, although in real life this doesn't
|
|||
|
really change things. The normal state of locks (1) and (2) is
|
|||
|
unlocked except for very brief periods.
|
|||
|
|
|||
|
|
|||
|
TENEX AND MTX
|
|||
|
|
|||
|
The design of the locking mechanism of these formats was
|
|||
|
motivated by a design to enable multiple simultaneous read/write
|
|||
|
access. It is almost the reverse of how locking works with
|
|||
|
bezerk/mmdf.
|
|||
|
|
|||
|
(1) c-client applies a shared flock() to the mail file when it
|
|||
|
opens the mail file. It upgrades this lock to exclusive whenever it
|
|||
|
tries to expunge the mail file. Because of the flock() bug that
|
|||
|
upgrading a lock actually releases it, it will not do so until it has
|
|||
|
acquired an exclusive lock (2) first. The purpose of this lock is to
|
|||
|
prevent against expunge taking place while some other c-client has the
|
|||
|
mail file open (and thus knows where all the messages are).
|
|||
|
|
|||
|
(2) c-client applies a shared flock() to a file on /tmp (whose
|
|||
|
name represents the device and inode number of the file) when it
|
|||
|
parses the mail file. It applies an exclusive flock() to this file
|
|||
|
when it appends new mail to the mail file, as well as before it
|
|||
|
attempts to upgrade lock (1) to exclusive. The purpose of this lock
|
|||
|
is to prevent against data being appended while some other c-client is
|
|||
|
parsing mail in the file (to prevent reading of incomplete messages).
|
|||
|
It also protects against the lock-releasing timing race on lock (1).
|
|||
|
|
|||
|
OBSERVATIONS
|
|||
|
|
|||
|
In a perfect world, locking works. You are protected against
|
|||
|
unfavorable interactions with the mailer and against your own mistake
|
|||
|
by running more than one instance of your mail reader. In tenex/mtx
|
|||
|
formats, you have the additional benefit that multiple simultaneous
|
|||
|
read/write access works, with the sole restriction being that you
|
|||
|
can't expunge if there are any sharers of the mail file.
|
|||
|
|
|||
|
If the mail file is NFS-mounted, then flock() locking is a silent
|
|||
|
no-op. This is the way BSD implements flock(), and c-client's
|
|||
|
emulation of flock() through fcntl() tests for NFS files and
|
|||
|
duplicates this functionality. There is no locking protection for
|
|||
|
tenex/mtx mail files at all, and only protection against the mailer
|
|||
|
for bezerk/mmdf mail files. This has been the accepted state of
|
|||
|
affairs on UNIX for many sad years.
|
|||
|
|
|||
|
If you can not create .lock files, it should not affect locking,
|
|||
|
since the flock() locks suffice for all protection. This is, however,
|
|||
|
not true if the mailer does not check for flock() locking, or if the
|
|||
|
the mail file is NFS-mounted.
|
|||
|
|
|||
|
What this means is that there is *no* locking protection at all
|
|||
|
in the case of a client using an NFS-mounted /usr/spool/mail that does
|
|||
|
not permit file creation by unprivileged programs. It is impossible,
|
|||
|
under these circumstances, for an unprivileged program to do anything
|
|||
|
about it. Worse, if EACCES errors on .lock file creation are no-op'ed
|
|||
|
, the user won't even know about it. This is arguably a site
|
|||
|
configuration error.
|
|||
|
|
|||
|
The problem with not being able to create .lock files exists on
|
|||
|
System V as well, but the failure modes for flock() -- which is
|
|||
|
implemented via fcntl() -- are different.
|
|||
|
|
|||
|
On System V, if the mail file is NFS-mounted and either the
|
|||
|
client or the server lacks a functioning statd/lockd pair, then the
|
|||
|
lock attempt would have hung forever if it weren't for the fact that
|
|||
|
c-client tests for NFS and no-ops the flock() emulator in this case.
|
|||
|
Systemwide or clusterwide failures of statd/lockd have been known to
|
|||
|
occur which cause all locks in all processes to hang (including
|
|||
|
local?). Without the special NFS test made by c-client, there would
|
|||
|
be no way to request BSD-style no-op behavior, nor is there any way to
|
|||
|
determine that this is happening other than the system being hung.
|
|||
|
|
|||
|
The additional locking introduced by c-client was shown to cause
|
|||
|
much more stress on the System V locking mechanism than has
|
|||
|
traditionally been placed upon it. If it was stressed too far, all
|
|||
|
hell broke loose. Fortunately, this is now past history.
|
|||
|
|
|||
|
TRADEOFFS
|
|||
|
|
|||
|
c-client based applications have a reasonable chance of winning
|
|||
|
as long as you don't use NFS for remote access to mail files. That's
|
|||
|
what IMAP is for, after all. It is, however, very important to
|
|||
|
realize that you can *not* use the lock-upgrade feature by itself
|
|||
|
because it releases the lock as an interim step -- you need to have
|
|||
|
lock-upgrading guarded by another lock.
|
|||
|
|
|||
|
If you have the misfortune of using System V, you are likely to
|
|||
|
run into problems sooner or later having to do with statd/lockd. You
|
|||
|
basically end up with one of three unsatisfactory choices:
|
|||
|
1) Grit your teeth and live with it.
|
|||
|
2) Try to make it work:
|
|||
|
a) avoid NFS access so as not to stress statd/lockd.
|
|||
|
b) try to understand the code in statd/lockd and hack it
|
|||
|
to be more robust.
|
|||
|
c) hunt out the system limit of locks, if there is one,
|
|||
|
and increase it. Figure on at least two locks per
|
|||
|
simultaneous imapd process and four locks per Pine
|
|||
|
process. Better yet, make the limit be 10 times the
|
|||
|
maximum number of processes.
|
|||
|
d) increase the socket buffer (-S switch to lockd) if
|
|||
|
it is offered. I don't know what this actually does,
|
|||
|
but giving lockd more resources to do its work can't
|
|||
|
hurt. Maybe.
|
|||
|
3) Decide that it can't possibly work, and turn off the
|
|||
|
fcntl() calls in your program.
|
|||
|
4) If nuking statd/lockd can be done without breaking local
|
|||
|
locking, then do so. This would make SVR4 have the same
|
|||
|
limitations as BSD locking, with a couple of additional
|
|||
|
bugs.
|
|||
|
5) Check for NFS, and don't do the fcntl() in the NFS case.
|
|||
|
This is what c-client does.
|
|||
|
|
|||
|
Note that if you are going to use NFS to access files on a server
|
|||
|
which does not have statd/lockd running, your only choice is (3), (4),
|
|||
|
or (5). Here again, IMAP can bail you out.
|
|||
|
|
|||
|
These problems aren't unique to c-client applications; they have
|
|||
|
also been reported with Elm, Mediamail, and other email tools.
|
|||
|
|
|||
|
Of the other two SVR4 locking bugs:
|
|||
|
|
|||
|
Programmer awareness is necessary to deal with the bug that you
|
|||
|
can not get an exclusive lock unless the file is open for write. I
|
|||
|
believe that c-client has fixed all of these cases.
|
|||
|
|
|||
|
The problem about opening a second designator smashing any
|
|||
|
current locks on the file has not been addressed satisfactorily yet.
|
|||
|
This is not an easy problem to deal with, especially in c-client which
|
|||
|
really doesn't know what other files/streams may be open by Pine.
|
|||
|
|
|||
|
Aren't you so happy that you bought an System V system?
|