mirror of
https://github.com/uw-imap/imap.git
synced 2024-11-16 18:38:21 +01:00
22f316e36d
MD5 2126fd125ea26b73b20f01fcd5940369
418 lines
19 KiB
Plaintext
418 lines
19 KiB
Plaintext
/* ========================================================================
|
||
* Copyright 1988-2006 University of Washington
|
||
*
|
||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||
* you may not use this file except in compliance with the License.
|
||
* You may obtain a copy of the License at
|
||
*
|
||
* http://www.apache.org/licenses/LICENSE-2.0
|
||
*
|
||
*
|
||
* ========================================================================
|
||
*/
|
||
|
||
UNIX Advisory File Locking Implications on c-client
|
||
Mark Crispin, 28 November 1995
|
||
|
||
|
||
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE FACT THAT
|
||
LINUX SUPPORTS BOTH flock() AND fcntl() AND THAT OSF/1
|
||
HAS BEEN BROKEN SO THAT IT ONLY SUPPORTS fcntl().
|
||
-- JUNE 15, 2004
|
||
|
||
THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
|
||
IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
|
||
IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
|
||
IMAP TOOLKIT.
|
||
|
||
INTRODUCTION
|
||
|
||
Advisory locking is a mechanism by which cooperating processes
|
||
can signal to each other their usage of a resource and whether or not
|
||
that usage is critical. It is not a mechanism to protect against
|
||
processes which do not cooperate in the locking.
|
||
|
||
The most basic form of locking involves a counter. This counter
|
||
is -1 when the resource is available. If a process wants the lock, it
|
||
executes an atomic increment-and-test-if-zero. If the value is zero,
|
||
the process has the lock and can execute the critical code that needs
|
||
exclusive usage of a resource. When it is finished, it sets the lock
|
||
back to -1. In C terms:
|
||
|
||
while (++lock) /* try to get lock */
|
||
invoke_other_threads (); /* failed, try again */
|
||
.
|
||
. /* critical code here */
|
||
.
|
||
lock = -1; /* release lock */
|
||
|
||
This particular form of locking appears most commonly in
|
||
multi-threaded applications such as operating system kernels. It
|
||
makes several presumptions:
|
||
(1) it is alright to keep testing the lock (no overflow)
|
||
(2) the critical resource is single-access only
|
||
(3) there is shared writeable memory between the two threads
|
||
(4) the threads can be trusted to release the lock when finished
|
||
|
||
In applications programming on multi-user systems, most commonly
|
||
the other threads are in an entirely different process, which may even
|
||
be logged in as a different user. Few operating systems offer shared
|
||
writeable memory between such processes.
|
||
|
||
A means of communicating this is by use of a file with a mutually
|
||
agreed upon name. A binary semaphore can be passed by means of the
|
||
existance or non-existance of that file, provided that there is an
|
||
atomic means to create a file if and only if that file does not exist.
|
||
In C terms:
|
||
|
||
/* try to get lock */
|
||
while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
|
||
sleep (1); /* failed, try again */
|
||
close (fd); /* got the lock */
|
||
.
|
||
. /* critical code here */
|
||
.
|
||
unlink ("lockfile"); /* release lock */
|
||
|
||
This form of locking makes fewer presumptions, but it still is
|
||
guilty of presumptions (2) and (4) above. Presumption (2) limits the
|
||
ability to have processes sharing a resource in a non-conflicting
|
||
fashion (e.g. reading from a file). Presumption (4) leads to
|
||
deadlocks should the process crash while it has a resource locked.
|
||
|
||
Most modern operating systems provide a resource locking system
|
||
call that has none of these presumptions. In particular, a mechanism
|
||
is provided for identifying shared locks as opposed to exclusive
|
||
locks. A shared lock permits other processes to obtain a shared lock,
|
||
but denies exclusive locks. In other words:
|
||
|
||
current state want shared want exclusive
|
||
------------- ----------- --------------
|
||
unlocked YES YES
|
||
locked shared YES NO
|
||
locked exclusive NO NO
|
||
|
||
Furthermore, the operating system automatically relinquishes all
|
||
locks held by that process when it terminates.
|
||
|
||
A useful operation is the ability to upgrade a shared lock to
|
||
exclusive (provided there are no other shared users of the lock) and
|
||
to downgrade an exclusive lock to shared. It is important that at no
|
||
time is the lock ever removed; a process upgrading to exclusive must
|
||
not relenquish its shared lock.
|
||
|
||
Most commonly, the resources being locked are files. Shared
|
||
locks are particularly important with files; multiple simultaneous
|
||
processes can read from a file, but only one can safely write at a
|
||
time. Some writes may be safer than others; an append to the end of
|
||
the file is safer than changing existing file data. In turn, changing
|
||
a file record in place is safer than rewriting the file with an
|
||
entirely different structure.
|
||
|
||
|
||
FILE LOCKING ON UNIX
|
||
|
||
In the oldest versions of UNIX, the use of a semaphore lockfile
|
||
was the only available form of locking. Advisory locking system calls
|
||
were not added to UNIX until after the BSD vs. System V split. Both
|
||
of these system calls deal with file resources only.
|
||
|
||
Most systems only have one or the other form of locking. AIX
|
||
and newer versions of OSF/1 emulate the BSD form of locking as a jacket
|
||
into the System V form. Ultrix and Linux implement both forms.
|
||
|
||
BSD
|
||
|
||
BSD added the flock() system call. It offers capabilities to
|
||
acquire shared lock, acquire exclusive lock, and unlock. Optionally,
|
||
the process can request an immediate error return instead of blocking
|
||
when the lock is unavailable.
|
||
|
||
|
||
FLOCK() BUGS
|
||
|
||
flock() advertises that it permits upgrading of shared locks to
|
||
exclusive and downgrading of exclusive locks to shared, but it does so
|
||
by releasing the former lock and then trying to acquire the new lock.
|
||
This creates a window of vulnerability in which another process can
|
||
grab the exclusive lock. Therefore, this capability is not useful,
|
||
although many programmers have been deluded by incautious reading of
|
||
the flock() man page to believe otherwise. This problem can be
|
||
programmed around, once the programmer is aware of it.
|
||
|
||
flock() always returns as if it succeeded on NFS files, when in
|
||
fact it is a no-op. There is no way around this.
|
||
|
||
Leaving aside these two problems, flock() works remarkably well,
|
||
and has shown itself to be robust and trustworthy.
|
||
|
||
SYSTEM V/POSIX
|
||
|
||
System V added new functions to the fnctl() system call, and a
|
||
simple interface through the lockf() subroutine. This was
|
||
subsequently included in POSIX. Both offer the facility to apply the
|
||
lock to a particular region of the file instead of to the entire file.
|
||
lockf() only supports exclusive locks, and calls fcntl() internally;
|
||
hence it won't be discussed further.
|
||
|
||
Functionally, fcntl() locking is a superset of flock(); it is
|
||
possible to implement a flock() emulator using fcntl(), with one minor
|
||
exception: it is not possible to acquire an exclusive lock if the file
|
||
is not open for write.
|
||
|
||
The fcntl() locking functions are: query lock station of a file
|
||
region, lock/unlock a region, and lock/unlock a region and block until
|
||
have the lock. The locks may be shared or exclusive. By means of the
|
||
statd and lockd daemons, fcntl() locking is available on NFS files.
|
||
|
||
When statd is started at system boot, it reads its /etc/state
|
||
file (which contains the number of times it has been invoked) and
|
||
/etc/sm directory (which contains a list of all remote sites which are
|
||
client or server locking with this site), and notifies the statd on
|
||
each of these systems that it has been restarted. Each statd then
|
||
notifies the local lockd of the restart of that system.
|
||
|
||
lockd receives fcntl() requests for NFS files. It communicates
|
||
with the lockd at the server and requests it to apply the lock, and
|
||
with the statd to request it for notification when the server goes
|
||
down. It blocks until all these requests are completed.
|
||
|
||
There is quite a mythos about fcntl() locking.
|
||
|
||
One religion holds that fcntl() locking is the best thing since
|
||
sliced bread, and that programs which use flock() should be converted
|
||
to fcntl() so that NFS locking will work. However, as noted above,
|
||
very few systems support both calls, so such an exercise is pointless
|
||
except on Ultrix and Linux.
|
||
|
||
Another religion, which I adhere to, has the opposite viewpoint.
|
||
|
||
|
||
FCNTL() BUGS
|
||
|
||
For all of the hairy code to do individual section locking of a
|
||
file, it's clear that the designers of fcntl() locking never
|
||
considered some very basic locking operations. It's as if all they
|
||
knew about locking they got out of some CS textbook with not
|
||
investigation of real-world needs.
|
||
|
||
It is not possible to acquire an exclusive lock unless the file
|
||
is open for write. You could have append with shared read, and thus
|
||
you could have a case in which a read-only access may need to go
|
||
exclusive. This problem can be programmed around once the programmer
|
||
is aware of it.
|
||
|
||
If the file is opened on another file designator in the same
|
||
process, the file is unlocked even if no attempt is made to do any
|
||
form of locking on the second designator. This is a very bad bug. It
|
||
means that an application must keep track of all the files that it has
|
||
opened and locked.
|
||
|
||
If there is no statd/lockd on the NFS server, fcntl() will hang
|
||
forever waiting for them to appear. This is a bad bug. It means that
|
||
any attempt to lock on a server that doesn't run these daemons will
|
||
hang. There is no way for an application to request flock() style
|
||
``try to lock, but no-op if the mechanism ain't there''.
|
||
|
||
There is a rumor to the effect that fcntl() will hang forever on
|
||
local files too if there is no local statd/lockd. These daemons are
|
||
running on mailer.u, although they appear not to have much CPU time.
|
||
A useful experiment would be to kill them and see if imapd is affected
|
||
in any way, but I decline to do so without an OK from UCS! ;-) If
|
||
killing statd/lockd can be done without breaking fcntl() on local
|
||
files, this would become one of the primary means of dealing with this
|
||
problem.
|
||
|
||
The statd and lockd daemons have quite a reputation for extreme
|
||
fragility. There have been numerous reports about the locking
|
||
mechanism being wedged on a systemwide or even clusterwide basis,
|
||
requiring a reboot to clear. It is rumored that this wedge, once it
|
||
happens, also blocks local locking. Presumably killing and restarting
|
||
statd would suffice to clear the wedge, but I haven't verified this.
|
||
|
||
There appears to be a limit to how many locks may be in use at a
|
||
time on the system, although the documentation only mentions it in
|
||
passing. On some of their systems, UCS has increased lockd's ``size
|
||
of the socket buffer'', whatever that means.
|
||
|
||
C-CLIENT USAGE
|
||
|
||
c-client uses flock(). On System V systems, flock() is simulated
|
||
by an emulator that calls fcntl().
|
||
|
||
|
||
BEZERK AND MMDF
|
||
|
||
Locking in the traditional UNIX formats was largely dictated by
|
||
the status quo in other applications; however, additional protection
|
||
is added against inadvertantly running multiple instances of a
|
||
c-client application on the same mail file.
|
||
|
||
(1) c-client attempts to create a .lock file (mail file name with
|
||
``.lock'' appended) whenever it reads from, or writes to, the mail
|
||
file. This is an exclusive lock, and is held only for short periods
|
||
of time while c-client is actually doing the I/O. There is a 5-minute
|
||
timeout for this lock, after which it is broken on the presumption
|
||
that it is a stale lock. If it can not create the .lock file due to
|
||
an EACCES (protection failure) error, it once silently proceeded
|
||
without this lock; this was for systems which protect /usr/spool/mail
|
||
from unprivileged processes creating files. Today, c-client reports
|
||
an error unless it is built otherwise. The purpose of this lock is to
|
||
prevent against unfavorable interactions with mail delivery.
|
||
|
||
(2) c-client applies a shared flock() to the mail file whenever
|
||
it reads from the mail file, and an exclusive flock() whenever it
|
||
writes to the mail file. This lock is freed as soon as it finishes
|
||
reading. The purpose of this lock is to prevent against unfavorable
|
||
interactions with mail delivery.
|
||
|
||
(3) c-client applies an exclusive flock() to a file on /tmp
|
||
(whose name represents the device and inode number of the file) when
|
||
it opens the mail file. This lock is maintained throughout the
|
||
session, although c-client has a feature (called ``kiss of death'')
|
||
which permits c-client to forcibly and irreversibly seize the lock
|
||
from a cooperating c-client application that surrenders the lock on
|
||
demand. The purpose of this lock is to prevent against unfavorable
|
||
interactions with other instances of c-client (rewriting the mail
|
||
file).
|
||
|
||
Mail delivery daemons use lock (1), (2), or both. Lock (1) works
|
||
over NFS; lock (2) is the only one that works on sites that protect
|
||
/usr/spool/mail against unprivileged file creation. Prudent mail
|
||
delivery daemons use both forms of locking, and of course so does
|
||
c-client.
|
||
|
||
If only lock (2) is used, then multiple processes can read from
|
||
the mail file simultaneously, although in real life this doesn't
|
||
really change things. The normal state of locks (1) and (2) is
|
||
unlocked except for very brief periods.
|
||
|
||
|
||
TENEX AND MTX
|
||
|
||
The design of the locking mechanism of these formats was
|
||
motivated by a design to enable multiple simultaneous read/write
|
||
access. It is almost the reverse of how locking works with
|
||
bezerk/mmdf.
|
||
|
||
(1) c-client applies a shared flock() to the mail file when it
|
||
opens the mail file. It upgrades this lock to exclusive whenever it
|
||
tries to expunge the mail file. Because of the flock() bug that
|
||
upgrading a lock actually releases it, it will not do so until it has
|
||
acquired an exclusive lock (2) first. The purpose of this lock is to
|
||
prevent against expunge taking place while some other c-client has the
|
||
mail file open (and thus knows where all the messages are).
|
||
|
||
(2) c-client applies a shared flock() to a file on /tmp (whose
|
||
name represents the device and inode number of the file) when it
|
||
parses the mail file. It applies an exclusive flock() to this file
|
||
when it appends new mail to the mail file, as well as before it
|
||
attempts to upgrade lock (1) to exclusive. The purpose of this lock
|
||
is to prevent against data being appended while some other c-client is
|
||
parsing mail in the file (to prevent reading of incomplete messages).
|
||
It also protects against the lock-releasing timing race on lock (1).
|
||
|
||
OBSERVATIONS
|
||
|
||
In a perfect world, locking works. You are protected against
|
||
unfavorable interactions with the mailer and against your own mistake
|
||
by running more than one instance of your mail reader. In tenex/mtx
|
||
formats, you have the additional benefit that multiple simultaneous
|
||
read/write access works, with the sole restriction being that you
|
||
can't expunge if there are any sharers of the mail file.
|
||
|
||
If the mail file is NFS-mounted, then flock() locking is a silent
|
||
no-op. This is the way BSD implements flock(), and c-client's
|
||
emulation of flock() through fcntl() tests for NFS files and
|
||
duplicates this functionality. There is no locking protection for
|
||
tenex/mtx mail files at all, and only protection against the mailer
|
||
for bezerk/mmdf mail files. This has been the accepted state of
|
||
affairs on UNIX for many sad years.
|
||
|
||
If you can not create .lock files, it should not affect locking,
|
||
since the flock() locks suffice for all protection. This is, however,
|
||
not true if the mailer does not check for flock() locking, or if the
|
||
the mail file is NFS-mounted.
|
||
|
||
What this means is that there is *no* locking protection at all
|
||
in the case of a client using an NFS-mounted /usr/spool/mail that does
|
||
not permit file creation by unprivileged programs. It is impossible,
|
||
under these circumstances, for an unprivileged program to do anything
|
||
about it. Worse, if EACCES errors on .lock file creation are no-op'ed
|
||
, the user won't even know about it. This is arguably a site
|
||
configuration error.
|
||
|
||
The problem with not being able to create .lock files exists on
|
||
System V as well, but the failure modes for flock() -- which is
|
||
implemented via fcntl() -- are different.
|
||
|
||
On System V, if the mail file is NFS-mounted and either the
|
||
client or the server lacks a functioning statd/lockd pair, then the
|
||
lock attempt would have hung forever if it weren't for the fact that
|
||
c-client tests for NFS and no-ops the flock() emulator in this case.
|
||
Systemwide or clusterwide failures of statd/lockd have been known to
|
||
occur which cause all locks in all processes to hang (including
|
||
local?). Without the special NFS test made by c-client, there would
|
||
be no way to request BSD-style no-op behavior, nor is there any way to
|
||
determine that this is happening other than the system being hung.
|
||
|
||
The additional locking introduced by c-client was shown to cause
|
||
much more stress on the System V locking mechanism than has
|
||
traditionally been placed upon it. If it was stressed too far, all
|
||
hell broke loose. Fortunately, this is now past history.
|
||
|
||
TRADEOFFS
|
||
|
||
c-client based applications have a reasonable chance of winning
|
||
as long as you don't use NFS for remote access to mail files. That's
|
||
what IMAP is for, after all. It is, however, very important to
|
||
realize that you can *not* use the lock-upgrade feature by itself
|
||
because it releases the lock as an interim step -- you need to have
|
||
lock-upgrading guarded by another lock.
|
||
|
||
If you have the misfortune of using System V, you are likely to
|
||
run into problems sooner or later having to do with statd/lockd. You
|
||
basically end up with one of three unsatisfactory choices:
|
||
1) Grit your teeth and live with it.
|
||
2) Try to make it work:
|
||
a) avoid NFS access so as not to stress statd/lockd.
|
||
b) try to understand the code in statd/lockd and hack it
|
||
to be more robust.
|
||
c) hunt out the system limit of locks, if there is one,
|
||
and increase it. Figure on at least two locks per
|
||
simultaneous imapd process and four locks per Pine
|
||
process. Better yet, make the limit be 10 times the
|
||
maximum number of processes.
|
||
d) increase the socket buffer (-S switch to lockd) if
|
||
it is offered. I don't know what this actually does,
|
||
but giving lockd more resources to do its work can't
|
||
hurt. Maybe.
|
||
3) Decide that it can't possibly work, and turn off the
|
||
fcntl() calls in your program.
|
||
4) If nuking statd/lockd can be done without breaking local
|
||
locking, then do so. This would make SVR4 have the same
|
||
limitations as BSD locking, with a couple of additional
|
||
bugs.
|
||
5) Check for NFS, and don't do the fcntl() in the NFS case.
|
||
This is what c-client does.
|
||
|
||
Note that if you are going to use NFS to access files on a server
|
||
which does not have statd/lockd running, your only choice is (3), (4),
|
||
or (5). Here again, IMAP can bail you out.
|
||
|
||
These problems aren't unique to c-client applications; they have
|
||
also been reported with Elm, Mediamail, and other email tools.
|
||
|
||
Of the other two SVR4 locking bugs:
|
||
|
||
Programmer awareness is necessary to deal with the bug that you
|
||
can not get an exclusive lock unless the file is open for write. I
|
||
believe that c-client has fixed all of these cases.
|
||
|
||
The problem about opening a second designator smashing any
|
||
current locks on the file has not been addressed satisfactorily yet.
|
||
This is not an easy problem to deal with, especially in c-client which
|
||
really doesn't know what other files/streams may be open by Pine.
|
||
|
||
Aren't you so happy that you bought an System V system?
|