mirror of
https://github.com/lunarmodules/luasocket.git
synced 2024-11-16 02:08:21 +01:00
This commit was manufactured by cvs2svn to create tag 'luasocket-2-0-2'.
Sprout from master 2007-10-11 21:16:28 UTC Diego Nehab <diego@tecgraf.puc-rio.br> 'Tested each sample.' Cherrypick from master 2007-05-31 22:27:40 UTC Diego Nehab <diego@tecgraf.puc-rio.br> 'Before sending to Roberto.': gem/ltn012.tex gem/makefile
This commit is contained in:
parent
52ac60af81
commit
81ebe649f0
347
gem/ltn012.tex
347
gem/ltn012.tex
@ -6,10 +6,7 @@
|
|||||||
\DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%}
|
\DefineVerbatimEnvironment{mime}{Verbatim}{fontsize=\small,commandchars=\$\#\%}
|
||||||
\newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}}
|
\newcommand{\stick}[1]{\vbox{\setlength{\parskip}{0pt}#1}}
|
||||||
\newcommand{\bl}{\ensuremath{\mathtt{\backslash}}}
|
\newcommand{\bl}{\ensuremath{\mathtt{\backslash}}}
|
||||||
\newcommand{\CR}{\texttt{CR}}
|
|
||||||
\newcommand{\LF}{\texttt{LF}}
|
|
||||||
\newcommand{\CRLF}{\texttt{CR~LF}}
|
|
||||||
\newcommand{\nil}{\texttt{nil}}
|
|
||||||
|
|
||||||
\title{Filters, sources, sinks, and pumps\\
|
\title{Filters, sources, sinks, and pumps\\
|
||||||
{\large or Functional programming for the rest of us}}
|
{\large or Functional programming for the rest of us}}
|
||||||
@ -21,30 +18,29 @@
|
|||||||
|
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
Certain data processing operations can be implemented in the
|
Certain data processing operations can be implemented in the
|
||||||
form of filters. A filter is a function that can process
|
form of filters. A filter is a function that can process data
|
||||||
data received in consecutive invocations, returning partial
|
received in consecutive function calls, returning partial
|
||||||
results each time it is called. Examples of operations that
|
results after each invocation. Examples of operations that can be
|
||||||
can be implemented as filters include the end-of-line
|
implemented as filters include the end-of-line normalization
|
||||||
normalization for text, Base64 and Quoted-Printable transfer
|
for text, Base64 and Quoted-Printable transfer content
|
||||||
content encodings, the breaking of text into lines, SMTP
|
encodings, the breaking of text into lines, SMTP dot-stuffing,
|
||||||
dot-stuffing, and there are many others. Filters become
|
and there are many others. Filters become even
|
||||||
even more powerful when we allow them to be chained together
|
more powerful when we allow them to be chained together to
|
||||||
to create composite filters. In this context, filters can be
|
create composite filters. In this context, filters can be seen
|
||||||
seen as the internal links in a chain of data transformations.
|
as the middle links in a chain of data transformations. Sources an sinks
|
||||||
Sources and sinks are the corresponding end points in these
|
are the corresponding end points of these chains. A source
|
||||||
chains. A source is a function that produces data, chunk by
|
is a function that produces data, chunk by chunk, and a sink
|
||||||
chunk, and a sink is a function that takes data, chunk by
|
is a function that takes data, chunk by chunk. In this
|
||||||
chunk. Finally, pumps are procedures that actively drive
|
article, we describe the design of an elegant interface for filters,
|
||||||
data from a source to a sink, and indirectly through all
|
sources, sinks, and chaining, and illustrate each step
|
||||||
intervening filters. In this article, we describe the design of an
|
with concrete examples.
|
||||||
elegant interface for filters, sources, sinks, chains, and
|
|
||||||
pumps, and we illustrate each step with concrete examples.
|
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
|
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
|
|
||||||
Within the realm of networking applications, we are often
|
Within the realm of networking applications, we are often
|
||||||
required to apply transformations to streams of data. Examples
|
required apply transformations to streams of data. Examples
|
||||||
include the end-of-line normalization for text, Base64 and
|
include the end-of-line normalization for text, Base64 and
|
||||||
Quoted-Printable transfer content encodings, breaking text
|
Quoted-Printable transfer content encodings, breaking text
|
||||||
into lines with a maximum number of columns, SMTP
|
into lines with a maximum number of columns, SMTP
|
||||||
@ -54,10 +50,11 @@ transfer coding, and the list goes on.
|
|||||||
Many complex tasks require a combination of two or more such
|
Many complex tasks require a combination of two or more such
|
||||||
transformations, and therefore a general mechanism for
|
transformations, and therefore a general mechanism for
|
||||||
promoting reuse is desirable. In the process of designing
|
promoting reuse is desirable. In the process of designing
|
||||||
\texttt{LuaSocket~2.0}, we repeatedly faced this problem.
|
\texttt{LuaSocket~2.0}, David Burgess and I were forced to deal with
|
||||||
The solution we reached proved to be very general and
|
this problem. The solution we reached proved to be very
|
||||||
convenient. It is based on the concepts of filters, sources,
|
general and convenient. It is based on the concepts of
|
||||||
sinks, and pumps, which we introduce below.
|
filters, sources, sinks, and pumps, which we introduce
|
||||||
|
below.
|
||||||
|
|
||||||
\emph{Filters} are functions that can be repeatedly invoked
|
\emph{Filters} are functions that can be repeatedly invoked
|
||||||
with chunks of input, successively returning processed
|
with chunks of input, successively returning processed
|
||||||
@ -65,33 +62,34 @@ chunks of output. More importantly, the result of
|
|||||||
concatenating all the output chunks must be the same as the
|
concatenating all the output chunks must be the same as the
|
||||||
result of applying the filter to the concatenation of all
|
result of applying the filter to the concatenation of all
|
||||||
input chunks. In fancier language, filters \emph{commute}
|
input chunks. In fancier language, filters \emph{commute}
|
||||||
with the concatenation operator. More importantly, filters
|
with the concatenation operator. As a result, chunk
|
||||||
must handle input data correctly no matter how the stream
|
boundaries are irrelevant: filters correctly handle input
|
||||||
has been split into chunks.
|
data no matter how it is split.
|
||||||
|
|
||||||
A \emph{chain} is a function that transparently combines the
|
A \emph{chain} transparently combines the effect of one or
|
||||||
effect of one or more filters. The interface of a chain is
|
more filters. The interface of a chain is
|
||||||
indistinguishable from the interface of its component
|
indistinguishable from the interface of its components.
|
||||||
filters. This allows a chained filter to be used wherever
|
This allows a chained filter to be used wherever an atomic
|
||||||
an atomic filter is accepted. In particular, chains can be
|
filter is expected. In particular, chains can be
|
||||||
themselves chained to create arbitrarily complex operations.
|
themselves chained to create arbitrarily complex operations.
|
||||||
|
|
||||||
Filters can be seen as internal nodes in a network through
|
Filters can be seen as internal nodes in a network through
|
||||||
which data will flow, potentially being transformed many
|
which data will flow, potentially being transformed many
|
||||||
times along the way. Chains connect these nodes together.
|
times along its way. Chains connect these nodes together.
|
||||||
The initial and final nodes of the network are
|
To complete the picture, we need \emph{sources} and
|
||||||
\emph{sources} and \emph{sinks}, respectively. Less
|
\emph{sinks}. These are the initial and final nodes of the
|
||||||
abstractly, a source is a function that produces new data
|
network, respectively. Less abstractly, a source is a
|
||||||
every time it is invoked. Conversely, sinks are functions
|
function that produces new data every time it is called.
|
||||||
that give a final destination to the data they receive.
|
Conversely, sinks are functions that give a final
|
||||||
Naturally, sources and sinks can also be chained with
|
destination to the data they receive. Naturally, sources
|
||||||
filters to produce filtered sources and sinks.
|
and sinks can also be chained with filters to produce
|
||||||
|
filtered sources and sinks.
|
||||||
|
|
||||||
Finally, filters, chains, sources, and sinks are all passive
|
Finally, filters, chains, sources, and sinks are all passive
|
||||||
entities: they must be repeatedly invoked in order for
|
entities: they must be repeatedly invoked in order for
|
||||||
anything to happen. \emph{Pumps} provide the driving force
|
anything to happen. \emph{Pumps} provide the driving force
|
||||||
that pushes data through the network, from a source to a
|
that pushes data through the network, from a source to a
|
||||||
sink, and indirectly through all intervening filters.
|
sink.
|
||||||
|
|
||||||
In the following sections, we start with a simplified
|
In the following sections, we start with a simplified
|
||||||
interface, which we later refine. The evolution we present
|
interface, which we later refine. The evolution we present
|
||||||
@ -101,28 +99,27 @@ concepts within our application domain.
|
|||||||
|
|
||||||
\subsection{A simple example}
|
\subsection{A simple example}
|
||||||
|
|
||||||
The end-of-line normalization of text is a good
|
Let us use the end-of-line normalization of text as an
|
||||||
example to motivate our initial filter interface.
|
example to motivate our initial filter interface.
|
||||||
Assume we are given text in an unknown end-of-line
|
Assume we are given text in an unknown end-of-line
|
||||||
convention (including possibly mixed conventions) out of the
|
convention (including possibly mixed conventions) out of the
|
||||||
commonly found Unix (\LF), Mac OS (\CR), and
|
commonly found Unix (LF), Mac OS (CR), and DOS (CRLF)
|
||||||
DOS (\CRLF) conventions. We would like to be able to
|
conventions. We would like to be able to write code like the
|
||||||
use the folowing code to normalize the end-of-line markers:
|
following:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
local CRLF = "\013\010"
|
local in = source.chain(source.file(io.stdin), normalize("\r\n"))
|
||||||
local input = source.chain(source.file(io.stdin), normalize(CRLF))
|
local out = sink.file(io.stdout)
|
||||||
local output = sink.file(io.stdout)
|
pump.all(in, out)
|
||||||
pump.all(input, output)
|
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
|
|
||||||
This program should read data from the standard input stream
|
This program should read data from the standard input stream
|
||||||
and normalize the end-of-line markers to the canonic
|
and normalize the end-of-line markers to the canonic CRLF
|
||||||
\CRLF\ marker, as defined by the MIME standard.
|
marker, as defined by the MIME standard. Finally, the
|
||||||
Finally, the normalized text should be sent to the standard output
|
normalized text should be sent to the standard output
|
||||||
stream. We use a \emph{file source} that produces data from
|
stream. We use a \emph{file source} that produces data from
|
||||||
standard input, and chain it with a filter that normalizes
|
standard input, and chain it with a filter that normalizes
|
||||||
the data. The pump then repeatedly obtains data from the
|
the data. The pump then repeatedly obtains data from the
|
||||||
@ -130,28 +127,27 @@ source, and passes it to the \emph{file sink}, which sends
|
|||||||
it to the standard output.
|
it to the standard output.
|
||||||
|
|
||||||
In the code above, the \texttt{normalize} \emph{factory} is a
|
In the code above, the \texttt{normalize} \emph{factory} is a
|
||||||
function that creates our normalization filter, which
|
function that creates our normalization filter. This filter
|
||||||
replaces any end-of-line marker with the canonic marker.
|
will replace any end-of-line marker with the canonic
|
||||||
The initial filter interface is
|
`\verb|\r\n|' marker. The initial filter interface is
|
||||||
trivial: a filter function receives a chunk of input data,
|
trivial: a filter function receives a chunk of input data,
|
||||||
and returns a chunk of processed data. When there are no
|
and returns a chunk of processed data. When there are no
|
||||||
more input data left, the caller notifies the filter by invoking
|
more input data left, the caller notifies the filter by invoking
|
||||||
it with a \nil\ chunk. The filter responds by returning
|
it with a \texttt{nil} chunk. The filter responds by returning
|
||||||
the final chunk of processed data (which could of course be
|
the final chunk of processed data.
|
||||||
the empty string).
|
|
||||||
|
|
||||||
Although the interface is extremely simple, the
|
Although the interface is extremely simple, the
|
||||||
implementation is not so obvious. A normalization filter
|
implementation is not so obvious. A normalization filter
|
||||||
respecting this interface needs to keep some kind of context
|
respecting this interface needs to keep some kind of context
|
||||||
between calls. This is because a chunk boundary may lie between
|
between calls. This is because a chunk boundary may lie between
|
||||||
the \CR\ and \LF\ characters marking the end of a single line. This
|
the CR and LF characters marking the end of a line. This
|
||||||
need for contextual storage motivates the use of
|
need for contextual storage motivates the use of
|
||||||
factories: each time the factory is invoked, it returns a
|
factories: each time the factory is invoked, it returns a
|
||||||
filter with its own context so that we can have several
|
filter with its own context so that we can have several
|
||||||
independent filters being used at the same time. For
|
independent filters being used at the same time. For
|
||||||
efficiency reasons, we must avoid the obvious solution of
|
efficiency reasons, we must avoid the obvious solution of
|
||||||
concatenating all the input into the context before
|
concatenating all the input into the context before
|
||||||
producing any output chunks.
|
producing any output.
|
||||||
|
|
||||||
To that end, we break the implementation into two parts:
|
To that end, we break the implementation into two parts:
|
||||||
a low-level filter, and a factory of high-level filters. The
|
a low-level filter, and a factory of high-level filters. The
|
||||||
@ -171,10 +167,10 @@ end-of-line normalization filters:
|
|||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
function filter.cycle(lowlevel, context, extra)
|
function filter.cycle(low, ctx, extra)
|
||||||
return function(chunk)
|
return function(chunk)
|
||||||
local ret
|
local ret
|
||||||
ret, context = lowlevel(context, chunk, extra)
|
ret, ctx = low(ctx, chunk, extra)
|
||||||
return ret
|
return ret
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
@ -182,30 +178,27 @@ end
|
|||||||
|
|
||||||
@stick#
|
@stick#
|
||||||
function normalize(marker)
|
function normalize(marker)
|
||||||
return filter.cycle(eol, 0, marker)
|
return cycle(eol, 0, marker)
|
||||||
end
|
end
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
|
|
||||||
The \texttt{normalize} factory simply calls a more generic
|
The \texttt{normalize} factory simply calls a more generic
|
||||||
factory, the \texttt{cycle}~factory, passing the low-level
|
factory, the \texttt{cycle} factory. This factory receives a
|
||||||
filter~\texttt{eol}. The \texttt{cycle}~factory receives a
|
|
||||||
low-level filter, an initial context, and an extra
|
low-level filter, an initial context, and an extra
|
||||||
parameter, and returns a new high-level filter. Each time
|
parameter, and returns a new high-level filter. Each time
|
||||||
the high-level filer is passed a new chunk, it invokes the
|
the high-level filer is passed a new chunk, it invokes the
|
||||||
low-level filter with the previous context, the new chunk,
|
low-level filter with the previous context, the new chunk,
|
||||||
and the extra argument. It is the low-level filter that
|
and the extra argument. It is the low-level filter that
|
||||||
does all the work, producing the chunk of processed data and
|
does all the work, producing the chunk of processed data and
|
||||||
a new context. The high-level filter then replaces its
|
a new context. The high-level filter then updates its
|
||||||
internal context, and returns the processed chunk of data to
|
internal context, and returns the processed chunk of data to
|
||||||
the user. Notice that we take advantage of Lua's lexical
|
the user. Notice that we take advantage of Lua's lexical
|
||||||
scoping to store the context in a closure between function
|
scoping to store the context in a closure between function
|
||||||
calls.
|
calls.
|
||||||
|
|
||||||
\subsection{The C part of the filter}
|
Concerning the low-level filter code, we must first accept
|
||||||
|
|
||||||
As for the low-level filter, we must first accept
|
|
||||||
that there is no perfect solution to the end-of-line marker
|
that there is no perfect solution to the end-of-line marker
|
||||||
normalization problem. The difficulty comes from an
|
normalization problem. The difficulty comes from an
|
||||||
inherent ambiguity in the definition of empty lines within
|
inherent ambiguity in the definition of empty lines within
|
||||||
@ -215,39 +208,39 @@ mixed input. It also does a reasonable job with empty lines
|
|||||||
and serves as a good example of how to implement a low-level
|
and serves as a good example of how to implement a low-level
|
||||||
filter.
|
filter.
|
||||||
|
|
||||||
The idea is to consider both \CR\ and~\LF\ as end-of-line
|
The idea is to consider both CR and~LF as end-of-line
|
||||||
\emph{candidates}. We issue a single break if any candidate
|
\emph{candidates}. We issue a single break if any candidate
|
||||||
is seen alone, or if it is followed by a different
|
is seen alone, or followed by a different candidate. In
|
||||||
candidate. In other words, \CR~\CR~and \LF~\LF\ each issue
|
other words, CR~CR~and LF~LF each issue two end-of-line
|
||||||
two end-of-line markers, whereas \CR~\LF~and \LF~\CR\ issue
|
markers, whereas CR~LF~and LF~CR issue only one marker each.
|
||||||
only one marker each. It is easy to see that this method
|
This method correctly handles the Unix, DOS/MIME, VMS, and Mac
|
||||||
correctly handles the most common end-of-line conventions.
|
OS conventions.
|
||||||
|
|
||||||
With this in mind, we divide the low-level filter into two
|
\subsection{The C part of the filter}
|
||||||
simple functions. The inner function~\texttt{pushchar} performs the
|
|
||||||
normalization itself. It takes each input character in turn,
|
Our low-level filter is divided into two simple functions.
|
||||||
deciding what to output and how to modify the context. The
|
The inner function performs the normalization itself. It takes
|
||||||
context tells if the last processed character was an
|
each input character in turn, deciding what to output and
|
||||||
end-of-line candidate, and if so, which candidate it was.
|
how to modify the context. The context tells if the last
|
||||||
For efficiency, we use Lua's auxiliary library's buffer
|
processed character was an end-of-line candidate, and if so,
|
||||||
interface:
|
which candidate it was. For efficiency, it uses
|
||||||
|
Lua's auxiliary library's buffer interface:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{C}
|
\begin{C}
|
||||||
@stick#
|
@stick#
|
||||||
@#define candidate(c) (c == CR || c == LF)
|
@#define candidate(c) (c == CR || c == LF)
|
||||||
static int pushchar(int c, int last, const char *marker,
|
static int process(int c, int last, const char *marker,
|
||||||
luaL_Buffer *buffer) {
|
luaL_Buffer *buffer) {
|
||||||
if (candidate(c)) {
|
if (candidate(c)) {
|
||||||
if (candidate(last)) {
|
if (candidate(last)) {
|
||||||
if (c == last)
|
if (c == last) luaL_addstring(buffer, marker);
|
||||||
luaL_addstring(buffer, marker);
|
|
||||||
return 0;
|
return 0;
|
||||||
} else {
|
} else {
|
||||||
luaL_addstring(buffer, marker);
|
luaL_addstring(buffer, marker);
|
||||||
return c;
|
return c;
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
luaL_pushchar(buffer, c);
|
luaL_putchar(buffer, c);
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -255,20 +248,15 @@ static int pushchar(int c, int last, const char *marker,
|
|||||||
\end{C}
|
\end{C}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
|
|
||||||
The outer function~\texttt{eol} simply interfaces with Lua.
|
The outer function simply interfaces with Lua. It receives the
|
||||||
It receives the context and input chunk (as well as an
|
context and input chunk (as well as an optional
|
||||||
optional custom end-of-line marker), and returns the
|
custom end-of-line marker), and returns the transformed
|
||||||
transformed output chunk and the new context.
|
output chunk and the new context:
|
||||||
Notice that if the input chunk is \nil, the operation
|
|
||||||
is considered to be finished. In that case, the loop will
|
|
||||||
not execute a single time and the context is reset to the
|
|
||||||
initial state. This allows the filter to be reused many
|
|
||||||
times:
|
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{C}
|
\begin{C}
|
||||||
@stick#
|
@stick#
|
||||||
static int eol(lua_State *L) {
|
static int eol(lua_State *L) {
|
||||||
int context = luaL_checkint(L, 1);
|
int ctx = luaL_checkint(L, 1);
|
||||||
size_t isize = 0;
|
size_t isize = 0;
|
||||||
const char *input = luaL_optlstring(L, 2, NULL, &isize);
|
const char *input = luaL_optlstring(L, 2, NULL, &isize);
|
||||||
const char *last = input + isize;
|
const char *last = input + isize;
|
||||||
@ -281,18 +269,24 @@ static int eol(lua_State *L) {
|
|||||||
return 2;
|
return 2;
|
||||||
}
|
}
|
||||||
while (input < last)
|
while (input < last)
|
||||||
context = pushchar(*input++, context, marker, &buffer);
|
ctx = process(*input++, ctx, marker, &buffer);
|
||||||
luaL_pushresult(&buffer);
|
luaL_pushresult(&buffer);
|
||||||
lua_pushnumber(L, context);
|
lua_pushnumber(L, ctx);
|
||||||
return 2;
|
return 2;
|
||||||
}
|
}
|
||||||
%
|
%
|
||||||
\end{C}
|
\end{C}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
|
|
||||||
|
Notice that if the input chunk is \texttt{nil}, the operation
|
||||||
|
is considered to be finished. In that case, the loop will
|
||||||
|
not execute a single time and the context is reset to the
|
||||||
|
initial state. This allows the filter to be reused many
|
||||||
|
times.
|
||||||
|
|
||||||
When designing your own filters, the challenging part is to
|
When designing your own filters, the challenging part is to
|
||||||
decide what will be in the context. For line breaking, for
|
decide what will be in the context. For line breaking, for
|
||||||
instance, it could be the number of bytes that still fit in the
|
instance, it could be the number of bytes left in the
|
||||||
current line. For Base64 encoding, it could be a string
|
current line. For Base64 encoding, it could be a string
|
||||||
with the bytes that remain after the division of the input
|
with the bytes that remain after the division of the input
|
||||||
into 3-byte atoms. The MIME module in the \texttt{LuaSocket}
|
into 3-byte atoms. The MIME module in the \texttt{LuaSocket}
|
||||||
@ -300,22 +294,19 @@ distribution has many other examples.
|
|||||||
|
|
||||||
\section{Filter chains}
|
\section{Filter chains}
|
||||||
|
|
||||||
Chains greatly increase the power of filters. For example,
|
Chains add a lot to the power of filters. For example,
|
||||||
according to the standard for Quoted-Printable encoding,
|
according to the standard for Quoted-Printable encoding,
|
||||||
text should be normalized to a canonic end-of-line marker
|
text must be normalized to a canonic end-of-line marker
|
||||||
prior to encoding. After encoding, the resulting text must
|
prior to encoding. To help specifying complex
|
||||||
be broken into lines of no more than 76 characters, with the
|
transformations like this, we define a chain factory that
|
||||||
use of soft line breaks (a line terminated by the \texttt{=}
|
creates a composite filter from one or more filters. A
|
||||||
sign). To help specifying complex transformations like
|
chained filter passes data through all its components, and
|
||||||
this, we define a chain factory that creates a composite
|
can be used wherever a primitive filter is accepted.
|
||||||
filter from one or more filters. A chained filter passes
|
|
||||||
data through all its components, and can be used wherever a
|
|
||||||
primitive filter is accepted.
|
|
||||||
|
|
||||||
The chaining factory is very simple. The auxiliary
|
The chaining factory is very simple. The auxiliary
|
||||||
function~\texttt{chainpair} chains two filters together,
|
function~\texttt{chainpair} chains two filters together,
|
||||||
taking special care if the chunk is the last. This is
|
taking special care if the chunk is the last. This is
|
||||||
because the final \nil\ chunk notification has to be
|
because the final \texttt{nil} chunk notification has to be
|
||||||
pushed through both filters in turn:
|
pushed through both filters in turn:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@ -331,9 +322,9 @@ end
|
|||||||
|
|
||||||
@stick#
|
@stick#
|
||||||
function filter.chain(...)
|
function filter.chain(...)
|
||||||
local f = select(1, ...)
|
local f = arg[1]
|
||||||
for i = 2, select('@#', ...) do
|
for i = 2, @#arg do
|
||||||
f = chainpair(f, select(i, ...))
|
f = chainpair(f, arg[i])
|
||||||
end
|
end
|
||||||
return f
|
return f
|
||||||
end
|
end
|
||||||
@ -346,11 +337,11 @@ define the Quoted-Printable conversion as such:
|
|||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
local qp = filter.chain(normalize(CRLF), encode("quoted-printable"),
|
local qp = filter.chain(normalize("\r\n"),
|
||||||
wrap("quoted-printable"))
|
encode("quoted-printable"))
|
||||||
local input = source.chain(source.file(io.stdin), qp)
|
local in = source.chain(source.file(io.stdin), qp)
|
||||||
local output = sink.file(io.stdout)
|
local out = sink.file(io.stdout)
|
||||||
pump.all(input, output)
|
pump.all(in, out)
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
@ -369,14 +360,14 @@ gives a final destination to the data.
|
|||||||
\subsection{Sources}
|
\subsection{Sources}
|
||||||
|
|
||||||
A source returns the next chunk of data each time it is
|
A source returns the next chunk of data each time it is
|
||||||
invoked. When there is no more data, it simply returns~\nil.
|
invoked. When there is no more data, it simply returns
|
||||||
In the event of an error, the source can inform the
|
\texttt{nil}. In the event of an error, the source can inform the
|
||||||
caller by returning \nil\ followed by the error message.
|
caller by returning \texttt{nil} followed by an error message.
|
||||||
|
|
||||||
Below are two simple source factories. The \texttt{empty} source
|
Below are two simple source factories. The \texttt{empty} source
|
||||||
returns no data, possibly returning an associated error
|
returns no data, possibly returning an associated error
|
||||||
message. The \texttt{file} source yields the contents of a file
|
message. The \texttt{file} source works harder, and
|
||||||
in a chunk by chunk fashion:
|
yields the contents of a file in a chunk by chunk fashion:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
@ -407,7 +398,7 @@ A filtered source passes its data through the
|
|||||||
associated filter before returning it to the caller.
|
associated filter before returning it to the caller.
|
||||||
Filtered sources are useful when working with
|
Filtered sources are useful when working with
|
||||||
functions that get their input data from a source (such as
|
functions that get their input data from a source (such as
|
||||||
the pumps in our examples). By chaining a source with one or
|
the pump in our first example). By chaining a source with one or
|
||||||
more filters, the function can be transparently provided
|
more filters, the function can be transparently provided
|
||||||
with filtered data, with no need to change its interface.
|
with filtered data, with no need to change its interface.
|
||||||
Here is a factory that does the job:
|
Here is a factory that does the job:
|
||||||
@ -415,18 +406,14 @@ Here is a factory that does the job:
|
|||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
function source.chain(src, f)
|
function source.chain(src, f)
|
||||||
return function()
|
return source.simplify(function()
|
||||||
if not src then
|
if not src then return nil end
|
||||||
return nil
|
|
||||||
end
|
|
||||||
local chunk, err = src()
|
local chunk, err = src()
|
||||||
if not chunk then
|
if not chunk then
|
||||||
src = nil
|
src = nil
|
||||||
return f(nil)
|
return f(nil)
|
||||||
else
|
else return f(chunk) end
|
||||||
return f(chunk)
|
end)
|
||||||
end
|
|
||||||
end
|
|
||||||
end
|
end
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
@ -434,20 +421,20 @@ end
|
|||||||
|
|
||||||
\subsection{Sinks}
|
\subsection{Sinks}
|
||||||
|
|
||||||
Just as we defined an interface for source of data,
|
Just as we defined an interface a data source,
|
||||||
we can also define an interface for a data destination.
|
we can also define an interface for a data destination.
|
||||||
We call any function respecting this
|
We call any function respecting this
|
||||||
interface a \emph{sink}. In our first example, we used a
|
interface a \emph{sink}. In our first example, we used a
|
||||||
file sink connected to the standard output.
|
file sink connected to the standard output.
|
||||||
|
|
||||||
Sinks receive consecutive chunks of data, until the end of
|
Sinks receive consecutive chunks of data, until the end of
|
||||||
data is signaled by a \nil\ input chunk. A sink can be
|
data is signaled by a \texttt{nil} chunk. A sink can be
|
||||||
notified of an error with an optional extra argument that
|
notified of an error with an optional extra argument that
|
||||||
contains the error message, following a \nil\ chunk.
|
contains the error message, following a \texttt{nil} chunk.
|
||||||
If a sink detects an error itself, and
|
If a sink detects an error itself, and
|
||||||
wishes not to be called again, it can return \nil,
|
wishes not to be called again, it can return \texttt{nil},
|
||||||
followed by an error message. A return value that
|
followed by an error message. A return value that
|
||||||
is not \nil\ means the sink will accept more data.
|
is not \texttt{nil} means the source will accept more data.
|
||||||
|
|
||||||
Below are two useful sink factories.
|
Below are two useful sink factories.
|
||||||
The table factory creates a sink that stores
|
The table factory creates a sink that stores
|
||||||
@ -482,7 +469,7 @@ end
|
|||||||
|
|
||||||
Naturally, filtered sinks are just as useful as filtered
|
Naturally, filtered sinks are just as useful as filtered
|
||||||
sources. A filtered sink passes each chunk it receives
|
sources. A filtered sink passes each chunk it receives
|
||||||
through the associated filter before handing it down to the
|
through the associated filter before handing it to the
|
||||||
original sink. In the following example, we use a source
|
original sink. In the following example, we use a source
|
||||||
that reads from the standard input. The input chunks are
|
that reads from the standard input. The input chunks are
|
||||||
sent to a table sink, which has been coupled with a
|
sent to a table sink, which has been coupled with a
|
||||||
@ -492,10 +479,10 @@ standard out:
|
|||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
local input = source.file(io.stdin)
|
local in = source.file(io.stdin)
|
||||||
local output, t = sink.table()
|
local out, t = sink.table()
|
||||||
output = sink.chain(normalize(CRLF), output)
|
out = sink.chain(normalize("\r\n"), out)
|
||||||
pump.all(input, output)
|
pump.all(in, out)
|
||||||
io.write(table.concat(t))
|
io.write(table.concat(t))
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
@ -503,11 +490,11 @@ io.write(table.concat(t))
|
|||||||
|
|
||||||
\subsection{Pumps}
|
\subsection{Pumps}
|
||||||
|
|
||||||
Although not on purpose, our interface for sources is
|
Adrian Sietsma noticed that, although not on purpose, our
|
||||||
compatible with Lua iterators. That is, a source can be
|
interface for sources is compatible with Lua iterators.
|
||||||
neatly used in conjunction with \texttt{for} loops. Using
|
That is, a source can be neatly used in conjunction
|
||||||
our file source as an iterator, we can write the following
|
with \texttt{for} loops. Using our file
|
||||||
code:
|
source as an iterator, we can write the following code:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
@ -552,22 +539,20 @@ end
|
|||||||
The \texttt{pump.step} function moves one chunk of data from
|
The \texttt{pump.step} function moves one chunk of data from
|
||||||
the source to the sink. The \texttt{pump.all} function takes
|
the source to the sink. The \texttt{pump.all} function takes
|
||||||
an optional \texttt{step} function and uses it to pump all the
|
an optional \texttt{step} function and uses it to pump all the
|
||||||
data from the source to the sink.
|
data from the source to the sink. We can now use everything
|
||||||
Here is an example that uses the Base64 and the
|
we have to write a program that reads a binary file from
|
||||||
line wrapping filters from the \texttt{LuaSocket}
|
|
||||||
distribution. The program reads a binary file from
|
|
||||||
disk and stores it in another file, after encoding it to the
|
disk and stores it in another file, after encoding it to the
|
||||||
Base64 transfer content encoding:
|
Base64 transfer content encoding:
|
||||||
\begin{quote}
|
\begin{quote}
|
||||||
\begin{lua}
|
\begin{lua}
|
||||||
@stick#
|
@stick#
|
||||||
local input = source.chain(
|
local in = source.chain(
|
||||||
source.file(io.open("input.bin", "rb")),
|
source.file(io.open("input.bin", "rb")),
|
||||||
encode("base64"))
|
encode("base64"))
|
||||||
local output = sink.chain(
|
local out = sink.chain(
|
||||||
wrap(76),
|
wrap(76),
|
||||||
sink.file(io.open("output.b64", "w")))
|
sink.file(io.open("output.b64", "w")))
|
||||||
pump.all(input, output)
|
pump.all(in, out)
|
||||||
%
|
%
|
||||||
\end{lua}
|
\end{lua}
|
||||||
\end{quote}
|
\end{quote}
|
||||||
@ -576,17 +561,19 @@ The way we split the filters here is not intuitive, on
|
|||||||
purpose. Alternatively, we could have chained the Base64
|
purpose. Alternatively, we could have chained the Base64
|
||||||
encode filter and the line-wrap filter together, and then
|
encode filter and the line-wrap filter together, and then
|
||||||
chain the resulting filter with either the file source or
|
chain the resulting filter with either the file source or
|
||||||
the file sink. It doesn't really matter.
|
the file sink. It doesn't really matter. The Base64 and the
|
||||||
|
line wrapping filters are part of the \texttt{LuaSocket}
|
||||||
|
distribution.
|
||||||
|
|
||||||
\section{Exploding filters}
|
\section{Exploding filters}
|
||||||
|
|
||||||
Our current filter interface has one serious shortcoming.
|
Our current filter interface has one flagrant shortcoming.
|
||||||
Consider for example a \texttt{gzip} decompression filter.
|
When David Burgess was writing his \texttt{gzip} filter, he
|
||||||
During decompression, a small input chunk can be exploded
|
noticed that a decompression filter can explode a small
|
||||||
into a huge amount of data. To address this problem, we
|
input chunk into a huge amount of data. To address this
|
||||||
decided to change the filter interface and allow exploding
|
problem, we decided to change the filter interface and allow
|
||||||
filters to return large quantities of output data in a chunk
|
exploding filters to return large quantities of output data
|
||||||
by chunk manner.
|
in a chunk by chunk manner.
|
||||||
|
|
||||||
More specifically, after passing each chunk of input to
|
More specifically, after passing each chunk of input to
|
||||||
a filter, and collecting the first chunk of output, the
|
a filter, and collecting the first chunk of output, the
|
||||||
@ -595,11 +582,11 @@ filtered data is left. Within these secondary calls, the
|
|||||||
caller passes an empty string to the filter. The filter
|
caller passes an empty string to the filter. The filter
|
||||||
responds with an empty string when it is ready for the next
|
responds with an empty string when it is ready for the next
|
||||||
input chunk. In the end, after the user passes a
|
input chunk. In the end, after the user passes a
|
||||||
\nil\ chunk notifying the filter that there is no
|
\texttt{nil} chunk notifying the filter that there is no
|
||||||
more input data, the filter might still have to produce too
|
more input data, the filter might still have to produce too
|
||||||
much output data to return in a single chunk. The user has
|
much output data to return in a single chunk. The user has
|
||||||
to loop again, now passing \nil\ to the filter each time,
|
to loop again, now passing \texttt{nil} to the filter each time,
|
||||||
until the filter itself returns \nil\ to notify the
|
until the filter itself returns \texttt{nil} to notify the
|
||||||
user it is finally done.
|
user it is finally done.
|
||||||
|
|
||||||
Fortunately, it is very easy to modify a filter to respect
|
Fortunately, it is very easy to modify a filter to respect
|
||||||
@ -617,8 +604,8 @@ filters practical.
|
|||||||
\section{A complex example}
|
\section{A complex example}
|
||||||
|
|
||||||
The LTN12 module in the \texttt{LuaSocket} distribution
|
The LTN12 module in the \texttt{LuaSocket} distribution
|
||||||
implements all the ideas we have described. The MIME
|
implements the ideas we have described. The MIME
|
||||||
and SMTP modules are tightly integrated with LTN12,
|
and SMTP modules are especially integrated with LTN12,
|
||||||
and can be used to showcase the expressive power of filters,
|
and can be used to showcase the expressive power of filters,
|
||||||
sources, sinks, and pumps. Below is an example
|
sources, sinks, and pumps. Below is an example
|
||||||
of how a user would proceed to define and send a
|
of how a user would proceed to define and send a
|
||||||
@ -635,9 +622,9 @@ local message = smtp.message{
|
|||||||
to = "Fulano <fulano@example.com>",
|
to = "Fulano <fulano@example.com>",
|
||||||
subject = "A message with an attachment"},
|
subject = "A message with an attachment"},
|
||||||
body = {
|
body = {
|
||||||
preamble = "Hope you can see the attachment" .. CRLF,
|
preamble = "Hope you can see the attachment\r\n",
|
||||||
[1] = {
|
[1] = {
|
||||||
body = "Here is our logo" .. CRLF},
|
body = "Here is our logo\r\n"},
|
||||||
[2] = {
|
[2] = {
|
||||||
headers = {
|
headers = {
|
||||||
["content-type"] = 'image/png; name="luasocket.png"',
|
["content-type"] = 'image/png; name="luasocket.png"',
|
||||||
@ -678,18 +665,6 @@ abstraction for final data destinations. Filters define an
|
|||||||
interface for data transformations. The chaining of
|
interface for data transformations. The chaining of
|
||||||
filters, sources and sinks provides an elegant way to create
|
filters, sources and sinks provides an elegant way to create
|
||||||
arbitrarily complex data transformations from simpler
|
arbitrarily complex data transformations from simpler
|
||||||
components. Pumps simply push the data through.
|
components. Pumps simply move the data through.
|
||||||
|
|
||||||
\section{Acknowledgements}
|
|
||||||
|
|
||||||
The concepts described in this text are the result of long
|
|
||||||
discussions with David Burgess. A version of this text has
|
|
||||||
been released on-line as the Lua Technical Note 012, hence
|
|
||||||
the name of the corresponding LuaSocket module,
|
|
||||||
\texttt{ltn12}. Wim Couwenberg contributed to the
|
|
||||||
implementation of the module, and Adrian Sietsma was the
|
|
||||||
first to notice the correspondence between sources and Lua
|
|
||||||
iterators.
|
|
||||||
|
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
@ -12,12 +12,3 @@ clean:
|
|||||||
|
|
||||||
pdf: ltn012.pdf
|
pdf: ltn012.pdf
|
||||||
open ltn012.pdf
|
open ltn012.pdf
|
||||||
|
|
||||||
test: gem.so
|
|
||||||
|
|
||||||
|
|
||||||
gem.o: gem.c
|
|
||||||
gcc -c -o gem.o -Wall -ansi -W -O2 gem.c
|
|
||||||
|
|
||||||
gem.so: gem.o
|
|
||||||
export MACOSX_DEPLOYMENT_TARGET="10.3"; gcc -bundle -undefined dynamic_lookup -o gem.so gem.o
|
|
||||||
|
Loading…
Reference in New Issue
Block a user