apache过滤器随笔

研究1

What are you talking about? RemoveOutputFilter .shtml DOESN'T REMOVE A FILTER!!!
RemoveOutputFilter .shtml不会删除一个过滤器。
It eliminates the association between .shtml and whatever filters had been otherwise configured with AddOutputFilter somefilters .shtml. It works _exactly_ like every other AddSomething/RemoveSomething for mime association.
它会结束.shtml与任何过滤器建立的关联，这些过滤器是通过使用AddOutputFilter函数添加一些适用于.shtml的过滤器来配置的。它确实和其它AddSomething/RemoveSomething处理mime关联是想象的。
Notice that your DeleteFilter does exactly that, it would delete a filter added by ANY METHOD :) So that's pretty unambigious. DeleteOutputFilter Includes would take out that filter if it was added by AddOutputFilter, AddOutputFilterByType, SetOutputFilter, or InsertOutputFilter. Pretty powerful, very useful.
注意，你的DeleteFilter确实做到了删除，它会删除由任何方法添加的过滤器。所以这非常含糊。使用DeleteOutputFilter将会取出由AddOutputFilter、AddOutputFilterByType、SetOutputFilter或InsertOutputFilter添加的的过滤器，这个函数非常强大，非常有用。
Bill

ap_remove_output_filter怎么用

研究2

in mod_perl 2.0 we register only four filter names (in:out:req:conn) and then we install the actuall perl callbacks using one of these four filter names and
storing the actual filter's callback information in f->ctx. If later on we want to do something with an inserted filter we have no API to find it, since
it's not identified by name, but the data inside f->ctx. Therefore we need a new API to traverse the filter chain and find what we want using a custom
callback.

Here is the API and implementation that I came up with (it needs the docco and the standard DECLARE stuff, which I ask to disregard for now and concentrate
on the API/implementation itself; once it's polished I'll post a complete patch).

The function that I'm requesting to add is somewhat similar to apr_table_do.

typedef int (ap_filter_chain_traverse_fh_t)(void *data, ap_filter_t *f);
int ap_filter_chain_traverse(ap_filter_chain_traverse_fh_t *traverse,
void *data, ap_filter_t **chain);

int ap_filter_chain_traverse(ap_filter_chain_traverse_fh_t *traverse,
void *data, ap_filter_t **chain)
{
int rv = 0;
ap_filter_t *curr = *chain;

while (curr) {
if ((rv = (*traverse)(data, curr)) != 0) {
return rv;
}
curr = curr->next;
}
return rv;
}

I'm not sure regarding the chain argument. Looking at the util_filter.c, it looks in proto_ and normal chain. Could there be a problem on the caller side?
Won't it be enough to use one of r->output_filters, c->output_filters, r->input_filters, c->input_filters if all I care is the custom filters?

Here is an example of usage by mod_perl 2.0. This is an implementation of a function that will remove a filter by its perl handler name, e.g.:

$r->remove_output_filter("TestHandler::out_filter");

there is a wrapper that translates this perl-side call to C's modperl_filter_remove_by_handler_name().

typedef struct {
char* filter_name;
char* handler_name;
ap_filter_t* f;
} filter_chain_traverse_t;

static int find_filter_by_handler_name(void *data, ap_filter_t *f)
{
apr_pool_t* p = f->r ? f->r->pool : f->c->pool;
filter_chain_traverse_t* traverse = (filter_chain_traverse_t*)data;
char *normalized_name;

/* 'name' in frec is always lowercased */
normalized_name = apr_pstrdup(p, traverse->filter_name);
ap_str_tolower(normalized_name);

/* skip non-mod_perl filters */
if (strNE(f->frec->name, normalized_name)) {
return 0;
} else {
modperl_filter_ctx_t *ctx = f->ctx;
if (strEQ(ctx->handler->name, traverse->handler_name)) {
traverse->f = f;
return 1; /* found what we wanted */
}
}

return 0;
}

/* modperl_filter_remove_by_handler_name(aTHX_ r, c,
* MP_OUTPUT_FILTER_MODE,
* "MyFilter::output_lc")
*/
void modperl_filter_remove_by_handler_name(pTHX_ request_rec *r,
conn_rec *c,
modperl_filter_mode_e mode,
char* handler_name)
{
int rv = 0;
apr_pool_t *pool = r ? r->pool : c->pool;
ap_filter_t *f;
filter_chain_traverse_t *traverse =
apr_pcalloc(pool, sizeof(filter_chain_traverse_t*));

/* XXX: generalize for conn/req in/out */
traverse->filter_name = MP_FILTER_REQUEST_OUTPUT_NAME;
traverse->handler_name = handler_name;

rv = ap_filter_chain_traverse(find_filter_by_handler_name, traverse,
&r->output_filters); /* XXX: generalize */
if (rv) {
f = traverse->f; /* XXX: validate */
MP_TRACE_f(MP_FUNC, "found filter handler %s\n", handler_name);
}
else {
Perl_croak(aTHX_ "unable to find filter handler '%s'\n", handler_name);
}

MP_TRACE_f(MP_FUNC, "removing filter %s\n", handler_name);
if (mode == MP_INPUT_FILTER_MODE) {
ap_remove_input_filter(f);
}
else {
ap_remove_output_filter(f);
}
}

研究3

Exactly. filter_init has a very specific purpose. Removing it makes certain classes of filters (like the now removed PHP httpd-2.x filter) not able to
work correctly. – justin
确实这样，filter_init有非常明确的目的。删除它使某些过滤器类（象目前删除PHP httpd-2.x过滤器）不能正常工作。

研究4

as long as you can remove all filters in a subdirectory by specifying a empty 'SetOutputfilter' (or maybe ever 'Clearoutputfilter') the command
should be then renamed to SetOutputFilterChain as you are Setting/Removing the entire chain.
只要通过调用一个空的’SetOutputFilter’（或者可能是’Clearoutputfilter’），你就能删除所有在字目录内的过滤器，只要按照上面方法做了，当你Setting/Removing整个链表时，命令就被重命名为SetOutputFilterChain。
the other issue I have with filters is 'enabling' them, I looked at the INCLUDE and CASE filters, and the both have a method of turning them on (either by allow
include in mod-includes case or caseFilter On.
我遇到的另外一个问题是’使能’他们。我检查了INCLUDE和CASE过滤器，两者都有办法来激活他们（要么通过允许在mod-includes内包含，要么设置caseFilter为ON）
we need to come to agreement on whether we need to do this. Setting the filter in the chain should enable it. period.

..Ian

研究5

This isn't possible, it was my first design, and it just doesn't work. First of all, figuring out which state you are in is hard at the
http_filter level, because you really don't know what the headers look like. In the future, I expect http_filter to be the thing that actually
generates headers_in, and then all the other filters will be installed on top of that.
首先，弄明白在http_filter层次内你处于何种情况是困难的，因为你实际上不知道请求头是怎样的。在随后，我期望http_filter确实生成headers_in，且接着其它过滤器将会在哪个头信息基础上安装。
The problem with having four different filters, is where does the brigade go when we are switching filters? Since the brigade has to sit in
sombody's ctx pointer, it needs to be associated with a filter. Because the core_filter doesn't know how much data to pass up at a time, the
brigade will almost always end up sitting in http_filter's ctx pointer. If you try to remove that filter the brigade disappears, and you
can't continue the request.
在有四个不同的过滤器时，问题是当我们切换过滤器时，成编队列被发送到哪了？因为成编队列必须通过ctx的指针进行指向，它需要被关联到一个过滤器上。因为core_filter不知道多少数据在某个时刻被传送，成编队列将几乎总是结束在ctx的指针上的指向。如果你尝试删除成编队列显示的过滤器，接着你就不能继续处理请求了。
As far as http_filter switching on the state, that requires http_filter really knowing the protocol, because otherwise it can't determine what
state it is in. There is no communication between the request_rec and the http_filter. Since the request_rec is the thing that knows the state, the
http_filter doesn't. The only solution, is to add an http_body filter, which is also a bit buggy, because that assumes that http_filter will
never be called during body processing, which is just incorrect. As soon as we need data from the socket, we will call back down the stack, and
http_filter will need to know if we are in body or header state.
This is just wrong. I'm sorry. The filters were always designed to be one way and one way only. It is not possible for our filters to push data
back down. The thing is, that currently, we have a hack because I want to make progress. Getline is bogus. http_filter should be the entity
creating the request_rec, and filling it out. That solves this whole issue, because then http_filter could actually really easily determine
which state it is in, and it can act accordingly. Plus, it makes adding different protocols as easy as replaing http_filter.
Those issues are gone now. The http_filter solved them. Please read what I wrote above, and then let me know if it still doesn't make sense as to
what some of the issues are here. The input filtering is non-intuitive, because we are operating on data on the way back up the stack. The
problem with that, is that it makes it much harder to add and remove filters while filtering data.
We can add/remove filters based on URI, but only at certain times. The problem with your patch from last week, is that it makes input filtering
look like output filtering, even though they are dramatically different to the end user.

Ryan
_______________________________________________________________________________

How I Think Filters Should Work

Let me be more precise. I'm not saying that we shouldn't use brigades. What I'm saying is we shouldn't be dealing with specific types
of data at this level. Right now, by requiring a filter to request "bytes" or "lines", we are seriously constraining the performance of
the filters. A filter should only inspect the types of the buckets it retrieves and then move on. The bytes should only come in to play once
we have actually retrieved a bucket of a certain type that we are able to process.

Furthermore, we should be using a dynamic type system, and liberally creating new bucket types as we invent new implementations. Filters need
not know which filters are upstream or downstream from them, but they should have been strategically placed to consume certain buckets from
upstream filters and to produce certain buckets required by downstream filters.

[Warning: long-winded brainstorm follows:]

I want a typical filter chain to look like this:

input_source ---> protocol filters --> sub-protocol filters --> handlers

an input socket would produce this:

SOCKET
EOS

an http header parser filter would produce these:

HEADER
HEADER
HEADER
DATA (extra data read past headers)
SOCKET
EOS

an http request parser would only work at the request level, performing
dechunking, dealing with content-length, and dealing with pipelined
requests. It would produce these:

BEGIN_OF_REQUEST
HEADERS
BEGIN_OF_BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
END_OF_BODY_DATA
TRAILERS...
END_OF_REQUEST
... and so on

a multipart input handler would then pass all types except BODY_DATA,
which it could use to produce:

...
MULTIPART_SECTION_BEGIN
BODY_DATA
MULTIPART_SECTION_END
...

or a magic mime filter could simply buffer enough BODY_DATA buckets until
it knew the type, prepending a MIME_TYPE to the front and sending
the whole thing downstream.

...
MIME_TYPE
BODY_DATA
BODY_DATA
...

The basic pattern for any input filter (which is pull-based at the moment in Apache) would be the following:

1. retrieve next "abstract data unit"
2. inspect "abstract data unit", can we operate on it?
3. if yes, operate_on(unit) and pass the result to the next filter.
4. if no, pass the current unit to the next filter.
5. go to #1

In this model, the operate_on() behavior has been separated from the mechanics of passing data around. I believe this would improve filter
performance as well as simplifying the implementation details that module authors must understand. I also think this would dramatically
improve the extendability of the Apache Filters system.

[.Sorry for the long brain dump. Some of these ideas have been floating around in my head for a long time. When they become clear enough I will
write up a more formal and concise proposal on how I think the future filter system should work (possible for 2.1 or beyond). I think the
apr-serf project is a perfect place to play with some of these ideas. I would appreciate any constructive comments to the above. ]

-aaron

That's fine, as long as you ensure that the retrieval can be bounded. When the HTTP processor realizes that it can only read 100 more bytes from the
next-filter, then you're outside of "abstract data unit" and into "concrete 100 bytes."

Due to the presence of the Upgrade: header, an HTTP processing filter must always be per-request, and must never read past the end of its request. That
enforces a number of limitations on your design.

[. unless you go for "pushback", which I believe is a poor design. ]

What would be neat is to have a connection-level filter that does HTTP processing, but can be signalled to morph itself into a simple buffer. For
example, let's say that filter pulls 10k from next-filter ("pull" here, remember). It parses up the data into some headers and a 500 byte body. It
has 9k leftover, which it holds to the side.

Now, the request processor sees an "Upgrade" and switches protocols to something else entirely. The connection filter gets demoted to a simple
buffer, returning the 9k without processing. When the buffer is empty, it removes itself from the filter stack.

The implication here is that filters need to register with particular hooks in the server. In particular, with a hook to state that a protocol change
has occurred on <this> connection (also implying an input and an output filter stack). The protocol-related filters in the stack can then take
appropriate action (in the above example, to disable HTTP processing and just be a buffer). Other subsystems may have also registered with the hook
and will *install* new protocol handler filters.

You could even use this protocol-change hook to set up the initial HTTP processing filters. Go from "null" protocol to "http", and that installs
your chunking, http processing, etc. It could even be the mechanism which tells the MPM to call ap_run_request (??) (the app-level thing which starts
sucking input from the filter stack and processing it).

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载