|
Open Anonymity - The System's Achitecture:
Open Anonymity consists of two Apache Modules,
one for filtering the content (Filter Module)
and one for marking the content (XML Producer
Module). The XML Producer connects to the Database
where the anonymous words are hold, and adapts
a file called openanonymity.xml for every single
directory. The filter reads the data out of this
file and filters the specified words. It also
filters words manually marked with the <anonym>
Tags.
Apache Filter Module:
The Filter is responsible for filtering the <anonym>
tags and all words specified in the openanonymity.xml
file located in the current response' directory.
An example: When a request is made for http://localhost/index.html,
a file http://localhost/openanonymity.xml should
be present. This file tells the filter which anonymous
- words appear in the response. The filter will
search for this list of words and cuts them out
of the response. This method is chosen because
of some thouhts about performance. Sure there
are simplier ways to do that, but in this way
it has some pros. By the way, if openanonymity.xml
isn't present, it does nothing. It also stops
executing for this request when the user is marked
with the isHuman Cookie and Cookie-checking is
enabled in httpd.conf, or when it isn't enabled
it stops when the Browser-Id is not in the list
of known Spiders.
Apache XML Producer:
The XML Producer is also called for one specific
URL. It connects to the DB, checks if there are
words specified for this requests directory, does
a check with this list for words that really appear
in this response and produces a list of all the
words in openanonymity.xml. If this file is still
not present, it does nothing. This module will
only start working when the Requester is well
known by his IP-adress and Browser-Id, both configurable
in the httpd.conf (not implemented yet).
OpenAnonymity.xml
openanonymity.xml could look like this
example:
The relevant parts are:
The Config-Element: The listitems of this
list are all words, that can appear in all URL's
in this directory and should be anonymized. Because
this list could be very large on bigger systems,
this is done in this way. This list is not relevant
in the processing of the Filter, also the XML
Producer doesn't read from this list when the
DB-Connect is enabled, in this case it is only
produced for logging purposes. When the XML Producer
is started with DBConnect off in the httpd.conf,
this list is used to create the Page-Elements.
In this case and only in this case, manually inserted
items make sense. The User-agent-trust and anonymize
tags are not implemented yet. The first is to
switch between the method of trusting the Id or
force a turing test, the second is to switch on
or off anonymizing in this directory. Now you
can switch it off by deleting the openanonymity.xml
file or by don't creating one. As you can see
in the listitems (or not:-), the search-algo of
both Modules is case-sesitive. It is to discuss
if this really makes sense, but i thought it could
be good to have more choices what to anonymize
and what not (imagine you have two people with
the same name -lets say Smith - in the same File,
one wants to be anonymized, one not. You can write
the one Smith and the second smith, one would
be filtered, one not)
<config> |
|
|
|
|
<User-agent-trust type="trusted"/> |
|
<anonymize type="on"/>
|
|
|
<directory> |
|
|
|
|
<list> |
|
|
|
|
<listitem>Reisinger</listitem> |
|
|
|
<listitem>Mathias</listitem> |
|
|
|
<listitem>KARL</listitem> |
|
|
|
<listitem>matl@aon.at</listitem> |
|
|
|
<listitem>Kimpl</listitem> |
|
|
</list> |
|
|
</directory> |
|
|
</config> |
|
|
|
The Page Element: This element is produced
by the XML Producer at the moment when this request
occurs (e.g. /opan/file01.html) . The Filter module
searches for the words specified in the list-element
when this URL is called by a spider and filters
the words out of the response. The URL specified
in the link element could also include GET-Parameters.
This is a reason why this file could get very big,
it is to audit if then some changes to the architecture
would be needed - like wildcards etc(look at Buglist
and Todo)
This
<page> |
|
|
|
<link>/opan/file01.html</link> |
|
<list> |
|
|
|
<anonymize>Mathias</anonymize> |
|
|
<anonymize>Kimpl</anonymize> |
|
|
<anonymize>matl@aon.at</anonymize> |
|
</list> |
|
</page> |
|
|
<page> |
|
|
|
<link>/opan/getUserValues.php?userId=12326</link> |
|
<list> |
|
|
|
<anonymize>Mathias</anonymize> |
|
|
<anonymize>Kimpl</anonymize> |
|
|
<anonymize>matl@aon.at</anonymize> |
|
|
<anonymize>KARL</anonymize> |
|
</list> |
|
</page> |
|
|
Database
The database in use is PostgreSQL,
but Open Anonymity uses libdbi
as independent Database Abstraction Layer. So
also Oracle or MySQL could be used without any
changes to the code.Check the libdbi
page for details and upcoming drivers. The DB
is a straight-forward one-table design, click
here to get the DB-Dump. There are 5 fields,
"id", "dir", "anonymize",
"owner", "stamp". Only dir
and anonymize are used within OpenAnonymity at
the moment, the rest is for future development.
Updating Routine
The XML Producer has to be called for every single
URL that should be anonymized. This is done by
a so called Updating Routine - its only a well
configured wget command for example. You should
configure the wget command that it really reaches
all of the desired URL's, because the security
(in the sense of anonymity) depends on that. If
you forget one file (but also one possible URL's
GET Parameter), it's your fault. Open Anonymity
is as good as this Updating Routine is. It is
also to say that the up-to-dateness depends on
how often this Updating routine refreshes the
whole system. Between two runs of this Routine
the system is out-of date. (Besides you are using
some special Configuration of Open Anonymity -
look at Features for
details).
wget command:(also look at the official
wget manual page)
wget -r -nd --wait=1 --delete-after --accept php,
html, php3, pdf --domains=your-host.com --input-file=fetchUrlList.txt
-r |
recursive call |
-nd |
don't make directories |
--wait=1 |
requests are pulsed with a period of a second |
--delete-after |
delete the files after downloading (we dont
need them!) |
--accept |
list of all files to access |
--domains |
prevent following of links to another hosts |
--input-file |
file with a list of URL's to access and
follow |
|
|