Open Anonymity
How it works
Diploma Thesis
Thesis: Open Anonymity - Anonymity in indicated data networks

Open Anonymity - The System's Achitecture:
Open Anonymity consists of two Apache Modules, one for filtering the content (Filter Module) and one for marking the content (XML Producer Module). The XML Producer connects to the Database where the anonymous words are hold, and adapts a file called openanonymity.xml for every single directory. The filter reads the data out of this file and filters the specified words. It also filters words manually marked with the <anonym> Tags.

Apache Filter Module:
The Filter is responsible for filtering the <anonym> tags and all words specified in the openanonymity.xml file located in the current response' directory. An example: When a request is made for http://localhost/index.html, a file http://localhost/openanonymity.xml should be present. This file tells the filter which anonymous - words appear in the response. The filter will search for this list of words and cuts them out of the response. This method is chosen because of some thouhts about performance. Sure there are simplier ways to do that, but in this way it has some pros. By the way, if openanonymity.xml isn't present, it does nothing. It also stops executing for this request when the user is marked with the isHuman Cookie and Cookie-checking is enabled in httpd.conf, or when it isn't enabled it stops when the Browser-Id is not in the list of known Spiders.

Apache XML Producer:
The XML Producer is also called for one specific URL. It connects to the DB, checks if there are words specified for this requests directory, does a check with this list for words that really appear in this response and produces a list of all the words in openanonymity.xml. If this file is still not present, it does nothing. This module will only start working when the Requester is well known by his IP-adress and Browser-Id, both configurable in the httpd.conf (not implemented yet).

openanonymity.xml could look like this example:
The relevant parts are:

The Config-Element: The listitems of this list are all words, that can appear in all URL's in this directory and should be anonymized. Because this list could be very large on bigger systems, this is done in this way. This list is not relevant in the processing of the Filter, also the XML Producer doesn't read from this list when the DB-Connect is enabled, in this case it is only produced for logging purposes. When the XML Producer is started with DBConnect off in the httpd.conf, this list is used to create the Page-Elements. In this case and only in this case, manually inserted items make sense. The User-agent-trust and anonymize tags are not implemented yet. The first is to switch between the method of trusting the Id or force a turing test, the second is to switch on or off anonymizing in this directory. Now you can switch it off by deleting the openanonymity.xml file or by don't creating one. As you can see in the listitems (or not:-), the search-algo of both Modules is case-sesitive. It is to discuss if this really makes sense, but i thought it could be good to have more choices what to anonymize and what not (imagine you have two people with the same name -lets say Smith - in the same File, one wants to be anonymized, one not. You can write the one Smith and the second smith, one would be filtered, one not)

  <User-agent-trust type="trusted"/>
  <anonymize type="on"/>  

The Page Element: This element is produced by the XML Producer at the moment when this request occurs (e.g. /opan/file01.html) . The Filter module searches for the words specified in the list-element when this URL is called by a spider and filters the words out of the response. The URL specified in the link element could also include GET-Parameters. This is a reason why this file could get very big, it is to audit if then some changes to the architecture would be needed - like wildcards etc(look at Buglist and Todo)

The database in use is PostgreSQL, but Open Anonymity uses libdbi as independent Database Abstraction Layer. So also Oracle or MySQL could be used without any changes to the code.Check the libdbi page for details and upcoming drivers. The DB is a straight-forward one-table design, click here to get the DB-Dump. There are 5 fields, "id", "dir", "anonymize", "owner", "stamp". Only dir and anonymize are used within OpenAnonymity at the moment, the rest is for future development.

Updating Routine
The XML Producer has to be called for every single URL that should be anonymized. This is done by a so called Updating Routine - its only a well configured wget command for example. You should configure the wget command that it really reaches all of the desired URL's, because the security (in the sense of anonymity) depends on that. If you forget one file (but also one possible URL's GET Parameter), it's your fault. Open Anonymity is as good as this Updating Routine is. It is also to say that the up-to-dateness depends on how often this Updating routine refreshes the whole system. Between two runs of this Routine the system is out-of date. (Besides you are using some special Configuration of Open Anonymity - look at Features for details).

wget command:(also look at the official wget manual page)

wget -r -nd --wait=1 --delete-after --accept php, html, php3, pdf --domains=your-host.com --input-file=fetchUrlList.txt

-r recursive call
-nd don't make directories
--wait=1 requests are pulsed with a period of a second
--delete-after delete the files after downloading (we dont need them!)
--accept list of all files to access
--domains prevent following of links to another hosts
--input-file file with a list of URL's to access and follow