He, Neil, already uses something near to this for his personal website, based on Metakit and tDOM. I.e. he stores XML and converts that to HTML. This is currently done offline. The moment this is changed to online generation the StarSite is complete.AK: Related to this is a notion by BrowseX. This browser allows the retrieval of a website and its storage into a zip-archive. This is usually meant for easy transfer of a website, but also allow display of the archive via BrowseX, without unpacking it (IIRC - AK).
AK: Obvious extensions of the concept.
- A local mode, i.e. running the Starkit containing the StarSite, or providing access to it [*], in a non-web environment pops up a Tk based display which allows browsing the site without web-browser.
- An extension of such a local mode would be to enable the editing of pages in the site.
- Allow the starkit to run not only as cgi-type application, but as its own web server.
- Storing data with a mime-type association (text/html, text/xml, image/gif etc). I don't believe that mk4vfs does this presently.
- Allowing viewing the database as a metakit database, or a filesystem (both are useful at times).
- Some sort of authentication/access-control built in. Wiki type applications with universal access are useful for some things, but often, you want more security. This needs to be designed in from the start, to be effective.
- Versioning/Archiving (just like the wiki, but maybe more fine-grained?)
- Ability to run as standalone HTTP or as CGI, with a consistent scripting API in both environments (ie a script shouldn't care).
- Some mechanism for plugging in XML/XSLT transformations.
- Ability to query database using XPath???
- Ability to group items together (for instance, grouping identical pictures in different formats: a .gif/.jpg for web grouped with a bitmap for WAP).
NEM - Excellent summary. This is exactly what I was planning. My main interest was in XML/XSLT generation of content, but really anything should be possible. The StarSite would sit on the server, and intercept requests using the PATH_TRANSLATED variable. So, for instance, in my website currently, the script xml.cgi can be invoked like http://www.tallniel.co.uk/cgi-bin/xml.cgi/home.xml which grabs the home.xml file and applies necessary stylesheets to it. Likewise, images could also be requested and returned from the database. The fact that MetaKit is the backend, allows for sophisticated searching and user interface options (session management, personalization etc). Mirroring a site would be a case of copying one, highly-compressed MetaKit datafile. I find this concept quite exciting.
Note that the Wikit is a case of putting the contents of a website into a file. I see above that Starsite would include a web server and, rather than using a markup style and conversion like the wikit does right now, would use xml as the markup and tdom or tclxml as the conversion software. Another difference appears to be that wikit is about content management, in a sense, in that visitors to the web site have the ability to update the pages. What other differences are envisioned?Well - as far as I am aware, wikit only allows the inclusion of the textual content in one file. The StarSite concept takes this a bit further, by allowing images, media etc to be stored in the same file, as well as other information (e.g. a user database). The idea of a starsite, as I (NEM), envision it, is that it should be able to do whatever a normal website can do, but with the added advantage of having everything in one file. So, you could, theoretically, put a wikit inside a StarSite. That is how I see it developing. At the moment it is nothing but this collection of ideas. When things start to reach a more coherent state, I (and any others who wish to join me) will sit down and start making it. The ability to update a StarSite (or parts thereof) over the web, is a feature I would like to include. The XML references are just there as that is what I like to create my site in. However, I feel StarSite should be broader than that. It should be a means of encapsulating a whole web site, with various common functionality available to make things easier (collaborative editing, authentication, session management, data storage etc). In the simplest case, a person would fire it up at home and use the Tk GUI to add static content (HTML, pictures etc). When finished, they would simply ftp the file to the webspace they use (in a cgi directory), and it would just work (just like starkits - no hassle installation). Alternatively, it could run as its own webserver, for intranets and the like. StarKits solve installation problems for regular applications. StarSites would solve it for web applications.AK:
- Look at Ideas for Wikit enhancements and Christophe Muller to see the overlaps.
- Using mime/type association for the content: Exactly as proposed for the wiki. Note that the wiki stores its pages directly in Metakit tables. It does not use the mk4VFS for its contents.
- mime-types / mk4vfs: Interesting idea. Generalized: User-defined attributes for files. I am not sure, but I believe there are even native filesystems which might support this. Needs research.
- Authentication/Security: Agree with building this in form the start.
- Authentication/Security: Has to allow deactivation. Example: Wiki
- Versioning/Archiving: The wiki codebase itself remembers the times of any change, and also saves out any change to a directory, if so configured. It only does not remember the exact changes/diffs in the internal database. The history of the [Tcler's Wiki] itself is a daily CVS import of the current state, making this more coarse-grained than the wiki codebase is able to support.
- Regarding plugins: Ties to mime-types in my view. Based on the mime-type of a content page, and the chosen output medium we can choose which renderer to use, which editor to use, etc. The wiki already has several Wiki Markup renderers chosen automatically upon 'format' flag and medium (Tk vs. Web).
NEM 30Nov2002: Latest brainstorming on this (flow of control of a request coming into a starsite):
- The whole system sits on a special virtual filesystem, with some differences:
- Files have a mime-type associated with them
- As well as directories, there is the concept of sections. These are mounted on to directory points, and control access to all files from that point down (until a new section starts).
- These sections are essentially directories, but with some procedures associated with them - namely a handle request procedure, and a handle error procedure. (Possibly others).
- Sections have an access-control list associated with them. This consists of a list of groups and a set of permissions. Initially, I think the following permissions:
- page - create, delete, read, edit.
- subsection - create, delete, read, edit.
- groups are like they are on UNIX. users are people viewing/editing the site. Users belong to groups. There are two special users: anonymous is a non-logged-in user, webmaster is the super-user. There are like-wise two such-named groups which contain these users. The webmaster (or admin, or root, ...) group has complete access to everything, while the anonymous group typically would only have read-only access (notice, though how a section can override this in its access control list, so a wiki could work). A user who has edit permissions on a section can alter the permissions (?? - maybe).
- Access to this VFS is through a special API (probably not the standard Tcl VFS API, due to the need for mime-type associations).
- Right, now onto how this all works:
- A request comes into the starsite (either through the built in webserver, or CGI or...). The first stage is to authenticate the user. A separate (replaceable) module handles this. It simply does all it has to do to determine who the user is. It removes any trace of its mechanism from the input (so, if it used a cookie, it would remove the cookie from the list passed in). It returns the username of the person making the request, or anonymous if they are not logged in. This module could work in any way, and so will be replaceable. It only works out the user name, it does not do access-control.
- Next step, the starsite runtime looks at the requested URL, and figures out which section it falls in (as sections are mapped onto directories, this will be by just finding the most specific directory which is a section map point).
- The starsite works out the format that the client wants the result in. It will use a (customizable) algorithm based on specific request (e.g. if .html was request then return HTML), accept-type headers, and finally, as a last resort user-agent strings.
- Access-control: The star-site then looks up the access control list of the section in question, and compares it against the groups which this user belongs to. If the user has access to this section, then we call the request handler for this section, passing in the requested URL, the requested mime-type, and the arguments passed (from ?blah=foo&a=b stuff, and from POSTed data etc. PATH_INFO/PATH_TRANSLATED stuff will not be passed in here - it will be used to figure out the requested URL).
- The request handler retrieves the file, and performs whatever processing it needs to do (e.g. dynamically generating the file, applying style-sheets, etc), and returns the file contents. The mime-type etc will already have been set. There may be an API for adding extra headers etc.
- If an error occurs at any time, or if the user doesn't have the correct permissions, then the section's error-handler proc will be called with a mime-type and the error message. It should format a nice error message and return it.
- How to enforce access-control within a section handler? Well, here I thought the best way, would be to only allow access to the VFS through a special API. When a request comes in, the request handler is called in a new interp (or one from a pool). This interp is a safe interp with access to the VFS API setup through aliases. These aliases incorporate the username into them, so that they can check access control, without the content-handler having to pass through the name of the user (that would be open to attack).
- All access to the VFS and StarSite internals would be through these safe interps with checked access control. This keeps the starsite secure (at least, I think so, but I'm not a security expert - comments appreciated).
- Versioning/history could be activated on a per-section basis, by adding more information to the interpreter aliases for the API - if an argument is flagged in the call then a versioning routine is called. In fact, the sections could each have an update-handler which handles edits of files, and can store away the old version.
- page:
- read - allow reading only
- append - allow appending to the end of a page
- edit - full editing of a page
- create - create new pages
- delete - delete a page
- section:
- create - can create new sub-sections
- delete - can delete sub-sections
- admin - can alter permissions on this section. Also general access to alter the scripts associated with the section.
page section raecdcdaIn general, anonymous users would have just page read access only (permission 10000000), whereas a webmaster would have 11111111. People could be designated as editors of a section with permission 11110000 - i.e. they have ability to read, edit and create pages, but cannot delete pages or change permissions.Another item for the implementation will be to associate a lock with each section, so that updating of the database can be done safely, and with a per-section lock granularity. This could be upped to a per-page lock, if deemed necessary (I think per-section will be acceptable for most sites). This locking will be done automatically in the VFS layer, so section handlers need not worry about it. To start with, locking will be implemented for single-threaded tclhttpd implementation. Later work will expand this to work with threads, and CGI. CGI is the most difficult, as without marshalling all accesses through a single process it is difficult to perform effective locking. Lock files would have to be used (for CGI), but these are nasty. A possible implementation would have all updates written as separate files into a directory, and then a separate process would lock the whole database and apply all the changes at some point in time (for instance, when the web-master logs in a runs a command).Time to get coding...
26jan03 NEM - Well, starting coding has brought me round to a new implementation idea:Generalize Starsite into a Persistent, Authenticated Object SystemAfter spending some time yesterday contemplating design issues, I have hit upon a design which I think could be useful. Instead of writing starsite as an application with a secure database API, why not write a secure, persistent web application framework and then implement StarSite in that system?The details:
- Applications are written in terms of objects and classes. This is the standard OO bit, but there are some differences.
- All member data is stored directly in a metakit database.
- Access to read and write the member data is performed through an API.
- The API is authenticated. By this, I mean that all object instantiations are performed in the context of a safe interpreter, and with the permissions of a particular user (more details below).
- Object data can be anything, but it is always stored with a mime-type tag. This allows the same object to reference different data depending on the mime-type that is being requested. Wildcard mime-types would be allowed. The main benefit of this, is that you can group content together but have alternative version for different output interfaces - e.g. image/gif for web browsers, image/bmp (or whatever the mime-type is) for WAP devices. This greatly simplifies the task of the web application, as it can behave consistently with little regard for what client is using it. The object system would determine the correct mime-type (or most specific match) and load the appropriate data. Of course, there will be a mechanism for accessing the other mime-type data if necessary.
- The basic look of a class definition would be:
class Foo { field title field body method foo {args} { # Accessing a member field: $this get title # Accessing a specific mimetype $this get title -mimetype text/html # Setting a field $this set body "<h1>Hello, World!</h1>" # Setting for particular mimetype $this set body "<header>Hello, World!</header>" -mimetype text/xml # What is the mimetype requested?: $this mimetype # Change the mimetype (changes the HTTP output headers as well) $this mimetype "text/xml" # Append to a field $this append title "<h2>This is a comment</h2>" # etc. } method foo image/gif {args} { # Override the general foo method for image/gif mimetype requests } }Objects can be instantiated in a hierarchy. There is always one main object for the site, which is the starsite object (or a different object if a different web application is being created). This is similar to the Tk widget/object structure, with "starsite" or whatever being the root object which always exists. You can then do:
Foo /starsite/myfooand this will create an object of type "myfoo" under the starsite object.Calls to the objects are handled by processing incoming HTTP requests. For instance, if a request came in to:
http://my.server.com/cgi-bin/starsite/myfoo/foo?arg1=hello&arg2=worldThis would cause a lookup to see what the most specific object being requested is. In this example it is "myfoo" (otherwise it would be "starsite" - the root). So, a call is made to the "foo" method of the object "myfoo", like so:
$myfoo foo {arg1 hello} {arg2 world}There are a couple of details here too:
- If a call was made to /cgi-bin/starsite/myfoo?arg1=hello&arg2=world the system would try to look for a method "myfoo" on the object "starsite". If no such method exists, then the call is redirected to /cgi-bin/starsite/myfoo/?arg1=hello&arg2=world. This is done with a standard HTTP redirect message.
- If a call comes in to /cgi-bin/starsite/myfoo/?arg1=hello&arg2=world, then the call is sent to the "default" method of the object. This method must always be implemented.
starsite::redirect /starsite/newfoo -> /starsite/myfoowould cause all method invocations on /starsite/newfoo to result in an HTTP redirect to /starsite/myfoo.A mapping is maintained between hierarchy positions (URLs) and objects. This is a strictly one-to-one mapping.Each object has the following properties associated with it:
- An owner - the userid of the person who created, or otherwise owns this object. This person has full control over the object.
- An access-control list. Specifies which groups of users have what permissions for accessing this object.
- A flag saying whether changes to this object should be archived. This is a hook for version-control systems.
- A database lock. This lock must be acquired before changes to the object can be made. I may implement two locks - a read lock (shared) and a write lock (exclusive). I may even implement this as synchronized blocks like java, but I'm not sure about that. Needs thought as to what the best mutual exclusion policy is.
- A collection of member data, organised by mimetype.
- A collection of methods. It will be possible to flag whether methods are public (accessible through direct HTTP calls), or private (accessible only to other methods within the same object). All public methods will automatically cope with SOAP requests (and probably WebDAV) too.
NEM22May2003 - Haven't looked at this in a while. Now my final exams are coming to a close, I'd better start thinking about implementing all these ideas. In the meantime, I'm going to play around with as many open source/free projects of a similar nature (both Tcl things like Apache Rivet and OpenACS/AOLServer, and also non-Tcl stuff like [Zope]), to get a feel for what is useful in such a beast. Please read the above and comment, although this page is getting pretty big now. Please note, also, that I'm not at all sure if the above design is good anymore. Some tell-it-like-it-is criticism would be welcome!escargo - I had been asking about e-mail notification of changes to wiki pages for this wiki, but part of the problem is that it realistically requires authentication before people can sign up (since you want to prevent people from signing up for notifications anyone but themselves). You are already planning on doing authentication. You could extend your design (or allow hooks that could be used to extend it) so that if a wiki page changed (or your more generalized objects), then an action could be triggered (in my case, an e-mail notification could be generated). This would require user data include an e-mail address; objects would have to have change listeners; somewhere there would be per-object metadata (for the listener list and the application data needed to keep the mail notification list). This is just "bells and whistles" as far as your desired features are concerned, but I thought this might give you an idea of how somebody might want to build on what you are doing.NEM - Good idea. This can probably be added in by the application author, if I design things right, and wouldn't have to be present in the base system. Nevertheless, I'll bear this in mind. Right now, I'm thinking performance. The problem with my "everything's an object" system above is that it adds unnecessary overhead to things which are just static content (images, static HTML etc). For these, I'm thinking of object wrappers which are automatically created when a feature is needed, but not before. This way if a piece of data is just returned without any processing, it has no overhead. Ahh... much more thinking is needed on this.PT 23Jun2003: I've recently been looking at Twiki which does support e-mail notification. There the user's home page contains the e-mail address to use and each 'web' (a twiki term which basically means a sub-tree of the entire wiki) can be configured to enable e-mail notification on a per user basis by adding usernames to the relevant page. Authentication is provided by having the user registration script automatically append a suitably constructed line to the site's .htpasswd file. In this wiki, changes tend to require authentication so it's always possibly to know who made what change.I'm not certain that we need such overhead in this site, but Twiki has been designed to be useful in a corporate intranet environment, where tracability is important. As a side note - Twiki is reasonably easy to install under unix and windows.
DAG: See also CHMvfs.