I have been wanting a wiki to do the following thing for a long time. It
looks to me like SmallWiki will make it much easier. Your comments on
persistence were helpful, so I would like to discuss this for awhile.
I teach a course in which students do most of the lecturing. We read
several books, and each student presents a chapter. They also write a
couple of study questions, and students are expected to read the chapter and
answer the study questions. The presenter will grade the answers.
I'd like to do it on the wiki. At first, all answers are secret. Students
can read what they wrote, but nobody else can. Except the presenter/grader.
Once the answers are graded, the grader will publish them, making special
notes of the good ones.
Here is how I think it will work. I'll need a new kind of structure called
a "virtual folder".
First, each student has their own folder named after their UIUC net ID.
There will be a folder called Students and it has subfolders for each
student. A students subfolder is private. The students create pages called
"chapter 1" and "chapter 2" for their answers. The administrator will
create a special page in the folder of the grader. This is the virtual
folder, which is a capability page. It pretends to be a folder with all the
answers in it. It might be called "chapter 1 answers" and it will be
parameterized to show all the pages in subfolders of Students that are
called "chapter 1". The grader will make a new page that discusses the
answers, says which ones are especially good, and contains pointers to them.
Then he will publish the page in read-only mode.
The virtual folder is a way of changing roles of structure. Pages in it
will use the security policies that it defines rather than the policies of
their own folder.
>From my quick reading of the security paper, it appears that security is
implemented using Chain of Responsibility from the root, which means that
structure inherits from its parent because control passes through the
parent. Therefore, a virtual folder should be able to change security
policies. In contrast, if security was implemented by having each structure
ask its parent whenever it needed a security policy, this would not work.
So, what do you think? Is there already a virtual folder class? If not,
will it be hard to make?
-Ralph Johnson
>I don't understand why a pages and resources are handled differently
>(pages are put into the directory, resources get an extra directory).
That is not what I meant. Pages get an extra directory, too. I didn't
mention folders, but they should get an extra directory.
>Maybe pages and folders should be put into some artificial folders
>(e.g. _classname_) as well, so that there is more of an uniformity:
Yes, that is what I want. If we use _classname_ then it should be
possible for someone to add a new class and for the storage manager
to automatically add it.
>Or more simple (shouldn't we do it as simple as possible?) one could
>put all the data into the same directory and to put the type/class-name
>as file-extension:
I don't think it is simpler. I think that the subdirectory is just
as easy to program, maybe simpler. And it has the added benefit of
making directories smaller, thus making open() be faster.
>Personally, I prefer to have human-readable tags (like the ones
>suggested by John Brant). Also name-clashes from different extensions
I want to tag every line so that the system can better handle the case
where only half of the page is written. Note that there will probably
only be one class that writes these tags, so different extensions will
have to worry about clashes to this class anyway.
-Ralph
Writes will dominate reads, because once the wiki is loaded
from the file system, pages and folders will never be read.
Only resources will be read. Keeping everything in memory is
good because it is fast and easy. Memory is cheap. If I can
run wiki.cs.uiuc.edu on a 200 Mhz machine then it is proof that
big systems can run on old computers if you store it all in memory.
I avoid database systems when I can. Smalltalk is more powerful
than a RDBMS. The file system is mostly to prevent loss of data.
This works until your image gets to 4G (which has never happened
to me). Then you have to move to a 64 bit Smalltalk.
-Ralph
>> I can now see that SmallWiki is a tree of folders, with leaf nodes
>> being
>> pages and resources. Each folder has its own name space. WikiWorks
>> has two
>> levels, SmallWiki allows an arbitrary number. This is a big
>> difference from
>> other wikis, and you do not emphasize it like you should.
>See chapter 3.2, page 16 in the documentation.
That is completely inadequate. I had read it several times before
posting my message. If I did not understand it, few people will.
Class hierarchies are abstract. People do not learn how to use an
abstraction by reading the abstraction. They learn abstractions
by learning examples. Your documentation needs to be more concrete.
It should be based on using a wiki, not on looking at the Smalltalk code.
Once people know what SmallWiki does then you can give them class
diagrams and they will make sense.
"Structure" is a wretched name. It is too vague. Section 3.2
is too abstract. It says little that I couldn't get by reading
the code. The result is that the section makes sense only
AFTER someone has figured out SmallWiki.
A structure "represents the model of a single page". But there is
a Page class, as well. This is confusing. I think that what is
important about Structure is that it has a URL. Though actions
also have URLs.
If a structure is a model, what are its observers? The storage
manager is one. Anything else?
-Ralph
"Structure" is a bad name. It is vague and can mean just about any thing.
PageComponent could be called "structure", too. Maybe rename Structure to
WikiComponent. Or call it WikiStructure in the program and in writing but
just call it Structure when you are talking.
I can now see that SmallWiki is a tree of folders, with leaf nodes being
pages and resources. Each folder has its own name space. WikiWorks has two
levels, SmallWiki allows an arbitrary number. This is a big difference from
other wikis, and you do not emphasize it like you should.
How do you link to a page in a different folder? You need to say that when
you describe the wiki syntax. Does searching just
When you make a link on a page to something, it is created in the same
folder that the page is in.
A resource is something that is uploaded to the wiki. Are these stored in
the image? Wiki.cs.uiuc.edu has over a gigabyte uploaded. I wouldn't want
them to be stored in the image! That should be easy to fix by using
proxies.
Why are the classes List, Document, ListItem, and Paragraph essentially
empty? They have no instance variables and only have the one method to
support visiting? Is this because any data they hold is in children?
-Ralph
Here are my current plans. Any comments?
Everything will be stored in files. It will be very portable. It will be
fast. It will be robust. You can edit all the text files safely.
A SmallWiki folder is a directory. It has one directory called resources
and one called pages. A page is stored as a text file in the "page"
directory. A resource is stored as a binary file in the "resources"
directory. A folder will probably have other files in its directory. If we
make new kinds of structures, we can make new subdirectories.
Each new version of a page gets added to the end of the file. Each delta
has a timestamp, the author, maybe the version number, and the data. A
timestamp line starts with T, an author line with A, the version number with
V, and the data lines with D. The delta ends with a line that starts with
E. Lines end with one of a set of end of line characters, including CR and
LF. Blank lines are ignored. This should make it so we don't care about
the end-of-line rules of the creator of the file, so it should be easy to
more from Unix to Windows.
Resources are only stored in the file system. Folders and pages are stored
in the image. The disk version of the folders and pages are only read when
the image is starting up. Otherwise, they are only written to, not read.
It should be easy to write the storage manage to handle new pages, new
folder, page edits, and resources. However, I am worried about renames.
Renaming a file is easy. But don't we also have to change all the files
that are in existing pages?
Also, I said this is fast, but it has to open a file for each write, and it
might be opening files in huge directories.
First, there won't be huge directories. If directories get too big then
we'll split them. If a directory is "big" (for some definition of big) then
it will divide its contents into groups with the same first letter in their
name. If it is really big then it will divide them into groups with the
same first two letters in their name.
Also, the storage manager could cache open files. It could try to reuse
open files and close them on a LRU basis. Since there is a lot of locality
of writing, this should reduce the number of file opens. But I will
measure the performance before implementing this, because I am not sure it
will be necessary. I am pretty sure the first one will be, though. 10K
files in one directory takes a long time to search.
In addition to writing a storage manager to update these files, I'll have to
write something to build up a wiki from a file system, and will have to make
proxies for resources so they don't have to be in the image.
Please tell me what is wrong with this.
-Ralph Johnson
>External SIXX persistance exists.
I haven't looked at it yet, but I am skeptical. Do you think
it will do what I want? XML is just barely human readable.
It tends to be slow. It is OK for a least common denominator,
but you can almost always do better.
I've looked at the mailing list. I didn't read the whole thing,
though, and I probably didn't understand a lot that I read.
-Ralph Johnson
I can't find how to add sub-folders to the top folder.
When I start it up, I get a top foler that contains
a folder called "Information". I'd like to add siblings
to "Information". The Workspace should tell me how to
do this. Surely that is more important information than
how to change the callback cache!
So, where is this documented? If it is not documented,
how do you do it?
-Ralph Johnson
It appears to me that the only SmallWiki persistence mechanism
so far is to save the entire image. Is that correct? I'd like
a persistence mechanism a bit more like that of WikiWorks.
Having used WikiWorks for a long time, I'd like to improve upon it.
In particular, file names should be meaningful, and all the versions
of a page should be stored in one file. It should be easy to delete
past history. The file format should make the system very tolerant
of crashes. It should be plain text and easy to edit with any text
editor.
wiki.cs.uiuc.edu has 33 wikis on it. Most of them are small, but at
least three of them have several thousand pages apiece. The image
is over 100 meg, since all pages are stored in memory (like SmallWiki).
When the machine pages, I will buy more memory. It gets several hits
a second, all day long. It runs for several months between reboots.
I need a persistence mechanism that can handle this.
-Ralph Johnson
Does SmallWiki have support for robot.txt? There needs to be a way to tell
the robots to stay away from history and edit pages. The way I did this in
WikiWorks was to generate a robot.txt file that tells them to stay away.
For every wiki XXX, it puts in lines of the form
wiki.cs.uiuc.edu/XXX/EDITwiki.cs.uiuc.edu/XXX/HISTORY
This only works because the page name comes AFTER the command. SmallWiki
puts the command after the page name, so you'd have to generate a line for
every command on every page. Or am I missing something?
Suppose I wanted to change the URLs around so that I could prevent robots
from executing actions. Would this be a major change, or easy?
-Ralph