Buckybase

wiki naming solved

I have found the optimal solution to the wiki naming system question:

Solution = unique ID & optional, unique slug & optional title

The slug is used to create links and in URLs. If it exists, the title replaces the slug in link labels, headlines, etc.

This system doesn't require a slug for every page (so you can just jot down some notes into the system without bothering about naming), but without a slug you can't link easily to a page (so, later when you want to link to a note, you can go back, and give it a slug). Slugs must be unique because otherwise the planned social joins (which join pages from different users/systems via their slugs) become tricky.

Novice users can simply use slugs as if they were titles, and get slightly uncool URLs, e.g. /My+Holiday+in+%C3%96sterreich. More experienced users can assign short, hopefully more durable slugs (e.g. /holiday-2007) and an additional longer title.

datawiki syntax

Here's a draft syntax for the datawiki I'm working on, with a data model where each page is a multimap, that maps keys to lists of values.

The goal is to stay close to how one usually takes notes, while allowing the expression of arbitrary metadata associated with each page.

Basics

Pages come in three forms:

1) Only body block

Remember to call Bob!

If a page has only one block, it is automatically the body of the page. So this equals:

body: Remember to call Bob!

2) One title and N body blocks (no key-value pairs)

TODO

Get EC2 working.

Try Hunchentoot SSL support.

If a page has more than one block, but no key-value pairs, the first block is the title, the others are the body of the page. So this equals:

title: TODO

body: Get EC2 working.

Try Hunchentoot SSL support.

3) Page with key-value pairs

Time Out of Mind

by: Bob Dylan
rating: *****

Must be one of his best records of late.

If a page has key-value pairs, everything before them is the title, everything after them is the body of the page. So this equals:

title: Time Out of Mind

by: Bob Dylan
rating: *****

body: Must be one of his best records of late.

Key-value pairs

Rules for parsing them:

A) Values may span multiple lines, except for the last value

We want to allow values that span multiple lines, including empty ones, like so:

value: bla bla bla

bla bla bla

value 2: even more bla
...

This conflicts a bit with the rule that everything after the key-value pairs is the body, so the last value of the page can have no empty lines:

...
last value: bla bla bla

even more bla

equals:

...
last value: bla bla bla

body: even more bla

B) Keys can contain any character except the ':'

The following are valid keys: 

author of
message-id

C) Spaces at the start of the line disable key-value parsing

In order to be able to write a value that contains a ':', key-value parsing is disabled if the line starts with one or more spaces.

For example, this defines a single key-value pair, and not two key-value pairs:

story: bla bla
  He said: bla bla

D) Commas split a value into a list (sometimes)

If a value contains commas, and the comma-separated parts are rather short, and the value does not end with sentence-end punctuation, it is parsed as a list.

tags: lisp, semweb, semistructured

equals:

tags: lisp
tags: semweb
tags: semistructured

But this does not parse as a list:

description: A nice house, even if somewhat old.

RFC: wiki naming system

Descriptiveness and durability of names are intertwined:

ID: non-descriptive, durable (P8420028043)
Slug: semi-descriptive, semi-durable (foorore-notes)
Title: descriptive, non-durable (Foorore: because notes are already hyperdata)

For example, changing the text of an article may require that its title be changed, while its (less descriptive) slug could still be valid. No change to the text of an article will ever require a change of its ID, because it is not descriptive of the article.

I am currently undecided between two naming systems for my semistructured datawiki:

System A: unique slug & optional title

Every page has a unique slug, that is used in URLs, e.g. foorore-notes (could also be Foorore: because notes are already hyperdata, but then you have uncool URLs).

A longer title can optionally be specified and is automatically used by the user interface (in link labels and such).

If you use a short, meaningful slug, you may have a cool URL, even if the page changes considerably.

System B: unique, invisible ID & optional slug & optional title

Every page has an ugly, unique ID, but it is usually hidden.

A title is usually specified, and sometimes a slug for cool URLs.

Hmmm, entia non sunt multiplicanda and such, but OTOH, we would have all three classes of names, which could be useful. For example, if you really wanted to, you could link to the ID, and thereby have a really hard link.

In a pile of paper you don't have unique names either, but you do have invisible, unique identities (the pages themselves).

We could allow links to all three names:

  • /P8420028043
  • /foorore-notes
  • /Foorore: because notes are already hyperdata

Slugs and titles are not automatically unique in system B, so this would be ambiguous (or the the user could be prohibited from saving a page if its title or slug already exists, which sucks).

Foorore: because notes are already hyperdata

Here's my most recent thinking on semistructured databases...

The system should store notes, that usually consist of a title, some structured information, and a longer text, like so:

Deli NYC

tel: 6326 2835
location: Shanghai
special: Super Burrito (Sat and Sun only)
tags: shanghai, food, sandwich, excellent
rating: *****
url: http://www.delinyc.com/nyc.htm (seems to be down, CNY related)

Deli NYC plain rocks, especially the Tuna Melt, Pastrami and Mozzarella, Ham and Egg Salad, and of course the California Style Super Burrito.

Delivery usually takes 30 minutes.

The extracted data should look somewhat like this:

title: Deli NYC
tel: 6326 2835
location: Shanghai
special: Super Burrito (Sat and Sun only)
tags: shanghai
tags: food
tags: sandwich
tags: excellent
rating: *****
url: http://www.delinyc.com/nyc.htm (seems to be down, CNY related)
body: Deli NYC plain rocks, especially the Tuna Melt, Pastrami and Mozzarella, Ham and Egg Salad, and of course the California Style Super Burrito.
body: Delivery usually takes 30 minutes.

Note that the system uses some heuristics to find out that tags is actually a list of items, and not one long item. (The heuristic could be that each item is short, has little punctuation, and the list does not end with a dot.)

There are some issues with the use of heuristics, and ideally there would be ways to disable them when there's a problem.

Sloppy linking and backlinks: Links to other items (e.g. Shanghai, the rating *****) should be very sloppy (i.e. case-insensitive, also insensitive to punctuation), and also lead to discoverable backlinks from those items back to Deli NYC, as in Buckybase.

Pretty URLs: The URL for the Deli NYC item could be /manuel/Deli+NYC, but it should be possible to specify a different, easier-to-type URL for an item, especially if it has a long title. This could be done with a key-value pair:

slug: delinyc

Then /manuel/delinyc would also be an URL for the item.

User interface: Like Google, the system should have a search bar at the top, with two buttons, "search" and "go" (I'm feeling lucky).  Search gives you a list of matching items, while go jumps to an item that has the string you entered as title (and can also be used to create new items: going to an item that doesn't exist brings up the edit mode.)

Versioning: Ideally, the system would provide versioning for items, but without some sort of stable identifier for each item, this could be tricky. The system could use a hidden identifier in the edit form, and update the existing item, though.

Basically, the system should be as sloppy and useful as a pile of paper, but bring some of the benefits of computers (searching, backlinks, versioning, and further down the road advanced slice-and-dice-ability and presentation tools.)

Buckybase replication

From an email I wrote, explaining Buckybase's multi-device replication (which is put on ice until the basic server protocol has a reference implementation):

> ... users never have to merge if they don't want to: the
> system stores all variants of a document separately (at each device),
> and when I look at my laptop I can see:
> that this document has title "foo" on my laptop, but title "bar" on my
> server, and title "quux" on my cellphone. The GUI would show me all
> three values, but on each device the "local" value would displayed in
> a preferred style.
>
> Now, of course, this just defers the merging problem. As a next step,
> Simple Sharing Extensions could be added, so we can show the user only
> the real conflicts. I just hope that the real conflicts between a
> user's devices are rather small, and when the devices are synced e.g.
> daily, quite manageable.

Buckybase HTTP API Summary

Base
List accounts

GET http://buckybase/

Account
Create account

POST http://buckybase/?account=...&password=...
List account's folders

GET http://buckybase/account
Delete account

DELETE http://buckybase/account 

Folder
Create folder

POST http://buckybase/account?folder=...
Feed of folder's recently updated pages

GET http://buckybase/account/folder
Upload binary media to folder

POST http://buckybase/account/folder
Delete folder

DELETE http://buckybase/account/folder 

Page
Show page

GET http://buckybase/account/folder/page 
Update page

PUT http://buckybase/account/folder/page 
Delete page

DELETE http://buckybase/account/folder /page 

From Buckybase microprotocol.

Buckybase microprotocol: closer to the REST-metal

Can your website be your API? --  Kevin Marks and Tantek Çelic

This is an access protocol and document microformat (based on XHTML 2.0 metadata attributes) for the data model introduced in Buckybase, a document database with bidirectional hyperlinks and reverse chronological access*:

  • Every page has a unique name.
  • A page has any number of user-defined fields.
  • All fields are multi-valued and ordered (single-value fields are just multi-valued fields with one value).
  • Field values can be XML snippets or links to other pages. (A field may contain a mixture of XML and links as values.)
  • Inverse fields are automatically made available and contain backlinks. However, the order of inverse fields cannot be set manually.

A Buckybase server has many accounts. An account has many folders. A folder contains many pages. Folders serve as units of syndication and access control.

This protocol uses simple, tailored REST requests (GET, PUT, POST, and DELETE to hierarchical, guessable URLs with arguments) to access and manipulate the accounts, folders, and pages of a Buckybase server.

Any kind of XML (not only HTML, but also RSS or SVG) can be used for pages, as long as a small number of XHTML 2.0 attributes from the Metainformation module (about, rel, rev, href, property) are used to express fields with links and XML values.

Sample Page

A sample page that represents a bug report:

Software doesn't exist yet

Would be nice to have it by the end of 2006.
2007 ain't bad either.
Is also more realistic.
Related: bug-2, bug-3.

<div about="http://buckybase/manuel/bugs/bug-1"
  xmlns:buckybase="http://buckybase.googlecode.com/">
  <h1 property="buckybase:title">Software doesn't exist yet</h1>
  <div property="buckybase:notes">Would be nice to have it by the end of 2006.</div>

  <div property="buckybase:notes">2007 ain't bad either.</div>
  <div property="buckybase:notes">Is also more realistic.</div
  Related:
  <a rel="buckybase:related" href="
http://buckybase/manuel/bugs/bug-2">bug-2</a>,
  <a rel="buckybase:related" href="
http://buckybase/manuel/bugs/bug-3">bug-3</a>.
</div>

Continue reading "Buckybase microprotocol: closer to the REST-metal" »

Buckybase HTTP API, Google Group

A tentative design for the HTTP API of the Buckybase document database is checked in.

Plus, I have created a buckybase group at Google Groups, for discussions and notes.

Buckypraise

My Buckybase vaporspec has received some very nice comments:

Thanks guys, you made my day!

Private messages I've received basically all say "nice and simple", so I guess I gotsta start hacking RSN.

First steps towards multi-device support in Buckybase

[Note: these ideas are not very well thought through. Nevertheless I'd like to get them out.]

I'd like to replicate my Buckybase documents on a central server, my laptop, my phone, and probably on some other machines, too. Per the Buckybase design, my documents are only ever edited by me, so there are no shared state issues between multiple users to think about.

The server will have a Buckybase installed, whereas the laptop and cellphone may have only a browser with client-side persistent storage (and possibly not even that), which cannot be accessed from the network (yet).

I'd like to be able to decide which feeds are replicated on which devices. For example, a public weblog or wiki could be replicated on all devices, whereas a private database could reside on the laptop only.

From my (the user's) perspective a document exists in multiple variants, one for each device. On each device, I would like to see the other devices' variants. The guarantee that each variant arrives at all other devices eventually seems sufficient.

When a document's variants differ, I'd like some smart GUI to show me the conflicts (for example, if a document's title was different on the laptop and cellphone, I'd like to see both titles. Heh, a document with two different titles, that should make some people I know nervous!)

Basically, this seems like ultra-weak consistency: because all variants are maintained at all endpoints, smart conflict detection (e.g. via Simple Sharing Extensions) and merging seem to become policy choices, higher up the stack.