Here's a draft syntax for the datawiki I'm working on, with a data model where each page is a multimap, that maps keys to lists of values.
The goal is to stay close to how one usually takes notes, while allowing the expression of arbitrary metadata associated with each page.
Basics
Pages come in three forms:
1) Only body block
Remember to call Bob!
If a page has only one block, it is automatically the body of the page. So this equals:
body: Remember to call Bob!
2) One title and N body blocks (no key-value pairs)
TODO
Get EC2 working.
Try Hunchentoot SSL support.
If a page has more than one block, but no key-value pairs, the first block is the title, the others are the body of the page. So this equals:
title: TODO
body: Get EC2 working.
Try Hunchentoot SSL support.
3) Page with key-value pairs
Time Out of Mind
by: Bob Dylan
rating: *****Must be one of his best records of late.
If a page has key-value pairs, everything before them is the title, everything after them is the body of the page. So this equals:
title: Time Out of Mind
by: Bob Dylan
rating: *****body: Must be one of his best records of late.
Key-value pairs
Rules for parsing them:
A) Values may span multiple lines, except for the last value
We want to allow values that span multiple lines, including empty ones, like so:
value: bla bla bla
bla bla bla
value 2: even more bla
...
This conflicts a bit with the rule that everything after the key-value pairs is the body, so the last value of the page can have no empty lines:
...
last value: bla bla blaeven more bla
equals:
...
last value: bla bla blabody: even more bla
B) Keys can contain any character except the ':'
The following are valid keys:
author of
message-id
C) Spaces at the start of the line disable key-value parsing
In order to be able to write a value that contains a ':', key-value parsing is disabled if the line starts with one or more spaces.
For example, this defines a single key-value pair, and not two key-value pairs:
story: bla bla
He said: bla bla
D) Commas split a value into a list (sometimes)
If a value contains commas, and the comma-separated parts are rather short, and the value does not end with sentence-end punctuation, it is parsed as a list.
tags: lisp, semweb, semistructured
equals:
tags: lisp
tags: semweb
tags: semistructured
But this does not parse as a list:
description: A nice house, even if somewhat old.
Very nice! What are you planning to do for queries?
Posted by: Kragen Sitaker | February 23, 2007 at 13:32
Thanks!
The system will provide three fundamental ways to do basic queries, that correspond closely to user interface views:
1.) topic-based access: show me all pages with a given topic (where the topic is the slug/short title of a page).
2.) time-based access: show me the most recently changed pages as a paged list.
3.) fuzzy full-text access: show me all pages based on relevance to a search query.
Now, all of these access mechanisms are parametrizable with a second axis: in each method, you can define whether you want to see results from A) all users, B) a single user's whole dataset, or C) a single dataset (wiki, blog, project...) of a single user.
For example, you can say "show me recently changed pages by user U in his wiki (but not in his other datasets)", and "show me all pages with the topic T by all users", etc.
All pages returned by a query contain the typed backlinks/inverse fields from other pages to that page.
I developed a system similar to this as a bug tracking system once, and it was already extremely useful (especially the backlinks), much more than conventional bug trackers.
Of course, more structured queries would be nice, and the Google Base Query Language would seem a perfect fit. This could probably be translated to Lucene queries in many cases.
However, I know how to implement the above basic queries efficiently, but I don't know if I have the computational resources for the more general Base queries. I should take a second look at the Partial Match Index.
Posted by: Manuel | February 23, 2007 at 14:49
with those filthy spammers putting links on my blogs. It’s just not being lazy moderating but I just don’t have time moderating.
Posted by: louboutin heels | May 30, 2011 at 15:56