Sat 12 Nov 2005

twinql: a SPARQLing Autopsy, Part 2

Continuing on from my previous twinql retrospective, this post polishes off the example, and continues to explain just how things work. It probably doesn't make the blindest bit of sense, because I'm writing it on an aeroplane, with a splitting headache, and no elbow room. Such is life.

You may recall (if you don't, go and read it again!) that we had a set of queries that we had derived to represent a triple pattern, each with appropriate bindings from an earlier pattern. Running these queries against the store returns a number of triples — they are simply standard (s p o) queries.
John
John's synthesised triple pattern results are these two triples:
ex:john foaf:knows ex:mike .
ex:john foaf:knows ex:steve .
You'll see both of these match the triple pattern. From the triple pattern, (ex:john foaf:knows z), and these results, we can synthesise two sets of bindings, which twinql represents as hashes. Each hash contains mappings from the pattern variables to the results of the queries.

Each newly-returned triple adds another variable — z, in this case — into our existing bindings (y). After running each pattern, we end up with a list of hashes, each of which contains (unique) mappings — from x, y, and z — to results in the graph. The idea is that each binding's variables could be substituted into the query patterns to yield a set of triples that exist in the original graph. We've made a mistake somewhere if we're missing a set of bindings (there are paths through the instance graph that are not generated by our bindings) or have additional bindings (we can generate a set of triples that don't all exist in the graph).

That's how graph pattern resolution works. Each triple pattern is used to generate and then progressively refine results by binding against query patterns; the results are binding hashes, which are provided as input to successive patterns. As we work, we record which variables have been ‘touched’, allowing us to decide whether to expand or refine the previous results. When we reach the end of our triple patterns, we have a list of bindings: our final results.
Code
How does this look in twinql? Here are the function signatures.
(defun process-triple-pattern (db triple-pattern
bindings-list touched
optionalp)
"Run a triple pattern against a database and some
bindings. The pattern may be processed as an
optional."
…)

(defun compile-graph-pattern (graph-pattern &optional
optionalp)
…)

(defmethod run-pattern ((pattern graph-pattern)
db &optional
(bindings-list nil)
(touched (make-hash-table)))
"Run the given pattern against DB. Optionally take
existing bindings and touched variables."
…)
There are other methods defined for the optional-graph-pattern class, and so on.
More complex patterns
So how do we extend this to all the features that SPARQL offers — nested patterns, OPTIONAL patterns, UNION, and so on? It's actually quite simple.
UNION
Run each sub-pattern and concatenate the results.
OPTIONAL
Ordinarily, when a graph pattern fails to provide results, it returns nothing, discarding whatever bindings were provided to it. (This is why queries fail if one of their parts isn't matched.) An optional pattern simply returns its inputs instead.
Nested patterns
These are run just like triple patterns, taking pre-existing bindings and returning the effects of integrating the sub-parts of the query.
And that really is it for basic querying. Using these constructs we can take arbitrary nestings of triple patterns, sub-graphs, optional subgraphs (which can themselves be nested), and unions, and resolve them against a database.

This diagram (PDF) shows how things work for twinql as a whole. The parser takes textual input and produces s-expressions, the most relevant part of which is the graph pattern (we later refine down the selected variables, amongst other things). These s-expressions are either executed directly or precompiled into objects; these are then dispatched to generic functions to evaluate them against a Wilbur database. We get a list of bindings hashes from this; the results can be used directly (for programmatic integration with SPARQL) or presented as XML, plain text, etc.: this is SELECT. CONSTRUCT is implemented by piping the bindings back into triple patterns to synthesise new triples; ASK checks if we have any results at all; DESCRIBE takes the selected resources and returns their CBDs.

All that's left are distinctness, filtering, and sorting of results, which I will cover next time.

Posted at 2005-11-12 10:48:29 by RichardLink to twinql: a SPARQLin…
Comments, trackbacks.

Welcome back, sir

I'm back in the States! It's 10:15am here now, and I've got rid of my hangover. Awesome.

The title of this post comes from the immigration officer who processed me. It was a good start — everyone at Tellme was the perfect combination of superbly welcoming and open-mouthed in shock. Mere hours after landing I was in St. Steven's Green, eating mixed grill and drinking Bass with the crew, and Mervyn's was definitely on the agenda. Special thanks to Jen and Jen for coming out with the crew!

Now I have milk to buy and a bank account to open….

Posted at 2005-11-12 10:19:22 by RichardLink to Welcome back, sir
Comments, trackbacks.

Google
Web holygoat.co.uk
  • richard is: