twinql: a SPARQLing Autopsy, Part 2
Continuing on from my previous twinql retrospective, this post polishes off the example, and continues to explain just how things work. It probably doesn't make the blindest bit of sense, because I'm writing it on an aeroplane, with a splitting headache, and no elbow room. Such is life.You may recall (if you don't, go and read it again!) that we had a set of queries that we had derived to represent a triple pattern, each with appropriate bindings from an earlier pattern. Running these queries against the store returns a number of triples — they are simply standard (s p o) queries.
John
John's synthesised triple pattern results are these two triples:You'll see both of these match the triple pattern. From the triple pattern, (ex:john foaf:knows z), and these results, we can synthesise two sets of bindings, which twinql represents as hashes. Each hash contains mappings from the pattern variables to the results of the queries.ex:john foaf:knows ex:mike .
ex:john foaf:knows ex:steve .
Each newly-returned triple adds another variable — z, in this case — into our existing bindings (y). After running each pattern, we end up with a list of hashes, each of which contains (unique) mappings — from x, y, and z — to results in the graph. The idea is that each binding's variables could be substituted into the query patterns to yield a set of triples that exist in the original graph. We've made a mistake somewhere if we're missing a set of bindings (there are paths through the instance graph that are not generated by our bindings) or have additional bindings (we can generate a set of triples that don't all exist in the graph).
That's how graph pattern resolution works. Each triple pattern is used to generate and then progressively refine results by binding against query patterns; the results are binding hashes, which are provided as input to successive patterns. As we work, we record which variables have been ‘touched’, allowing us to decide whether to expand or refine the previous results. When we reach the end of our triple patterns, we have a list of bindings: our final results.
Code
How does this look in twinql? Here are the function signatures.There are other methods defined for the optional-graph-pattern class, and so on.(defun process-triple-pattern (db triple-pattern
bindings-list touched
optionalp)
"Run a triple pattern against a database and some
bindings. The pattern may be processed as an
optional."
…)
(defun compile-graph-pattern (graph-pattern &optional
optionalp)
…)
(defmethod run-pattern ((pattern graph-pattern)
db &optional
(bindings-list nil)
(touched (make-hash-table)))
"Run the given pattern against DB. Optionally take
existing bindings and touched variables."
…)
More complex patterns
So how do we extend this to all the features that SPARQL offers — nested patterns, OPTIONAL patterns, UNION, and so on? It's actually quite simple.- UNION
- Run each sub-pattern and concatenate the results.
- OPTIONAL
- Ordinarily, when a graph pattern fails to provide results, it returns nothing, discarding whatever bindings were provided to it. (This is why queries fail if one of their parts isn't matched.) An optional pattern simply returns its inputs instead.
- Nested patterns
- These are run just like triple patterns, taking pre-existing bindings and returning the effects of integrating the sub-parts of the query.
This diagram (PDF) shows how things work for twinql as a whole. The parser takes textual input and produces s-expressions, the most relevant part of which is the graph pattern (we later refine down the selected variables, amongst other things). These s-expressions are either executed directly or precompiled into objects; these are then dispatched to generic functions to evaluate them against a Wilbur database. We get a list of bindings hashes from this; the results can be used directly (for programmatic integration with SPARQL) or presented as XML, plain text, etc.: this is SELECT. CONSTRUCT is implemented by piping the bindings back into triple patterns to synthesise new triples; ASK checks if we have any results at all; DESCRIBE takes the selected resources and returns their CBDs.
All that's left are distinctness, filtering, and sorting of results, which I will cover next time.
Posted at 2005-11-12 10:48:29 by Richard • Link to twinql: a SPARQLin…
Comments, trackbacks.
