Tuesday, July 15, 2014

I adjusted the query according to the previous post and also added patterns for the cell the property is actually on (oops):
prefix openworm: <http://openworm.org/entities/>
prefix neuron: <http://openworm.org/entities/Neuron/>
prefix sp: <http://openworm.org/entities/SimpleProperty/>
select distinct ?type where
{ 
?Neuron rdf:type openworm:Neuron .

?Neuron neuron:lineageName ?Neuron_lineageName .
?Neuron_lineageName rdf:type openworm:Neuron_lineageName .
?Neuron_lineageName sp:value ?lineageName .

?Neuron neuron:name ?Neuron_name .
?Neuron_name rdf:type openworm:Neuron_name .
?Neuron_name sp:value "AVAL" .

?Neuron neuron:type ?Neuron_type .
?Neuron_type rdf:type openworm:Neuron_type .
?Neuron_type sp:value ?type .

?Neuron neuron:receptor ?Neuron_receptor .
?Neuron_receptor rdf:type openworm:Neuron_receptor .
?Neuron_receptor sp:value ?receptor 
}
For the data stores we've been using, such queries don't seem to execute very efficiently.

I would hope for it to be quick since

  1. there should be exactly one statement matching (a, sp:value, "AVAL") to look up in the 'pos' index,
  2. and then only one matching (b, neuron:name, a) looked up again in the 'osp' index,
  3. and an index lookup on the 'spo' index for the types of 'b' to check it has the correct type should be efficient.
  4. The neuron should then have only one neuron:name,
  5. and one of all of the other properties which can be looked up in the 'spo' index.

For queries against Sesame, I have to try using the Java API to be more sure these indexes are actually there, otherwise I'm making a lot of assumptions about how well the indexes work. Besides that, I don't have an easy way of understanding how the queries are getting executed without measuring often very long query times. Maybe I can get some sense of how these queries are being handled with some test data.

I've started reading a little about ZODB, a python object storage database. The core of what I've been making for PyOpenWorm is an Object<->RDF mapper. The choice of RDF storage solutions to back PyOpenWorm was based on the work that had already been done, my experience in working with some RDF tools, and an expectation that joining with existing datasets may be easier with RDF than other storage options. It may be useful to re-evaluate going forward.

No comments:

Post a Comment