OpenWorm notes: May 2014

Wednesday, May 28, 2014

Lineage data

I finally dug into the daughter_of tree (unfortunately, a forest. see below) cell division data.

After cleaning it up a little I put it into OpenRDF: http://107.170.133.175:8080/openrdf-workbench/repositories/OpenWorm2

There are some discontinuities which make the graph unconnected. The roots in our forest of division trees can be shown with this query:

PREFIX ns1:<http://openworm.org/entities/>

select distinct ?z where { ?y ns1:daughter_of ?z . filter(not exists { ?z ns1:daughter_of ?p }) }

These discontinuities have nothing to do with the input errors mentioned in the github issue as all of those in the daughter_of table were easily fixed.
To look at the path-to-root for 'AB plppaaaapa' :

PREFIX ns1:<http://openworm.org/entities/>

select distinct ?z where { ns1:AB_plppaaaapa ns1:daughter_of+ ?z }

The code I used to extract and upload: http://pastebin.com/NVnAfD8D

It's just the cell divisions that we've had for some time. We've discussed bringing in volume data for each of the cells in development in order to model differentiation waves discussed by Richard Gordon and colleagues. As of yet, we don't have this data readily available.

Thursday, May 22, 2014

OpenRDF problems

I was having difficulty with setting up OpenRDF on a Digital Ocean machine, but figured it out. A permissions problem was preventing sesame from creating its log files. Changing the ownership of /usr/share/tomcat7 to user tomcat7 fixed it.

Tuesday, May 20, 2014

I've switched from using a local SQLite database to an OpenRDF SPARQL service which supports SPARQL 1.1 updates. The update statements should include mandatory information about the updater as well as other checks specific to the kind of data being modified. I'm looking into an inference engine that stands between the DB and PyOpenWorm, but in the meantime I'll be encoding the rules in PyOpenWorm itself.

Sunday, May 18, 2014

I have an OWL file with WormBase data in it. And a sparql endpoint that has the same data.

Wormbase has it going on (with respect to anatomical tree displays).

Also:

Work has started on the requirements writeup.

Sunday, May 11, 2014

Cell lineage data

Just links:

Relevant meeting notes

Whitepaper

Github issue

Saturday, May 10, 2014

Updates, Persistence, and Berkeley DB store

For persisting updates to data through PyOpenWorm, we need some stable store. For now I'm looking at how we can save the updates locally. From there we would have mechanisms to write back local changes to the shared repository, possibly with staging for review by other members, like a Github pull request.

Stephen L. has suggested the integration of git with the database system, in particular through presentation of changes through diffs in SQL statements. With the scheme above this could involve dumping to SQL statements on commit to be legible in the diff. To extract a meaningful diff from from these dumps, we'd need to sort the values first so that no-change would mean matching subsequences for diff to find. Just from a cursory assessment, I don't think this too difficult, and we can already see that it would be valuable for exposing the changes in a human readable format and for integrating into existing Github-based workflows.

I have started experimenting with Berkeley DB using the "Sleepycat" store for RDFlib. I've done a little profiling and found that performing queries is about 10 times slower (.02 ms for in-memory v. .17ms for Sleepycat). What's interesting is that query times fall precipitously after a number of repeated queries, suggesting some caching. I'd like to read up on Berkeley DB to get a fuller picture. I'll also be looking at times for writes back to the database.

Wednesday, May 7, 2014

Coming from the data meeting held today, I looked further into the movement validation repo.

I will review later today.

Tuesday, May 6, 2014

Data sources and collection

This is a list of data sources I've identified as actually being used in projects, starting from here. The goal is to understand the different types of data source to know how we can access them and what input/output formats should be targeted:

MySQL database: mysql_31129_celegans at my01.winhost.com
Ion Channel Spreadsheet includes citations from Pubmed showing relationships of ion channels to genes.
Wormbase, generally. There seems to be some query functionality, but it uses its own 'special' query language -- seems not very complicated; pattern based.
Time series data?
Movement validation data pointing to the ftp from Laura Grundy
Comments on some of the spreadsheet cells -- not exported from google drive.

I've also found a lot of scripts for extracting data from different formats:

Extracting from the Laura Grundy ftp
From spreadsheets(CElegansNeuroML project)

I'm still looking around. I'll make a new post with any updates.

Sunday, May 4, 2014

neuroConstruct and CElegansNeuroML

I am thinking that a good model to start working from is the CElegansNeuroML project. For right now I'm going to browse around to figure out how it all fits together. One of the problems I have is that my NeuroConstruct build can't open the 3D display. I'm using Xephyr in XMonad so I can get a non-tiling WM in here (Java AWT programs don't play well with Xmonad it seems). Mayhaps I can use it in a different session.

For right now, I'm going to explore some more data sources, start looking at where there are shared entities. Regardless of what we end up doing, we'll need to handle this somehow.

Friday, May 2, 2014

I'm thinking of trying out the RDFlib store API

Thursday, May 1, 2014

PyOpenWorm and NeuroML

I'm trying to make a neuroml file from the generated rdf database in PyOpenWorm. Trying to make sense of the schema. Everything is related through integer keys and I haven't found a human description yet.

Update 1:
All of the methods read through to the a RDF graph built up in the network module. However, the neuron is not checked for existence in the databse on creation. I'm going to add a primary access method with the check.

Update 2:
Thinking about the structure of these modules. I'm guessing a config file would be good for moving the configurations around. PyOpenWorm should be a library that can be used to make the wrappers, so it shouldn't be tied to any specific database. Sounds like a job for Dependency Inversion (i.e. parametrizing the network class).

Update 3:
Learning libneuorml from Padraig G's examples in the github repo. Right now, I'm leaning towards separating the file-type specific components from PyOpenWorm proper. Eventually I want to stop opening up the library files (closely tied to the schema) and spend my time with the file-specific converters.