Introduction
Over a series of blogs called "adventures in the galaxy", I will show how to create a customized importer of data into galaxy.In case you don't know what galaxy is: its a web frontend for bioinformatic programs. The nice thing about galaxy is that it remembers the complete history of a data-set. This includes all the transformations together with the parameters and all the data-sets it has been combined with.
The builtin ability to import data lacked some features that we needed: association with metadata of the experiments, and presenting the data in a structured way. We also want the data to stay in its place. The main motivation of this blog is to have some documentation for my coworkers. Its my first blogging, so please be kind.
Basic setup
The developer site of galaxy seems to have moved to bitbucket galaxy-central. So if you want to submit bug reports or check the issues, go there.
In order to develop clone the repository, create a new branch, hack around and commit your changes. Then switch to the default branch, pull from upstream and merge the updated branch back into your own branch.
hg clone http://bitbucket.org/galaxy/galaxy-central galaxy_blog cd galaxy_blog hg branch importer #create branch hg commit -m "created branch importer" #...hack around hg commit -m "useful changes" hg update -C default #switch branch hg pull hg update -C importer hg merge default #now you have to resolve the merge conflicts
To run galaxy you should: run setup.sh, change the created universe_wsgi.ini by adding an adminstrative user. Whenever you want to run just start run.sh. You can backup the database by copying database/universe.sqlite. After erasing the database it will be recreated when you run run.sh. A nice tool for development with sqlite is sqliteman3. For production you will probably switch to a different RDBM-system.
Initialization
Galaxy does a lot during startup, esp. connecting to the db and setup of the orm. The importer does this like the scripts in the scripts directory. A relatively complete version is available at github as a gist.startup shell script
The startup shell script calls the actual python script.#!/bin/sh cd `dirname $0` python -ES ./importer/importer.py universe_wsgi.ini $@
python importer
The python script (which I don't include here - get it from github) first does some pythonpath mangling. From line 72 on it parses the configuration file (universe_wsgi.ini) and returns an ImportData object. Now our importer we can use the python object model that was created by the galaxy developers.
As an example, we create a user in the database and associate her with a role. To create an object, just instantiate it and call flush(). You also have to create the association table object.
def createUser(app, username, password): """creates a user and a private role ("r" + username) and associates them in the database""" password = sha.new( password ).hexdigest() user = app.model.User(username, password) user.flush() role = app.model.Role( name="r" + username, description="private role of " + username, type=app.model.Role.types.PRIVATE ) role.flush() userRoleAssociation = app.model.UserRoleAssociation(user, role) userRoleAssociation.flush()
We call createUser from the __main__ method. We have wrapped everything up in a transaction. You have to comment out the raising of the exception on line 8 in oder to save the user.
def __main__(): app = parseConfig() username = "name" + "1" session = app.model.session transaction = session.begin() try: createUser(app, username, "123456") raise Exception('test for rollback', 'test for rollback') transaction.commit() except Exception: transaction.rollback() print("error: rollback") user = app.model.User.filter_by(email=username).first() if not user: print "user not created" else: print "user created: " + user.email + " " + " ".join(map(lambda r: r.name, user.all_roles()))
Next
From here on its rather simple, but if you don't want to experiment yourself with the galaxy object model yet, the next part of the series will show you how to create libraries, folders and datasets and associate everything with nice forms so authorized users will be able to store comments for each dataset
It would be very nice of you to give me feedback: was it useful, wordy, superfluous etc..
you rule for doing this, and you will benefit 1000 people for every one who leaves a comment. on behalf of the 999 lurkers, thx.
AntwortenLöschenyour formatting renders well on google reader, and on the site.
where possible, stick to a single style for anything which the user needs to type. don't hide hide 'run setup.sh' in paragraph text, instead be explicit about 'sh setup.sh' vs './setup.sh'.