Montag, 31. August 2009

Mappable Map

Back from vacation. This three liner creates a mappable map of the genome. It takes about a day on our new 16 core machine that I wanted to stress a little bit. splitter is of course from emboss. bowtie is used for quick exact matching.

BOWTIE="/projects/solexadst/bin/bowtie-0.10.1/bowtie"
BOWTIEINDEX="/projects/solexasrc/genomes/mouse/ncbi37_mm9/bowtie/mm9_bowtie"
FASTADIR="/projects/solexasrc/genomes/mouse/ncbi37_mm9/genome/fasta/"
for fasta in $(ls ${FASTADIR}*.fa);
do splitter $fasta raw::stdout -size 36 -overlap 35 -auto | $BOWTIE -r -v 0 -m 1 -p 16 $BOWTIEINDEX - | cut -f 1 | sort -n | awk -v fa=$(basename $fasta .fa) 'BEGIN{chr="chr" substr(fa,2)}{ if(NR==1){firstpos=$1;lastpos=$1};if($1 > lastpos + 1){ print chr "\t" firstpos "\t" lastpos; firstpos = $1 } lastpos = $1 }END{ print chr "\t" firstpos  "\t" lastpos}' > $(basename $fasta .fa).bed;
done

Sonntag, 16. August 2009

Adventures in the galaxy Pt.1

Introduction

Over a series of blogs called "adventures in the galaxy", I will show how to create a customized importer of data into galaxy.In case you don't know what galaxy is: its a web frontend for bioinformatic programs. The nice thing about galaxy is that it remembers the complete history of a data-set. This includes all the transformations together with the parameters and all the data-sets it has been combined with.

The builtin ability to import data lacked some features that we needed: association with metadata of the experiments, and presenting the data in a structured way. We also want the data to stay in its place. The main motivation of this blog is to have some documentation for my coworkers. Its my first blogging, so please be kind.

Basic setup

The developer site of galaxy seems to have moved to bitbucket galaxy-central. So if you want to submit bug reports or check the issues, go there.

In order to develop clone the repository, create a new branch, hack around and commit your changes. Then switch to the default branch, pull from upstream and merge the updated branch back into your own branch.

hg clone http://bitbucket.org/galaxy/galaxy-central galaxy_blog
cd galaxy_blog
hg branch importer #create branch
hg commit -m "created branch importer"
#...hack around
hg commit -m "useful changes"
hg update -C default #switch branch
hg pull
hg update -C importer
hg merge default #now you have to resolve the merge conflicts

To run galaxy you should: run setup.sh, change the created universe_wsgi.ini by adding an adminstrative user. Whenever you want to run just start run.sh. You can backup the database by copying database/universe.sqlite. After erasing the database it will be recreated when you run run.sh. A nice tool for development with sqlite is sqliteman3. For production you will probably switch to a different RDBM-system.

Initialization

Galaxy does a lot during startup, esp. connecting to the db and setup of the orm. The importer does this like the scripts in the scripts directory. A relatively complete version is available at github as a gist.

startup shell script

The startup shell script calls the actual python script.
#!/bin/sh
cd `dirname $0`
python -ES ./importer/importer.py universe_wsgi.ini $@

python importer

The python script (which I don't include here - get it from github) first does some pythonpath mangling. From line 72 on it parses the configuration file (universe_wsgi.ini) and returns an ImportData object. Now our importer we can use the python object model that was created by the galaxy developers.

As an example, we create a user in the database and associate her with a role. To create an object, just instantiate it and call flush(). You also have to create the association table object.

def createUser(app, username, password):
   """creates a user and a private role ("r" + username) and associates them in the database"""
   password = sha.new( password ).hexdigest()
   user = app.model.User(username, password)
   user.flush()
   role = app.model.Role( name="r" + username, description="private role of " + username, type=app.model.Role.types.PRIVATE )
   role.flush()
   userRoleAssociation = app.model.UserRoleAssociation(user, role)
   userRoleAssociation.flush()

We call createUser from the __main__ method. We have wrapped everything up in a transaction. You have to comment out the raising of the exception on line 8 in oder to save the user.

def __main__():
   app = parseConfig()
   username = "name" + "1"
   session = app.model.session
   transaction = session.begin()
   try:
      createUser(app, username, "123456")
      raise Exception('test for rollback', 'test for rollback')
      transaction.commit()
   except Exception:
      transaction.rollback()
      print("error: rollback")
   user = app.model.User.filter_by(email=username).first()
   if not user:
      print "user not created"
   else:
      print "user created: " + user.email + " " + " ".join(map(lambda r: r.name, user.all_roles()))

Next

From here on its rather simple, but if you don't want to experiment yourself with the galaxy object model yet, the next part of the series will show you how to create libraries, folders and datasets and associate everything with nice forms so authorized users will be able to store comments for each dataset

It would be very nice of you to give me feedback: was it useful, wordy, superfluous etc..