Back from vacation. This three liner creates a mappable map of the genome. It takes about a day on our new 16 core machine that I wanted to stress a little bit. splitter is of course from emboss. bowtie is used for quick exact matching.
BOWTIE="/projects/solexadst/bin/bowtie-0.10.1/bowtie" BOWTIEINDEX="/projects/solexasrc/genomes/mouse/ncbi37_mm9/bowtie/mm9_bowtie" FASTADIR="/projects/solexasrc/genomes/mouse/ncbi37_mm9/genome/fasta/" for fasta in $(ls ${FASTADIR}*.fa); do splitter $fasta raw::stdout -size 36 -overlap 35 -auto | $BOWTIE -r -v 0 -m 1 -p 16 $BOWTIEINDEX - | cut -f 1 | sort -n | awk -v fa=$(basename $fasta .fa) 'BEGIN{chr="chr" substr(fa,2)}{ if(NR==1){firstpos=$1;lastpos=$1};if($1 > lastpos + 1){ print chr "\t" firstpos "\t" lastpos; firstpos = $1 } lastpos = $1 }END{ print chr "\t" firstpos "\t" lastpos}' > $(basename $fasta .fa).bed; done
Keine Kommentare:
Kommentar veröffentlichen