Current Projects

Some of the programs have been packaged and published on the Python Package Index. To install these tools, simply run the following command as root:

pip install packagename

Next Generation Sequencing

NGS barcodes

For the design and validation of NGS barcodes, this program can be used. It supports different metrics to assess the distance between barcodes. Also when designing barcodes for certain platforms, a limitation should be set on the length of mononucleotide stretches. This program is able to do just that.

FASTA/FASTQ analysis and manipulation

FASTA and FASTQ are very popular formats for reference data and read data respectively. Manipulation of these files is often needed and while there are lots of small scripts available, bundeling them together in one tool kit still seemed like a good idea. This tool kit has the following functionalities:

  • Sanitising FASTA files, this can be useful when making a new reference sequence from other reference sequences.
  • Conversion between FASTA, FASTQ and GenBank formats.
  • Adding of a sequence on the 3' end. This is useful in SAGE analysis.
  • Calculating the Levenshtein distance between two FASTA files.
  • Report the length of all records in a file.
  • Fragment all sequences with a restriction enzyme.
  • Remove mononucleotide stretches.
  • Conversion of quality formats.
  • Counting substrings while allowing for errors.
  • Select a fixed region in all records.
  • Retrieve a slice of a reference sequence.
  • Complement (not reverse complement) a FASTA file.
  • Generate a random FASTA file.
  • Retrieve a reference sequence.
  • Retrieve the sequence content.
  • Split a file in two based on a length threshold.

k-mer analysis

Raw datasets can be analysed based on the occurrence of substrings of a fixed length. This program is a tool kit with a lot of functionalities. Please see the following article for more information.

(M)pileup files

The (m)pileup format is widely used as an intermediate before variant calling, generating coverage plots, etc. The following program, named piletools can generate several types of wiggle tracks from an mpileup file.

For RNASeq data analysis, we currently use split read aligners like gmap or tophat. These gaps that are introduced by the split reads also end up in the mpileup file, so these gaps should be removed. Piletools has the option to remove these gaps, but also to only show these gaps. The latter may be handy for the analysis of splicing events.

For SAGE and CAGE analysis we are usually only interested in the mapping position, not in the actual coverage. Piletools is able to record only the 5' or 3' end of a read while also taking strandedness into account.

Targeted characterisation of short structural variation

For the analysis of amplicons, but also short tandem repeats, the following program can be used. Please see this article for more information.

Calculating the distance between wiggle files

On this page you can find an implementation of an algorithm that calculates the ''multiset distance'' between two wiggle files. Currently, this package is no longer maintained since its functionality is incorporated in the wiggelen tool kit. See this article for more information.

Simulated reads

For testing purposes, it can be very handy to make a simulated dataset. The following program does just that. It has multiple modes of operation:

  • Generate simulated reads from a local reference sequence.
  • Generate simulated reads from a slice of a reference sequence. In this mode a list of variants in HGVS format can be supplied to incorporate mismatches, insertions, deletions, inversions, etc.

Variant databases

To store large quantities of variant calling data, the database named the Diagnostic Variant Database DVD was created. This code is still maintained, but will probably be replaced with this implementation in the long run.

Mutalyzer

Mutalyzer is an extensive web based service for checking the Human Genome Variation Society (HGVS) nomenclature. Please see the site for more information.

We have published the BNF as a formalised description of the HGVS nomenclature. For a full description of Mutalyzer, see the following article. Also see this news item.

Supervision

cli-mate cli-mate

LeveDNA! blattery blattery

phased-variants

GAPSS3

lims lims

phased-variants

Other

generations

poster

presentation

ngs-misc