Text editing from the command line with vim

When learning bioinformatics, you will perhaps need to create or edit text files, shell scripts or Python scripts from the command line. Using a Unix-based text editor is also good practice for getting used to the environment if you are new to the command line. I have seen that many people have their preference for nano, emacs or vim. I started with nano, as it is quite straightforward to use - but then I moved to vim (probably because it had lots of colours).

Read More

How I teach myself new bioinformatics tools

I’m not sure if there’s a name for people who thought they would be doing lab work for the rest of their lives and then find themselves thrust into the deep end of bioinformatics, but I am one of them. This seems to be a common occurrence in research labs, and will probably continue to be until undergraduate programs catch up with the bioinformatics skills required in many fields of research. Fortunately I quite enjoy the stuff, but I am continually learning new things, and I find that with much of it self-taught it can be a long process to learn and then do each analysis.

Read More

Using Bookdown for tidy documentation

In my last blog post, I described how I use R Markdown as a tool in my research to document the analyses I do. I find this very useful to keep a record of the mass of troubleshooting and trial and error I do when I start a new analysis, but when it comes to having a neat record of the final pipeline for someone else to read (and hopefully understand) I needed to create something tidier.

Read More

How I use R Markdown to document my bioinformatics analyses

Complete, neat and thorough documentation of our research is something that we probably all aim to achieve. In the wet lab, lab notebooks are essential and some labs are migrating to online versions like LabArchives. (Update 27-9-17: LabArchives now supports Markdown Syntax!) For bioinformaticians, documentation of code commonly goes on GitHub. However, as a biologically-trained student entering the realm of bioinformatics it was not always clear how best to document my analyses, as this most often involved using commands to run other people’s code rather than writing my own. I moved to writing short bash scripts to run different tools, but there was still an awful lot I wanted to write down regarding what I was learning as I went, not to mention the importance of recording everything that went wrong.

Read More

Parallel gzip compression with pigz

The unfortunate thing about dealing with large volumes of sequencing data is that the analysis of this data can take a lot of time. I’m okay with computational bottlenecks if I’m going to see exciting results once it’s finished running. When it’s decompressing the raw data so you can start? Not so much.

Read More