I recently survived the final year of a PhD. Yes, it can be done. Look! I’m actually handing it in and moving on with my life! (Sort of).
Earlier this month I was at ASM 2018 in Brisbane, which contained a lot of great microbiology! I always love spending time in Brisbane, but this time, I was giving a talk at 5:20pm on my birthday… at least it was a productive birthday!
This year is the PhD’s final year, so I’m going to become a bit of a hermit and will probably be blogging less.
When learning bioinformatics, you will perhaps need to create or edit text files, shell scripts or Python scripts from the command line. Using a Unix-based text editor is also good practice for getting used to the environment if you are new to the command line. I have seen that many people have their preference for nano, emacs or vim. I started with nano, as it is quite straightforward to use - but then I moved to vim (probably because it had lots of colours).
I’m not sure if there’s a name for people who thought they would be doing lab work for the rest of their lives and then find themselves thrust into the deep end of bioinformatics, but I am one of them. This seems to be a common occurrence in research labs, and will probably continue to be until undergraduate programs catch up with the bioinformatics skills required in many fields of research. Fortunately I quite enjoy the stuff, but I am continually learning new things, and I find that with much of it self-taught it can be a long process to learn and then do each analysis.
In my last blog post, I described how I use R Markdown as a tool in my research to document the analyses I do. I find this very useful to keep a record of the mass of troubleshooting and trial and error I do when I start a new analysis, but when it comes to having a neat record of the final pipeline for someone else to read (and hopefully understand) I needed to create something tidier.
Complete, neat and thorough documentation of our research is something that we probably all aim to achieve. In the wet lab, lab notebooks are essential and some labs are migrating to online versions like LabArchives. (Update 27-9-17: LabArchives now supports Markdown Syntax!) For bioinformaticians, documentation of code commonly goes on GitHub. However, as a biologically-trained student entering the realm of bioinformatics it was not always clear how best to document my analyses, as this most often involved using commands to run other people’s code rather than writing my own. I moved to writing short bash scripts to run different tools, but there was still an awful lot I wanted to write down regarding what I was learning as I went, not to mention the importance of recording everything that went wrong.
The unfortunate thing about dealing with large volumes of sequencing data is that the analysis of this data can take a lot of time. I’m okay with computational bottlenecks if I’m going to see exciting results once it’s finished running. When it’s decompressing the raw data so you can start? Not so much.
I recently submitted my first manuscript (AHHHHH) and was required to submit our raw sequencing data to an online repository. There were several I could choose from, but I decided to upload to the NCBI Sequence Read Archive.