Friday, January 8, 2016

Another Paper on Reproducible Research

Here is yet another paper on our lack of reproducibility. OK, we hopefully now "get it" and understand that we need to do a better job of reporting and publishing our protocols, including bioinformatics pipelines. Can we please start publishing guidelines and standards to adhere by? And then making journals enforce that those practices get followed?

So today I'm going to give three suggestions on how to better capture your pipeline using my own experiences.


Cartoon by Sidney Harris (The New Yorker)


1. Flow Charts
I am often expressing my love and appreciation for flow charts. I love visuals. I love making lists. And I super love making a list into a visual - TaDa - flow chart. Look, it just helps me stay on track and clients appreciate having something to look at while I verbally run them through the pipeline I am using with their data. Creating a flow chart can take some time (depending on how OCD you are and how much pride in your work you have prior to giving something to a client), but I truly believe that the impact to your work and your client relationship is worth the time you put into it.



2. Version Control
Oh yes, version control is all the rave. Look, it is no secret that changes in versions can give different results. Version releases are important. Releases clean up mistakes, decrease compute time, help maintain relevancy as technology changes, etc. But even a small release where rounding a number can result in huge changes in outcome. So write down your versions and MAKE NOTES when you upgrade them! This is incredibly important for someone to replicate your results.

3. DeBugging
(scene set): You have identified a new, amazing bioinformatics tool. People are blowing up twitter about it, Bioinformatics has published on it, etc. And you have a light bulb moment: Ah .. ha maybe I can use this tool for my research since the data are some what similar to what other people are using this tool for! So you start out making a new FLOW CHART to include this new tool. Should be easy, but all of a sudden you keep getting errors that you can't figure out (so you post on Reddit). You go to the documentation, nothing helps. You search all of the forums, but you either hit a dead end or find out that everyone else is having issues. You email the tools contact only to find out that they are now hired by SAS and just don't have the time to help you.


You are now frustrated (probably because someone has you on a timeline or wrongly thinks you aren't working since there are "no results") and you just know if you could get this tool to work that the result will be INCREDIBLE (I don't know why, but I always think IT WILL BE INCREDIBLE). So you start debugging by trying everything you can possibly think would be wrong. This is where we generally shut down on documenting this process. If you are like me, you get some sort of sick kick out of debugging that is OBSESSIVE IN NATURE. You can't stop to write down what you just tried because your mind is already racing to the next trial. AND when you FINALLY get that thing to work (because you probably won't stop until you get it to break to your every whim) you have the BEST HIGH EVER. And if you are anything like me, you'll start high fiving yourself, anyone who is around you ... and you will immediately forget the details of the last several hours. BUMMER. But you don't care because you are still riding that high.


I know this about myself. I accept it. But I still need to document that process so that I don't have to repeat it (because several months later I might have to). PRO TIP: SCREEN RECORD THE PROCESS. I also like to include audio because it makes the videos more fun for me to watch later. Seriously, get some friends over later, have some drinks and put that video on. If you have geek friends or have an amusing narrative style, it will be the highlight of the party.





No comments:

Post a Comment