This past spring, I offered my services to a biology lab at the University of Virginia. This lab was pushing the limits on a few fronts, including malaria diagnosis, cell imaging, and DNA processing. My job was to help the lead researchers understand advanced programming concepts, organize code, and administer servers.
Setting Goals
I first met with researchers to establish goals. The goals themselves were relatively straightforward, but the steps to reach them required a little experience and a lot of planning. For example, we needed a pipeline to process and study our DNA data, but…
- The data is a few terabytes uncompressed, and we don’t have a big enough computer.
- The pipeline requires about a dozen dependencies, each of which requires a different version of C++ and Python.
- We are missing critical documentation, and we are producing code faster than we can document it.
The researchers had their goals, and whether or not I was there to help, they were going to keep trying to achieve those goals. I knew I was only with them for a semester, and I had my own goals to help them in the long-term, such as…
- Teach Git, set up a Git server, and encourage code-sharing and version control within the lab.
- Draft a protocol for creating minimal reproducible examples on big data projects.
- Create OS-independent programming environments to facilitate teamwork.
Whether or not we met our short term goals in that semester, I knew it was important to teach Technical Project Management for when they didn’t have a TPM professional around.
Meeting Goals
Midway through the semester, we were working hard to meet these goals. To my delight, all of the researchers adapted very well to the new practices. To solve some of the technical problems, we ended up…
- Launching AWS servers with a few terabytes of elastic block storage to store the uncompressed DNA data.
- Discovering Anaconda channels with newer versions of many of the dependencies.
- Picking up Markdown on GitHub to make documentation and task management quick and convenient.
To my surprise, solving the technical issues they faced before my arrival also helped us reach my TPM goals for the lab. For example, Anaconda solved their dependency issues and created OS-independent programming environments. Also, creating more and better documentation helped introduce Git and GitHub as teaming solutions. I have one explanation for this: Best practices are “best” for a reason. Starting with the best practice helps us finish with fewer problems.
Conclusion
I am doubling down on my criticism of engineering curricula by lumping science curricula in with it. We have scientists doing potentially life-saving research getting hung up on basic code-sharing, dependency management, and server administration issues. The cutting edge is more quantitative than ever, so we should be introducing these concepts to scientists early in college. In the meantime, I encourage the life sciences to continue to cooperate with computer science departments to clear roadblocks and get the research out.
Leave a Reply