Data scientists are often very cloistered in the world of software development. We are noble purists that value elegant math and efficient logic. As a data scientist turned project manager, I have some pieces of wisdom for managing developers working in unfamiliar domains. A few examples of situations this can apply to:
- You need an API-backed iOS app and you have no experience with web development or Objective-C.
- Your client needs your complex algorithm deployed as a user-friendly single page web app, but you have no CGI, PHP, or Django experience.
- Your organization needs a machine-learning procedure deployed in SAS for federal compliance purposes, but you are a religious R user.
In each of these scenarios, it is necessary to hire experts in the right fields to deliver your product with any semblance of haste. Some pointers to follow!
User Interface Matters
Data scientists like myself are often tempted to make bare-bones (but fully functional) software that looks unattractive and is not at all intuitive to use. I once made the mistake of assuming that users of a web app would understand how to enter a “tab-delimited table” in a large white HTML form field. We need to remember that “tab-delimited” is not in the average man’s vocabulary, and it is impractical to ask a user to type a .fwf (fixed-width file) by hand into a form field. We later fixed this by adding an “Upload Excel Spreadsheet” button with parsing routines to find and auto-extract tables.
Additionally, obviously old UI’s look and feel cheap. Almost every business owner I have met is willing to spend the extra money for an attractive UI. It felt unnatural to me the first time I hired an Adobe Photoshop professional to draft a modern UI for an iPad app, but it turned out being the best expense I made on that specific project. An update does not feel like an update if users see the same ugly interface.
Budget for Support
When hiring developers in unfamiliar languages, you as the project leader sacrifice some control over the quality and sustainability of a codebase. When striking agreements to develop and hand off software to the client, budget your time and money for incidental support. When striking work agreements with developers for the project, include provisions for future support. You should enter agreements with the knowledge that there is a very high likelihood of untimely failure that you will be expected to support on a moment’s notice. If you are developing in an unfamiliar language, your developers for the project also need to be available at short notice, because they are both domain experts and most familiar with the codebase.
Preparing for failure is a less critical procedure when data scientists are working with other data scientists. More likely than not, any R programmer can debug any R program, any Python programmer can debug any Python program, and so on. In these situations, as long as you or one other team member is available to investigate the problem, it will be fixed in a timely manner.
Have Sharp Administrative Skills
As a project manager, you want to maximize ease of development and proprietary control over the project. In other words, you want to enable developers to work while still maintaining total and real-time control over the assets being created. This requires an array of administrative skills including:
- Cloud administration
- Aim for familiarity with at least two cloud providers and at least one cloud CLI
- Avoid deploying on shared instances owned or leased by developers, lease your own hadrware at all costs
- Learn to manage networks and firewalls for different types of applications
- Linux administration
- User and permissions management are very important for securing servers with multiple developers and/or web-facing servers
- Package and dependency management are essential skills. Know which Linux distributions meet your needs.
- Git/Github administration
- know how to operate, analyze, and organize, even if you won’t be making commits
- Assess need for and be able to deploy private Git servers
- Database administration
- As with the cloud, avoid using database instances owned or leased by anyone other than yourself
- Understand the differences between database backups and file-system backups, make use of frequent database backups
- Authorize users and set permissions with similar discretion as with Linux file-systems
There are a number of things data scientists can do to improve their project management skills. It is worth noting, these are broad and introductory pointers to the world of DevOps. A thorough treatment of DevOps as a practice can seem overwhelming and foreign, especially for those programmers that have not worked in enterprise settings. I encourage anyone interested to investigate DevOps and the more advanced tools used to maintain and deploy software on an large scale. For a practical treatment of such tools, I recommend The DevOps 2.0 Toolkit by Victor Farcic.