In my discussion of the role of data scientists, I reinforced that good data scientists are good software developers. When I talk about my role as a software developer separately from my role as a data scientist, I am highlighting my skills with…
- GUI Development (Python Tkinter, PyQt)
- Computational Efficiency
- Cloud Management & Server Administration
- Website, API & Mobile Development
- Technical Project & Dependency Management
These are skills I have learned in the pursuit of deploying and delivering analytics projects.
I frequently talk about how I started to program using the R Language, but quickly picked up more languages as more goals necessitated. A data scientist can be a competent analyst with only knowledge of the R Language, but it limits his ability to deploy, communicate, and actuate his ideas.
If your program requires user input, there aren’t many options in pure R, plain and simple. The following case study illustrates this concept.
Case Study: Data Enrichment Pipeline
I was tasked with manually acquiring and enriching over 10,000 high-resolution pictures for a machine-vision database. I needed one set of personnel to upload specific pictures and another set of personnel to mark them up with advanced OpenCV tools. Both the inputs and outputs were complex, and each step required custom software.
I formulated my project requirements and got to work building my own data enrichment pipeline. My pipeline’s requirements:
- Accept uploaded pictures from photographers. Manage logs to payments due to the photographers.
- Filter, decompress, sort, and rename pictures uploaded by photographers. Backup the pictures.
- Distribute pictures to data enrichment personnel. Evenly distribute pictures and software.
- Accept uploaded metadata from personnel. Send personnel new workloads. Manage logs and payments due the personnel.
- Deliver completed pictures and metadata to machine learning personnel.
A combination of languages allowed me to prototype this application in a day.
- Use PHP for emails and payment systems.
- Hook Dropbox’s API into Linux system calls for file management.
- Use the R language and R’s session persistence to manage job data and dispatch system calls.
- Use Python Tkinter and OpenCV to create OS-independent manual enrichment software.
- Use AWS CLI called through Bash, Cron, and R to manage networks.
This illustrates how in-depth knowledge of multiple languages can help accomplish tasks quickly. Eventually, I rebuilt the pipeline in solely Python to minimize dependencies, but the proof-of-concept was invaluable for the first few weeks of its use.
Software as a Stepping Stone
This enrichment pipeline, which I remain very proud of, was ultimately just a stepping stone in a broader data science project. The data generated by this software and its 30+ users generated over half of a terabyte of data used for advanced computer vision and machine learning projects.
Technical Literacy & Quality of Life
Even in light of my proudest achievements, software developers are often reduced to just good at computers. Programmers don’t typically like to hear this generalization, but I welcome it, because technical literacy improves corporate quality of life. Simple tasks like launching blogs, configuring SSL certificates, and securing desktop computers are second-nature to software developers, but significant roadblocks to most people.
For example, I launched a charity server a few years ago to allow me to host blogs, resumes, and emails of whoever needs them. The most significant blog hosted here is Father Michael Sliney’s Sliney.org. I offered to create this blog after being on Father Michael’s mailing list for years, and realizing it would be helpful to catalog his great content. From there, I incrementally optimized the site for optimal search engine discovery, and we started seeing a lot of traffic. Father Michael has made 700+ posts and reached hundreds of thousands of people through web content. Update: He was recently featured on Fox News speaking about the Catholic Church’s presence in Guam during the North Korea crisis.
When I have the time, I work on optimizing this server, resolving SEO issues for its users, and on-boarding new services. It is an inexpensive way for my technical literacy to have a big impact on people I know that could use the help.
So, I don’t mind being just good at computers, because not everyone is.