One of the biggest jobs of CTOs and Technical Project Managers is to answer:
What is the best software stack for building XYZ?
On the web…
Internet forums like Quora and Stack Overflow are littered with these kinds of questions. This is understandable, because committing to the wrong software stack can be deadly in the long-term for any project.
In education…
Teachers and authors often have a strong preference for specific languages. They often position their language-of-choice as the one-size-fits-all solution to computer science. This is very prevalent in Python and JavaScript education.
In organizations…
Companies often hire for specific languages to create uniform teams. Adding a new language to a small or mid-size company’s stack can be chaotic and confusing for the team.
The Answer
The answer to the CTO’s question can vary wildly depending on a lot of internal and external factors. In this article, we will answer the CTO’s question assuming access to near-infinite resources. We will do a cursory categorization of over 1,000 open-source repositories from Google, Microsoft, Amazon, Apple, and Facebook to discover what the best language is for any given project. For engineers looking to learn new languages, this can be a helpful look at not only what a language can do, but where that language can perform at the top level. Spoiler: There is no do-anything language.
The Rules
- Start with a T-table of some common languages, including:
- Python
- Java
- C/C++
- JavaScript
- PHP
- Scroll top-to-bottom on recent open-source repositories from Google, Amazon/AWS, Facebook, Apple, and Microsoft, round-robin style.
- If a language appears as the backbone language for an un-added topic, add the topic in the corresponding column. Ignore company-specific languages in repositories created by the parent company. For example, a project where Google uses Objective-C is eligible to be added, but a project where Google uses Go is not eligible to be added. Company-specific languages include:
- Hack (Facebook)
- TypeScript and C# (Microsoft)
- Go (Google)
- Objective-C and Swift (Apple)
- Stop when the T-table fills up. Condense as necessary to make a semi-attractive infographic.
- Add outlier languages to the T-table if appropriate.
- Ignore language-native frameworks (Laravel is a PHP framework written in PHP, React is a JS framework written in JS, etc.)
The Results
I got through about 250 repositories across the five companies before there was nothing but repeats. The results were very interesting. See the tables below.
Discussion
There are a lot of misconceptions to debunk using these tables. I’ll touch on some of the biggest points and leave the remaining discussion to the comment section (which I will respond to.)
Do They Like Using their Own Languages?
It is interesting to note the C# and PHP tables are empty, and the Objective-C and Go tables are nearly empty. Hack, TypeScript, and Swift don’t even have tables. We know that Microsoft built C# and TypeScript, Facebook built Hack and loves PHP, Apple built Objective-C and Swift, and Google built Go. Cool, so why don’t they have entries here?
I found that most of the repositories supporting these languages were libraries written in other languages. So, while Apple is pushing Swift and creating a ton of libraries, documentation, and support for it, they aren’t building open-source tools with it themselves. When they have the choice, they will use the much more mature and flexible Objective-C language. The Go language has tons of libraries and excellent support, but when Google needs something done, they have it written in C++. The same pattern prevails for each company that creates their own language.
I was most surprised by Microsoft’s lack of open-source tools in the C#, given that the language has been around since the year 2000. I may be missing some very important legal or procedural reason for this, but given the pattern we see among similar companies, all I can assume is that their engineers prefer other languages.
In case anyone was curious, the single original Go repository was a cloud administration suite created by AWS.
Wrapping up this subsection, we should remember that we have no idea what these companies use internally for closed-source applications. Still, my gut feeling is that they are underutilized.
Python Doesn’t Crunch Numbers
Big companies use Python as a scripting language in the strictest sense of the word. Code academies and universities often teach Python as a tool for data science. R and Python are constantly in a war for data science supremacy (though Python has gained a strong lead recently.) One would think that Python has been used to build some serious tools for data science, machine learning, parallel computing, and general number crunching. It is used in all these applications thanks to highly optimized libraries written in C/C++, not thanks to native Python code.
When big companies want number crunching and data processing algorithms built, they skip over Python and go straight to C/C++.
Python is for Speed-Insensitive Logic
Big companies turn to Python when they need to write complex logic for speed-insensitive applications. All of the entries in the Python section of the table are logic, administration, and testing-oriented. None of them do a lot of data processing or crunch numbers. For example, Google’s MacOps library contains some incredibly detailed and complex logic regarding Mac fleet administration, but at no point does it do a million-iteration for loop. The code is long and advanced, but the computer executes the programs extremely quickly, oftentimes skipping huge chunks of text hidden behind if-else statements.
It’s Hard to Write Python in Silicon Valley
Based on this analysis, I imagine it is very hard for a pure Python programmer to find work in Silicon Valley. In other words, if Python is your lowest-level language, there won’t be any positions available. While Silicon Valley definitely uses Python (it is the 2nd biggest table in the infographic), most of the tasks they perform in Python are very advanced. We have topics in there like stress testing, hardware security, fleet administration, and compliance testing. In other words, it looks like those using Python in Silicon Valley know much much more than just Python.
This is interesting to me, because I know many consultants that make great livings working purely in Python. I never considered them to have any significant weaknesses in their tech education. It seem that, if they were required to play at this level, they would have to drop Python and pick up C/C++.
Java Isn’t Evolving
We know that Java is huge, and it isn’t going anywhere anytime soon. It is extremely popular in Google and Amazon, internally. We simply aren’t seeing a lot of new technology built in Java, and most that we are seeing is Android-based. Things that are done at this high level in Java are simply a subset of what is done in C/C++, with the addition of Android-ware. On this note, I would say a pure C/C++ programmer looking to pick up a second language should learn anything but Java.
C/C++ Still Rules
I have praised computer science programs in the past for dropping Java and picking up Python in their curricula. In light of these tables and my recent use of C/C++, I would suggest a combined Python and C++ curriculum for the best job outcomes. C/C++ is the only language big companies consider for crunching numbers and processing data. Fortran may be faster sometimes, but still, C/C++ is the language of choice for these companies. After perusing these repos, I would feel naked working at Google without a C++ background.
Facebook Loves OCaml
I do not know anything about OCaml, but Facebook seems to use it successfully for a number of projects involving language-agnostic code optimization. Worth researching some more.
JavaScript is the Open-Source UI Solution
JavaScript has a lot of transpilers that can deploy UI-driven applications on the web, OSX, iOS, Android, Windows, and Linux. This would normally require maintaining separate codebases in C#, C++, and Objective-C, or using some commercial framework like Qt. I think JavaScript’s open-source transpilers are oft-ignored features that makes it vert important language. The JavaScript and Node.JS ecosystem seems alive and well. I encourage developers not to shy away from it in spite of its strange syntax.
Conclusion
I have wanted to do this analysis for a while, and I found it to be productive and rewarding. The tables pictured can change by the day depending on which open-source repositories are active within the selected companies, but the results are still instructive. Let me know any of your thoughts in the comments, and we will continue to build out the table.
Did you check https://github.com/dotnet ? Also https://github.com/Azure comes to mind.
You might find more C# there.
Great article, Chris. Very creative method to answer a very complex question.