Programming Languages
Back in the late 80’s I went on a job interview. The interviewer described with great pride the environment and tools they were using as some of the most advanced available at that time. Then she asked about the most powerful language I had ever worked with to which I replied that I was adept at 80x86 Assembler, a low level language that directly translates into machine language.
The interviewer was incensed that I would think something as archaic as assembler or machine language could do anything near as advanced as the modern language they used on their team! When she demanded I explain my reasoning I explained that everything a computer does is done in machine language. Every other language is simply a higher level façade that must be translated down to machine language. I explained that the most advanced language on the globe must eventually either compile down to machine language or utilize some method of interpretation to translate the higher level instructions into machine language. Either way, nothing happens on a computer if it doesn’t happen in machine language.
The interview ended abruptly and I was escorted out by the interviewer who was still mumbling comments about my ignorance the whole way out. The situation was unbelievable and all I could do was laugh (after I left of course).
So why bring this up? Because before we can have any discussion about the value of any language we need to first understand that NOTHING happens on a computer if the CPU doesn’t receive a machine language instruction! CPUs only understand machine language opcodes, absolutely nothing else (operands are just parameter data).
So at this point a number of you reading this will be caught in the same dilemma my interviewer had; that your language is so much more advanced then machine language. Your language has inheritance, classes, polymorphisms, code first design features and a host of other object oriented capabilities specific to that language. Machine language couldn’t possibly do what your language does!
Remember, nothing happens on a computer without the CPU and the CPU only speaks machine language opcodes so.. Machine language already does do all of the things your language does, you just didn’t know it!
What’s in a language?
So hopefully a realization is sinking in that your language, whatever it is, is nothing more than a facade for what’s really happening under the covers. So what’s the point of all these different languages?
It’s far easier if you try thinking of a language as just another type of vehicle sharing the same road to a common destination.
When you drive on the road, are all vehicles exactly the same as yours? Probably not. Each has their own specific strengths and weaknesses. A Lamborghini will get you to your destination in style and speed but won’t be so helpful if you need to take your large dog to the vet! A tractor trailer could transport your large pet to the vet and more but would be impossible to maneuver on small streets. An SUV could also get your pet to the vet but won’t be as stylish or as fast as a Lamborghini nor will it carry as much as a tractor trailer. Languages are exactly the same, collections of features and tradeoffs aimed at a specific use or goal. There is no perfect answer. No right or wrong, just a fit for your specific need or situation.
Language packaging
So the simplest of the features that we concern ourselves with is execution method/packaging. Although the education system will say there are only 2 execution method/packaging, I’d offer that there are three. Namely, compiled, interpreted and what I call bytecode, a name I’ve appropriated from the name of its resulting executable file.
Originally, compiling a program meant that the commands in your program would be translated directly into CPU machine language for direct execution by the CPU. The key words here are “Translate to machine language”. In later years assembler and machine language fell out of fashion in favor of simpler tools and the word “compile” morphed to mean, “Translate to something else” which we’ll see later.
Compiling is something that is only meant to be done once per code change to translate and package the code text into its final executable program for use by the CPU. Compilers usually also perform a code optimization pass during their compile. Because they can examine all of the code without having to execute it, the compiler can get a sense of the programmer’s intention and decide the most efficient way to translate the code. So the advantages to the use of machine level compilers is speed of execution especially with a code optimization pass, and occasionally raw power. The disadvantage to these kinds of compilers is a need to compile a version for every different CPU you may be executing the code on. The file size may also be bulky as compared to other options which was a factor years ago as you’ll see.
Conversely, interpreter based programs can be thought of very much like a human language interpreters on TV. The text of the program is converted into a machine executable form one the fly while the program is running. This naturally carries implications. The interpreters perspective is only of the single program command it’s currently translating, thus it can’t infer any particular intention on the part of the programmer and thus can’t really make any optimizations. Additionally, a permanent copy of the translation the interpreter as made is never kept thus the translation must occur every time the program is run. Thus code interpretation is thought of as being one of the worst packaging types from a performance perspective but it’s also one of the most portable packaging types since a custom interpreter can be written interpret the same program file on any number of different CPUs, making it possible to run the same program on different CPUs without any change to the program itself and without having to recompile and maintain different versions.
If you’re going to be running your program on very different CPUs and Operating systems than this is important but if the target audience is all Intel CPUs running Windows, this advantage becomes mute leaving you with nothing but the execution performance cost.
The third packaging type which I have named “Bytecode” is a hybrid of the other two types. A bytecode program is compiled prior to use just as in a conventional compiled program and often has a code optimization pass but no machine language instructions are recorded in the execution file. Instead, the program is translated into another “middle” language called “Bytecode” that the language developers literally made up when they designed the language. Bytecode usually consists of numeric codes for specific actions the program needs performed. This newly compiled file of custom, “made up” codes are then passed through a custom interpreter that translates those codes into machine language actions.
If you’re thinking they’ve added a redundant step, you’d be right. Java became well known for this and would cite the advantages being that the compile to bytecode shrunk the Java program considerably for faster downloading over the internet. Potential precompile optimizations and physical program file sizes are the ONLY advantage to this approach. Later Microsoft adopted this approach for their new .Net language but added that their bytecode acted as an intermediate language which allowed developers to develop in any language they wanted while ensuring that all of the programs and libraries could work together seamlessly but these features were never embraced. As with Java, this approach offers the potential benefits of code optimization and reduced file size but beyond that is just an interpreter language with all of the same costs and advantages of that packaging type.
Most companies use their code on backend or internal implementations and have little or no need for CPU portability so interpreter based packaging is almost lost in today’s world. While I would never advocate throwing away a feature, the need for portability must be carefully weighed against the performance costs of other packaging types.
The optimization pass offered by compilers, even for use as bytecode, can be a nice boost to performance and is often very helpful in closing the high level language/low level language performance gap but don’t be fooled, the extra steps interpreters introduce can never match the direct execution of the machine language code that pure compile languages like assembler, unmanaged C and C++ compile down to. Its simple math really. Every step in machine language is an execution step while many of the steps in an interpreter based program are wasted just trying to evaluate what needs to be done next. While your Java or C# interpreter is resolving the next piece of Bytecode, programs compiled to machine language are already executing that next step.
Language Type
There are really only three language types, each of which could be further subdivided. For our discussion we’ll stick to just the main two types, Procedural and Object Oriented. This is a current point of “Perception”. What I mean is that every developer wants to be seen as being a certain type of developer. Everyone wants to claim their OO developers and every tool claims to be an OO language. I’ve encountered Object Oriented "Assembler" programmers and while I can agree some aspects of OO design can fit anywhere, I think in many areas, especially interpreter only languages, the, “Object Oriented” title is a huge stretch. I mention this because I’ll be doing some bubble bursting here so hang on to your hats.
The simplest and most direct way to describe procedural languages is to say they’re a step by step list of instructions. It really is that simple. Just like your turn-by-turn GPS in your car, procedural code is a program that lists a set of simple and direct steps the computer must perform to achieve the desired goal. Procedural languages tend to have minimal language structure. Some examples of procedural languages are; Machine Language, Assembler, Basic, Visual Basic, SQL and most scripting languages with no compiler to name a few.
Here’s the part some Object Oriented programmers won’t like; No program can execute without procedural code!
Which brings us to what an Object Oriented language is and for that I'd like to remind you that NOTHING runs on a computer without the CPU and the CPU only knows one language and it’s not Object Oriented.
An Object Oriented language is really a vast array of structural mechanisms to organize code and data into self-contained “Objects”. The actual work done in an Object Oriented program is done within a boundary called a method. Within that method is the code needed to perform tasks on the data. THAT code is entirely procedural. Put another way, if you remove ALL of the method code (which is procedural) from an Object Oriented program, you have a program that doesn’t do anything. No matter what, there is no escaping the need for a step by step list of instructions. That’s procedural. Everything else in an object oriented language is meant to define data and how one “thing” in the program relates to other “Things”.
The big benefit we’re supposed to get from using Object Oriented languages is;
- Readability – The program should be easy to read and analyze because we put so much work into defining everything upfront.
- Maintainable – Because the program is so readable and well organized, we should be able to make changes in a very meaningful and strategic way without having to rewrite code throughout the program.
- Reusable – The program is so well organized that the code that defines all of the “Things” the program needs to work with can be used in other programs that may need to work with the same “Things” or even reused within the same program for other similar data elements.
Well that was the dream! Naturally reality is a little different. Usually there’s far more debate then practice in the Object Oriented world. I worked with a PhD who taught Java for 15 years at a university before joining the programming world. Coding in his beloved Java, he wrote 12 versions of exactly the same method in his program to support a slight variation in each one. The OO approach would have been to overload the same method for each data object but he hadn’t organized his code around the data in that way so I suggested the procedural approach which is to write one method and add a special parameter to signal a change in the way the method needs to behave for each circumstance. He went with that which reduced his 12 methods to one central method, vastly simplifying any other variation he might need to make to that method in the code.
Suffice to say just because an Object Oriented language is in use by a well-informed Object Oriented programmer doesn’t mean anything is being done in an Object Oriented way. OO design is first and foremost a choice, but procedural methods are the natural tendency.
The last language type is Script languages. Script languages have become very popular these days. They are either bytecode or interpreter languages meant for small jobs and usually carry a whole host of issues that aren’t immediately obvious. Script languages are usually chosen for their simplicity. Script languages are often guilty of some puffery as they try to tote abilities usually associated with full languages while being unable to fully implement those abilities or needing special work arounds or techniques to perform tasks that are native in full languages. Python and R are the tools of choice in the Big Data and AI realms vs something like Java which can replace both script languages entirely in these realms and in others.
Script languages usually have no code optimizations at all but more than that, they tend to use very simple data storage mechanisms (variables) which impacts performance. Simplicity is never a free feature. It’s always at the cost of control as the language makes gross assumptions on your behalf in order to simplify your experience. In Python variable handling is grossly simplified in some areas vs other languages and vastly more complex in others. This all has a performance cost. Performance losses like this may seem menial until one considers their impact on code that is iterating though millions of pieces of data as in the case of big data and AI applications.
Putting it all together
I could spend hours going over the advantages and disadvantages of each language. I’ve tried to give you a quick overview of the really big ticket items to look out for. The most important things to remember are
- There is no magic bullet, everything has value and everything has a cost.
- The industry is full of technology evangelists and misinformation which is often just fodder to support a misguided point or bash a tool that someone would be unable or unwilling to work with. Pick your sources carefully and decide for yourself.
Over the years I’ve literally been asked by developers, not to mention the advantages of certain languages because they didn’t want to have to learn them. In one case while working for one of the big banks, I came under political attack by a Lotus Notes developer for advising management that a Java/SQL solution would offer some advantages over their current VB5 flat database solution. Decide for yourself.
There is no such things as a shortcut in development. For every second you save on your development time, there are recurring hours waiting for you on the other side of your delivery. This is something I like to call “Technical Debt”. It’s the gift that keeps on giving! Remember, time is immutable, you can't change it but you can alter where in a process you spend/waste it. Design & Construction or during regular usage?
Remember that your tools are only as strong as their weakest link. This may sound obvious but if you look at Microsoft languages you’ll find they usually provide everything in a single development environment. This is something they’ve been doing since the late 70’s when I started. In fact developers were their primary customer back then. I still have my copy of Microsoft Visual Basic 4! Meanwhile tools like Java rely on a host of third party tools for use in the mainstream. Tools like Eclipse by Eclipse Foundation or IntelliJ by IDEA for the programming IDE, Hibernate by Red Hat, Spring by Pivotal and then often a server platform like JBOSS or something similar as well as some sort of source code repository. Don’t get me wrong, I like Java as a language but I don’t enjoy so many disparate pieces each with their own versions and development state, bugs and quirks. More tools and more vendors means more points of failure and more work to update and maintain. This has become so extreme in Java that tools like IntelliJ now come with features just for managing all of the third party tools and verions.
Finally, there’s the question of tool viability in the marketplace. Go to your favorite job board and do a search on the language you’re considering. How many job openings do you find for that language? If you find none, odds are you’ll not only have trouble hiring someone to do it but your current programmers may leave if you try to saddle them with an unmarketable language. But if you find tons of job postings for that language, it means the language is viable and in high demand. You shouldn’t have any shortage of applicants but you may have to put in some extra work to attract them to your side of the pond.