Programming – Page 15 – Giedrius Statkevičius

I see time and time again this myth that programming languages have varying speeds perpetually peddled in various discussions between programmers in different IRC channels, forums, and so on. Some people think that, for example, C is much, much faster than Python. And a few of them go even further – they argue that C is a language that is “close to the metal”. Well, let me tell you. The code that you write in C is actually dedicated for a virtual machine that can eat that code and spit out assembly which roughly does the things that it is supposed to according to the C language standard. Why roughly? Simply because that virtual machine is poorly defined and it has a lot of undefined behaviour. This opposes our original argument that C is “close to the metal”. How could real hardware be poorly defined? Could it be that if you executed a certain instruction one time it would do something completely different, that was not mentioned at all in the CPU manual? Well, the answer is obviously no. Nobody would use such computers.

In general, a language (either spoken or a programming language) is just a conjugation of lexicon, syntax, or, in other words, it is just a way for us to express our thoughts that is understood by other people or computers either by using their brain to interpret the sounds waves that reach their ear-drums or by interpreting the logical structure of text. A computer at the end of the parsing pipeline really just executes a *lot* of “primitive” instructions at the CPU level (for brevity let’s say that only the CPU executes instructions and takes significant decisions). And those instructions don’t take the same amount of time to execute. How could it be that one way or another would be faster? It really mostly *depends* on the parsing part. Obviously, CPU have these things called caches and so on which might influence results but the former part still remains the most important one.

Certainly all of those abstractions don’t come without a cost but at the bottom line the languages themselves don’t define how fast they are. Instead, what we should be talking about are their implementations at the very least. I am pretty sure that when people are having those discussions, they do not have some kind of fastest or slowest implementation in their mind. So, if we are talking about speeds, we ought to compare the speeds of functionally equivalent, compiled programs on specific implementations. Even then it is problematic because we need to agree on what is the definition of “speed”.

The number of CPU instructions that are executed? Well, some instructions are faster, some are slower. Just because there are more of them does not mean that the program is slower.

Number of source code lines? The sheer size of program’s size does not translate directly into how big the resulting executable is. Also, see the former paragraph.

Memory usage? Even if some program allocates, let’s say, 2GB of RAM it still doesn’t mean that it is slower. It might calculate the answer quicker regardless of how much RAM it needs.

All in all, ideally we would be talking only about objective things. However, I think this is an utopia in general and in this case because the benchmarking software still runs on some kind of operating system, specific hardware, and so on. Perhaps there is no need to go into so much detail about objectivity when doing a comparison of programming languages but we should at least look at the tip of the ice berg.

That tip is the specific compiler and its version. The benchmarking method and/or software should also be included. You could even include the definitions of various words such as “speed” so everyone would be talking about the same thing when they are using (sometimes) convoluted terms. So, please, let’s all do our part and make our place a little bit more objective instead of spreading anecdata and encouraging people to practice cargo cult (people blindly switching from one language to another because they think it will make their programs magically faster) by saying that the programming language X is faster than Y.

You could probably find “Code Complete” by Steve McConnell in any of the top 10 lists of books recommended to read for all programmers. I recently finished reading that book too. I have to confess that at times the tips and knowledge in there was a bit too basic and fundamental but, on the other hand, I also think that it has a lot of golden nuggets of knowledge.

While reading it, I was writing down notes after every section. Essentially, they were three/four ideas that I thought were the most important in each section. Sometimes, they were paraphrased from the conclusions of each section written by the author himself. In other cases, I summarised them myself, in my own words.

Note that some of these things I knew already, obviously. However, I think that they are still very important to know to all programmers, independent of their skill level. Let me know what you think of it and if you picked up any other things from that book! Here is my assorted list of 10 ideas that I got from this book:

Defects in the requirements/architecture stage are very costly money-wise. In general, the later the errors were detected, the more you will have to pay in terms of money or time. For example, if you noticed issues in those two things in, for example, the implementation stage then you would have to go back a lot: not only you would have to re-do your requirements/architecture but you would have to design your code again and essentially almost start from scratch. Also, that’s why code reusability is paramount. (Chapter 3)
Software development is a heuristic process – there is no methodology that works for all cases. That is why prototyping is important – do not be afraid to write small pieces of code for testing something out. (Chapter 5)
The imperative of software design is to reduce complexity. It should be rethought or thrown away if it does the opposite. Keep accidental complexity to the minimum. Essential complexity is inevitable. Learn how to know which is which. Try at least a few designs before settling onto the final one. (Chapter 5)
Make sure that related statements are in groups, close together. Relatively independent groups of statements should be moved into their own functions. Also, code should be written to be read from top to bottom. That’s why early exits are important, IMHO. (Chapter 14)
Consider jump tables. They offer a good opportunity to reduce complex code with a lot of conditional statements. They can be index-based or staircase-based (when not exact values are used but ranges instead, you have to duplicate values). Think if you need to put index calculation into a separate function instead of duplicating code. (Chapter 18)
Testing by itself is not very effective. Consider combining multiple quality assurance techniques according to your organisation goals to achieve maximum effectiveness. Make quality objectives clear because people will optimise for them. You should also formalise this process to make it even more clearer. (Chapter 20)
Use binary search with hypotheses to narrow down the search space of where the error might be. Understand the root of the problem before trying to fix it because you might introduce more defects while fixing it. Set the compiler to the pickiest level possible. It will save more time in the long run. Don’t ignore the warnings. The compiler is your friend. (Chapter 23)
Do not stop with the first code-tuning technique, there almost always exists a better one. Move expensive operations out of loops. Always benchmark your changes because results vary wildly depending on a plethora of variables. Apply optimisation with care: readability and maintainability are still paramount. As Sir Tony Hoare said: “premature optimisation is the root of all evil”. (Chapter 26)
Consider rewriting a routine/function if its decision count is more than 10. Make boolean checks positive. Do not use double negatives. Write boolean checks according to the number line so that they would flow nicely from left to right. (Chapter 19)
Always think if arrays are really suitable for you. Research shows that programs with container data structures – queues, stacks, et cetera had fewer bugs. Use enumerated types instead of constants because they enforce more type-checking and thus it makes the program more correct. Abstract data types should be oriented as much as possible to their functional purpose. Don’t create ADTs just to store arbitrary data. (Chapter 12)

Category: Programming

No, programming languages do not have speeds

Related Posts

10 nuggets of knowledge that I picked up from “Code Complete”

Related Posts

Hey there!