2017-05-20

On scripting languages

Teething rings (pacifiers) found on the great interwebz.

Foreword

Historically, the difference between scripting languages and real programming languages has been understood as the presence or absence of a compilation step. However, in recent decades the distinction has blurred; from time to time we have seen:

  • Interpreters for languages that were originally meant to be compiled.
  • Compilers for languages that were originally meant to be interpreted.
  • Scripting engines internally converting source code to bytecode before interpreting it.
  • Real languages compiling to bytecode which is then mostly interpreted and rarely converted to machine code.

So, compiled vs. interpreted does not seem to be the real differentiating factor between scripting languages and real programming languages. Nonetheless, we can usually tell a scripting language when we see one. So, what is it that we see?

(Useful pre-reading: About these papers)

I would suggest that the actual differentiating factor between real programming and scripting languages is nothing but the presence or absence of semantic checking. In other words, it boils down to strong vs. weak typing. 

TypeScript is the odd exception to the rule, and this is to be expected, because the impetus for the creation of TypeScript was vastly different from that of other scripting languages: scripting languages tend to be one-man efforts, and usually come into existence as nothing more than little toy projects. In contrast, TypeScript was the result of a deliberate group effort backed by a big company (Microsoft) starting with the realization that JavaScript is unfortunately here to stay, and setting out specifically to correct one of its major deficiencies, namely the lack of strong typing.

The trend of real programming languages to be compiled and of scripting languages to be interpreted can be explained in full as a consequence of the primary choice of strong vs. weak typing:

  • If a language is strongly typed, then semantic errors in the code are detectable, so a compilation step is very useful because it will unfailingly locate all of them before attempting to run.
  • If a language is weakly typed, then semantic errors are undetectable, so there is no need to parse code in advance. A compilation step could only reveal syntactic errors, which can also be detected by any halfway decent IDE.

So, allowing for the exception of TypeScript, this leaves us with the following rule:

Real languages are strongly typed, meaning that they employ semantic checking.

Scripting languages are weakly typed, meaning that they lack semantic checking.

And yet, many people tend to like scripting languages, and actually write lots of code in them, supposedly because they are "easier". This brings to mind the famous quote by Edsger W. Dijkstra:

[...] some people found error messages they couldn't ignore more annoying than wrong results, and, when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.

(From Edsger W. Dijkstra, On the foolishness of "natural language programming".)

The original text in which the above quote appeared was about Natural Language Programming, but the particular passage from which the quote was taken refers to programming languages in general. Luckily,  (or naturally, if you would permit the pun,) Natural Language Programming did not catch on, but little did ol' Edsger know that in the decades that would follow his nightmares would come true in a different way, because scripting languages did catch on. Apparently, people love making undetected mistakes.

Arguments I keep hearing in favor of scripting languages

Argument: It is easy to write code in it; look, the "hello, world!" program is a one-liner.

Rebuttal: What this means is that this scripting language is a very good choice for writing the "hello, world!" program. The ease with which you may write "hello, world!" is no indication whatsoever about the ease with which a non-trivial system may be developed, tested, debugged, maintained, and extended. On the contrary, a scripting language which makes it possible for you to write "hello, world!" in a single line achieves this by introducing a few trade-offs; it offers built-in functionality without the need to explicitly import it, which in turn means that there will always be irrelevant identifiers polluting every scope; it does not require code to be placed in classes, which means that it is either not object-oriented, or it mixes paradigms; and it does not require code to be placed in functions, which means that either its syntax is trivial, or again, it mixes paradigms. The moment you write anything non-trivial, you will of course need to be able to import namespaces, to put all code in methods, and to put all methods in classes, so the fact that the language does not require them buys you absolutely nothing.  (But you are still stuck with all scopes being polluted with unwanted stuff.)

Argument: No, I mean it is really terse. There are many things besides "hello, world!" that I can write in one line.

Rebuttal: Sure, you can write them in one line. But can you read them? One of the most important aspects of code is readability, (second only to correctness,) but terse code is not necessarily easy to read; if that was the case, Perl would be the most readable language ever, but instead it enjoys the dubious distinction of being the least readable among all widely used programming languages. Terseness usually represents a tradeoff between verbosity and complexity. The more terse the code, the less of it you have to read, but also the harder it is to untangle its complexity. Thus, it is debatable whether terseness correlates with readability. Terseness appears to be the modern trend, so as real programming languages keep evolving they are also receiving features that make them more and more terse, take lambdas and the fluent style of invocations for example. So, terseness is not the exclusive domain of scripting languages, and to the extent that scripting languages go further in this domain it is debatable whether it is an advantage or a disadvantage.

Argument: There are lots of libraries for it.

Rebuttal: Seriously? There are more libraries for your scripting language than there are for Java?

Argument: I don't have to compile it; I just write my code and run it.

Rebuttal: I also just write my code and run it. When I hit the "launch" button, my IDE compiles my code in the blink of an eye and runs it. The difference between you and me is that if I have made any semantic mistakes in my code, I will be told so before wasting my time trying to run it. But what am I saying, being told that there are semantic mistakes in your code probably counts as a disadvantage for you, right?

Argument: I can modify my program as it runs.

Rebuttal: I can also modify my program as it runs; the ability to do this is available in most real programming languages, and it is called "edit and continue" or "hot reload" depending on the language. Look it up.

Argument: I do not need type safety. I am better off without it.

Rebuttal: Right. So, you are the technological equivalent of an anti-vaxxer. (Credit: danluu)

Argument: I do not have to use an IDE, I can just use my favorite text editor.

Rebuttal: Oh sure. You are also the technological equivalent of an Amish farmer.

Argument: I am not worried about errors, because I use testing.

Rebuttal: Oh really? Are your tests achieving even a mere 60% code coverage as we speak? And supposing that they do, how do you feel about the fact that in the remaining 40%, every single line is liable to break due to reasons as trivial and yet as common as a typo? Testing is an indispensable quality assurance mechanism for software, but it does not, in and by itself, guarantee correctness. It is too custom-made, too subjective, and too fragmentary. You can easily forget to test something, you can easily test the wrong thing, and you can easily test "around" a bug, essentially creating tests that pretty much require the bug to be in place in order to pass. Despite these deficiencies, testing is still very important, but it is nothing more than a weapon in our arsenal against bugs. This arsenal happens to also include another weapon, which is closer to the forefront of the battle against bugs than testing is, and it is comprehensive, generic, 100% objective, and definitive. This weapon is called strong typing.

Argument: It has lots and lots of built-in features.

Rebuttal: Sure, and that's why scripting languages are not entirely useless. If the only thing that matters is to accomplish a certain highly self-contained goal of severely limited scope in as little time as possible, then please, by all means, do go ahead and use your favorite scripting language with its awesome built-in features. However, if the project is bound to take a life of its own, you are far better off investing a couple of minutes to create a project in a real programming language, and to include the external libraries that will give you the same functionality in that language. Built-in features do not only come with benefits; in contrast to libraries, they are much more difficult to evolve, because even a minute change in them may break existing code, resulting in people being reluctant to migrate to the latest version of the language. Furthermore, built-in features usually have to be supported forever, even after better alternatives have been invented, or after they simply go out of style, so over time scripting languages tend to gather lots of unnecessary baggage. We have tried feature-bloated programming languages before, (with ADA for example,) and the consensus is that they are not the way to go.

Argument: But really, it is so much easier! Look here, in one statement I obtain a list and assign its elements to individual variables!

Rebuttal: That's great, I bet this has slashed your time to market by half. What happens if the number of elements in the list differs from the number of variables that you decompose it into? I bet there is no error, because you don't want to be bothered with errors, right? Seriously, my compiled language of choice has its own unique, arcane syntax quirks that I could, if I wanted to, claim that they make things so much easier for me. Some of them are not even that arcane; for example, instead of using special notation within comments to indicate the actual types of arguments so that the IDE can parse those comments and provide me with some rudimentary argument-type checking, I get to simply declare the type of each argument right next to the argument name, as part of the actual syntax of the language! Imagine that!

Argument: Scripting languages are trendy. They are hip.

No contest here. I can't argue with hipsters.

The problems with scripting languages

The nonsense

I don't need to say much here, just watch the legendary "Wat" video by Gary Bernhardt from CodeMash 2012, it is only 4 minutes long: https://www.destroyallsoftware.com/talks/wat

The reason for all this nonsense is that all these languages are hacks.

When the foundation that you are working on is a hack, then either anything you will build upon it will in turn be a hack, or you are going to be putting an enormous effort to circumvent the hackiness of the foundation to build something reasonable over it.  Why handicap yourself?

The errors

Lack of semantic checking means that mistakes can be made, which will not be caught at the earliest moment possible, that is, during compilation, or preferably during editing in any decent IDE. Therefore, lack of semantic checking necessarily means that there will be more errors.

It is a given fact that a certain percentage of errors will always pass testing and make it to production, which in turn inescapably means that there will be a somewhat increased number of bugs in production.

This alone is enough to classify scripting languages as unsuitable for anything but tinkering, and the debate should be over right there, end of story.

The crippled IDE

Lack of semantic checking means that your IDE cannot provide you with many useful features that you get with strongly typed languages. Specifically, you either have limited functionality, or you do not have at all, some or all of the following features:

  1. Context-sensitive argument auto-completion. Since any parameter to any function can be of any type, the IDE usually has no clue as to which of the variables in scope may be passed to a certain parameter of a certain function. Therefore, it has to suggest everything that happens to be in scope. Most of these suggestions are preposterous, and some even treacherous.

  2. Member Auto-completion. Since a variable does not have a specific type, the IDE usually has no clue as to what member fields and functions are exposed by that variable. Therefore, either it cannot give any suggestions, or it has to suggest every single member of every single known type and the kitchen sink.

  3. Listing all usages of a type. Since any variable can be of any type, the IDE usually has no clue as to where a given type is used, or if it is used at all. Contrast this with strongly typed languages where the IDE can very accurately list all usages of any given type and even provide you with visual clues about unused types.

  4. Type sensitive search. If you have multiple different types where each one of them contains, say, a `Name` member, you cannot search for all references of the `Name` member of only one of those types. You have to use text search, which will yield all irrelevant synonyms in the results. This can be okay in tiny projects, but it very quickly becomes non-viable as the project size increases.

  5. Refactoring. When the IDE has no knowledge of the semantics of your code, it is incapable of performing various useful refactoring operations on it. IDEs that offer refactoring features on untyped languages are actually faking it; they should not be calling it refactoring, they should be calling it Cunning Search and Replace. Needless to say, it does not always work as intended, and it does sometimes severely mess up the code. (When this happens, it is called Search and Destroy.) Furthermore, since there is no compiler, you have no way of knowing that your code has been messed up until you try running it.

That little performance issue

Performance is generally not an issue, because scripting languages tend to be used in situations where performance is not required. 

(There are of course some situations where people opt to use a scripting language despite the fact that performance matters, and in those situations people do in fact suffer the consequences of poor performance, take web servers written in node.js for example.) 

In today's world where the majority of personal computers are running on precious battery power, it can be argued that even the tiniest bit of performance matters, but we can let that one slide, since battery technology is constantly improving.

In cases where performance matters but the task at hand is relatively isolated, performance is again not an issue for scripting languages because external libraries are quickly developed to handle those tasks. (These external libraries are written in guess what: real programming languages.)

Having explained that performance is usually not an issue, let us also quickly mention before moving on that on computationally expensive tasks, such as iterating over all pixels of an image to manipulate each one of them, and assuming a competent programmer in each language, the following things hold true:

  • there is no way that a scripting language will perform as well as Java, just as:
  • there is no way that Java will perform as well as C++, just as:
  • there is no way that C++ will perform as well as Assembly. 

Stop arguing about this.

The horrible syntax

Most scripting languages suffer from a severe case of capriciously arcane and miserably grotesque syntax. No, beauty is not in the eye of the beholder, and there is only a certain extent up to which aesthetics are subjective.  

The syntax of scripting languages tends to suffer  due to various reasons, the most common being:

  • Their priorities are all wrong to begin with.
  • They were hacked together in a weekend.
  • Plain incompetence on behalf of their creators.

Scripting languages that have their priorities wrong are, for example, all the shell scripting languages. These languages aim to make strings (filenames) look and feel as if they are identifiers, so that you can type commands without having to enclose them in quotes, as if the convenience of not having to use quotes was the most important thing ever. If all we want to do in a shell script is to list a sequence of commands to execute, then this convenience is perhaps all we care for, but the moment we try to use any actual programming construct, like variables, what we have in our hands is a string-escaping nightmare of epic proportions.

A scripting language that owes its bad syntax to being hastily hacked together is JavaScript. Brendan Eich, its creator, has admitted that JavaScript was developed in 10 days, and that the language was not meant for anything but short isolated snippets of code. He is honest enough to speak of his own creation in derogatory terms, and to accept blame. (See TEDxVienna 2016, opening statement, "Hello, I am to blame for JavaScript".)  Also, pretty much anyone deeply involved with JavaScript will admit that it has serious problems. One of the most highly acclaimed books on the language is JavaScript: The Good Parts, authored by Douglas Crockford and published by O'Reilly; you can take the title of the book as a hint.

A scripting language that owes its horrific syntax to lack of competence is PHP. Its creator, Rasmus Lerdorf, is quoted on the Wikipedia article about PHP as saying "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."

So, from the above it should be obvious that most scripting languages are little toy projects that were created by individuals who simply wanted to prove that they could build something like that, without actually intending it to be used outside of their own workbench.

The cheapness

The lack of semantic checking in scripting languages is usually not a deliberate design choice, but instead a consequence of the very limited effort that has gone into creating them. In many cases the creators of scripting languages would not know how to add semantic checking to the language even if they wanted to. In all cases, the amount of work required to add semantic checking would have been several orders of magnitude greater than the total amount of work that went into the creation of the language in the first place.

In this sense, the comparison between scripting languages and real programming languages is a lot like comparing children's tinker toy tools with tools for professionals: sure, a plastic screwdriver is inexpensive, lightweight and easy to use, but try screwing anything but plastic screws with it.  

(I was going to also add "you cannot hurt yourself with it", but this analogy does not transfer to programming: you can very easily hurt yourself with a scripting language.)

What scripting languages are good for

Scripting languages are useful when embedded within applications, (applications written in real programming languages,) as evaluators of user-supplied expressions, or, as executors of user-supplied code snippets.

Scripting languages are useful when shortening the time from the moment you fire up the code editor to the moment you first run your program is more important than everything else. By "everything else" we really mean everything: understandability, maintainability, performance, even correctness.

Scripting languages are useful when the program to be written is so trivial, and its expected lifetime is so short, that it is hardly worth the effort of creating a new folder with a new project file in it. The corollary of this is that if it is worth creating a project for it, then it is worth using a real programming language.

Scripting languages are useful when the code to be written is so small and simple that bugs can be detected by simply skimming through the code. The corollary of this is that if the program is to be even slightly complex, it should be written in a real programming language. (Adding insult to injury, many scripting languages tend to have such capricious write-only syntax that it is very hard to grasp what any piece of code does, let alone skim through it and vouch for it being bug-free.)

Scripting languages are useful for getting non-programmers into programming as quickly as possible.  Most of us programmers have had a friend, who was not a programmer, and who one day asked us how to get into programming. The thought process should be familiar: you think about it for a moment, you start making a mental list of things they would need in order to get started with a real programming language, and you quickly change your mind and suggest that they try Python, because this answer stands some chance of fitting within our friend's attention span. However, the truth of the matter is that this recommendation will only save our friend from maybe an hour of preparatory work, and it would be a crime if it condemns them to thousands of hours wasted over the course of a forty year long career due to the use of an inferior programming language. In this sense, scripting languages are a lot like teething rings (pacifiers): it is okay to start with one; you must get rid of it as soon as you grow some teeth.

Conclusion

The fact that some scripting languages catch on and spread like wildfire simply shows how eager the industry is to adopt any contemptible piece of nonsense without any critical thinking whatsoever, as long as it helps optimize some short-sighted concern, such as how to get non-programmers into programming as quickly as possible. It is a truly deplorable situation that kids nowadays learn JavaScript as their first programming language due to it being so accessible to them: all you need is a browser, and one day instead of F11 for full-screen you accidentally hit F12 which opens up the developer tools, and you realize that you have an entire development environment for JavaScript sitting right there, ready to use. The availability of JavaScript to small children is frightening.

Usually, once a language becomes extremely popular, tools are created to lessen the impact of its deficiencies. Thanks to the herculean efforts of teams that develop scripting engines, and through all kinds of sorcery being done under the hood in these engines, the most popular scripting languages are considerably faster today than they used to be. However, the sorcery is not always applicable, even when it is applicable it is imperfect, and besides, it incurs a penalty of its own, so scripting languages will never match the performance of real programming languages. Also, modern IDEs have evolved to provide some resemblance of semantic checking in some scripting languages, but since this checking has been added as an afterthought, it is always partial, unreliable, hacky, and generally an uphill battle.

So, you might ask, what about the hundreds of thousands of successful projects written in scripting languages? Are they all junk? And what about the hundreds of thousands of programmers all over the world who are making extensive use of scripting languages every day and are happy with them? Are they all misguided? Can't they see all these problems? Are they all ensnared in a monstrous collective delusion?

Yep, that's exactly it. You took the words from my mouth.

mandatory grumpy cat meme

Also read: michael.gr - Tablecloth - A short high-tech sci-fi horror story




Note: This is a draft. It may contain inaccuracies or mistakes. There are bound to be corrections after I receive some feedback.

---------------------------------------------------------------------

Scratch

See:

http://stackoverflow.com/questions/397418/when-to-use-a-scripting-language

From http://wiki.c2.com/?SeriousVersusScriptingLanguages

Scripting Languages emphasize quickly writing one-off programs

serious languages emphasize writing long-lived, maintainable, fast-running programs.

light-duty "gluing" of components and languages.

From https://danluu.com/empirical-pl/

“I think programmers who doubt that type systems help are basically the tech equivalent of an anti-vaxxer”

the effect isn’t quantifiable by a controlled experiment.

Misinformation people want to believe spreads faster than information people don’t want to believe.

https://stackoverflow.blog/2023/01/19/adding-structure-to-dynamic-languages


No comments:

Post a Comment